Webspecia
Developer Reference

AI Visibility Platform
Technical Architecture

Complete developer reference: full database schema, all API routes, data flow, scoring algorithms, caching strategy, authentication, external integrations, and design decisions. Sufficient for an AI or developer to fully understand the product from scratch.

Product Overview

What This Product Does

AI Visibility is a SaaS platform that tracks whether a business is mentioned in AI assistant responses (ChatGPT, Google AIO, Gemini). Users register their domain, the platform runs AI queries on their behalf, parses responses for brand mentions, and builds historical trend data over time. The core value proposition is giving SMBs the same visibility into AI search that SEO tools gave them for Google search.

Product Type

B2B SaaS, PLG trial → paid subscription

Primary User

SMB owner / digital marketing manager

Core Loop

Scan → View results → Take action → Re-scan

Free Tier

Public scan (no auth), 5 queries, ChatGPT only

Paid Tier

Workspace scans, 20 prompts, AIO + ChatGPT, history

Revenue Model

Razorpay subscription (₹999/mo Pro plan)

Technology

Tech Stack

Framework

  • Next.js 14 App Router
  • React Server Components
  • TypeScript

Database

  • PostgreSQL (Neon serverless)
  • Prisma ORM
  • Prisma db push (no migrations TTY)

Auth

  • Clerk (JWT-based)
  • clerkMiddleware
  • auth() / currentUser() server helpers

AI / LLM

  • OpenAI GPT-4o (main engine)
  • Google AI Overviews via DataForSEO SERP API

External APIs

  • DataForSEO (SERP, backlinks, keywords)
  • Google Search Console OAuth
  • Razorpay (billing)

Deployment

  • Vercel (serverless functions)
  • maxDuration: 120s on scan routes

Database

Full Prisma Schema

PostgreSQL via Neon. All models use cuid() primary keys. Cascade deletes are set on all child relations. No migrations TTY — changes applied via npx prisma db push.

// ─── Workspace ────────────────────────────────────────────────────────────────
// One per user per domain. Container for all tracking data.
model Workspace {
  id            String         @id @default(cuid())
  clerkUserId   String                              // Clerk user ID
  domain        String                              // bare domain e.g. "abc.com"
  displayName   String?
  brandName     String?                             // overridable brand name for AI tracking
  brandPrimaryColor  String?                        // hex e.g. "#4f46e5" (from scan)
  brandBlogUrl       String?                        // blog URL override
  businessAbout      String?   @db.Text
  businessType       String?                        // "service"|"product"|"hybrid"
  headquarters       String?
  locationsServed    String?
  servicesOffered    String?   @db.Text
  establishedYear    Int?
  locationCode       Int?                           // DataForSEO location_code e.g. 2356
  locationName       String?                        // e.g. "Delhi,Delhi,India"
  // ── Brand Profile (confirmed during Brand Setup Wizard) ──────────────────────
  categories         String[]  @default([])         // e.g. ["Audiology","Hearing Aids"]
  primaryCategory    String?                        // used in prompt templates
  operatingCities    String[]  @default([])
  primaryCity        String?                        // used in geo-scoped prompts
  brandAliases       String[]  @default([])         // name variants for mention detection
  timezone      String   @default("UTC")
  createdAt     DateTime       @default(now())
  updatedAt     DateTime       @updatedAt
  @@unique([clerkUserId, domain])
  @@index([clerkUserId])
}

// ─── Topic / Prompt hierarchy ────────────────────────────────────────────────
// Topics are categories (e.g. "Brand Awareness"), Prompts are individual queries.
model Topic {
  id          String    @id @default(cuid())
  workspaceId String
  name        String
  sortOrder   Int       @default(0)
  isActive    Boolean   @default(true)
  createdAt   DateTime  @default(now())
  prompts     Prompt[]
  @@index([workspaceId])
}

model Prompt {
  id        String    @id @default(cuid())
  topicId   String
  text      String
  intent    String?   // "informational"|"navigational"|"transactional"|"commercial"
  isActive  Boolean   @default(true)
  sortOrder Int       @default(0)
  createdAt DateTime  @default(now())
  @@index([topicId])
}

// ─── ScanRun ─────────────────────────────────────────────────────────────────
// One row per scan execution. Groups all PromptResults from that run.
model ScanRun {
  id               String    @id @default(cuid())
  workspaceId      String
  localDate        String    // "YYYY-MM-DD" in workspace timezone
  runAt            DateTime  @default(now())
  promptsRan       Int       @default(0)
  inputTokens      Int       @default(0)
  outputTokens     Int       @default(0)
  estimatedCostUsd Float     @default(0)
  @@index([workspaceId, localDate])
}

// ─── PromptResult ────────────────────────────────────────────────────────────
// Mutable daily cache. One row per (prompt × engine × scanRun).
// Unique key: (promptId, engine, scanRunId).
model PromptResult {
  id          String   @id @default(cuid())
  promptId    String
  scanRunId   String?
  engine      String   @default("chatgpt")   // "chatgpt" | "google-aio"
  rawResponse String   @db.Text
  mentioned   Boolean  @default(false)
  rank        Int?                             // 1-indexed position in numbered list
  sentiment   String?  // "positive"|"neutral"|"negative"
  citations   Json     @default("[]")          // [{ name: string, url: string }]
  competitors Json     @default("[]")          // ["rival.com", ...]
  mentionTypes  String[] @default([])          // see mention-detection.ts
  mentionScore  Float    @default(0)
  aioPresent    Boolean?                       // google-aio only: did AIO block appear?
  cachedDate  String                           // "YYYY-MM-DD"
  runAt       DateTime @default(now())
  @@unique([promptId, engine, scanRunId])
}

// ─── PromptRankHistory ────────────────────────────────────────────────────────
// Immutable append-only ledger. Written on every scan. Powers trend charts.
model PromptRankHistory {
  id                  String   @id @default(cuid())
  promptId            String
  workspaceId         String
  scannedAt           DateTime @default(now())
  status              String   // "visible" | "not_found"
  position            Int?
  sentiment           String?
  competitorCount     Int      @default(0)
  llmResponseSnapshot String?  @db.Text
  @@index([promptId, scannedAt])
  @@index([workspaceId, scannedAt])
}

// ─── ScanCitation ────────────────────────────────────────────────────────────
// One row per URL per prompt per day. Unique key prevents duplicate writes.
model ScanCitation {
  id              String   @id @default(cuid())
  workspaceId     String
  promptId        String
  promptText      String?  @db.Text
  url             String   @db.Text
  domain          String
  citationName    String?
  citationType    String   // "news_media"|"review_platform"|"directory"|"third_party_blog"
                           // |"aggregator"|"social"|"ecommerce"|"owned"|"unknown"
  isOwnedByEntity Boolean  @default(false)
  scannedAt       DateTime @default(now())
  cachedDate      String
  @@unique([workspaceId, promptId, url, cachedDate])
}

// ─── EntityCitationDomain ─────────────────────────────────────────────────────
// Per-domain aggregation. Upserted (not incremented) after every scan.
model EntityCitationDomain {
  id              String   @id @default(cuid())
  workspaceId     String
  domain          String
  citationType    String
  isOwnedByEntity Boolean  @default(false)
  timesCited      Int      @default(0)
  firstSeen       DateTime @default(now())
  lastSeen        DateTime @default(now())
  promptsList     String[] // unique prompt texts that triggered this citation
  @@unique([workspaceId, domain])
}

// ─── CitationAnalysis ────────────────────────────────────────────────────────
// Entity-level summary. One row per workspace. Upserted after every scan.
model CitationAnalysis {
  id             String   @id @default(cuid())
  workspaceId    String   @unique
  diversityScore Float    @default(0)   // Σ(typeWeight per unique domain)
  totalCitations Int      @default(0)
  uniqueDomains  Int      @default(0)
  ownedCount     Int      @default(0)
  topDomainRatio Float    @default(0)   // highest single-domain share (0.0–1.0)
  flags          String[] // recommendation strings
  lastComputedAt DateTime @default(now())
}

// ─── Task ─────────────────────────────────────────────────────────────────────
// User-facing action items generated from scan analysis.
model Task {
  id               String   @id @default(cuid())
  workspaceId      String
  title            String
  description      String?  @db.Text
  category         String   // "schema"|"content"|"entity"|"citations"|"gbp"|"other"
  priority         String   // "high"|"medium"|"low"
  effort           String   // "quick"|"medium"|"large"
  status           String   @default("todo")  // "todo"|"in_progress"|"done"
  linkedPromptId   String?
  linkedPromptText String?
  expectedOutcome  String?
  createdBy        String   // Clerk userId
  createdAt        DateTime @default(now())
  updatedAt        DateTime @updatedAt
  completedAt      DateTime?
}

// ─── PinnedPrompt ─────────────────────────────────────────────────────────────
// User-curated watchlist. Unique per (workspaceId, promptId).
model PinnedPrompt {
  id          String   @id @default(cuid())
  workspaceId String
  promptId    String
  pinnedAt    DateTime @default(now())
  notes       String?
  @@unique([workspaceId, promptId])
}

// ─── Subscription ─────────────────────────────────────────────────────────────
// One row per Clerk user.
model Subscription {
  id                     String    @id @default(cuid())
  clerkUserId            String    @unique
  plan                   String    @default("free")    // "free"|"starter"|"pro"|"agency"
  status                 String    @default("active")  // "active"|"trialing"|"past_due"|"canceled"
  stripeCustomerId       String?   @unique
  stripeSubscriptionId   String?   @unique
  razorpaySubscriptionId String?   @unique
  currentPeriodEnd       DateTime?
  createdAt              DateTime  @default(now())
  updatedAt              DateTime  @updatedAt
}

// ─── BrandEntity ──────────────────────────────────────────────────────────────
// One row per known brand/domain. Aliases are pre-normalized (lowercase,
// alphanumeric only) for fast exact-match in AI responses.
model BrandEntity {
  id          String   @id @default(cuid())
  domain      String   @unique   // "widex.in"
  brandName   String             // "Widex India"
  aliases     String[]           // ["widexindia", "widex", "widex.in"]
  clerkUserId String?
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt
}

// ─── GlobalBrandMention ───────────────────────────────────────────────────────
// Cross-workspace. Every brand detected in ANY AI response is logged here.
// Enables pre-fill on signup, cross-mention emails, superadmin analytics.
model GlobalBrandMention {
  id              String   @id @default(cuid())
  mentionedDomain String?           // resolved domain
  mentionedName   String            // name as appeared in AI text
  promptText      String   @db.Text
  promptIntent    String?
  triggerDomain   String            // whose scan produced this
  engine          String   @default("chatgpt")
  mentionTypes    String[]
  isDirectAnswer  Boolean  @default(false)
  listPosition    Int?
  citationUrls    String[]
  sentiment       String?
  mentionScore    Float    @default(0)
  detectionMethod String   @default("citation")  // "citation"|"own_scan"|"alias_match"
  scannedAt       DateTime @default(now())
  sector          String?
  categories      String[] @default([])
}

// ─── PromptResponseCache ──────────────────────────────────────────────────────
// Shared cross-workspace cache. Key = SHA-256(exact prompt sent to LLM + "|" + engine).
// 2-day TTL for workspace scans, 24h for free scans.
model PromptResponseCache {
  id           String   @id @default(cuid())
  promptHash   String
  engine       String
  rawResponse  String   @db.Text
  mentioned    Boolean  @default(false)
  rank         Int?
  sentiment    String?
  citations    Json     @default("[]")
  competitors  Json     @default("[]")
  inputTokens  Int      @default(0)
  outputTokens Int      @default(0)
  createdAt    DateTime @default(now())
  expiresAt    DateTime
  @@unique([promptHash, engine])
  @@index([expiresAt])
}

// ─── Scan (public/anonymous) ──────────────────────────────────────────────────
// Free scan initiated from the landing page — no auth required.
model Scan {
  id            String   @id @default(cuid())
  url           String
  normalizedUrl String
  domain        String   @default("")
  status        String   @default("pending")  // "pending"|"processing"|"completed"|"failed"
  createdAt     DateTime @default(now())
  updatedAt     DateTime @updatedAt
}

// ─── BusinessProfile ──────────────────────────────────────────────────────────
// Enriched business data attached to a free Scan. Populated from:
// scraping homepage, JSON-LD structured data, DataForSEO, and AI (Gemini/GPT).
model BusinessProfile {
  id            String   @id @default(cuid())
  scanId        String   @unique
  name          String?
  description   String?
  location      String?
  businessType  String   // "product"|"service"|"hybrid"
  confidence    Float
  blogUrl       String?
  sitemapUrl    String?
  pageTitle     String?
  faviconUrl    String?
  themeColor    String?
  socialLinks   Json     @default("{}")    // { facebook, instagram, linkedin, youtube, twitter }
  telephone     String?                   // from schema.org
  priceRange    String?                   // "$"|"$$"|"$$$"
  ratingValue   Float?
  reviewCount   Int?
  schemaType    String?                   // schema.org @type e.g. "MedicalBusiness"
  sameAs        Json     @default("[]")   // official profile URLs
  openingHours  Json     @default("[]")   // ["Mon-Fri 09:00-18:00", ...]
  liveResults   Json     @default("[]")   // [{query, mentioned, response, competitors, cachedAt, googleAio?: {...}}]
  products      Json     @default("[]")
  services      Json     @default("[]")
  cities        Json     @default("[]")
  country       String?
  gmbUrl        String?
  email         String?
  sector              String?
  primaryCategory     String?
  secondaryCategories Json     @default("[]")
  categoryConfidence  Float?
  aeoChecklist        Json     @default("[]")  // [{id, label, pass, impact, tip}]
  domainRank          Int?
  backlinksCount      Int?
  referringDomains    Int?
  organicKeywordsCount Int?
  topRankedKeywords   Json     @default("[]")  // [{keyword, position, volume, url, cpc}]
  confirmedCity       String?
  confirmedName       String?
  hasKnowledgePanel   Boolean  @default(false)
  inLocalPack         Boolean  @default(false)
  peopleAlsoAsk       Json     @default("[]")
  competitorInsights  Json     @default("[]")  // [{domain, domainRank, backlinksCount, mentionCount}]
  createdAt     DateTime @default(now())
}

API Reference

All API Routes

Public Routes (no auth required)

POST/api/scan

Start a free scan. Body: { url }. Scrapes homepage, runs AI enrichment, runs 5 live GPT-4o + DataForSEO AIO queries. Returns { scanId }.

GET/api/scan/[id]

Poll scan status. Returns { status, businessProfile } when complete.

POST/api/brand/[domain]/confirm-city

Save confirmedCity on BusinessProfile. Body: { city }. No auth — scoped by domain.

POST/api/brand/[domain]/confirm-name

Save confirmedName on BusinessProfile. Body: { name }.

POST/api/trial/start

Create a Subscription row with status=trialing, currentPeriodEnd=+7 days. Requires Clerk auth (reads userId from JWT).

GET/api/trial/status

Returns { status, plan, daysLeft } for the authenticated user. Returns free plan if no subscription found.

POST/api/billing/webhook

Razorpay subscription lifecycle webhook. Verifies x-razorpay-signature HMAC. Handles activated/charged/halted/cancelled/pending events. In isPublicApi middleware bypass — no Clerk auth.

Workspace Routes (Clerk auth required)

GET/api/workspace/[domain]

Fetch full workspace data including topics, latest scan results, and citation analysis.

POST/api/workspace/[domain]

Create workspace for domain. Triggers BusinessProfile fetch, brand entity registration.

POST/api/workspace/[domain]/setup

Save Brand Setup Wizard output: categories, primaryCategory, operatingCities, primaryCity, brandAliases. Syncs BrandEntity aliases (normalized, deduplicated).

POST/api/workspace/[domain]/topics/generate

AI-generate 3 topics + 3–5 prompts per topic. Uses primaryCategory + primaryCity from workspace. Returns created Topic and Prompt records.

POST/api/workspace/[domain]/topics/run

Run scan. For each active prompt: check PromptResponseCache, else call GPT-4o + Google AIO in parallel. Writes PromptResult, PromptRankHistory, ScanCitation, GlobalBrandMention. maxDuration: 120.

GET/api/workspace/[domain]/scan-runs

List all ScanRun records for workspace with cost and token summaries.

GET/api/workspace/[domain]/results/[runId]

All PromptResult rows for a specific ScanRun, all engines, ordered by engine asc.

GET/api/workspace/[domain]/citations

EntityCitationDomain + CitationAnalysis for workspace. Used by citation panel.

POST/api/workspace/[domain]/tasks

Create a Task. Body: { title, category, priority, effort, linkedPromptId, expectedOutcome }.

PATCH/api/workspace/[domain]/tasks/[taskId]

Update task status. Body: { status }. Sets completedAt timestamp when status=done.

POST/api/workspace/[domain]/prompts/pin

Pin or unpin a prompt for the watchlist. Body: { promptId }.

GET/api/workspace/[domain]/brand-mentions

Fetch GlobalBrandMention rows cross-referenced against this domain's AI responses. Used by AiPresencePanel.

Billing Routes (auth required)

POST/api/billing/create-subscription

Creates a Razorpay subscription using RAZORPAY_PLAN_ID_PRO_MONTHLY. Checks for existing active subscription first. Returns { subscriptionId, keyId }.

POST/api/billing/verify

Verifies HMAC of razorpay_payment_id + razorpay_subscription_id. On success: upserts Subscription with plan=pro, status=active, currentPeriodEnd=+31 days (optimistic).

GSC Routes

GET/api/gsc/auth

Initiates Google OAuth flow for Google Search Console. Redirects to Google consent screen.

GET/api/gsc/callback

OAuth callback. Saves access_token, refresh_token, expiry to GSCConnection.

GET/api/gsc/data

Fetch top queries from GSC for a domain. Refreshes token if expired. Returns { queries: [{query, clicks, impressions, position}] }.

Core Pipeline

Workspace Scan Pipeline

POST /api/workspace/[domain]/topics/run — runs all active prompts for a workspace.

// For each active Prompt in the workspace, concurrently:
for (const prompt of activePrompts) {

  // 1. Build the exact prompt string sent to LLM
  const promptText = prompt.text + INSTRUCTION_SUFFIX
  // INSTRUCTION_SUFFIX: "Please mention specific companies, brands, or websites
  //   by name where relevant. Include URLs where available."

  // 2. Check PromptResponseCache (2-day TTL, keyed by SHA-256 of promptText + engine)
  const cacheKey = sha256(promptText + "|chatgpt")
  const cached = await prisma.promptResponseCache.findUnique({ where: { promptHash_engine } })

  // 3a. Cache hit path
  if (cached && cached.expiresAt > now) {
    const result = parseFromCache(cached)
    // check brand aliases too: workspace.brandAliases.some(a => response.includes(a))
  }

  // 3b. Cache miss — call GPT-4o + Google AIO in parallel
  const [chatgptResult, aioResult] = await Promise.all([
    runPromptAgainstChatGPT(promptText, domain, brandName, competitors, brandAliases),
    runGoogleAIOQuery(promptText, domain, brandName)
  ])

  // 4. Write results
  await Promise.all([
    prisma.promptResult.upsert({
      where:  { promptId_engine_scanRunId: { promptId, engine: "chatgpt", scanRunId } },
      create: { ...chatgptResult, engine: "chatgpt", cachedDate },
      update: { ...chatgptResult }
    }),
    prisma.promptResult.upsert({
      where:  { promptId_engine_scanRunId: { promptId, engine: "google-aio", scanRunId } },
      create: { ...aioResult, engine: "google-aio", aioPresent: aioResult.aioPresent, cachedDate },
      update: { ...aioResult }
    }),
    prisma.promptRankHistory.create({
      data: { promptId, workspaceId, status, position, sentiment, competitorCount,
              llmResponseSnapshot }
    }),
    processCitationPipeline({ promptId, workspaceId, citations, cachedDate }),
    writeGlobalBrandMentions({ promptText, domain, competitors, engine: "chatgpt", ... })
  ])
}

// After all prompts: recompute CitationAnalysis
await recomputeCitationAnalysis(workspaceId)

Mention Detection Logic

// Three parallel checks, any one triggers mentioned = true
const lower = response.toLowerCase()
const bareDomain = domain.replace(/^www\./, "")
const domainBase = bareDomain.split(".")[0]

const mentioned =
  lower.includes(bareDomain) ||                          // exact domain match
  lower.includes(brandName.toLowerCase()) ||             // brand name substring (min 3 chars)
  new RegExp("\\b" + domainBase + "\\b", "i").test(lower)  // word-boundary domain base

// Brand alias check (from workspace.brandAliases)
const aliasHit = brandAliases.some(
  alias => alias.length >= 4 && lower.includes(alias.toLowerCase())
)

const finalMentioned = mentioned || aliasHit

Rank Extraction Logic

// Split response by newline, count numbered list items
let listCount = 0
let rank: number | null = null

for (const line of response.split("\n")) {
  if (/^\s*\d+[.)]/m.test(line)) listCount++
  if (line.toLowerCase().includes(brandLower) || line.includes(bareDomain)) {
    rank = listCount > 0 ? listCount : 1  // default to 1 if not in numbered list
    break
  }
}
if (mentioned && rank === null) rank = 1

Sentiment Analysis Logic

// 450-char context window around first brand mention
const idx = lower.indexOf(brandLower)
const window = response.slice(Math.max(0, idx - 150), idx + 300)

const POSITIVE = ["best", "excellent", "highly recommended", "top", "leading", "trusted",
                  "affordable", "effective", "quality", "award"]
const NEGATIVE = ["avoid", "scam", "poor", "unreliable", "bad reviews", "issues",
                  "complaints", "overpriced", "worst"]

const sentiment =
  NEGATIVE.some(w => window.toLowerCase().includes(w)) ? "negative"
  : POSITIVE.some(w => window.toLowerCase().includes(w)) ? "positive"
  : "neutral"  // sentiment only set when brand is mentioned

Algorithms

Scoring Algorithms

AEO Visibility Score (0–100)

Computed on the frontend from PromptResult rows belonging to a ScanRun. Accounts for both presence and sentiment quality of each mention.

// Score weights per mention quality
const WEIGHTS = {
  "visible+positive": 1.0,
  "visible+neutral":  0.8,
  "visible+negative": 0.5,
  "not_found":        0.0,
}

// Per ScanRun across all prompts
const weightedSum = results.reduce((acc, r) => {
  if (!r.mentioned) return acc
  if (r.sentiment === "positive") return acc + 1.0
  if (r.sentiment === "negative") return acc + 0.5
  return acc + 0.8  // neutral
}, 0)

const aeoScore = Math.round((weightedSum / results.length) * 100)
// Example: 5 prompts, weights [1.0, 0.8, 0.5, 0.0, 0.0] → score = 46

Citation Diversity Score

// Weights per citation type
const TYPE_WEIGHTS = {
  news_media:       3.0,
  review_platform:  2.5,
  directory:        2.0,
  third_party_blog: 1.5,
  aggregator:       1.5,
  social:           1.0,
  ecommerce:        1.0,
  owned:            0.5,
  unknown:          0.3,
}

// Computed in recomputeCitationAnalysis()
// Each UNIQUE domain contributes exactly once (breadth rewarded, not repetition)
const diversityScore = entityCitationDomains.reduce(
  (acc, ecd) => acc + (TYPE_WEIGHTS[ecd.citationType] ?? 0.3),
  0
)
// Stored in CitationAnalysis.diversityScore (1 decimal place)
// No defined maximum. >15 = strong multi-channel authority, <3 = weak presence

Mention Score (per PromptResult)

// src/lib/mention-detection.ts
// mentionTypes: string[] — classification of how the brand was mentioned
// Possible values:
//   "direct_answer"     — brand is the direct answer to the query
//   "featured_in_list"  — brand appears in a numbered/bulleted list
//   "own_url_cited"     — brand's own URL is cited
//   "third_party_cited" — third-party URL about the brand is cited
//   "comparison_mention"— brand mentioned in comparison context

// mentionScore: weighted composite
const mentionScore = detectMentionTypes(response, domain, brandName)
  .reduce((acc, type) => acc + MENTION_TYPE_WEIGHTS[type], 0)

Citation Health Flags

// Written to CitationAnalysis.flags[] after every scan
const flags: string[] = []

if (ownedCount === 0)
  flags.push("LLMs never cite your site directly — critical issue")

if (ownedCount / totalCitations < 0.30)
  flags.push("Strengthen your own site content")

if (maxDomainCount / totalCitations > 0.60)
  flags.push("Over-reliant on one source — diversify")

Caching

Cache Architecture

PromptResponseCache

  • ·Shared cross-workspace PostgreSQL table
  • ·Key: SHA-256(promptText + "|" + engine)
  • ·2-day TTL for workspace scans
  • ·24-hour TTL for free scans
  • ·Covers both chatgpt and google-aio engines separately
  • ·expiresAt indexed for efficient cleanup

PromptResult table (daily cache)

  • ·Per-workspace, per-day mutable cache
  • ·Unique key: (promptId, engine, scanRunId)
  • ·Upserted on each scan run
  • ·Force-refresh: delete all rows for today, re-run
  • ·cachedDate field (YYYY-MM-DD) enables daily dedup
  • ·Separate from immutable PromptRankHistory
// Cache lookup in runPromptAgainstChatGPT()
const promptHash = createHash("sha256").update(fullPrompt + "|chatgpt").digest("hex")
const cached = await prisma.promptResponseCache.findUnique({
  where: { promptHash_engine: { promptHash, engine: "chatgpt" } }
})

if (cached && cached.expiresAt > new Date()) {
  return parseFromCache(cached)  // alias hits still re-checked against fresh aliases
}

// Cache miss → call OpenAI
const completion = await openai.chat.completions.create({ model: "gpt-4o", temperature: 0.3, max_tokens: 900, ... })

// Upsert cache
await prisma.promptResponseCache.upsert({
  where:  { promptHash_engine: { promptHash, engine: "chatgpt" } },
  create: { promptHash, engine: "chatgpt", rawResponse, expiresAt, ...parsed },
  update: { rawResponse, expiresAt, ...parsed },
})

Citation System

Citation Pipeline

// processCitationPipeline() — runs after every prompt scan
async function processCitationPipeline({ promptId, workspaceId, citations, cachedDate }) {

  // Stage 1: Write ScanCitation rows (skipDuplicates makes force-refresh idempotent)
  await prisma.scanCitation.createMany({
    data: citations.map(c => ({
      workspaceId, promptId, url: c.url,
      domain:          extractDomain(c.url),
      citationName:    c.name,
      citationType:    classifyCitationType(extractDomain(c.url)),
      isOwnedByEntity: extractDomain(c.url) === workspaceDomain,
      cachedDate,
    })),
    skipDuplicates: true,  // unique on (workspaceId, promptId, url, cachedDate)
  })
}

// Stage 2: recomputeCitationAnalysis() — runs after ALL prompts complete
async function recomputeCitationAnalysis(workspaceId) {
  const allCitations = await prisma.scanCitation.findMany({ where: { workspaceId } })

  // Group by domain, compute per-domain stats
  const domainMap = groupBy(allCitations, c => c.domain)
  await Promise.all(
    Object.entries(domainMap).map(([domain, rows]) =>
      prisma.entityCitationDomain.upsert({
        where:  { workspaceId_domain: { workspaceId, domain } },
        create: { workspaceId, domain, citationType: rows[0].citationType,
                  timesCited: rows.length, promptsList: [...new Set(rows.map(r => r.promptText))] },
        update: { timesCited: rows.length, lastSeen: new Date(), promptsList: ... },
      })
    )
  )

  // Compute entity-level diversity score and flags
  const diversityScore = entityDomains.reduce((acc, ecd) => acc + TYPE_WEIGHTS[ecd.citationType], 0)
  await prisma.citationAnalysis.upsert({ ... })
}

Citation Type Classification

// Priority-ordered lookup (first match wins)
const SOCIAL_DOMAINS    = new Set(["linkedin.com", "twitter.com", "instagram.com", "youtube.com", ...])
const REVIEW_DOMAINS    = new Set(["trustpilot.com", "g2.com", "yelp.com", "practo.com", ...])
const DIRECTORY_DOMAINS = new Set(["justdial.com", "indiamart.com", "clutch.co", ...])
const NEWS_PATTERNS     = [/news/, /times/, /post/, /media/, /press/, /journal/, ...]
const ECOMMERCE_DOMAINS = new Set(["amazon.in", "flipkart.com", "ebay.com", ...])
const AGGREGATOR_DOMAINS = new Set(["policybazaar.com", "booking.com", "zomato.com", ...])

function classifyCitationType(domain: string): CitationType {
  if (!domain) return "unknown"
  if (SOCIAL_DOMAINS.has(domain))     return "social"
  if (REVIEW_DOMAINS.has(domain))     return "review_platform"
  if (DIRECTORY_DOMAINS.has(domain))  return "directory"
  if (NEWS_PATTERNS.some(p => p.test(domain))) return "news_media"
  if (ECOMMERCE_DOMAINS.has(domain))  return "ecommerce"
  if (AGGREGATOR_DOMAINS.has(domain)) return "aggregator"
  if (domain.includes("blog") || domain.includes("medium") || ...) return "third_party_blog"
  return "unknown"
}

URL Extraction from AI Response

// Two-pass extraction, capped at 6 citations per response
const citations: { name: string; url: string }[] = []
const seen = new Set<string>()

// Pass 1: markdown links [text](url)
const mdLinks = response.matchAll(/\[([^\]]+)\]\((https?:\/\/[^\s)]+)\)/g)
for (const [, name, url] of mdLinks) {
  if (!seen.has(url)) { citations.push({ name, url }); seen.add(url) }
  if (citations.length >= 6) return citations
}

// Pass 2: bare https:// URLs not already captured
const bareUrls = response.matchAll(/https?:\/\/[a-zA-Z0-9-.]+\.[a-z]{2,}[^\s)]*/g)
for (const [url] of bareUrls) {
  if (!seen.has(url)) { citations.push({ name: extractDomain(url), url }); seen.add(url) }
  if (citations.length >= 6) return citations
}

Auth & Payments

Authentication & Billing

Clerk Auth Architecture

// middleware.ts — route protection
const isProtectedRoute = createRouteMatcher([
  "/dashboard(.*)", "/workspace(.*)", "/api/workspace(.*)", "/api/gsc(.*)"
])
const isPublicApi = createRouteMatcher([
  "/api/stripe/webhook", "/api/billing/webhook"
])

// Server components
const { userId } = await auth()
const user = await currentUser()

// API routes
const { userId } = await auth()
if (!userId) return NextResponse.json({ error: "Unauthorized" }, { status: 401 })

// Workspace ownership — always validated
const workspace = await prisma.workspace.findUnique({
  where: { clerkUserId_domain: { clerkUserId: userId, domain } }
})
if (!workspace) return 404

Trial Flow

// PLG trial: free scan → signup CTA → localStorage → post-modal useEffect
// 1. User completes free scan on landing page
// 2. TrialCTA modal shown → user signs up via Clerk
// 3. After Clerk signup: useEffect reads localStorage "pendingTrialDomain"
// 4. Calls POST /api/trial/start → creates Subscription { status: "trialing", currentPeriodEnd: +7d }
// 5. Redirects to /workspace/[domain]?setup=1

// Trial banner: DomainShell.tsx fetches GET /api/trial/status on mount
// Shows amber banner when daysLeft <= 5, red when <= 2

Razorpay Subscription Flow

// 1. User clicks upgrade button
// 2. POST /api/billing/create-subscription
//    → razorpay.subscriptions.create({ plan_id: RAZORPAY_PLAN_ID_PRO_MONTHLY,
//         total_count: 120, notes: { clerkUserId } })
//    → returns { subscriptionId, keyId }

// 3. Client opens Razorpay checkout modal (checkout.js)
//    → on success handler fires with { razorpay_payment_id, razorpay_subscription_id, razorpay_signature }

// 4. POST /api/billing/verify (optimistic activation)
//    → HMAC verify: SHA-256(paymentId + "|" + subscriptionId)
//    → upsert Subscription { plan: "pro", status: "active", currentPeriodEnd: +31d }

// 5. Webhook (authoritative) POST /api/billing/webhook
//    → verifies x-razorpay-signature HMAC
//    → subscription.activated/charged → active + real periodEnd (entity.current_end * 1000)
//    → subscription.halted → past_due
//    → subscription.cancelled/completed → canceled
//    → resolves by notes.clerkUserId first, falls back to razorpaySubscriptionId

External Services

External Integrations

OpenAI (GPT-4o)

// src/app/api/workspace/[domain]/topics/run/route.ts
const completion = await openai.chat.completions.create({
  model:       "gpt-4o",
  temperature: 0.3,       // low temp for consistent brand mention behavior
  max_tokens:  900,
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user",   content: fullPrompt },
  ]
})
// fullPrompt = prompt.text + "\n\nPlease mention specific companies..." + INSTRUCTION_SUFFIX
// Brand aliases injected into systemPrompt if workspace.brandAliases.length > 0

DataForSEO — Google AIO

// src/lib/runGoogleAIOQuery.ts
// Auth: Basic base64("EMAIL:PASSWORD") from DATAFORSEO_LOGIN + DATAFORSEO_PASSWORD env
const response = await fetch(
  "https://api.dataforseo.com/v3/serp/google/organic/live/advanced",
  {
    method:  "POST",
    headers: { Authorization: "Basic " + btoa(login + ":" + password),
               "Content-Type": "application/json" },
    body: JSON.stringify([{
      keyword:       query,
      location_code: locationCode ?? 2356,  // India default
      language_code: "en",
      device:        "desktop"
    }])
  }
)

// Parse AIO block
const items = data.tasks[0]?.result[0]?.items ?? []
const aioItem = items.find(item => item.type === "ai_overview")

if (!aioItem) return { aioPresent: false, mentioned: false, ... }

const aioText  = aioItem.text?.slice(0, 1500) ?? null
const citations = aioItem.references?.map(r => ({ name: r.title, url: r.url })) ?? []
// Brand detection: same logic as ChatGPT (domain/brandName substring + regex)
// Cache in PromptResponseCache with engine: "google-aio", 2-day TTL

DataForSEO — SEO Metrics

// Used in free scan (POST /api/scan)
// Domain Rank + Backlinks:
POST https://api.dataforseo.com/v3/backlinks/domain_pages_summary/live
→ domainRank, backlinksCount, referringDomains

// Organic Keywords:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/keywords_for_site/live
→ organicKeywordsCount, topRankedKeywords [{keyword, position, volume, cpc}]

// Knowledge Panel + Local Pack + People Also Ask:
POST https://api.dataforseo.com/v3/serp/google/organic/live/advanced
→ hasKnowledgePanel (item.type === "knowledge_graph")
→ inLocalPack (item.type === "local_pack")
→ peopleAlsoAsk (item.type === "people_also_ask")

Google Search Console

// OAuth 2.0 flow
// Scopes: https://www.googleapis.com/auth/webmasters.readonly
// Tokens stored in GSCConnection: { accessToken, refreshToken, expiresAt }

// Data fetch (GET /api/gsc/data)
const response = await fetch(
  "https://www.googleapis.com/webmasters/v3/sites/{siteUrl}/searchAnalytics/query",
  {
    method: "POST",
    headers: { Authorization: "Bearer " + accessToken },
    body: JSON.stringify({
      startDate: "2024-01-01", endDate: today,
      dimensions: ["query"],
      rowLimit: 20,
      orderby: [{ fieldName: "impressions", sortOrder: "descending" }]
    })
  }
)

Configuration

Environment Variables

# ─── Database ─────────────────────────────────────────────────────────────────
DATABASE_URL="postgresql://..."              # Neon PostgreSQL connection string

# ─── Clerk Auth ────────────────────────────────────────────────────────────────
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_..."
CLERK_SECRET_KEY="sk_..."
NEXT_PUBLIC_CLERK_SIGN_IN_URL="/sign-in"
NEXT_PUBLIC_CLERK_SIGN_UP_URL="/sign-up"
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL="/dashboard"
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL="/dashboard"

# ─── OpenAI ────────────────────────────────────────────────────────────────────
OPENAI_API_KEY="sk-..."

# ─── DataForSEO ────────────────────────────────────────────────────────────────
DATAFORSEO_LOGIN="email@example.com"        # DataForSEO account email
DATAFORSEO_PASSWORD="..."                   # DataForSEO account password
DATAFORSEO_DEFAULT_LOCATION_CODE="2356"     # optional; 2356=India, 2840=USA

# ─── Google Search Console ─────────────────────────────────────────────────────
GOOGLE_CLIENT_ID="..."
GOOGLE_CLIENT_SECRET="..."
NEXT_PUBLIC_BASE_URL="https://yourapp.com"  # used for OAuth redirect URI

# ─── Razorpay ──────────────────────────────────────────────────────────────────
RAZORPAY_KEY_ID="rzp_..."                   # publishable key (used in checkout.js)
RAZORPAY_KEY_SECRET="..."                   # secret key (server only)
RAZORPAY_PLAN_ID_PRO_MONTHLY="plan_..."     # plan ID from Razorpay dashboard
RAZORPAY_WEBHOOK_SECRET="..."               # webhook signing secret

# ─── Optional / Legacy ─────────────────────────────────────────────────────────
STRIPE_SECRET_KEY="sk_..."                  # optional; admin routes gracefully skip if absent
STRIPE_WEBHOOK_SECRET="whsec_..."           # optional

UI Routing

Key Page Routes & Component Architecture

// ─── Public routes (no auth) ──────────────────────────────────────────────────
/                           → src/app/page.tsx            (landing + free scan form)
/brand/[domain]             → src/app/brand/[domain]/page.tsx  (public results page)
/methodology                → src/app/methodology/page.tsx     (original technical page)
/methodology/technical      → src/app/methodology/technical/page.tsx (this page)
/how-it-works               → src/app/how-it-works/page.tsx   (customer-friendly page)

// ─── Auth routes ───────────────────────────────────────────────────────────────
/sign-in                    → Clerk-hosted or custom sign-in
/sign-up                    → Clerk-hosted or custom sign-up

// ─── Protected routes ──────────────────────────────────────────────────────────
/dashboard                  → src/app/dashboard/page.tsx      (all workspaces list)
/workspace/[domain]         → src/app/workspace/[domain]/page.tsx
  // searchParams:
  //   ?setup=1             → BrandSetupWizard (if no categories yet)
  //   ?setup=1             → redirect to ?autoGenerate=1 (if categories exist)
  //   ?autoGenerate=1      → TopicsManager with autoGenerate={true}
/workspace/[domain]/results/[runId] → scan results detail page
/settings/billing           → src/app/settings/billing/page.tsx

// ─── Key client components ─────────────────────────────────────────────────────
DomainShell.tsx             → shell/layout for workspace pages, trial banner
BrandSetupWizard.tsx        → category/city/alias wizard (shown once on ?setup=1)
TopicsManager.tsx           → topic + prompt management, auto-generate trigger
PromptDrawer.tsx            → slide-over showing full AI response for a prompt
PublicScanResults.tsx       → free scan results on /brand/[domain]
RazorpayButton.tsx          → upgrade button, loads checkout.js, handles payment flow
AiPresencePanel.tsx         → GlobalBrandMention cross-intelligence view

Architecture

Design Decisions & Invariants

Mutable cache vs immutable ledger

PromptResult is mutable (upserted daily, overwritten on force-refresh). PromptRankHistory and ScanCitation are immutable (append-only, never deleted or modified). Derived views (EntityCitationDomain, CitationAnalysis) are recomputed from immutable tables — they can be wiped and rebuilt without data loss. This means the system is replayable.

Idempotency by design

Every write is safe to repeat. PromptResult uses upsert with composite unique key. ScanCitation uses createMany skipDuplicates. EntityCitationDomain and CitationAnalysis are recomputed from scratch (not incremented). Running the pipeline 10 times produces the same result as once.

Cross-workspace PromptResponseCache

The LLM response cache is shared across all workspaces. This means if two users track the same domain and run the same prompt, the second call is free. Cache key is SHA-256 of the exact prompt string + engine. This also means AI responses are not user-personalised — they're real AI answers to the query.

Brand aliases are pre-normalized

Aliases stored in BrandEntity.aliases and workspace.brandAliases are lowercased and alphanumeric-only. The mention detection code does the same normalisation at query time — no fuzzy matching, pure substring includes(). Minimum alias length of 4 characters prevents false positives on acronyms.

Google AIO is non-deterministic — flagged in UI

Google AI Overview appearance varies by location, A/B test, query freshness, and device. A 'not present' result does not mean AIO never appears for that query — it means it didn't appear on that specific run. PromptResult.aioPresent distinguishes 'no AIO block appeared' (false) from 'AIO appeared but brand not cited' (true + mentioned=false). The UI communicates this clearly.

No prisma migrate dev — db push only

Vercel serverless and Neon's serverless driver don't support the interactive TTY required by prisma migrate dev. All schema changes are applied via npx prisma db push --accept-data-loss. This means no migration history files — schema.prisma is the single source of truth.

Razorpay webhook is the authoritative billing source

The /api/billing/verify route does optimistic activation (so UX doesn't stall). The webhook fires within seconds and provides the authoritative period end from Razorpay's entity.current_end (Unix timestamp). If verify succeeds but webhook is late, the subscription is still valid. If verify fails but webhook fires, the webhook upserts the subscription correctly.

Workspace ownership always validated

All workspace API routes look up the workspace by (clerkUserId, domain) — not just domain. A user can never access another user's workspace data even if they know the domain. The composite unique key @@unique([clerkUserId, domain]) enforces one workspace per user per domain at the DB level.