
AI Visibility Platform
Technical Architecture
Complete developer reference: full database schema, all API routes, data flow, scoring algorithms, caching strategy, authentication, external integrations, and design decisions. Sufficient for an AI or developer to fully understand the product from scratch.
Product Overview
What This Product Does
AI Visibility is a SaaS platform that tracks whether a business is mentioned in AI assistant responses (ChatGPT, Google AIO, Gemini). Users register their domain, the platform runs AI queries on their behalf, parses responses for brand mentions, and builds historical trend data over time. The core value proposition is giving SMBs the same visibility into AI search that SEO tools gave them for Google search.
Product Type
B2B SaaS, PLG trial → paid subscription
Primary User
SMB owner / digital marketing manager
Core Loop
Scan → View results → Take action → Re-scan
Free Tier
Public scan (no auth), 5 queries, ChatGPT only
Paid Tier
Workspace scans, 20 prompts, AIO + ChatGPT, history
Revenue Model
Razorpay subscription (₹999/mo Pro plan)
Technology
Tech Stack
Framework
- Next.js 14 App Router
- React Server Components
- TypeScript
Database
- PostgreSQL (Neon serverless)
- Prisma ORM
- Prisma db push (no migrations TTY)
Auth
- Clerk (JWT-based)
- clerkMiddleware
- auth() / currentUser() server helpers
AI / LLM
- OpenAI GPT-4o (main engine)
- Google AI Overviews via DataForSEO SERP API
External APIs
- DataForSEO (SERP, backlinks, keywords)
- Google Search Console OAuth
- Razorpay (billing)
Deployment
- Vercel (serverless functions)
- maxDuration: 120s on scan routes
Database
Full Prisma Schema
PostgreSQL via Neon. All models use cuid() primary keys. Cascade deletes are set on all child relations. No migrations TTY — changes applied via npx prisma db push.
// ─── Workspace ────────────────────────────────────────────────────────────────
// One per user per domain. Container for all tracking data.
model Workspace {
id String @id @default(cuid())
clerkUserId String // Clerk user ID
domain String // bare domain e.g. "abc.com"
displayName String?
brandName String? // overridable brand name for AI tracking
brandPrimaryColor String? // hex e.g. "#4f46e5" (from scan)
brandBlogUrl String? // blog URL override
businessAbout String? @db.Text
businessType String? // "service"|"product"|"hybrid"
headquarters String?
locationsServed String?
servicesOffered String? @db.Text
establishedYear Int?
locationCode Int? // DataForSEO location_code e.g. 2356
locationName String? // e.g. "Delhi,Delhi,India"
// ── Brand Profile (confirmed during Brand Setup Wizard) ──────────────────────
categories String[] @default([]) // e.g. ["Audiology","Hearing Aids"]
primaryCategory String? // used in prompt templates
operatingCities String[] @default([])
primaryCity String? // used in geo-scoped prompts
brandAliases String[] @default([]) // name variants for mention detection
timezone String @default("UTC")
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@unique([clerkUserId, domain])
@@index([clerkUserId])
}
// ─── Topic / Prompt hierarchy ────────────────────────────────────────────────
// Topics are categories (e.g. "Brand Awareness"), Prompts are individual queries.
model Topic {
id String @id @default(cuid())
workspaceId String
name String
sortOrder Int @default(0)
isActive Boolean @default(true)
createdAt DateTime @default(now())
prompts Prompt[]
@@index([workspaceId])
}
model Prompt {
id String @id @default(cuid())
topicId String
text String
intent String? // "informational"|"navigational"|"transactional"|"commercial"
isActive Boolean @default(true)
sortOrder Int @default(0)
createdAt DateTime @default(now())
@@index([topicId])
}
// ─── ScanRun ─────────────────────────────────────────────────────────────────
// One row per scan execution. Groups all PromptResults from that run.
model ScanRun {
id String @id @default(cuid())
workspaceId String
localDate String // "YYYY-MM-DD" in workspace timezone
runAt DateTime @default(now())
promptsRan Int @default(0)
inputTokens Int @default(0)
outputTokens Int @default(0)
estimatedCostUsd Float @default(0)
@@index([workspaceId, localDate])
}
// ─── PromptResult ────────────────────────────────────────────────────────────
// Mutable daily cache. One row per (prompt × engine × scanRun).
// Unique key: (promptId, engine, scanRunId).
model PromptResult {
id String @id @default(cuid())
promptId String
scanRunId String?
engine String @default("chatgpt") // "chatgpt" | "google-aio"
rawResponse String @db.Text
mentioned Boolean @default(false)
rank Int? // 1-indexed position in numbered list
sentiment String? // "positive"|"neutral"|"negative"
citations Json @default("[]") // [{ name: string, url: string }]
competitors Json @default("[]") // ["rival.com", ...]
mentionTypes String[] @default([]) // see mention-detection.ts
mentionScore Float @default(0)
aioPresent Boolean? // google-aio only: did AIO block appear?
cachedDate String // "YYYY-MM-DD"
runAt DateTime @default(now())
@@unique([promptId, engine, scanRunId])
}
// ─── PromptRankHistory ────────────────────────────────────────────────────────
// Immutable append-only ledger. Written on every scan. Powers trend charts.
model PromptRankHistory {
id String @id @default(cuid())
promptId String
workspaceId String
scannedAt DateTime @default(now())
status String // "visible" | "not_found"
position Int?
sentiment String?
competitorCount Int @default(0)
llmResponseSnapshot String? @db.Text
@@index([promptId, scannedAt])
@@index([workspaceId, scannedAt])
}
// ─── ScanCitation ────────────────────────────────────────────────────────────
// One row per URL per prompt per day. Unique key prevents duplicate writes.
model ScanCitation {
id String @id @default(cuid())
workspaceId String
promptId String
promptText String? @db.Text
url String @db.Text
domain String
citationName String?
citationType String // "news_media"|"review_platform"|"directory"|"third_party_blog"
// |"aggregator"|"social"|"ecommerce"|"owned"|"unknown"
isOwnedByEntity Boolean @default(false)
scannedAt DateTime @default(now())
cachedDate String
@@unique([workspaceId, promptId, url, cachedDate])
}
// ─── EntityCitationDomain ─────────────────────────────────────────────────────
// Per-domain aggregation. Upserted (not incremented) after every scan.
model EntityCitationDomain {
id String @id @default(cuid())
workspaceId String
domain String
citationType String
isOwnedByEntity Boolean @default(false)
timesCited Int @default(0)
firstSeen DateTime @default(now())
lastSeen DateTime @default(now())
promptsList String[] // unique prompt texts that triggered this citation
@@unique([workspaceId, domain])
}
// ─── CitationAnalysis ────────────────────────────────────────────────────────
// Entity-level summary. One row per workspace. Upserted after every scan.
model CitationAnalysis {
id String @id @default(cuid())
workspaceId String @unique
diversityScore Float @default(0) // Σ(typeWeight per unique domain)
totalCitations Int @default(0)
uniqueDomains Int @default(0)
ownedCount Int @default(0)
topDomainRatio Float @default(0) // highest single-domain share (0.0–1.0)
flags String[] // recommendation strings
lastComputedAt DateTime @default(now())
}
// ─── Task ─────────────────────────────────────────────────────────────────────
// User-facing action items generated from scan analysis.
model Task {
id String @id @default(cuid())
workspaceId String
title String
description String? @db.Text
category String // "schema"|"content"|"entity"|"citations"|"gbp"|"other"
priority String // "high"|"medium"|"low"
effort String // "quick"|"medium"|"large"
status String @default("todo") // "todo"|"in_progress"|"done"
linkedPromptId String?
linkedPromptText String?
expectedOutcome String?
createdBy String // Clerk userId
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
completedAt DateTime?
}
// ─── PinnedPrompt ─────────────────────────────────────────────────────────────
// User-curated watchlist. Unique per (workspaceId, promptId).
model PinnedPrompt {
id String @id @default(cuid())
workspaceId String
promptId String
pinnedAt DateTime @default(now())
notes String?
@@unique([workspaceId, promptId])
}
// ─── Subscription ─────────────────────────────────────────────────────────────
// One row per Clerk user.
model Subscription {
id String @id @default(cuid())
clerkUserId String @unique
plan String @default("free") // "free"|"starter"|"pro"|"agency"
status String @default("active") // "active"|"trialing"|"past_due"|"canceled"
stripeCustomerId String? @unique
stripeSubscriptionId String? @unique
razorpaySubscriptionId String? @unique
currentPeriodEnd DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
// ─── BrandEntity ──────────────────────────────────────────────────────────────
// One row per known brand/domain. Aliases are pre-normalized (lowercase,
// alphanumeric only) for fast exact-match in AI responses.
model BrandEntity {
id String @id @default(cuid())
domain String @unique // "widex.in"
brandName String // "Widex India"
aliases String[] // ["widexindia", "widex", "widex.in"]
clerkUserId String?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
// ─── GlobalBrandMention ───────────────────────────────────────────────────────
// Cross-workspace. Every brand detected in ANY AI response is logged here.
// Enables pre-fill on signup, cross-mention emails, superadmin analytics.
model GlobalBrandMention {
id String @id @default(cuid())
mentionedDomain String? // resolved domain
mentionedName String // name as appeared in AI text
promptText String @db.Text
promptIntent String?
triggerDomain String // whose scan produced this
engine String @default("chatgpt")
mentionTypes String[]
isDirectAnswer Boolean @default(false)
listPosition Int?
citationUrls String[]
sentiment String?
mentionScore Float @default(0)
detectionMethod String @default("citation") // "citation"|"own_scan"|"alias_match"
scannedAt DateTime @default(now())
sector String?
categories String[] @default([])
}
// ─── PromptResponseCache ──────────────────────────────────────────────────────
// Shared cross-workspace cache. Key = SHA-256(exact prompt sent to LLM + "|" + engine).
// 2-day TTL for workspace scans, 24h for free scans.
model PromptResponseCache {
id String @id @default(cuid())
promptHash String
engine String
rawResponse String @db.Text
mentioned Boolean @default(false)
rank Int?
sentiment String?
citations Json @default("[]")
competitors Json @default("[]")
inputTokens Int @default(0)
outputTokens Int @default(0)
createdAt DateTime @default(now())
expiresAt DateTime
@@unique([promptHash, engine])
@@index([expiresAt])
}
// ─── Scan (public/anonymous) ──────────────────────────────────────────────────
// Free scan initiated from the landing page — no auth required.
model Scan {
id String @id @default(cuid())
url String
normalizedUrl String
domain String @default("")
status String @default("pending") // "pending"|"processing"|"completed"|"failed"
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
// ─── BusinessProfile ──────────────────────────────────────────────────────────
// Enriched business data attached to a free Scan. Populated from:
// scraping homepage, JSON-LD structured data, DataForSEO, and AI (Gemini/GPT).
model BusinessProfile {
id String @id @default(cuid())
scanId String @unique
name String?
description String?
location String?
businessType String // "product"|"service"|"hybrid"
confidence Float
blogUrl String?
sitemapUrl String?
pageTitle String?
faviconUrl String?
themeColor String?
socialLinks Json @default("{}") // { facebook, instagram, linkedin, youtube, twitter }
telephone String? // from schema.org
priceRange String? // "$"|"$$"|"$$$"
ratingValue Float?
reviewCount Int?
schemaType String? // schema.org @type e.g. "MedicalBusiness"
sameAs Json @default("[]") // official profile URLs
openingHours Json @default("[]") // ["Mon-Fri 09:00-18:00", ...]
liveResults Json @default("[]") // [{query, mentioned, response, competitors, cachedAt, googleAio?: {...}}]
products Json @default("[]")
services Json @default("[]")
cities Json @default("[]")
country String?
gmbUrl String?
email String?
sector String?
primaryCategory String?
secondaryCategories Json @default("[]")
categoryConfidence Float?
aeoChecklist Json @default("[]") // [{id, label, pass, impact, tip}]
domainRank Int?
backlinksCount Int?
referringDomains Int?
organicKeywordsCount Int?
topRankedKeywords Json @default("[]") // [{keyword, position, volume, url, cpc}]
confirmedCity String?
confirmedName String?
hasKnowledgePanel Boolean @default(false)
inLocalPack Boolean @default(false)
peopleAlsoAsk Json @default("[]")
competitorInsights Json @default("[]") // [{domain, domainRank, backlinksCount, mentionCount}]
createdAt DateTime @default(now())
}API Reference
All API Routes
Public Routes (no auth required)
/api/scanStart a free scan. Body: { url }. Scrapes homepage, runs AI enrichment, runs 5 live GPT-4o + DataForSEO AIO queries. Returns { scanId }.
/api/scan/[id]Poll scan status. Returns { status, businessProfile } when complete.
/api/brand/[domain]/confirm-citySave confirmedCity on BusinessProfile. Body: { city }. No auth — scoped by domain.
/api/brand/[domain]/confirm-nameSave confirmedName on BusinessProfile. Body: { name }.
/api/trial/startCreate a Subscription row with status=trialing, currentPeriodEnd=+7 days. Requires Clerk auth (reads userId from JWT).
/api/trial/statusReturns { status, plan, daysLeft } for the authenticated user. Returns free plan if no subscription found.
/api/billing/webhookRazorpay subscription lifecycle webhook. Verifies x-razorpay-signature HMAC. Handles activated/charged/halted/cancelled/pending events. In isPublicApi middleware bypass — no Clerk auth.
Workspace Routes (Clerk auth required)
/api/workspace/[domain]Fetch full workspace data including topics, latest scan results, and citation analysis.
/api/workspace/[domain]Create workspace for domain. Triggers BusinessProfile fetch, brand entity registration.
/api/workspace/[domain]/setupSave Brand Setup Wizard output: categories, primaryCategory, operatingCities, primaryCity, brandAliases. Syncs BrandEntity aliases (normalized, deduplicated).
/api/workspace/[domain]/topics/generateAI-generate 3 topics + 3–5 prompts per topic. Uses primaryCategory + primaryCity from workspace. Returns created Topic and Prompt records.
/api/workspace/[domain]/topics/runRun scan. For each active prompt: check PromptResponseCache, else call GPT-4o + Google AIO in parallel. Writes PromptResult, PromptRankHistory, ScanCitation, GlobalBrandMention. maxDuration: 120.
/api/workspace/[domain]/scan-runsList all ScanRun records for workspace with cost and token summaries.
/api/workspace/[domain]/results/[runId]All PromptResult rows for a specific ScanRun, all engines, ordered by engine asc.
/api/workspace/[domain]/citationsEntityCitationDomain + CitationAnalysis for workspace. Used by citation panel.
/api/workspace/[domain]/tasksCreate a Task. Body: { title, category, priority, effort, linkedPromptId, expectedOutcome }.
/api/workspace/[domain]/tasks/[taskId]Update task status. Body: { status }. Sets completedAt timestamp when status=done.
/api/workspace/[domain]/prompts/pinPin or unpin a prompt for the watchlist. Body: { promptId }.
/api/workspace/[domain]/brand-mentionsFetch GlobalBrandMention rows cross-referenced against this domain's AI responses. Used by AiPresencePanel.
Billing Routes (auth required)
/api/billing/create-subscriptionCreates a Razorpay subscription using RAZORPAY_PLAN_ID_PRO_MONTHLY. Checks for existing active subscription first. Returns { subscriptionId, keyId }.
/api/billing/verifyVerifies HMAC of razorpay_payment_id + razorpay_subscription_id. On success: upserts Subscription with plan=pro, status=active, currentPeriodEnd=+31 days (optimistic).
GSC Routes
/api/gsc/authInitiates Google OAuth flow for Google Search Console. Redirects to Google consent screen.
/api/gsc/callbackOAuth callback. Saves access_token, refresh_token, expiry to GSCConnection.
/api/gsc/dataFetch top queries from GSC for a domain. Refreshes token if expired. Returns { queries: [{query, clicks, impressions, position}] }.
Core Pipeline
Workspace Scan Pipeline
POST /api/workspace/[domain]/topics/run — runs all active prompts for a workspace.
// For each active Prompt in the workspace, concurrently:
for (const prompt of activePrompts) {
// 1. Build the exact prompt string sent to LLM
const promptText = prompt.text + INSTRUCTION_SUFFIX
// INSTRUCTION_SUFFIX: "Please mention specific companies, brands, or websites
// by name where relevant. Include URLs where available."
// 2. Check PromptResponseCache (2-day TTL, keyed by SHA-256 of promptText + engine)
const cacheKey = sha256(promptText + "|chatgpt")
const cached = await prisma.promptResponseCache.findUnique({ where: { promptHash_engine } })
// 3a. Cache hit path
if (cached && cached.expiresAt > now) {
const result = parseFromCache(cached)
// check brand aliases too: workspace.brandAliases.some(a => response.includes(a))
}
// 3b. Cache miss — call GPT-4o + Google AIO in parallel
const [chatgptResult, aioResult] = await Promise.all([
runPromptAgainstChatGPT(promptText, domain, brandName, competitors, brandAliases),
runGoogleAIOQuery(promptText, domain, brandName)
])
// 4. Write results
await Promise.all([
prisma.promptResult.upsert({
where: { promptId_engine_scanRunId: { promptId, engine: "chatgpt", scanRunId } },
create: { ...chatgptResult, engine: "chatgpt", cachedDate },
update: { ...chatgptResult }
}),
prisma.promptResult.upsert({
where: { promptId_engine_scanRunId: { promptId, engine: "google-aio", scanRunId } },
create: { ...aioResult, engine: "google-aio", aioPresent: aioResult.aioPresent, cachedDate },
update: { ...aioResult }
}),
prisma.promptRankHistory.create({
data: { promptId, workspaceId, status, position, sentiment, competitorCount,
llmResponseSnapshot }
}),
processCitationPipeline({ promptId, workspaceId, citations, cachedDate }),
writeGlobalBrandMentions({ promptText, domain, competitors, engine: "chatgpt", ... })
])
}
// After all prompts: recompute CitationAnalysis
await recomputeCitationAnalysis(workspaceId)Mention Detection Logic
// Three parallel checks, any one triggers mentioned = true
const lower = response.toLowerCase()
const bareDomain = domain.replace(/^www\./, "")
const domainBase = bareDomain.split(".")[0]
const mentioned =
lower.includes(bareDomain) || // exact domain match
lower.includes(brandName.toLowerCase()) || // brand name substring (min 3 chars)
new RegExp("\\b" + domainBase + "\\b", "i").test(lower) // word-boundary domain base
// Brand alias check (from workspace.brandAliases)
const aliasHit = brandAliases.some(
alias => alias.length >= 4 && lower.includes(alias.toLowerCase())
)
const finalMentioned = mentioned || aliasHitRank Extraction Logic
// Split response by newline, count numbered list items
let listCount = 0
let rank: number | null = null
for (const line of response.split("\n")) {
if (/^\s*\d+[.)]/m.test(line)) listCount++
if (line.toLowerCase().includes(brandLower) || line.includes(bareDomain)) {
rank = listCount > 0 ? listCount : 1 // default to 1 if not in numbered list
break
}
}
if (mentioned && rank === null) rank = 1Sentiment Analysis Logic
// 450-char context window around first brand mention
const idx = lower.indexOf(brandLower)
const window = response.slice(Math.max(0, idx - 150), idx + 300)
const POSITIVE = ["best", "excellent", "highly recommended", "top", "leading", "trusted",
"affordable", "effective", "quality", "award"]
const NEGATIVE = ["avoid", "scam", "poor", "unreliable", "bad reviews", "issues",
"complaints", "overpriced", "worst"]
const sentiment =
NEGATIVE.some(w => window.toLowerCase().includes(w)) ? "negative"
: POSITIVE.some(w => window.toLowerCase().includes(w)) ? "positive"
: "neutral" // sentiment only set when brand is mentionedAlgorithms
Scoring Algorithms
AEO Visibility Score (0–100)
Computed on the frontend from PromptResult rows belonging to a ScanRun. Accounts for both presence and sentiment quality of each mention.
// Score weights per mention quality
const WEIGHTS = {
"visible+positive": 1.0,
"visible+neutral": 0.8,
"visible+negative": 0.5,
"not_found": 0.0,
}
// Per ScanRun across all prompts
const weightedSum = results.reduce((acc, r) => {
if (!r.mentioned) return acc
if (r.sentiment === "positive") return acc + 1.0
if (r.sentiment === "negative") return acc + 0.5
return acc + 0.8 // neutral
}, 0)
const aeoScore = Math.round((weightedSum / results.length) * 100)
// Example: 5 prompts, weights [1.0, 0.8, 0.5, 0.0, 0.0] → score = 46Citation Diversity Score
// Weights per citation type
const TYPE_WEIGHTS = {
news_media: 3.0,
review_platform: 2.5,
directory: 2.0,
third_party_blog: 1.5,
aggregator: 1.5,
social: 1.0,
ecommerce: 1.0,
owned: 0.5,
unknown: 0.3,
}
// Computed in recomputeCitationAnalysis()
// Each UNIQUE domain contributes exactly once (breadth rewarded, not repetition)
const diversityScore = entityCitationDomains.reduce(
(acc, ecd) => acc + (TYPE_WEIGHTS[ecd.citationType] ?? 0.3),
0
)
// Stored in CitationAnalysis.diversityScore (1 decimal place)
// No defined maximum. >15 = strong multi-channel authority, <3 = weak presenceMention Score (per PromptResult)
// src/lib/mention-detection.ts // mentionTypes: string[] — classification of how the brand was mentioned // Possible values: // "direct_answer" — brand is the direct answer to the query // "featured_in_list" — brand appears in a numbered/bulleted list // "own_url_cited" — brand's own URL is cited // "third_party_cited" — third-party URL about the brand is cited // "comparison_mention"— brand mentioned in comparison context // mentionScore: weighted composite const mentionScore = detectMentionTypes(response, domain, brandName) .reduce((acc, type) => acc + MENTION_TYPE_WEIGHTS[type], 0)
Citation Health Flags
// Written to CitationAnalysis.flags[] after every scan
const flags: string[] = []
if (ownedCount === 0)
flags.push("LLMs never cite your site directly — critical issue")
if (ownedCount / totalCitations < 0.30)
flags.push("Strengthen your own site content")
if (maxDomainCount / totalCitations > 0.60)
flags.push("Over-reliant on one source — diversify")Caching
Cache Architecture
PromptResponseCache
- ·Shared cross-workspace PostgreSQL table
- ·Key: SHA-256(promptText + "|" + engine)
- ·2-day TTL for workspace scans
- ·24-hour TTL for free scans
- ·Covers both chatgpt and google-aio engines separately
- ·expiresAt indexed for efficient cleanup
PromptResult table (daily cache)
- ·Per-workspace, per-day mutable cache
- ·Unique key: (promptId, engine, scanRunId)
- ·Upserted on each scan run
- ·Force-refresh: delete all rows for today, re-run
- ·cachedDate field (YYYY-MM-DD) enables daily dedup
- ·Separate from immutable PromptRankHistory
// Cache lookup in runPromptAgainstChatGPT()
const promptHash = createHash("sha256").update(fullPrompt + "|chatgpt").digest("hex")
const cached = await prisma.promptResponseCache.findUnique({
where: { promptHash_engine: { promptHash, engine: "chatgpt" } }
})
if (cached && cached.expiresAt > new Date()) {
return parseFromCache(cached) // alias hits still re-checked against fresh aliases
}
// Cache miss → call OpenAI
const completion = await openai.chat.completions.create({ model: "gpt-4o", temperature: 0.3, max_tokens: 900, ... })
// Upsert cache
await prisma.promptResponseCache.upsert({
where: { promptHash_engine: { promptHash, engine: "chatgpt" } },
create: { promptHash, engine: "chatgpt", rawResponse, expiresAt, ...parsed },
update: { rawResponse, expiresAt, ...parsed },
})Citation System
Citation Pipeline
// processCitationPipeline() — runs after every prompt scan
async function processCitationPipeline({ promptId, workspaceId, citations, cachedDate }) {
// Stage 1: Write ScanCitation rows (skipDuplicates makes force-refresh idempotent)
await prisma.scanCitation.createMany({
data: citations.map(c => ({
workspaceId, promptId, url: c.url,
domain: extractDomain(c.url),
citationName: c.name,
citationType: classifyCitationType(extractDomain(c.url)),
isOwnedByEntity: extractDomain(c.url) === workspaceDomain,
cachedDate,
})),
skipDuplicates: true, // unique on (workspaceId, promptId, url, cachedDate)
})
}
// Stage 2: recomputeCitationAnalysis() — runs after ALL prompts complete
async function recomputeCitationAnalysis(workspaceId) {
const allCitations = await prisma.scanCitation.findMany({ where: { workspaceId } })
// Group by domain, compute per-domain stats
const domainMap = groupBy(allCitations, c => c.domain)
await Promise.all(
Object.entries(domainMap).map(([domain, rows]) =>
prisma.entityCitationDomain.upsert({
where: { workspaceId_domain: { workspaceId, domain } },
create: { workspaceId, domain, citationType: rows[0].citationType,
timesCited: rows.length, promptsList: [...new Set(rows.map(r => r.promptText))] },
update: { timesCited: rows.length, lastSeen: new Date(), promptsList: ... },
})
)
)
// Compute entity-level diversity score and flags
const diversityScore = entityDomains.reduce((acc, ecd) => acc + TYPE_WEIGHTS[ecd.citationType], 0)
await prisma.citationAnalysis.upsert({ ... })
}Citation Type Classification
// Priority-ordered lookup (first match wins)
const SOCIAL_DOMAINS = new Set(["linkedin.com", "twitter.com", "instagram.com", "youtube.com", ...])
const REVIEW_DOMAINS = new Set(["trustpilot.com", "g2.com", "yelp.com", "practo.com", ...])
const DIRECTORY_DOMAINS = new Set(["justdial.com", "indiamart.com", "clutch.co", ...])
const NEWS_PATTERNS = [/news/, /times/, /post/, /media/, /press/, /journal/, ...]
const ECOMMERCE_DOMAINS = new Set(["amazon.in", "flipkart.com", "ebay.com", ...])
const AGGREGATOR_DOMAINS = new Set(["policybazaar.com", "booking.com", "zomato.com", ...])
function classifyCitationType(domain: string): CitationType {
if (!domain) return "unknown"
if (SOCIAL_DOMAINS.has(domain)) return "social"
if (REVIEW_DOMAINS.has(domain)) return "review_platform"
if (DIRECTORY_DOMAINS.has(domain)) return "directory"
if (NEWS_PATTERNS.some(p => p.test(domain))) return "news_media"
if (ECOMMERCE_DOMAINS.has(domain)) return "ecommerce"
if (AGGREGATOR_DOMAINS.has(domain)) return "aggregator"
if (domain.includes("blog") || domain.includes("medium") || ...) return "third_party_blog"
return "unknown"
}URL Extraction from AI Response
// Two-pass extraction, capped at 6 citations per response
const citations: { name: string; url: string }[] = []
const seen = new Set<string>()
// Pass 1: markdown links [text](url)
const mdLinks = response.matchAll(/\[([^\]]+)\]\((https?:\/\/[^\s)]+)\)/g)
for (const [, name, url] of mdLinks) {
if (!seen.has(url)) { citations.push({ name, url }); seen.add(url) }
if (citations.length >= 6) return citations
}
// Pass 2: bare https:// URLs not already captured
const bareUrls = response.matchAll(/https?:\/\/[a-zA-Z0-9-.]+\.[a-z]{2,}[^\s)]*/g)
for (const [url] of bareUrls) {
if (!seen.has(url)) { citations.push({ name: extractDomain(url), url }); seen.add(url) }
if (citations.length >= 6) return citations
}Auth & Payments
Authentication & Billing
Clerk Auth Architecture
// middleware.ts — route protection
const isProtectedRoute = createRouteMatcher([
"/dashboard(.*)", "/workspace(.*)", "/api/workspace(.*)", "/api/gsc(.*)"
])
const isPublicApi = createRouteMatcher([
"/api/stripe/webhook", "/api/billing/webhook"
])
// Server components
const { userId } = await auth()
const user = await currentUser()
// API routes
const { userId } = await auth()
if (!userId) return NextResponse.json({ error: "Unauthorized" }, { status: 401 })
// Workspace ownership — always validated
const workspace = await prisma.workspace.findUnique({
where: { clerkUserId_domain: { clerkUserId: userId, domain } }
})
if (!workspace) return 404Trial Flow
// PLG trial: free scan → signup CTA → localStorage → post-modal useEffect
// 1. User completes free scan on landing page
// 2. TrialCTA modal shown → user signs up via Clerk
// 3. After Clerk signup: useEffect reads localStorage "pendingTrialDomain"
// 4. Calls POST /api/trial/start → creates Subscription { status: "trialing", currentPeriodEnd: +7d }
// 5. Redirects to /workspace/[domain]?setup=1
// Trial banner: DomainShell.tsx fetches GET /api/trial/status on mount
// Shows amber banner when daysLeft <= 5, red when <= 2Razorpay Subscription Flow
// 1. User clicks upgrade button
// 2. POST /api/billing/create-subscription
// → razorpay.subscriptions.create({ plan_id: RAZORPAY_PLAN_ID_PRO_MONTHLY,
// total_count: 120, notes: { clerkUserId } })
// → returns { subscriptionId, keyId }
// 3. Client opens Razorpay checkout modal (checkout.js)
// → on success handler fires with { razorpay_payment_id, razorpay_subscription_id, razorpay_signature }
// 4. POST /api/billing/verify (optimistic activation)
// → HMAC verify: SHA-256(paymentId + "|" + subscriptionId)
// → upsert Subscription { plan: "pro", status: "active", currentPeriodEnd: +31d }
// 5. Webhook (authoritative) POST /api/billing/webhook
// → verifies x-razorpay-signature HMAC
// → subscription.activated/charged → active + real periodEnd (entity.current_end * 1000)
// → subscription.halted → past_due
// → subscription.cancelled/completed → canceled
// → resolves by notes.clerkUserId first, falls back to razorpaySubscriptionIdExternal Services
External Integrations
OpenAI (GPT-4o)
// src/app/api/workspace/[domain]/topics/run/route.ts
const completion = await openai.chat.completions.create({
model: "gpt-4o",
temperature: 0.3, // low temp for consistent brand mention behavior
max_tokens: 900,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: fullPrompt },
]
})
// fullPrompt = prompt.text + "\n\nPlease mention specific companies..." + INSTRUCTION_SUFFIX
// Brand aliases injected into systemPrompt if workspace.brandAliases.length > 0DataForSEO — Google AIO
// src/lib/runGoogleAIOQuery.ts
// Auth: Basic base64("EMAIL:PASSWORD") from DATAFORSEO_LOGIN + DATAFORSEO_PASSWORD env
const response = await fetch(
"https://api.dataforseo.com/v3/serp/google/organic/live/advanced",
{
method: "POST",
headers: { Authorization: "Basic " + btoa(login + ":" + password),
"Content-Type": "application/json" },
body: JSON.stringify([{
keyword: query,
location_code: locationCode ?? 2356, // India default
language_code: "en",
device: "desktop"
}])
}
)
// Parse AIO block
const items = data.tasks[0]?.result[0]?.items ?? []
const aioItem = items.find(item => item.type === "ai_overview")
if (!aioItem) return { aioPresent: false, mentioned: false, ... }
const aioText = aioItem.text?.slice(0, 1500) ?? null
const citations = aioItem.references?.map(r => ({ name: r.title, url: r.url })) ?? []
// Brand detection: same logic as ChatGPT (domain/brandName substring + regex)
// Cache in PromptResponseCache with engine: "google-aio", 2-day TTLDataForSEO — SEO Metrics
// Used in free scan (POST /api/scan)
// Domain Rank + Backlinks:
POST https://api.dataforseo.com/v3/backlinks/domain_pages_summary/live
→ domainRank, backlinksCount, referringDomains
// Organic Keywords:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/keywords_for_site/live
→ organicKeywordsCount, topRankedKeywords [{keyword, position, volume, cpc}]
// Knowledge Panel + Local Pack + People Also Ask:
POST https://api.dataforseo.com/v3/serp/google/organic/live/advanced
→ hasKnowledgePanel (item.type === "knowledge_graph")
→ inLocalPack (item.type === "local_pack")
→ peopleAlsoAsk (item.type === "people_also_ask")Google Search Console
// OAuth 2.0 flow
// Scopes: https://www.googleapis.com/auth/webmasters.readonly
// Tokens stored in GSCConnection: { accessToken, refreshToken, expiresAt }
// Data fetch (GET /api/gsc/data)
const response = await fetch(
"https://www.googleapis.com/webmasters/v3/sites/{siteUrl}/searchAnalytics/query",
{
method: "POST",
headers: { Authorization: "Bearer " + accessToken },
body: JSON.stringify({
startDate: "2024-01-01", endDate: today,
dimensions: ["query"],
rowLimit: 20,
orderby: [{ fieldName: "impressions", sortOrder: "descending" }]
})
}
)Configuration
Environment Variables
# ─── Database ───────────────────────────────────────────────────────────────── DATABASE_URL="postgresql://..." # Neon PostgreSQL connection string # ─── Clerk Auth ──────────────────────────────────────────────────────────────── NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="pk_..." CLERK_SECRET_KEY="sk_..." NEXT_PUBLIC_CLERK_SIGN_IN_URL="/sign-in" NEXT_PUBLIC_CLERK_SIGN_UP_URL="/sign-up" NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL="/dashboard" NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL="/dashboard" # ─── OpenAI ──────────────────────────────────────────────────────────────────── OPENAI_API_KEY="sk-..." # ─── DataForSEO ──────────────────────────────────────────────────────────────── DATAFORSEO_LOGIN="email@example.com" # DataForSEO account email DATAFORSEO_PASSWORD="..." # DataForSEO account password DATAFORSEO_DEFAULT_LOCATION_CODE="2356" # optional; 2356=India, 2840=USA # ─── Google Search Console ───────────────────────────────────────────────────── GOOGLE_CLIENT_ID="..." GOOGLE_CLIENT_SECRET="..." NEXT_PUBLIC_BASE_URL="https://yourapp.com" # used for OAuth redirect URI # ─── Razorpay ────────────────────────────────────────────────────────────────── RAZORPAY_KEY_ID="rzp_..." # publishable key (used in checkout.js) RAZORPAY_KEY_SECRET="..." # secret key (server only) RAZORPAY_PLAN_ID_PRO_MONTHLY="plan_..." # plan ID from Razorpay dashboard RAZORPAY_WEBHOOK_SECRET="..." # webhook signing secret # ─── Optional / Legacy ───────────────────────────────────────────────────────── STRIPE_SECRET_KEY="sk_..." # optional; admin routes gracefully skip if absent STRIPE_WEBHOOK_SECRET="whsec_..." # optional
UI Routing
Key Page Routes & Component Architecture
// ─── Public routes (no auth) ──────────────────────────────────────────────────
/ → src/app/page.tsx (landing + free scan form)
/brand/[domain] → src/app/brand/[domain]/page.tsx (public results page)
/methodology → src/app/methodology/page.tsx (original technical page)
/methodology/technical → src/app/methodology/technical/page.tsx (this page)
/how-it-works → src/app/how-it-works/page.tsx (customer-friendly page)
// ─── Auth routes ───────────────────────────────────────────────────────────────
/sign-in → Clerk-hosted or custom sign-in
/sign-up → Clerk-hosted or custom sign-up
// ─── Protected routes ──────────────────────────────────────────────────────────
/dashboard → src/app/dashboard/page.tsx (all workspaces list)
/workspace/[domain] → src/app/workspace/[domain]/page.tsx
// searchParams:
// ?setup=1 → BrandSetupWizard (if no categories yet)
// ?setup=1 → redirect to ?autoGenerate=1 (if categories exist)
// ?autoGenerate=1 → TopicsManager with autoGenerate={true}
/workspace/[domain]/results/[runId] → scan results detail page
/settings/billing → src/app/settings/billing/page.tsx
// ─── Key client components ─────────────────────────────────────────────────────
DomainShell.tsx → shell/layout for workspace pages, trial banner
BrandSetupWizard.tsx → category/city/alias wizard (shown once on ?setup=1)
TopicsManager.tsx → topic + prompt management, auto-generate trigger
PromptDrawer.tsx → slide-over showing full AI response for a prompt
PublicScanResults.tsx → free scan results on /brand/[domain]
RazorpayButton.tsx → upgrade button, loads checkout.js, handles payment flow
AiPresencePanel.tsx → GlobalBrandMention cross-intelligence viewArchitecture
Design Decisions & Invariants
Mutable cache vs immutable ledger
PromptResult is mutable (upserted daily, overwritten on force-refresh). PromptRankHistory and ScanCitation are immutable (append-only, never deleted or modified). Derived views (EntityCitationDomain, CitationAnalysis) are recomputed from immutable tables — they can be wiped and rebuilt without data loss. This means the system is replayable.
Idempotency by design
Every write is safe to repeat. PromptResult uses upsert with composite unique key. ScanCitation uses createMany skipDuplicates. EntityCitationDomain and CitationAnalysis are recomputed from scratch (not incremented). Running the pipeline 10 times produces the same result as once.
Cross-workspace PromptResponseCache
The LLM response cache is shared across all workspaces. This means if two users track the same domain and run the same prompt, the second call is free. Cache key is SHA-256 of the exact prompt string + engine. This also means AI responses are not user-personalised — they're real AI answers to the query.
Brand aliases are pre-normalized
Aliases stored in BrandEntity.aliases and workspace.brandAliases are lowercased and alphanumeric-only. The mention detection code does the same normalisation at query time — no fuzzy matching, pure substring includes(). Minimum alias length of 4 characters prevents false positives on acronyms.
Google AIO is non-deterministic — flagged in UI
Google AI Overview appearance varies by location, A/B test, query freshness, and device. A 'not present' result does not mean AIO never appears for that query — it means it didn't appear on that specific run. PromptResult.aioPresent distinguishes 'no AIO block appeared' (false) from 'AIO appeared but brand not cited' (true + mentioned=false). The UI communicates this clearly.
No prisma migrate dev — db push only
Vercel serverless and Neon's serverless driver don't support the interactive TTY required by prisma migrate dev. All schema changes are applied via npx prisma db push --accept-data-loss. This means no migration history files — schema.prisma is the single source of truth.
Razorpay webhook is the authoritative billing source
The /api/billing/verify route does optimistic activation (so UX doesn't stall). The webhook fires within seconds and provides the authoritative period end from Razorpay's entity.current_end (Unix timestamp). If verify succeeds but webhook is late, the subscription is still valid. If verify fails but webhook fires, the webhook upserts the subscription correctly.
Workspace ownership always validated
All workspace API routes look up the workspace by (clerkUserId, domain) — not just domain. A user can never access another user's workspace data even if they know the domain. The composite unique key @@unique([clerkUserId, domain]) enforces one workspace per user per domain at the DB level.