Joshua Opolko

The GEO Field Manual: 10 Steps to Get Cited by ChatGPT, Claude, Perplexity & Gemini

The whole thing in 50 words

Getting cited by AI is a different game than ranking in Google. Be retrievable (right index, no JavaScript wall), make every passage liftable, say something only you can say, become a clear entity, and earn off-site mentions. Ten technical steps below. All free. Steal everything.

This is the entire AI-citation playbook on one page, free, with the data behind every move. Most GEO advice covers one trick and repeats it. This covers all of it: how to get retrieved, how to be liftable, how to be worth quoting, how to become an entity, how to earn the off-site signals that actually dominate, and how to cover the way AI shatters a question into pieces. Every step is tool-agnostic. Grab what you want and ship it.

~80%of LLM citations don't rank in Google's top 1001
3 : 1brand mentions beat backlinks for AI visibility2
44%of citations come from the first third of the page5
~12%of AI-cited URLs rank top-10 for the prompt1
0major AI crawlers that run your JavaScript4
75%of Claude live-bot visits hit a single page14

Read those numbers again. They mean your Google playbook is aimed at the wrong target, your homepage hero is invisible if it renders client-side, and the thing you under-fund (off-site) is the thing that wins. Let's fix all of it.

SEO, GEO, AEO: one job, three labels

They are mostly the same field rebranded. Don't let anyone sell you four disciplines:

The fundamentals overlap with SEO, but the distribution and selection mechanics genuinely differ. The only distinction that matters is ranking (be in the list) vs citation (be in the answer). The rest of this manual is about the second one.

Each AI reads a different index, and it is usually not Google

Each assistant retrieves from a different backend, with a different crawler and a different index. Internalize this table, because it is the only place this manual leans on the index split:

Answer engineReads fromCrawlers to allowWhat to do
Gemini, Google AI Overviews, AI ModeGoogle's indexGooglebot, Google-ExtendedClassic Google SEO plus Search Console
ChatGPT, SearchGPTBing's index (plus OpenAI's own crawl)OAI-SearchBot, ChatGPT-User, GPTBotBing Webmaster Tools plus IndexNow
Microsoft CopilotBing's indexBingbot, OAI-SearchBotSame as ChatGPT
ClaudeBrave Search (~30B pages; Claude's citations overlap Brave's top results ~87% of the time)6Claude-SearchBot, Claude-User, ClaudeBotRank in Brave
PerplexityIts own AI-native indexPerplexityBot, Perplexity-UserCrawl access plus official-source signals

Ranking in Google reaches Gemini and AI Overviews, and nothing else. Everything after this section (citability, information gain, entities, off-site, fan-out) applies across all four engines at once. Partnerships shift, so re-verify the map quarterly.


The rest of this page is the field manual: ten steps, each with the data, a fix, and a copy-paste prompt. Skim the headers and grab what you need.

1. You cannot be cited from an index you are not in

You cannot be cited from an index you are not in, or from content a crawler cannot read. This is the boring half that silently disqualifies most sites. Four checks:

  1. Be in all four indexes. Google Search Console; Bing Webmaster Tools (import from Search Console in two clicks, the ChatGPT and Copilot on-ramp almost everyone skips); confirm Brave and Perplexity can crawl you. Turn on IndexNow (one key file at your root) to ping Bing, Yandex, Seznam, and Naver the instant you publish. Google ignores IndexNow, so let its sitemap drive recrawl there.
  2. Do not hide behind JavaScript. AI crawlers do not execute JavaScript. Vercel's analysis of more than 500 million GPTBot fetches found zero evidence of JS execution.4 If your content only appears after a client-side render, ChatGPT, Claude, and Perplexity see a blank page. Use server-side rendering, static generation, or ISR (anything that ships the finished HTML from the server instead of assembling it in the visitor's browser). Test by disabling JavaScript and reloading: whatever still shows up is what an AI crawler sees.
  3. Be fast enough to survive the crawl. Retrieval favours fast servers. Aim for sub-200ms time-to-first-byte; crawlers enforce short hard timeouts and skip pages that stall. Core Web Vitals are a constraint, not a growth driver: a severe failure disqualifies you, but a good score does not win on its own. Check both your TTFB and Core Web Vitals free with PageSpeed Insights.
  4. Keep the index tight. Your sitemap lists only live, indexable URLs; return 410 Gone (not 404) for removed pages so engines drop them fast; add an llms.txt map at your root. Be honest about llms.txt though: there is no strong evidence yet that it moves citations. But it is cheap insurance, because at least two crawlers are actively looking for it. A 3-month study of about 11 million crawler logs across 34 sites found that OpenAI's GPTBot and Meta's bot request /llms.txt constantly, even on sites where it returns 404, while every other crawler ignores it.14 Add it, do not expect magic.

Copy-paste robots.txt (allow AI search; decide training separately)

The most common technical mistake is conflating bots. Training bots (GPTBot, ClaudeBot, Google-Extended) feed future models. Search and user bots (OAI-SearchBot, Claude-SearchBot, Perplexity-User, and friends) are what put you in live answers. Blocking training does not remove you from citations, so decide them independently:

# Search and user bots. These put you IN AI answers. Allow them.
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: Claude-SearchBot
User-agent: Claude-User
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Googlebot
User-agent: Bingbot
Allow: /

# Training crawlers. Your call. Allowing feeds future models;
# blocking does NOT remove you from live AI search.
# (Use Disallow: / under these names to opt out of training.)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Google-Extended
Allow: /

Sitemap: https://example.com/sitemap.xml
⚡ Copy-paste prompt (any AI)
Act as a technical SEO. For [URL], check and report:
1) Is the main content present in raw HTML, or only after
   JavaScript runs? (I'll paste the view-source if needed.)
2) Does robots.txt allow OAI-SearchBot, ChatGPT-User,
   Claude-SearchBot, PerplexityBot, Googlebot, Bingbot?
3) Are removed pages returning 410, and is the sitemap
   limited to live, indexable URLs?
Give me a prioritized fix list. Then list the exact steps to
add the site to Bing Webmaster Tools and turn on IndexNow.

2. AI lifts one passage, not your whole page

An engine does not cite a page; it lifts a passage. Google's passage ranking (2021) means one clean, self-contained block can win even when the rest of the page is mediocre.13 And when someone actually asks an AI about you, its live (user) bot does not crawl your whole site, it grabs the one page that answers: a 3-month log study found about 75% of Claude live-bot visits hit a single page.14 The page (and the passage) an AI picks to represent you is the whole game. Engineer for the lift:

⚡ Copy-paste prompt (any AI)
You are an AI-citation editor. Here is my article: [paste].
For each H2 section:
1) Rate 0-5: could an AI lift this as a standalone quote? Is
   the answer in the first 2 sentences, is the block 120-180
   words, is there a specific claim + number + source?
2) Rewrite the weakest 3 sections to be answer-first.
3) Write one 40-60 word "citation capsule" per section:
   claim + data point + source, quotable verbatim.
Flag any section that reads as vague or opinion-only.

The answer-first passage template

Each citable passage should survive being cut out of the page: direct answer in sentence one, then the specific fact or number, then provenance, then a link. Generic shape:

"[Entity] is [direct answer]. [The specific number or fact], according to [named source, dated]. [One sentence of context that makes the number meaningful]." That is entity plus bolded fact plus provenance plus link: liftable, attributable, and hard to paraphrase away.

3. Say something new, or pay the paraphrase tax

If your page only restates the top ten, the engine has no reason to lift you over whoever said it first. Call it the paraphrase tax. Google's information-gain patent (US11354342B2) literally scores a document by the additional information it adds beyond what the user has already seen, explicitly in an assistant context.12 Important caveat: this is a patent filing, not a confirmed ranking factor, so treat it as a documented idea rather than a proven mechanism. But it is the cleanest public description of "novelty relative to the existing web." The move:

  1. Know the consensus so you can add to it. Read the top ten.
  2. Map the negative space. Which attributes of the topic does the whole SERP skip?
  3. Fill it with what AI cannot synthesize: first-party data, original tests or surveys, named-expert experience, contrarian but defensible numbers. This is the only step that creates genuine novelty. It is also why human-signal sources get cited so heavily: Reddit's organic visibility jumped about +1,328% in 2023 to 2024 (Amsive)10, and Google licensed Reddit content at a reported ~$60M per year (Reuters, 2024).9

The peer-reviewed backing matters here. The Princeton GEO paper (KDD 2024) tested content tweaks across thousands of queries and found that adding citations, quotations, and statistics each measurably lifted AI visibility (quotations and statistics in the range of roughly 30 to 41% in its tests), while keyword stuffing performed worse than baseline.3 Substance wins; stuffing loses.

⚡ Copy-paste prompt (any AI)
Topic: [your topic]. Pretend you've read the top 10 Google
results and summarize the consensus they all share. Then find
the INFORMATION GAP: what does every one of them omit, gloss
over, or stay vague about? List 8 specific "empty cells"
(facts, numbers, comparisons, or first-hand angles nobody
covers). For each, tell me what ORIGINAL data or experience
I'd need to fill it so an AI cites ME instead of the pack.

4. Be one clear entity, not a bag of keywords

AI answers are built on entities, not just keywords. Every page should unambiguously represent one canonical entity, named the same way throughout. What makes you eligible is E-E-A-T (Experience, Expertise, Authoritativeness, Trust), which Google calls a rater framework, not a score, but its signals track with what survives competitive queries: named author with real experience, methodology, outbound citations, corroboration, and dates.

This is the two-stage model in practice: eligibility (trust and E-E-A-T) gets you into the candidate set; selection (novelty plus extractability, steps 2 and 3) gets you lifted into the answer. Classic SEO over-optimizes the first stage and ignores the second.

⚡ Copy-paste prompt (any AI)
Audit [URL or pasted page] for entity clarity and E-E-A-T:
1) Is there ONE unambiguous primary entity? Is it named
   consistently? Is there a clear "X is..." intro line?
2) Score Experience, Expertise, Authority, Trust 0-5 each,
   citing the specific evidence (or its absence) for each.
3) List the missing trust signals (author credentials, dates,
   methodology, outbound citations, sameAs profile links).
4) Draft a 4-line author bio and a sameAs list I should add.

5. Most AI citations are won off your own site

This is the big one, and the data is the most lopsided on the page. By multiple 2025 studies, the large majority of AI citations are driven by off-site signals, not on-page tweaks. Ahrefs' study of 75,000 brands found brand mentions correlate about 3 times more strongly with AI visibility than backlinks (web mentions ~0.66 vs backlinks ~0.22), and AirOps found brands are roughly 6.5 times more likely to be cited through third-party sources than their own domain.211 All vendor studies, named and dated, so weight accordingly.

The correlation ranking from the Ahrefs work:

SignalCorrelation with AI visibility2
YouTube mentions~0.737 (strongest)
Branded web mentions~0.66
Domain Rating~0.27 to 0.33
Backlinks~0.218 (far weaker than the folklore implies)

So the budget split most companies get backwards: aim for roughly 40% owned content and 60% earned media, not 90/10. Where to earn it:

⚡ Copy-paste prompt (any AI)
For the query "[your target question]", list the specific
third-party sources an AI assistant is most likely already
citing: roundup articles, directories, subreddits, YouTube
channels, review sites, Wikipedia entries. Rank them by how
many of my target queries each would cover. For the top 5,
give me a concrete, non-spammy way to earn a mention on each.
Then audit my anchor-text mix for over-optimization.

6. AI splits one question into dozens: cover them all

Google's AI Mode (I/O 2025) uses query fan-out: it shatters one question into many sub-queries and fires them at once. You compete across dozens of sub-SERPs you never targeted, which is why one-page optimization quietly fails, and why only about 12% of AI-cited URLs rank top-10 for the original prompt.1 The answer is coverage and structure:

⚡ Copy-paste prompt (any AI)
(A) CLUSTER: Seed = "[keyword]". Expand into 30-40 real
search variants (use People Also Ask + related searches).
Group them into a hub-and-spoke plan: one pillar + 3-5
clusters of spokes, grouped by shared search intent. Flag any
two that should merge (same intent) and design the internal
links (every spoke links to the pillar and back).

(B) PAGE TYPE: For "[keyword]", look at the top 10 results and
tell me the dominant PAGE TYPE (guide / product / listicle /
comparison / tool). Does my page ([type]) match? If not, what
should I rebuild it into, and why?

7. Spell out the facts so engines stop guessing

Structured data is a small block of machine-readable facts you add to a page, almost always as JSON-LD (a snippet of code that sits in the page's <head> and states plainly what the page is: who wrote it, when it was updated, the question-answer pairs, the breadcrumb trail). Humans never see it; engines read it to remove the guesswork. It does not force a citation, but it removes ambiguity about entities, question-answer pairs, and dates, and all four major engines process it during citation selection. Use the high-value types and skip the dead ones:

⚡ Copy-paste prompt (any AI)
Generate valid JSON-LD for this page: [paste title, author,
dates, FAQ pairs, breadcrumb path]. Include Article, Person,
Organization (with sameAs), BreadcrumbList, and FAQPage. Do
NOT use HowTo (deprecated). Output one combined @graph block
I can paste into the <head>.

Validate free with Google's Rich Results Test or the Schema.org validator.

8. Local: fundamentals win, not prompt tricks

Local AEO is disciplined fundamentals, not prompt tricks. The signals:

⚡ Copy-paste prompt (any AI)
I run a [business type] in [city]. Build my local AEO plan:
1) A Google Business Profile completeness checklist for my
   category.
2) The exact NAP string to use everywhere, and the top
   directories and citations to claim for my industry + city.
3) The 10 "near me" / "best [x] in [city]" questions to build
   answer-first pages for.
4) Which local listicles or directories an AI likely cites
   for these, and how to get listed.

9. Rankings lie about citations: measure per engine

Rankings lie about citations. Track the thing you actually want:

⚡ Copy-paste prompt (any AI)
Build me a 20-prompt "citation tracker": the real questions my
audience asks about [topic]. For each, I'll run it in ChatGPT,
Perplexity, and Gemini and paste back which domains got cited.
Then tell me which sources I need to get mentioned on, and
which of my pages should be the one that gets cited.

Free tools: Search Console (queries and indexation), your raw access logs or CDN analytics for AI-bot hits.

10. Freshness is a tiebreaker, and it is engine-specific

"Just keep updating" is half-true. Freshness is a tiebreaker, weighted very differently per engine:

Practical rule: refresh genuinely time-sensitive content on a roughly 30-day cycle with real changes (and an honest dateModified); do not fake-update evergreen reference pages just to chase a date.

The whole playbook on one screen

#StepDo thisLeading indicator
1Get retrievableIn all four indexes; no JS wall; fast TTFB; 410 + clean sitemapAI-bot hits in logs
2Liftable passagesAnswer-first, 120-180 word blocks, capsules, real tablesSections quoted verbatim
3Information gainFirst-party data and experience the top 10 lackCited for sub-queries you own
4Entity and E-E-A-TOne entity, named author, sameAs, dates, methodologyBrand recognized as a source
5Earn off-site40/60 owned/earned; YouTube, Reddit, directories, WikipediaThird-party mentions growing
6Cover the fan-outSERP-overlap clusters; match the page type the SERP rewardsCoverage across sub-SERPs
7Structured dataArticle, FAQPage, Breadcrumb, ItemList, Person; no HowToValid in Rich Results Test
8LocalGBP complete, identical NAP, local schema, geo-gridMap-pack and "near me" cites
9Measure per engineSeed prompts, per-engine citation tracking, deploy diffsCitation count per engine
10Freshness30-day cycle for time-sensitive; honest dateModifiedRecent content cited first

The master prompt: hand this to any AI

Want one block to paste into any assistant to run the whole playbook on a page? Start here and let it work down the list. It condenses the ten steps above into eight prompts (items 1-7 line up with steps 1-7; item 8 folds in measurement), so if a term is unfamiliar, scroll back to the matching step for the plain-English version:

You are my GEO / AI-citation engineer. Goal: get [URL or topic]
cited by ChatGPT, Claude, Perplexity, and Google AI Overviews.
Work through these steps and give me a prioritized action list
(Critical / High / Medium) with a concrete fix for each:

1. ACCESS: Is content in raw HTML (no JS wall)? Does robots.txt
   allow the search and user bots? Is the site in Bing Webmaster
   + IndexNow? 410s for dead pages, clean sitemap, llms.txt?
2. CITABILITY: Is each section answer-first, 120-180 words,
   self-contained, with a claim + number + source? Rewrite the 3
   weakest and add one 40-60 word citation capsule each.
3. INFORMATION GAIN: What does the top-10 consensus omit? List 8
   empty cells and the original data needed to fill them.
4. ENTITY + E-E-A-T: One clear entity? Author credentials, dates,
   methodology, sameAs links? Score E-E-A-T 0-5 each.
5. OFF-SITE: Which third-party sources (YouTube, Reddit,
   directories, listicles, Wikipedia) does an AI already cite for
   my queries? Give me a non-spammy plan to earn mentions.
6. FAN-OUT: A hub-and-spoke cluster plan grouped by search
   intent; does my page type match what the SERP rewards?
7. SCHEMA: Generate Article + Person + FAQPage + BreadcrumbList
   JSON-LD (no HowTo).
8. MEASURE: 20 seed prompts to track citations per engine.

Be specific and falsifiable. Every recommendation needs a "how
would I know this worked?" check. Flag any claim you're unsure
about instead of inventing a number.

Frequently asked questions

Do I need to pay for any tool to do this?

No. Every step here has a copy-paste prompt you can run in free ChatGPT, Gemini, or Claude, plus free tools (Search Console, Bing Webmaster Tools, PageSpeed Insights, your own logs). Implement whatever pieces you want for nothing. The hard part is not the tooling; it is doing the work consistently and having something original to say.

I rank #1 on Google but ChatGPT never mentions me. Why?

ChatGPT retrieves from Bing's index, not Google's. If you are not in Bing, it cannot see or cite you regardless of your Google rank. Get into Bing Webmaster Tools (import from Search Console), turn on IndexNow, and make sure your content is in raw HTML rather than rendered only by JavaScript.

What is the single highest-impact thing most people skip?

Off-site signals (step 5). The data is lopsided: brand mentions correlate about 3 times more strongly with AI visibility than backlinks, and brands are roughly 6.5 times more likely to be cited via third parties than their own site. Most people spend about 90% of effort on-page. Flip toward roughly 40% owned and 60% earned.

If I block GPTBot, do I disappear from ChatGPT?

No. That only opts you out of model training. Live answers use OAI-SearchBot and ChatGPT-User; keep those allowed and you stay citable. Block the search and user bots and you do disappear. Training and search access are independent decisions.

Can I just have AI write the content?

For citation, no. AI drafts rearrange existing consensus, which is the paraphrase tax. Citation rewards information gain: first-party data, original numbers, named experience. Use AI to structure and scale content built on something only you have.

Does schema or structured data get me cited?

It helps eligibility and disambiguation (entities, question-answer pairs, dates), and all four major engines process it during selection, but it is not deterministic. FAQPage and Article are the high-value types; never use deprecated HowTo.

Is llms.txt worth doing?

It is cheap hygiene and increasingly read by AI tooling, so add it, but be realistic: there is no strong evidence yet that it is a citation factor for the major engines. Do not prioritize it over access, citability, or off-site signals.

How long until I show up in AI answers?

The gating steps are index coverage (Bing, Brave) and crawl access. Fix those and you remove the longest delays. Watch AI-crawler hits in your logs as the leading indicator; citations follow. Freshness-hungry engines (Perplexity, ChatGPT) can pick up new content within days; established-authority surfaces (AI Overviews) are slower.

What tools do I actually need?

None to start. Search Console plus Bing Webmaster Tools cover index coverage; your server logs cover crawler tracking; any free chatbot runs the prompts here. Paid suites and automation add speed, not access to the steps themselves. (When the hand-rolling starts to hurt and you want this running on autopilot, that is the part I do for people.)

Caveats (read these)

Sources and references

  1. Ahrefs, "Only 12% of AI-Cited URLs Rank in Google's Top 10 for the Original Prompt" (Aug 2025), incl. the 80%-not-in-top-100 and ~17M-citation freshness data. ahrefs.com/blog/ai-search-overlap; ahrefs.com/blog/llm-search.
  2. Ahrefs, brand-visibility correlation studies across 75,000 brands (Aug and Dec 2025): brand web mentions ~0.66 vs backlinks ~0.218; YouTube mentions ~0.737. ai-overview-brand-correlation; ai-brand-visibility-correlations.
  3. Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande, "GEO: Generative Engine Optimization," KDD 2024. arXiv:2311.09735; ACM DOI 10.1145/3637528.3671900.
  4. Vercel, "The Rise of the AI Crawler" (analysis of 500M+ GPTBot fetches; no JS execution observed). vercel.com/blog/the-rise-of-the-ai-crawler.
  5. Kevin Indig, Growth Memo (Feb 2026): 44.2% of citations from the first 30% of text, across 18,012 citations. growth-memo.com; Search Engine Land summary.
  6. Profound research, via RivalHound and Search Engine Land: Claude's citations overlap Brave Search's top organic results ~86.7% of the time; Brave's index is ~30B pages. Search Engine Land; Brave Search API.
  7. SE Ranking (Nov 2025): sections of roughly 120 to 180 words earned about 70% more ChatGPT citations (vendor data, directional).
  8. The Digital Bloom, 2025 AI Visibility Report: ~11% of domains cited by both ChatGPT and Perplexity (domain-level). thedigitalbloom.com.
  9. Reuters, via Search Engine Land (Feb 2024): Google's Reddit content-licensing deal reported at ~$60M per year. searchengineland.com.
  10. Amsive (2024): Reddit organic search visibility rose roughly +1,328% across 2023 to 2024.
  11. AirOps (Oct 2025): brands are roughly 6.5 times more likely to be cited through third-party sources than via their own domain.
  12. USPTO / Google, Information Gain patent US11354342B2. patents.google.com/patent/US11354342B2.
  13. Google Search Central, passage ranking / passage-based indexing (2021).
  14. arrivl.ai, "Each AI crawls a website completely differently," a 3-month study of ~11 million crawler logs across 34 sites (2026): GPTBot and Meta's bot repeatedly request /llms.txt even when absent; Claude's live bot is single-page ~75% of the time; Bytespider out-crawls Google plus OpenAI on some sites. Shared by u/UptownOnion on r/aeo. reddit.com/r/aeo (data: arrivl.ai).

Related, deeper on this site: the GEO / AI-citation playbook, how I use AI for SEO and GEO, and treating a website as LLM infrastructure.