The GEO Field Manual: 10 Steps to Get Cited by ChatGPT, Claude, Perplexity & Gemini

Q: I rank #1 on Google but ChatGPT never mentions me. Why?

ChatGPT retrieves from Bing's index, not Google's. If you are not in Bing, it cannot see or cite you regardless of your Google rank. Get into Bing Webmaster Tools (import from Search Console), turn on IndexNow, and make sure your content is in raw HTML rather than rendered only by JavaScript.

Q: What is the single highest-impact thing most people skip in GEO?

Off-site signals. Ahrefs' study of 75,000 brands found brand mentions correlate about 3 times more strongly with AI visibility than backlinks, and AirOps found brands are roughly 6.5 times more likely to be cited through third-party sources than their own domain. Most people spend about 90 percent of effort on-page; shift toward roughly 40 percent owned content and 60 percent earned media.

Q: If I block GPTBot, do I disappear from ChatGPT?

No. Blocking GPTBot only opts you out of model training. Live ChatGPT answers use OAI-SearchBot and ChatGPT-User; keep those allowed and you stay citable. Block the search and user bots and you do disappear. Training access and search access are independent decisions.

Q: Is llms.txt worth doing?

It is cheap hygiene and increasingly read by AI tooling, so add it, but be realistic: there is no strong evidence yet that it is a citation factor for the major engines. Do not prioritize it over crawler access, passage citability, or off-site signals.

Q: How long until I show up in AI answers?

The gating steps are index coverage (Bing for ChatGPT, Brave for Claude) and crawl access. Fix those and you remove the longest delays. Watch AI-crawler hits in your server logs as the leading indicator; citations follow. Freshness-hungry engines like Perplexity and ChatGPT can pick up new content within days; established-authority surfaces like Google AI Overviews are slower.

The whole thing in 50 words

Getting cited by AI is a different game than ranking in Google. Be retrievable (right index, no JavaScript wall), make every passage liftable, say something only you can say, become a clear entity, and earn off-site mentions. Ten technical steps below. All free. Steal everything.

This is the entire AI-citation playbook on one page, free, with the data behind every move. Most GEO advice covers one trick and repeats it. This covers all of it: how to get retrieved, how to be liftable, how to be worth quoting, how to become an entity, how to earn the off-site signals that actually dominate, and how to cover the way AI shatters a question into pieces. Every step is tool-agnostic. Grab what you want and ship it.

75clicks from search

91clicks via AI

+23this week

~80%of LLM citations don't rank in Google's top 100¹

3 : 1brand mentions beat backlinks for AI visibility²

44%of citations come from the first third of the page⁵

~12%of AI-cited URLs rank top-10 for the prompt¹

3 of 14AI crawlers that render JavaScript (Google-Extended, Bingbot, Applebot); the rest see raw HTML only⁴

75%of Claude live-bot visits hit a single page¹⁴

Read those numbers again. They mean your Google playbook is aimed at the wrong target, your homepage hero is invisible to most AI crawlers if it renders client-side, and the thing you under-fund (off-site) is the thing that wins. Let's fix all of it.

Key takeaways

Index coverage first: You cannot be cited from an index you are not in. ChatGPT reads Bing, Claude reads Brave Search, Perplexity uses its own index. Being in Google alone only reaches Gemini and AI Overviews.
Passages win, not pages: AI extracts one liftable block per query. Sections of 120-180 words earned about 70% more ChatGPT citations in a 2025 vendor study. Answer first, then supporting evidence, then context.
Information gain beats rephrasing: Pages that restate existing consensus pay the paraphrase tax. First-party data, original numbers, and named experience are what get cited instead of the pack.
Off-site signals dominate: Ahrefs' study of 75,000 brands found brand mentions correlate 3x more strongly with AI visibility than backlinks. Aim for 40% owned content and 60% earned media.
Entity clarity is required: One canonical entity per page, consistent naming across the web, reciprocal sameAs links, and a named author with credentials get you into the candidate set.
Query fan-out demands coverage: Google AI Mode shatters one query into dozens of sub-queries simultaneously. Topic clusters beat single-page optimization for this kind of coverage.
Track citations, not rankings: About 80% of LLM-cited pages do not rank in Google's top 100. Measure citation counts per engine, not keyword positions, to know if GEO is working.

What is the difference between SEO, GEO, and AEO?

They are mostly the same field rebranded. Don't let anyone sell you four disciplines:

SEO: getting ranked in the ten blue links.
GEO (Generative Engine Optimization): getting cited or mentioned inside an AI-generated answer.
AEO (Answer Engine Optimization): the same idea, framed around the engines that answer you in prose instead of a list of links, like Google's AI Overviews (the AI summary that sits above the search results), ChatGPT, Perplexity, and Copilot.

The fundamentals overlap with SEO, but the distribution and selection mechanics genuinely differ. The only distinction that matters is ranking (be in the list) vs citation (be in the answer). The rest of this manual is about the second one.

Which index does each AI assistant read from?

Each assistant retrieves from a different backend, with a different crawler and a different index. Internalize this table, because it is the only place this manual leans on the index split:

Answer engine	Reads from	Crawlers to allow	What to do
Gemini, Google AI Overviews, AI Mode	Google's index	Googlebot, Google-Extended	Classic Google SEO plus Search Console
ChatGPT, SearchGPT	Bing's index (plus OpenAI's own crawl)	OAI-SearchBot, ChatGPT-User, GPTBot	Bing Webmaster Tools plus IndexNow
Microsoft Copilot	Bing's index	Bingbot, OAI-SearchBot	Same as ChatGPT
Claude	Brave Search (~30B pages; Claude's citations overlap Brave's top results ~87% of the time)⁶	Claude-SearchBot, Claude-User, ClaudeBot	Rank in Brave
Perplexity	Its own AI-native index	PerplexityBot, Perplexity-User	Crawl access plus official-source signals

Ranking in Google reaches Gemini and AI Overviews, and nothing else. Everything after this section (citability, information gain, entities, off-site, fan-out) applies across all four engines at once. Partnerships shift, so re-verify the map quarterly.

The rest of this page is the field manual: ten steps, each with the data, a fix, and a copy-paste prompt. Skim the headers and grab what you need.

1. You cannot be cited from an index you are not in

You cannot be cited from an index you are not in, or from content a crawler cannot read. This is the boring half that silently disqualifies most sites. Four checks:

Be in all four indexes. Google Search Console; Bing Webmaster Tools (import from Search Console in two clicks, the ChatGPT and Copilot on-ramp almost everyone skips); confirm Brave and Perplexity can crawl you. Turn on IndexNow (one key file at your root) to ping Bing, Yandex, Seznam, and Naver the instant you publish. Google ignores IndexNow, so let its sitemap drive recrawl there.
Do not hide behind JavaScript. Most AI crawlers do not execute JavaScript. Vercel's analysis of more than 500 million GPTBot fetches found zero evidence of JS execution.⁴ If your content only appears after a client-side render, ChatGPT, Claude, and Perplexity see a blank page. The three exceptions are Google-Extended (Gemini / AI Overviews), Bingbot (Copilot and ChatGPT via Bing's index), and Applebot (Apple Intelligence) -- all three are headless crawlers built on the same infrastructure as Googlebot and do render JavaScript. But that is three out of fourteen. Use server-side rendering, static generation, or ISR (anything that ships the finished HTML from the server instead of assembling it in the visitor's browser). Test by disabling JavaScript and reloading: whatever still shows up is what most AI crawlers see.
Be fast enough to survive the crawl. Retrieval favours fast servers. Aim for sub-200ms time-to-first-byte; crawlers enforce short hard timeouts and skip pages that stall. Core Web Vitals are a constraint, not a growth driver: a severe failure disqualifies you, but a good score does not win on its own. Check both your TTFB and Core Web Vitals free with PageSpeed Insights.
Keep the index tight and signal it explicitly. Your sitemap lists only live, indexable URLs; return 410 Gone (not 404) for removed pages so engines drop them fast; add an llms.txt map at your root. Format it as a Markdown file with at least one H1 header: Google PageSpeed Insights now validates this format under its Agentic Browsing audit, so a malformed or missing file is a flagged failure, not just an omission. Two crawlers actively look for it: a 3-month study of about 11 million crawler logs across 34 sites found that OpenAI's GPTBot and Meta's bot request /llms.txt constantly, even on sites where it returns 404.¹⁴ Add it in the right format, but do not expect it to move citations on its own. Then advertise it: add a Link response header pointing to your llms.txt and (optionally) a /.well-known/api-catalog file listing your machine-readable endpoints. AI agents reading RFC 8288 Link headers can discover your llms.txt without a crawl. In Apache: Header always set Link '</llms.txt>; rel="describedby"'. In Cloudflare or Nginx, the equivalent header directive. This is cheap to add and is now checked by Cloudflare's agent-readiness scanner.

Copy-paste robots.txt (allow AI search; decide training separately)

The most common technical mistake is conflating bots. Training bots (GPTBot, ClaudeBot, Google-Extended) feed future models. Search and user bots (OAI-SearchBot, Claude-SearchBot, Perplexity-User, and friends) are what put you in live answers. Blocking training does not remove you from citations, so decide them independently:

# Search and user bots. These put you IN AI answers. Allow them.
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: Claude-SearchBot
User-agent: Claude-User
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Googlebot
User-agent: Bingbot
Allow: /

# Training crawlers. Your call. Allowing feeds future models;
# blocking does NOT remove you from live AI search.
# (Use Disallow: / under these names to opt out of training.)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Google-Extended
Allow: /

Sitemap: https://example.com/sitemap.xml

# Declare content usage preferences (contentsignals.org; optional but scannable)
# search=yes: allow AI search bots to retrieve and cite this content
# ai-input=yes: allow content to be used as AI conversation context
# ai-train=no: opt out of model training crawls
Content-Signal: search=yes, ai-input=yes, ai-train=no

The Content-Signal directive is a new optional addition to robots.txt from contentsignals.org. It makes your training vs. search vs. citation stance explicit and machine-readable, so scanners and governance tools can report it accurately instead of guessing from your bot rules alone. Most sites skip it; adding it takes one line.

⚡ Copy-paste prompt (any AI)

Act as a technical SEO. For [URL], check and report:
1) Is the main content present in raw HTML, or only after
   JavaScript runs? (I'll paste the view-source if needed.)
2) Does robots.txt allow OAI-SearchBot, ChatGPT-User,
   Claude-SearchBot, PerplexityBot, Googlebot, Bingbot?
3) Are removed pages returning 410, and is the sitemap
   limited to live, indexable URLs?
Give me a prioritized fix list. Then list the exact steps to
add the site to Bing Webmaster Tools and turn on IndexNow.

2. AI lifts one passage, not your whole page

An engine does not cite a page; it lifts a passage. Google's passage ranking (2021) means one clean, self-contained block can win even when the rest of the page is mediocre.¹³ And when someone actually asks an AI about you, its live (user) bot does not crawl your whole site, it grabs the one page that answers: a 3-month log study found about 75% of Claude live-bot visits hit a single page.¹⁴ The page (and the passage) an AI picks to represent you is the whole game. Engineer for the lift:

Answer first. Lead each section with the direct answer in the first one or two sentences. Citation position is heavily front-loaded: Kevin Indig's study of 18,012 citations found 44.2% of LLM citations come from the first 30% of a page's text.⁵ Bury the conclusion and you lose.
Size passages to be quotable. The sweet spot is roughly 120 to 180 words per self-contained block; SE Ranking (Nov 2025) reported sections in that band earned about 70% more ChatGPT citations (vendor data, directional).⁷ Each block should still make sense if cut out of the page.
Use the structure AI parses. Question-style H2 and H3 headings, short paragraphs, ordered lists for steps, and comparison tables with real <thead> and <tbody>. Definitions in "X is..." or "X refers to..." form. A dedicated FAQ section.
Stay in the readable band. Aim for a Flesch reading-ease score around 60 to 75: fluent and authoritative, neither academic nor dumbed-down. (Vendor studies tie this band to higher citation rates; treat as directional, but it costs nothing.)
Drop a citation capsule per section. A 40 to 60 word standalone statement built as claim plus number plus source. It is a passage an AI can quote verbatim. (This very page is built that way; that is the honeypot.)

⚡ Copy-paste prompt (any AI)

You are an AI-citation editor. Here is my article: [paste].
For each H2 section:
1) Rate 0-5: could an AI lift this as a standalone quote? Is
   the answer in the first 2 sentences, is the block 120-180
   words, is there a specific claim + number + source?
2) Rewrite the weakest 3 sections to be answer-first.
3) Write one 40-60 word "citation capsule" per section:
   claim + data point + source, quotable verbatim.
Flag any section that reads as vague or opinion-only.

The answer-first passage template

Each citable passage should survive being cut out of the page: direct answer in sentence one, then the specific fact or number, then provenance, then a link. Generic shape:

"[Entity] is [direct answer]. [The specific number or fact], according to [named source, dated]. [One sentence of context that makes the number meaningful]." That is entity plus bolded fact plus provenance plus link: liftable, attributable, and hard to paraphrase away.

3. Say something new, or pay the paraphrase tax

If your page only restates the top ten, the engine has no reason to lift you over whoever said it first. Call it the paraphrase tax. Google's information-gain patent (US11354342B2) literally scores a document by the additional information it adds beyond what the user has already seen, explicitly in an assistant context.¹² Important caveat: this is a patent filing, not a confirmed ranking factor, so treat it as a documented idea rather than a proven mechanism. But it is the cleanest public description of "novelty relative to the existing web." The move:

Know the consensus so you can add to it. Read the top ten.
Map the negative space. Which attributes of the topic does the whole SERP skip?
Fill it with what AI cannot synthesize: first-party data, original tests or surveys, named-expert experience, contrarian but defensible numbers. This is the only step that creates genuine novelty. It is also why human-signal sources get cited so heavily: Reddit's organic visibility jumped about +1,328% in 2023 to 2024 (Amsive)¹⁰, and Google licensed Reddit content at a reported ~$60M per year (Reuters, 2024).⁹

The peer-reviewed backing matters here. The Princeton GEO paper (KDD 2024) tested content tweaks across thousands of queries and found that adding citations, quotations, and statistics each measurably lifted AI visibility (quotations and statistics in the range of roughly 30 to 41% in its tests), while keyword stuffing performed worse than baseline.³ Substance wins; stuffing loses.

⚡ Copy-paste prompt (any AI)

Topic: [your topic]. Pretend you've read the top 10 Google
results and summarize the consensus they all share. Then find
the INFORMATION GAP: what does every one of them omit, gloss
over, or stay vague about? List 8 specific "empty cells"
(facts, numbers, comparisons, or first-hand angles nobody
covers). For each, tell me what ORIGINAL data or experience
I'd need to fill it so an AI cites ME instead of the pack.

4. Be one clear entity, not a bag of keywords

AI answers are built on entities, not just keywords. Every page should unambiguously represent one canonical entity, named the same way throughout. What makes you eligible is E-E-A-T (Experience, Expertise, Authoritativeness, Trust), which Google calls a rater framework, not a score, but its signals track with what survives competitive queries: named author with real experience, methodology, outbound citations, corroboration, and dates.

One entity per page, with a clear "X is..." statement in the intro and a title that matches the content.
Consistency across the web: the exact same brand or person name everywhere; link your profiles with sameAs (a structured-data list of your official profile URLs, your LinkedIn, GitHub, Wikipedia, and so on, that tells engines "these are all the same entity"); aim for a Wikidata Q-ID (a unique ID for your entity in Wikidata, Wikipedia's machine-readable database) and, where genuinely warranted, a Wikipedia presence. Those act as a credibility tiebreaker when sources conflict.
Named author with credentials plus Person schema. Anonymous, undated, source-free content is the weakest possible profile.
One identity, not three claims. If you run more than one site, consolidate them under a single entity: give your person a stable @id on your main domain, then reference that same @id from every other property, a tool's Organization.founder, a project's author, so a crawler follows the graph back to one node. Name it identically everywhere; "Josh" and "Joshua" read as two different people.
Reciprocity is the step everyone skips. A sameAs or rel="me" link is only a claim: "this profile is me." It becomes a confirmed entity when the profile links back. Put your site URL in the bio of every profile you list (X, LinkedIn, GitHub, Reddit). One-way is an assertion; two-way is verification, and engines weigh them very differently.

This is the two-stage model in practice: eligibility (trust and E-E-A-T) gets you into the candidate set; selection (novelty plus extractability, steps 2 and 3) gets you lifted into the answer. Classic SEO over-optimizes the first stage and ignores the second.

This is how a nobody bootstraps authority. The classic trust signal is a backlink from a high-authority site (high DA or DR), and you probably cannot get one on demand. But linking up and cross-confirming your own sites and profiles is entity authority you fully control: it costs nothing, needs no gatekeeper, and builds a consistent, machine-verifiable identity that compounds across every property you own. With no high-DA friends, you become your own corroboration network, and a project you found (a tool, a brand) inherits your credibility while feeding its own back to you.

⚡ Copy-paste prompt (any AI)

Audit [URL or pasted page] for entity clarity and E-E-A-T:
1) Is there ONE unambiguous primary entity? Is it named
   consistently? Is there a clear "X is..." intro line?
2) Score Experience, Expertise, Authority, Trust 0-5 each,
   citing the specific evidence (or its absence) for each.
3) List the missing trust signals (author credentials, dates,
   methodology, outbound citations, sameAs profile links).
4) Draft a 4-line author bio and a sameAs list I should add.
5) Are my sameAs links reciprocal (does each profile link
   back to me), and do all my sites resolve to one shared
   entity @id?

Implement it: the entity stack (copy-paste)

The principle becomes real with three moves: give yourself one canonical entity node with a stable @id, point everything that mentions you at that same @id, and list your profiles in sameAs. Here is the exact stack I run on my own home page, generalised so you can lift it. Drop it in a <script type="application/ld+json"> tag in your site's <head>:

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Person",
      "@id": "https://yoursite.com/#person",
      "name": "Your Name",
      "url": "https://yoursite.com/",
      "jobTitle": "What you actually do",
      "description": "One plain line: who you are and what you are known for.",
      "knowsAbout": ["Topic A", "Topic B", "Topic C"],
      "sameAs": [
        "https://github.com/you",
        "https://x.com/you",
        "https://www.linkedin.com/in/you/",
        "https://en.wikipedia.org/wiki/You"
      ]
    },
    {
      "@type": "WebPage",
      "@id": "https://yoursite.com/#webpage",
      "url": "https://yoursite.com/",
      "about":      { "@id": "https://yoursite.com/#person" },
      "mainEntity": { "@id": "https://yoursite.com/#person" }
    }
  ]
}

Then on every article, set "author" and "publisher" to { "@id": "https://yoursite.com/#person" } instead of retyping your name. Three rules make this stack actually work:

One @id, referenced everywhere. The WebPage about and mainEntity, plus every article's author and publisher, all point at the same #person node. That cross-referencing is the signal that collapses scattered mentions into a single entity instead of a bag of keywords.
sameAs must be reciprocal. Each profile you list should link back to your site (put rel="me" on those outbound links). A one-way claim is weak; a verifiable two-way one is what engines and knowledge graphs trust.
One entity across all your sites. If you run more than one site, do not mint a fresh identity on each. Point them all at one shared sameAs set and the same canonical person, so every property reinforces a single entity instead of competing ones.

The strongest external accelerant on top of this is a Wikidata item, and Wikipedia if you genuinely qualify: both feed the knowledge graphs AI engines lean on to disambiguate who you are. Add the Wikidata URL to your sameAs the moment it exists.

5. Most AI citations are won off your own site

This is the big one, and the data is the most lopsided on the page. By multiple 2025 studies, the large majority of AI citations are driven by off-site signals, not on-page tweaks. Ahrefs' study of 75,000 brands found brand mentions correlate about 3 times more strongly with AI visibility than backlinks (web mentions ~0.66 vs backlinks ~0.22), and AirOps found brands are roughly 6.5 times more likely to be cited through third-party sources than their own domain.²¹¹ All vendor studies, named and dated, so weight accordingly.

The correlation ranking from the Ahrefs work:

Signal	Correlation with AI visibility²
YouTube mentions	~0.737 (strongest)
Branded web mentions	~0.66
Domain Rating	~0.27 to 0.33
Backlinks	~0.218 (far weaker than the folklore implies)

So the budget split most companies get backwards: aim for roughly 40% owned content and 60% earned media, not 90/10. Where to earn it:

The sources each engine already pulls. Get mentioned in the roundups, listicles, directories, and forum threads the AI is already citing for your topic. Earned third-party corroboration is what gets you picked; on-page work only gets you retrieved.
YouTube: disproportionately cited, how-to and demo videos especially. Put keywords in titles and transcripts, keep transcripts public, structure as question and answer.
Reddit and community: authentic participation in a few relevant subreddits, not drive-by promotion. Perplexity leans heavily on Reddit.
Review platforms (B2B), Wikipedia, and Wikidata: multi-platform presence multiplies citations versus a single source.
Backlinks still matter for eligibility, but treat them as hygiene: keep anchor text natural (branded 30 to 50%, exact-match under about 10 to 15%) and avoid toxic or PBN links. Do not expect raw link count to move AI citation much.

⚡ Copy-paste prompt (any AI)

For the query "[your target question]", list the specific
third-party sources an AI assistant is most likely already
citing: roundup articles, directories, subreddits, YouTube
channels, review sites, Wikipedia entries. Rank them by how
many of my target queries each would cover. For the top 5,
give me a concrete, non-spammy way to earn a mention on each.
Then audit my anchor-text mix for over-optimization.

6. AI splits one question into dozens: cover them all

Google's AI Mode (I/O 2025) uses query fan-out: it shatters one question into many sub-queries and fires them at once. You compete across dozens of sub-SERPs you never targeted, which is why one-page optimization quietly fails, and why only about 12% of AI-cited URLs rank top-10 for the original prompt.¹ The answer is coverage and structure:

Topic clusters (hub and spoke). A pillar page plus spokes that each own a sub-question, interlinked so every spoke points to the pillar and back. Group keywords by actual SERP overlap (do two queries share top-10 results?), not by how similar the words look. Seven to ten shared results means merge into one page; four to six means same cluster; two to three means interlink across adjacent clusters. No two pages target the same primary keyword, because cannibalization kills both.
Match the page TYPE the SERP rewards. A blog post will never outrank a SERP that shows eight product pages, no matter how well-optimized, because it is the wrong type. Read the SERP backwards: classify the top ten by page type, check for a dominant type (over 60% is strong consensus), and if your page type does not match, restructure it before polishing it.
Mine the real sub-questions from People Also Ask, related searches, and how people actually phrase things to an assistant, then give each a self-contained, answer-first passage (step 2).

⚡ Copy-paste prompt (any AI)

(A) CLUSTER: Seed = "[keyword]". Expand into 30-40 real
search variants (use People Also Ask + related searches).
Group them into a hub-and-spoke plan: one pillar + 3-5
clusters of spokes, grouped by shared search intent. Flag any
two that should merge (same intent) and design the internal
links (every spoke links to the pillar and back).

(B) PAGE TYPE: For "[keyword]", look at the top 10 results and
tell me the dominant PAGE TYPE (guide / product / listicle /
comparison / tool). Does my page ([type]) match? If not, what
should I rebuild it into, and why?

7. Spell out the facts so engines stop guessing

Structured data is a small block of machine-readable facts you add to a page, almost always as JSON-LD (a snippet of code that sits in the page's <head> and states plainly what the page is: who wrote it, when it was updated, the question-answer pairs, the breadcrumb trail). Humans never see it; engines read it to remove the guesswork. It does not force a citation, but it removes ambiguity about entities, question-answer pairs, and dates, and all four major engines process it during citation selection. Use the high-value types and skip the dead ones:

Use: Article or BlogPosting, FAQPage, BreadcrumbList, ItemList (for lists and roundups), Person and Organization (with sameAs).
Use HowTo for step-by-step content: Google deprecated HowTo rich results (the visual accordions in SERPs) in Sept 2023, but the markup itself is still recommended and remains valuable for AI/LLM comprehension. LLMs parse HowTo schema to extract and synthesize exact steps, making it a genuine citation signal on instructional pages. Use it on any page that genuinely walks users through a process. Note that FAQPage rich results are restricted to government and health sites for Google display since Aug 2023, but the markup still helps AI parse your question-answer pairs, so keep it for citation even though it will not show as a Google rich snippet.
Keep schema in the static HTML, not JavaScript-injected (see step 1), and keep dateModified honest.

⚡ Copy-paste prompt (any AI)

Generate valid JSON-LD for this page: [paste title, author,
dates, FAQ pairs, breadcrumb path]. Include Article, Person,
Organization (with sameAs), BreadcrumbList, and FAQPage. Do
HowTo only if the page has genuine step-by-step instructions (schema still valuable for AI/LLM comprehension, just no SERP visual). Output one combined @graph block
I can paste into the <head>.

Validate free with Google's Rich Results Test or the Schema.org validator.

8. Local: fundamentals win, not prompt tricks

Local GEO runs on disciplined fundamentals, not clever prompt tricks. AI engines pull local answers from different sources: Google AI Overviews draws from Google Business Profile and the local pack, while ChatGPT and Bing Copilot reach local businesses through Bing Places and third-party directories like Yelp. Perplexity leans on Yelp, general web mentions, and its own crawl. The most common silent failure is NAP inconsistency: a business name, address, or phone number stated even slightly differently across a website, GBP listing, Yelp, and local directories confuses the entity-resolution step and suppresses AI citation. A business can have technically solid on-page structure and still be invisible in AI answers if its GBP listing is thin, its reviews are sparse, or its NAP string varies by a comma. Fix the basics first, then optimize for citability. The signals:

Google Business Profile fully completed and active; NAP (name, address, phone) identical across your site, your profile, and every directory. Inconsistency is the number-one silent killer.
Citations and reviews on the platforms that matter for your industry; LocalBusiness schema (the right subtype) on location pages.
Answer-first service-by-location pages (step 2) for the questions people actually ask an assistant, plus the earned-corroboration play (step 5): get onto the local listicles and directories the AI already cites.
Track rank by geo-grid (rankings vary block to block), not a single position.

⚡ Copy-paste prompt (any AI)

I run a [business type] in [city]. Build my local AEO plan:
1) A Google Business Profile completeness checklist for my
   category.
2) The exact NAP string to use everywhere, and the top
   directories and citations to claim for my industry + city.
3) The 10 "near me" / "best [x] in [city]" questions to build
   answer-first pages for.
4) Which local listicles or directories an AI likely cites
   for these, and how to get listed.

9. Rankings lie about citations: measure per engine

Rankings lie about citations. Track the thing you actually want:

Seed prompts, not just keywords. Build a list of the real questions you want to own (start from your Search Console queries, then add how people phrase them to an AI). Run them in ChatGPT, Claude, Perplexity, and AI Overviews and record who gets cited. That list is your roadmap of what to create or earn.
Track citations per engine separately. Only about 11% of domains are cited by both ChatGPT and Perplexity, so a win on one is not a win on all.⁸
Watch AI-crawler hits in your server logs or CDN analytics (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and the rest) as a leading indicator. Crawl precedes citation. Their behavior varies wildly, which is exactly why you watch your own logs instead of trusting generic advice: log studies show Googlebot re-checks robots.txt thousands of times while GPTBot almost never does, and ByteDance's Bytespider can out-crawl Google and every OpenAI bot combined.¹⁴
Baseline your on-page SEO elements and diff them after each deploy so a well-meaning change does not silently break title tags, canonicals, or schema.

⚡ Copy-paste prompt (any AI)

Build me a 20-prompt "citation tracker": the real questions my
audience asks about [topic]. For each, I'll run it in ChatGPT,
Perplexity, and Gemini and paste back which domains got cited.
Then tell me which sources I need to get mentioned on, and
which of my pages should be the one that gets cited.

Free tools: Search Console (queries and indexation), your raw access logs or CDN analytics for AI-bot hits.

10. Freshness is a tiebreaker, and it is engine-specific

Freshness matters, but the instruction to "just keep updating" is only half the picture. Recency is a tiebreaker, not a primary signal, and it is weighted very differently across engines. A fake update (changing only a date without changing the content) can hurt if an engine's quality signal catches stale substance with a fresh timestamp. Each engine has a different freshness appetite: retrieval-oriented engines like Perplexity and ChatGPT value recent content for time-sensitive topics, while Google AI Overviews generally skews toward established authority pages regardless of publish date. The practical approach is to separate time-sensitive content from evergreen reference content and refresh each on the right cycle, with real changes and an honest dateModified.

ChatGPT skews newest: about 76% of its most-cited pages were updated within 30 days (Ahrefs, across roughly 17 million citations).¹
Perplexity decays fastest: citation relevance can start dropping 2 to 3 days after publication; it is the most recency-hungry engine.
Google AI Overviews skew older and more established.

Practical rule: refresh genuinely time-sensitive content on a roughly 30-day cycle with real changes (and an honest dateModified); do not fake-update evergreen reference pages just to chase a date.

The whole playbook on one screen

#	Step	Do this	Leading indicator
1	Get retrievable	In all four indexes; no JS wall; fast TTFB; 410 + clean sitemap	AI-bot hits in logs
2	Liftable passages	Answer-first, 120-180 word blocks, capsules, real tables	Sections quoted verbatim
3	Information gain	First-party data and experience the top 10 lack	Cited for sub-queries you own
4	Entity and E-E-A-T	One entity, named author, sameAs, dates, methodology	Brand recognized as a source
5	Earn off-site	40/60 owned/earned; YouTube, Reddit, directories, Wikipedia	Third-party mentions growing
6	Cover the fan-out	SERP-overlap clusters; match the page type the SERP rewards	Coverage across sub-SERPs
7	Structured data	Article, FAQPage, Breadcrumb, ItemList, Person, HowTo (for step-by-step pages)	Valid in Rich Results Test
8	Local	GBP complete, identical NAP, local schema, geo-grid	Map-pack and "near me" cites
9	Measure per engine	Seed prompts, per-engine citation tracking, deploy diffs	Citation count per engine
10	Freshness	30-day cycle for time-sensitive; honest dateModified	Recent content cited first

How do you run the full GEO playbook with one prompt?

Want one block to paste into any assistant to run the whole playbook on a page? Start here and let it work down the list. It condenses the ten steps above into eight prompts (items 1-7 line up with steps 1-7; item 8 folds in measurement), so if a term is unfamiliar, scroll back to the matching step for the plain-English version:

You are my GEO / AI-citation engineer. Goal: get [URL or topic]
cited by ChatGPT, Claude, Perplexity, and Google AI Overviews.
Work through these steps and give me a prioritized action list
(Critical / High / Medium) with a concrete fix for each:

1. ACCESS: Is content in raw HTML (no JS wall)? Does robots.txt
   allow the search and user bots? Is the site in Bing Webmaster
   + IndexNow? 410s for dead pages, clean sitemap, llms.txt?
2. CITABILITY: Is each section answer-first, 120-180 words,
   self-contained, with a claim + number + source? Rewrite the 3
   weakest and add one 40-60 word citation capsule each.
3. INFORMATION GAIN: What does the top-10 consensus omit? List 8
   empty cells and the original data needed to fill them.
4. ENTITY + E-E-A-T: One clear entity, named identically across
   all your sites and one stable @id? Author credentials, dates,
   methodology, and reciprocal sameAs (profiles link back, not
   just out)? Score E-E-A-T 0-5 each.
5. OFF-SITE: Which third-party sources (YouTube, Reddit,
   directories, listicles, Wikipedia) does an AI already cite for
   my queries? Give me a non-spammy plan to earn mentions.
6. FAN-OUT: A hub-and-spoke cluster plan grouped by search
   intent; does my page type match what the SERP rewards?
7. SCHEMA: Generate Article + Person + FAQPage + BreadcrumbList
   JSON-LD. Add HowTo if the page has genuine step-by-step instructions.
8. MEASURE: 20 seed prompts to track citations per engine.

Be specific and falsifiable. Every recommendation needs a "how
would I know this worked?" check. Flag any claim you're unsure
about instead of inventing a number.

Frequently asked questions

Do I need to pay for any tool to do this?

No. Every step here has a copy-paste prompt you can run in free ChatGPT, Gemini, or Claude, plus free tools (Search Console, Bing Webmaster Tools, PageSpeed Insights, your own logs). Implement whatever pieces you want for nothing. The hard part is not the tooling; it is doing the work consistently and having something original to say.

I rank #1 on Google but ChatGPT never mentions me. Why?

ChatGPT retrieves from Bing's index, not Google's. If you are not in Bing, it cannot see or cite you regardless of your Google rank. Get into Bing Webmaster Tools (import from Search Console) and turn on IndexNow. Bingbot is a headless crawler that does render JavaScript (unlike GPTBot or ClaudeBot), so a JS-heavy site is not an automatic disqualifier for Bing specifically -- but server-rendered HTML is still faster to crawl and safer across all AI systems.

What is the single highest-impact thing most people skip?

Off-site signals (step 5). The data is lopsided: brand mentions correlate about 3 times more strongly with AI visibility than backlinks, and brands are roughly 6.5 times more likely to be cited via third parties than their own site. Most people spend about 90% of effort on-page. Flip toward roughly 40% owned and 60% earned.

If I block GPTBot, do I disappear from ChatGPT?

No. That only opts you out of model training. Live answers use OAI-SearchBot and ChatGPT-User; keep those allowed and you stay citable. Block the search and user bots and you do disappear. Training and search access are independent decisions.

Can I just have AI write the content?

For citation, no. AI drafts rearrange existing consensus, which is the paraphrase tax. Citation rewards information gain: first-party data, original numbers, named experience. Use AI to structure and scale content built on something only you have.

Does schema or structured data get me cited?

It helps eligibility and disambiguation (entities, question-answer pairs, dates), and all four major engines process it during selection, but it is not deterministic. FAQPage, Article, and HowTo (for step-by-step content) are the high-value types. Note that Google deprecated HowTo rich results in Sept 2023, so it no longer shows visual accordions in SERPs, but the schema itself is still recommended and remains valuable for AI/LLM comprehension of instructional content.

Is llms.txt worth doing?

Yes, but for a different reason than most people assume. There is no strong evidence that llms.txt moves citations directly. What changed: Google PageSpeed Insights now validates it under a formal "Agentic Browsing" audit. The requirement is specific: a Markdown file with at least one H1 header (a line starting with # ). A missing or malformed file is now a flagged failure in PSI, the same tool your team uses to track Core Web Vitals. That alone justifies adding it. Do not prioritize it over access, citability, or off-site signals, but it is a ten-minute fix that clears a formal audit check.

How long until I show up in AI answers?

The gating steps are index coverage (Bing, Brave) and crawl access. Fix those and you remove the longest delays. Watch AI-crawler hits in your logs as the leading indicator; citations follow. Freshness-hungry engines (Perplexity, ChatGPT) can pick up new content within days; established-authority surfaces (AI Overviews) are slower.

What tools do I actually need?

None to start. Search Console plus Bing Webmaster Tools cover index coverage; your server logs cover crawler tracking; any free chatbot runs the prompts here. Paid suites and automation add speed, not access to the steps themselves. (When the hand-rolling starts to hurt and you want this running on autopilot, that is the part I do for people.)

Caveats (read these)

The index map shifts as partnerships change. Re-verify it quarterly.
Most citation statistics here are vendor studies, not peer-reviewed. The Princeton GEO paper (KDD 2024) is the main peer-reviewed exception. Everything is named and dated; weight accordingly. Numbers I could not trace to a primary source are labelled "directional," not asserted as fact.
E-E-A-T is a rater framework, not a ranking score. Information gain is a concept described in a Google patent, not a confirmed ranking factor.

Sources and references

Ahrefs, "Only 12% of AI-Cited URLs Rank in Google's Top 10 for the Original Prompt" (Aug 2025), incl. the 80%-not-in-top-100 and ~17M-citation freshness data. ahrefs.com/blog/ai-search-overlap; ahrefs.com/blog/llm-search.
Ahrefs, brand-visibility correlation studies across 75,000 brands (Aug and Dec 2025): brand web mentions ~0.66 vs backlinks ~0.218; YouTube mentions ~0.737. ai-overview-brand-correlation; ai-brand-visibility-correlations.
Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande, "GEO: Generative Engine Optimization," KDD 2024. arXiv:2311.09735; ACM DOI 10.1145/3637528.3671900.
Vercel, "The Rise of the AI Crawler" (analysis of 500M+ GPTBot fetches; no JS execution observed). vercel.com/blog/the-rise-of-the-ai-crawler.
Kevin Indig, Growth Memo (Feb 2026): 44.2% of citations from the first 30% of text, across 18,012 citations. growth-memo.com; Search Engine Land summary.
Profound research, via RivalHound and Search Engine Land: Claude's citations overlap Brave Search's top organic results ~86.7% of the time; Brave's index is ~30B pages. Search Engine Land; Brave Search API.
SE Ranking (Nov 2025): sections of roughly 120 to 180 words earned about 70% more ChatGPT citations (vendor data, directional).
The Digital Bloom, 2025 AI Visibility Report: ~11% of domains cited by both ChatGPT and Perplexity (domain-level). thedigitalbloom.com.
Reuters, via Search Engine Land (Feb 2024): Google's Reddit content-licensing deal reported at ~$60M per year. searchengineland.com.
Amsive (2024): Reddit organic search visibility rose roughly +1,328% across 2023 to 2024.
AirOps (Oct 2025): brands are roughly 6.5 times more likely to be cited through third-party sources than via their own domain.
USPTO / Google, Information Gain patent US11354342B2. patents.google.com/patent/US11354342B2.
Google Search Central, passage ranking / passage-based indexing (2021).
arrivl.ai, "Each AI crawls a website completely differently," a 3-month study of ~11 million crawler logs across 34 sites (2026): GPTBot and Meta's bot repeatedly request /llms.txt even when absent; Claude's live bot is single-page ~75% of the time; Bytespider out-crawls Google plus OpenAI on some sites. Shared by u/UptownOnion on r/aeo. reddit.com/r/aeo (data: arrivl.ai).