Getting cited by AI is a different game than ranking in Google. Be retrievable (right index, no JavaScript wall), make every passage liftable, say something only you can say, become a clear entity, and earn off-site mentions. Ten technical steps below. All free. Steal everything.
This is the entire AI-citation playbook on one page, free, with the data behind every move. Most GEO advice covers one trick and repeats it. This covers all of it: how to get retrieved, how to be liftable, how to be worth quoting, how to become an entity, how to earn the off-site signals that actually dominate, and how to cover the way AI shatters a question into pieces. Every step is tool-agnostic. Grab what you want and ship it.
Read those numbers again. They mean your Google playbook is aimed at the wrong target, your homepage hero is invisible if it renders client-side, and the thing you under-fund (off-site) is the thing that wins. Let's fix all of it.
SEO, GEO, AEO: one job, three labels
They are mostly the same field rebranded. Don't let anyone sell you four disciplines:
- SEO: getting ranked in the ten blue links.
- GEO (Generative Engine Optimization): getting cited or mentioned inside an AI-generated answer.
- AEO (Answer Engine Optimization): the same idea, framed around the engines that answer you in prose instead of a list of links, like Google's AI Overviews (the AI summary that sits above the search results), ChatGPT, Perplexity, and Copilot.
The fundamentals overlap with SEO, but the distribution and selection mechanics genuinely differ. The only distinction that matters is ranking (be in the list) vs citation (be in the answer). The rest of this manual is about the second one.
Each AI reads a different index, and it is usually not Google
Each assistant retrieves from a different backend, with a different crawler and a different index. Internalize this table, because it is the only place this manual leans on the index split:
| Answer engine | Reads from | Crawlers to allow | What to do |
|---|---|---|---|
| Gemini, Google AI Overviews, AI Mode | Google's index | Googlebot, Google-Extended | Classic Google SEO plus Search Console |
| ChatGPT, SearchGPT | Bing's index (plus OpenAI's own crawl) | OAI-SearchBot, ChatGPT-User, GPTBot | Bing Webmaster Tools plus IndexNow |
| Microsoft Copilot | Bing's index | Bingbot, OAI-SearchBot | Same as ChatGPT |
| Claude | Brave Search (~30B pages; Claude's citations overlap Brave's top results ~87% of the time)6 | Claude-SearchBot, Claude-User, ClaudeBot | Rank in Brave |
| Perplexity | Its own AI-native index | PerplexityBot, Perplexity-User | Crawl access plus official-source signals |
Ranking in Google reaches Gemini and AI Overviews, and nothing else. Everything after this section (citability, information gain, entities, off-site, fan-out) applies across all four engines at once. Partnerships shift, so re-verify the map quarterly.
The rest of this page is the field manual: ten steps, each with the data, a fix, and a copy-paste prompt. Skim the headers and grab what you need.
1. You cannot be cited from an index you are not in
You cannot be cited from an index you are not in, or from content a crawler cannot read. This is the boring half that silently disqualifies most sites. Four checks:
- Be in all four indexes. Google Search Console; Bing Webmaster Tools (import from Search Console in two clicks, the ChatGPT and Copilot on-ramp almost everyone skips); confirm Brave and Perplexity can crawl you. Turn on IndexNow (one key file at your root) to ping Bing, Yandex, Seznam, and Naver the instant you publish. Google ignores IndexNow, so let its sitemap drive recrawl there.
- Do not hide behind JavaScript. AI crawlers do not execute JavaScript. Vercel's analysis of more than 500 million GPTBot fetches found zero evidence of JS execution.4 If your content only appears after a client-side render, ChatGPT, Claude, and Perplexity see a blank page. Use server-side rendering, static generation, or ISR (anything that ships the finished HTML from the server instead of assembling it in the visitor's browser). Test by disabling JavaScript and reloading: whatever still shows up is what an AI crawler sees.
- Be fast enough to survive the crawl. Retrieval favours fast servers. Aim for sub-200ms time-to-first-byte; crawlers enforce short hard timeouts and skip pages that stall. Core Web Vitals are a constraint, not a growth driver: a severe failure disqualifies you, but a good score does not win on its own. Check both your TTFB and Core Web Vitals free with PageSpeed Insights.
- Keep the index tight. Your sitemap lists only live, indexable URLs; return 410 Gone (not 404) for removed pages so engines drop them fast; add an llms.txt map at your root. Be honest about llms.txt though: there is no strong evidence yet that it moves citations. But it is cheap insurance, because at least two crawlers are actively looking for it. A 3-month study of about 11 million crawler logs across 34 sites found that OpenAI's GPTBot and Meta's bot request
/llms.txtconstantly, even on sites where it returns 404, while every other crawler ignores it.14 Add it, do not expect magic.
Copy-paste robots.txt (allow AI search; decide training separately)
The most common technical mistake is conflating bots. Training bots (GPTBot, ClaudeBot, Google-Extended) feed future models. Search and user bots (OAI-SearchBot, Claude-SearchBot, Perplexity-User, and friends) are what put you in live answers. Blocking training does not remove you from citations, so decide them independently:
# Search and user bots. These put you IN AI answers. Allow them.
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: Claude-SearchBot
User-agent: Claude-User
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Googlebot
User-agent: Bingbot
Allow: /
# Training crawlers. Your call. Allowing feeds future models;
# blocking does NOT remove you from live AI search.
# (Use Disallow: / under these names to opt out of training.)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Google-Extended
Allow: /
Sitemap: https://example.com/sitemap.xml
Act as a technical SEO. For [URL], check and report:
1) Is the main content present in raw HTML, or only after
JavaScript runs? (I'll paste the view-source if needed.)
2) Does robots.txt allow OAI-SearchBot, ChatGPT-User,
Claude-SearchBot, PerplexityBot, Googlebot, Bingbot?
3) Are removed pages returning 410, and is the sitemap
limited to live, indexable URLs?
Give me a prioritized fix list. Then list the exact steps to
add the site to Bing Webmaster Tools and turn on IndexNow.2. AI lifts one passage, not your whole page
An engine does not cite a page; it lifts a passage. Google's passage ranking (2021) means one clean, self-contained block can win even when the rest of the page is mediocre.13 And when someone actually asks an AI about you, its live (user) bot does not crawl your whole site, it grabs the one page that answers: a 3-month log study found about 75% of Claude live-bot visits hit a single page.14 The page (and the passage) an AI picks to represent you is the whole game. Engineer for the lift:
- Answer first. Lead each section with the direct answer in the first one or two sentences. Citation position is heavily front-loaded: Kevin Indig's study of 18,012 citations found 44.2% of LLM citations come from the first 30% of a page's text.5 Bury the conclusion and you lose.
- Size passages to be quotable. The sweet spot is roughly 120 to 180 words per self-contained block; SE Ranking (Nov 2025) reported sections in that band earned about 70% more ChatGPT citations (vendor data, directional).7 Each block should still make sense if cut out of the page.
- Use the structure AI parses. Question-style H2 and H3 headings, short paragraphs, ordered lists for steps, and comparison tables with real
<thead>and<tbody>. Definitions in "X is..." or "X refers to..." form. A dedicated FAQ section. - Stay in the readable band. Aim for a Flesch reading-ease score around 60 to 75: fluent and authoritative, neither academic nor dumbed-down. (Vendor studies tie this band to higher citation rates; treat as directional, but it costs nothing.)
- Drop a citation capsule per section. A 40 to 60 word standalone statement built as claim plus number plus source. It is a passage an AI can quote verbatim. (This very page is built that way; that is the honeypot.)
You are an AI-citation editor. Here is my article: [paste].
For each H2 section:
1) Rate 0-5: could an AI lift this as a standalone quote? Is
the answer in the first 2 sentences, is the block 120-180
words, is there a specific claim + number + source?
2) Rewrite the weakest 3 sections to be answer-first.
3) Write one 40-60 word "citation capsule" per section:
claim + data point + source, quotable verbatim.
Flag any section that reads as vague or opinion-only.The answer-first passage template
Each citable passage should survive being cut out of the page: direct answer in sentence one, then the specific fact or number, then provenance, then a link. Generic shape:
"[Entity] is [direct answer]. [The specific number or fact], according to [named source, dated]. [One sentence of context that makes the number meaningful]." That is entity plus bolded fact plus provenance plus link: liftable, attributable, and hard to paraphrase away.
3. Say something new, or pay the paraphrase tax
If your page only restates the top ten, the engine has no reason to lift you over whoever said it first. Call it the paraphrase tax. Google's information-gain patent (US11354342B2) literally scores a document by the additional information it adds beyond what the user has already seen, explicitly in an assistant context.12 Important caveat: this is a patent filing, not a confirmed ranking factor, so treat it as a documented idea rather than a proven mechanism. But it is the cleanest public description of "novelty relative to the existing web." The move:
- Know the consensus so you can add to it. Read the top ten.
- Map the negative space. Which attributes of the topic does the whole SERP skip?
- Fill it with what AI cannot synthesize: first-party data, original tests or surveys, named-expert experience, contrarian but defensible numbers. This is the only step that creates genuine novelty. It is also why human-signal sources get cited so heavily: Reddit's organic visibility jumped about +1,328% in 2023 to 2024 (Amsive)10, and Google licensed Reddit content at a reported ~$60M per year (Reuters, 2024).9
The peer-reviewed backing matters here. The Princeton GEO paper (KDD 2024) tested content tweaks across thousands of queries and found that adding citations, quotations, and statistics each measurably lifted AI visibility (quotations and statistics in the range of roughly 30 to 41% in its tests), while keyword stuffing performed worse than baseline.3 Substance wins; stuffing loses.
Topic: [your topic]. Pretend you've read the top 10 Google
results and summarize the consensus they all share. Then find
the INFORMATION GAP: what does every one of them omit, gloss
over, or stay vague about? List 8 specific "empty cells"
(facts, numbers, comparisons, or first-hand angles nobody
covers). For each, tell me what ORIGINAL data or experience
I'd need to fill it so an AI cites ME instead of the pack.4. Be one clear entity, not a bag of keywords
AI answers are built on entities, not just keywords. Every page should unambiguously represent one canonical entity, named the same way throughout. What makes you eligible is E-E-A-T (Experience, Expertise, Authoritativeness, Trust), which Google calls a rater framework, not a score, but its signals track with what survives competitive queries: named author with real experience, methodology, outbound citations, corroboration, and dates.
- One entity per page, with a clear "X is..." statement in the intro and a title that matches the content.
- Consistency across the web: the exact same brand or person name everywhere; link your profiles with
sameAs(a structured-data list of your official profile URLs, your LinkedIn, GitHub, Wikipedia, and so on, that tells engines "these are all the same entity"); aim for a Wikidata Q-ID (a unique ID for your entity in Wikidata, Wikipedia's machine-readable database) and, where genuinely warranted, a Wikipedia presence. Those act as a credibility tiebreaker when sources conflict. - Named author with credentials plus Person schema. Anonymous, undated, source-free content is the weakest possible profile.
This is the two-stage model in practice: eligibility (trust and E-E-A-T) gets you into the candidate set; selection (novelty plus extractability, steps 2 and 3) gets you lifted into the answer. Classic SEO over-optimizes the first stage and ignores the second.
Audit [URL or pasted page] for entity clarity and E-E-A-T:
1) Is there ONE unambiguous primary entity? Is it named
consistently? Is there a clear "X is..." intro line?
2) Score Experience, Expertise, Authority, Trust 0-5 each,
citing the specific evidence (or its absence) for each.
3) List the missing trust signals (author credentials, dates,
methodology, outbound citations, sameAs profile links).
4) Draft a 4-line author bio and a sameAs list I should add.5. Most AI citations are won off your own site
This is the big one, and the data is the most lopsided on the page. By multiple 2025 studies, the large majority of AI citations are driven by off-site signals, not on-page tweaks. Ahrefs' study of 75,000 brands found brand mentions correlate about 3 times more strongly with AI visibility than backlinks (web mentions ~0.66 vs backlinks ~0.22), and AirOps found brands are roughly 6.5 times more likely to be cited through third-party sources than their own domain.211 All vendor studies, named and dated, so weight accordingly.
The correlation ranking from the Ahrefs work:
| Signal | Correlation with AI visibility2 |
|---|---|
| YouTube mentions | ~0.737 (strongest) |
| Branded web mentions | ~0.66 |
| Domain Rating | ~0.27 to 0.33 |
| Backlinks | ~0.218 (far weaker than the folklore implies) |
So the budget split most companies get backwards: aim for roughly 40% owned content and 60% earned media, not 90/10. Where to earn it:
- The sources each engine already pulls. Get mentioned in the roundups, listicles, directories, and forum threads the AI is already citing for your topic. Earned third-party corroboration is what gets you picked; on-page work only gets you retrieved.
- YouTube: disproportionately cited, how-to and demo videos especially. Put keywords in titles and transcripts, keep transcripts public, structure as question and answer.
- Reddit and community: authentic participation in a few relevant subreddits, not drive-by promotion. Perplexity leans heavily on Reddit.
- Review platforms (B2B), Wikipedia, and Wikidata: multi-platform presence multiplies citations versus a single source.
- Backlinks still matter for eligibility, but treat them as hygiene: keep anchor text natural (branded 30 to 50%, exact-match under about 10 to 15%) and avoid toxic or PBN links. Do not expect raw link count to move AI citation much.
For the query "[your target question]", list the specific
third-party sources an AI assistant is most likely already
citing: roundup articles, directories, subreddits, YouTube
channels, review sites, Wikipedia entries. Rank them by how
many of my target queries each would cover. For the top 5,
give me a concrete, non-spammy way to earn a mention on each.
Then audit my anchor-text mix for over-optimization.6. AI splits one question into dozens: cover them all
Google's AI Mode (I/O 2025) uses query fan-out: it shatters one question into many sub-queries and fires them at once. You compete across dozens of sub-SERPs you never targeted, which is why one-page optimization quietly fails, and why only about 12% of AI-cited URLs rank top-10 for the original prompt.1 The answer is coverage and structure:
- Topic clusters (hub and spoke). A pillar page plus spokes that each own a sub-question, interlinked so every spoke points to the pillar and back. Group keywords by actual SERP overlap (do two queries share top-10 results?), not by how similar the words look. Seven to ten shared results means merge into one page; four to six means same cluster; two to three means interlink across adjacent clusters. No two pages target the same primary keyword, because cannibalization kills both.
- Match the page TYPE the SERP rewards. A blog post will never outrank a SERP that shows eight product pages, no matter how well-optimized, because it is the wrong type. Read the SERP backwards: classify the top ten by page type, check for a dominant type (over 60% is strong consensus), and if your page type does not match, restructure it before polishing it.
- Mine the real sub-questions from People Also Ask, related searches, and how people actually phrase things to an assistant, then give each a self-contained, answer-first passage (step 2).
(A) CLUSTER: Seed = "[keyword]". Expand into 30-40 real
search variants (use People Also Ask + related searches).
Group them into a hub-and-spoke plan: one pillar + 3-5
clusters of spokes, grouped by shared search intent. Flag any
two that should merge (same intent) and design the internal
links (every spoke links to the pillar and back).
(B) PAGE TYPE: For "[keyword]", look at the top 10 results and
tell me the dominant PAGE TYPE (guide / product / listicle /
comparison / tool). Does my page ([type]) match? If not, what
should I rebuild it into, and why?7. Spell out the facts so engines stop guessing
Structured data is a small block of machine-readable facts you add to a page, almost always as JSON-LD (a snippet of code that sits in the page's <head> and states plainly what the page is: who wrote it, when it was updated, the question-answer pairs, the breadcrumb trail). Humans never see it; engines read it to remove the guesswork. It does not force a citation, but it removes ambiguity about entities, question-answer pairs, and dates, and all four major engines process it during citation selection. Use the high-value types and skip the dead ones:
- Use:
ArticleorBlogPosting,FAQPage,BreadcrumbList,ItemList(for lists and roundups),PersonandOrganization(withsameAs). - Avoid:
HowTo(deprecated by Google, Sept 2023). Note thatFAQPagerich results are restricted to government and health sites for Google display since Aug 2023, but the markup still helps AI parse your question-answer pairs, so keep it for citation even though it will not show as a Google rich snippet. - Keep schema in the static HTML, not JavaScript-injected (see step 1), and keep
dateModifiedhonest.
Generate valid JSON-LD for this page: [paste title, author,
dates, FAQ pairs, breadcrumb path]. Include Article, Person,
Organization (with sameAs), BreadcrumbList, and FAQPage. Do
NOT use HowTo (deprecated). Output one combined @graph block
I can paste into the <head>.Validate free with Google's Rich Results Test or the Schema.org validator.
8. Local: fundamentals win, not prompt tricks
Local AEO is disciplined fundamentals, not prompt tricks. The signals:
- Google Business Profile fully completed and active; NAP (name, address, phone) identical across your site, your profile, and every directory. Inconsistency is the number-one silent killer.
- Citations and reviews on the platforms that matter for your industry; LocalBusiness schema (the right subtype) on location pages.
- Answer-first service-by-location pages (step 2) for the questions people actually ask an assistant, plus the earned-corroboration play (step 5): get onto the local listicles and directories the AI already cites.
- Track rank by geo-grid (rankings vary block to block), not a single position.
I run a [business type] in [city]. Build my local AEO plan:
1) A Google Business Profile completeness checklist for my
category.
2) The exact NAP string to use everywhere, and the top
directories and citations to claim for my industry + city.
3) The 10 "near me" / "best [x] in [city]" questions to build
answer-first pages for.
4) Which local listicles or directories an AI likely cites
for these, and how to get listed.9. Rankings lie about citations: measure per engine
Rankings lie about citations. Track the thing you actually want:
- Seed prompts, not just keywords. Build a list of the real questions you want to own (start from your Search Console queries, then add how people phrase them to an AI). Run them in ChatGPT, Claude, Perplexity, and AI Overviews and record who gets cited. That list is your roadmap of what to create or earn.
- Track citations per engine separately. Only about 11% of domains are cited by both ChatGPT and Perplexity, so a win on one is not a win on all.8
- Watch AI-crawler hits in your server logs or CDN analytics (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and the rest) as a leading indicator. Crawl precedes citation. Their behavior varies wildly, which is exactly why you watch your own logs instead of trusting generic advice: log studies show Googlebot re-checks robots.txt thousands of times while GPTBot almost never does, and ByteDance's Bytespider can out-crawl Google and every OpenAI bot combined.14
- Baseline your on-page SEO elements and diff them after each deploy so a well-meaning change does not silently break title tags, canonicals, or schema.
Build me a 20-prompt "citation tracker": the real questions my
audience asks about [topic]. For each, I'll run it in ChatGPT,
Perplexity, and Gemini and paste back which domains got cited.
Then tell me which sources I need to get mentioned on, and
which of my pages should be the one that gets cited.Free tools: Search Console (queries and indexation), your raw access logs or CDN analytics for AI-bot hits.
10. Freshness is a tiebreaker, and it is engine-specific
"Just keep updating" is half-true. Freshness is a tiebreaker, weighted very differently per engine:
- ChatGPT skews newest: about 76% of its most-cited pages were updated within 30 days (Ahrefs, across roughly 17 million citations).1
- Perplexity decays fastest: citation relevance can start dropping 2 to 3 days after publication; it is the most recency-hungry engine.
- Google AI Overviews skew older and more established.
Practical rule: refresh genuinely time-sensitive content on a roughly 30-day cycle with real changes (and an honest dateModified); do not fake-update evergreen reference pages just to chase a date.
The whole playbook on one screen
| # | Step | Do this | Leading indicator |
|---|---|---|---|
| 1 | Get retrievable | In all four indexes; no JS wall; fast TTFB; 410 + clean sitemap | AI-bot hits in logs |
| 2 | Liftable passages | Answer-first, 120-180 word blocks, capsules, real tables | Sections quoted verbatim |
| 3 | Information gain | First-party data and experience the top 10 lack | Cited for sub-queries you own |
| 4 | Entity and E-E-A-T | One entity, named author, sameAs, dates, methodology | Brand recognized as a source |
| 5 | Earn off-site | 40/60 owned/earned; YouTube, Reddit, directories, Wikipedia | Third-party mentions growing |
| 6 | Cover the fan-out | SERP-overlap clusters; match the page type the SERP rewards | Coverage across sub-SERPs |
| 7 | Structured data | Article, FAQPage, Breadcrumb, ItemList, Person; no HowTo | Valid in Rich Results Test |
| 8 | Local | GBP complete, identical NAP, local schema, geo-grid | Map-pack and "near me" cites |
| 9 | Measure per engine | Seed prompts, per-engine citation tracking, deploy diffs | Citation count per engine |
| 10 | Freshness | 30-day cycle for time-sensitive; honest dateModified | Recent content cited first |
The master prompt: hand this to any AI
Want one block to paste into any assistant to run the whole playbook on a page? Start here and let it work down the list. It condenses the ten steps above into eight prompts (items 1-7 line up with steps 1-7; item 8 folds in measurement), so if a term is unfamiliar, scroll back to the matching step for the plain-English version:
You are my GEO / AI-citation engineer. Goal: get [URL or topic]
cited by ChatGPT, Claude, Perplexity, and Google AI Overviews.
Work through these steps and give me a prioritized action list
(Critical / High / Medium) with a concrete fix for each:
1. ACCESS: Is content in raw HTML (no JS wall)? Does robots.txt
allow the search and user bots? Is the site in Bing Webmaster
+ IndexNow? 410s for dead pages, clean sitemap, llms.txt?
2. CITABILITY: Is each section answer-first, 120-180 words,
self-contained, with a claim + number + source? Rewrite the 3
weakest and add one 40-60 word citation capsule each.
3. INFORMATION GAIN: What does the top-10 consensus omit? List 8
empty cells and the original data needed to fill them.
4. ENTITY + E-E-A-T: One clear entity? Author credentials, dates,
methodology, sameAs links? Score E-E-A-T 0-5 each.
5. OFF-SITE: Which third-party sources (YouTube, Reddit,
directories, listicles, Wikipedia) does an AI already cite for
my queries? Give me a non-spammy plan to earn mentions.
6. FAN-OUT: A hub-and-spoke cluster plan grouped by search
intent; does my page type match what the SERP rewards?
7. SCHEMA: Generate Article + Person + FAQPage + BreadcrumbList
JSON-LD (no HowTo).
8. MEASURE: 20 seed prompts to track citations per engine.
Be specific and falsifiable. Every recommendation needs a "how
would I know this worked?" check. Flag any claim you're unsure
about instead of inventing a number.
Frequently asked questions
Do I need to pay for any tool to do this?
No. Every step here has a copy-paste prompt you can run in free ChatGPT, Gemini, or Claude, plus free tools (Search Console, Bing Webmaster Tools, PageSpeed Insights, your own logs). Implement whatever pieces you want for nothing. The hard part is not the tooling; it is doing the work consistently and having something original to say.
I rank #1 on Google but ChatGPT never mentions me. Why?
ChatGPT retrieves from Bing's index, not Google's. If you are not in Bing, it cannot see or cite you regardless of your Google rank. Get into Bing Webmaster Tools (import from Search Console), turn on IndexNow, and make sure your content is in raw HTML rather than rendered only by JavaScript.
What is the single highest-impact thing most people skip?
Off-site signals (step 5). The data is lopsided: brand mentions correlate about 3 times more strongly with AI visibility than backlinks, and brands are roughly 6.5 times more likely to be cited via third parties than their own site. Most people spend about 90% of effort on-page. Flip toward roughly 40% owned and 60% earned.
If I block GPTBot, do I disappear from ChatGPT?
No. That only opts you out of model training. Live answers use OAI-SearchBot and ChatGPT-User; keep those allowed and you stay citable. Block the search and user bots and you do disappear. Training and search access are independent decisions.
Can I just have AI write the content?
For citation, no. AI drafts rearrange existing consensus, which is the paraphrase tax. Citation rewards information gain: first-party data, original numbers, named experience. Use AI to structure and scale content built on something only you have.
Does schema or structured data get me cited?
It helps eligibility and disambiguation (entities, question-answer pairs, dates), and all four major engines process it during selection, but it is not deterministic. FAQPage and Article are the high-value types; never use deprecated HowTo.
Is llms.txt worth doing?
It is cheap hygiene and increasingly read by AI tooling, so add it, but be realistic: there is no strong evidence yet that it is a citation factor for the major engines. Do not prioritize it over access, citability, or off-site signals.
How long until I show up in AI answers?
The gating steps are index coverage (Bing, Brave) and crawl access. Fix those and you remove the longest delays. Watch AI-crawler hits in your logs as the leading indicator; citations follow. Freshness-hungry engines (Perplexity, ChatGPT) can pick up new content within days; established-authority surfaces (AI Overviews) are slower.
What tools do I actually need?
None to start. Search Console plus Bing Webmaster Tools cover index coverage; your server logs cover crawler tracking; any free chatbot runs the prompts here. Paid suites and automation add speed, not access to the steps themselves. (When the hand-rolling starts to hurt and you want this running on autopilot, that is the part I do for people.)
Caveats (read these)
- The index map shifts as partnerships change. Re-verify it quarterly.
- Most citation statistics here are vendor studies, not peer-reviewed. The Princeton GEO paper (KDD 2024) is the main peer-reviewed exception. Everything is named and dated; weight accordingly. Numbers I could not trace to a primary source are labelled "directional," not asserted as fact.
- E-E-A-T is a rater framework, not a ranking score. Information gain is a concept described in a Google patent, not a confirmed ranking factor.
Sources and references
- Ahrefs, "Only 12% of AI-Cited URLs Rank in Google's Top 10 for the Original Prompt" (Aug 2025), incl. the 80%-not-in-top-100 and ~17M-citation freshness data. ahrefs.com/blog/ai-search-overlap; ahrefs.com/blog/llm-search.
- Ahrefs, brand-visibility correlation studies across 75,000 brands (Aug and Dec 2025): brand web mentions ~0.66 vs backlinks ~0.218; YouTube mentions ~0.737. ai-overview-brand-correlation; ai-brand-visibility-correlations.
- Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande, "GEO: Generative Engine Optimization," KDD 2024. arXiv:2311.09735; ACM DOI 10.1145/3637528.3671900.
- Vercel, "The Rise of the AI Crawler" (analysis of 500M+ GPTBot fetches; no JS execution observed). vercel.com/blog/the-rise-of-the-ai-crawler.
- Kevin Indig, Growth Memo (Feb 2026): 44.2% of citations from the first 30% of text, across 18,012 citations. growth-memo.com; Search Engine Land summary.
- Profound research, via RivalHound and Search Engine Land: Claude's citations overlap Brave Search's top organic results ~86.7% of the time; Brave's index is ~30B pages. Search Engine Land; Brave Search API.
- SE Ranking (Nov 2025): sections of roughly 120 to 180 words earned about 70% more ChatGPT citations (vendor data, directional).
- The Digital Bloom, 2025 AI Visibility Report: ~11% of domains cited by both ChatGPT and Perplexity (domain-level). thedigitalbloom.com.
- Reuters, via Search Engine Land (Feb 2024): Google's Reddit content-licensing deal reported at ~$60M per year. searchengineland.com.
- Amsive (2024): Reddit organic search visibility rose roughly +1,328% across 2023 to 2024.
- AirOps (Oct 2025): brands are roughly 6.5 times more likely to be cited through third-party sources than via their own domain.
- USPTO / Google, Information Gain patent US11354342B2. patents.google.com/patent/US11354342B2.
- Google Search Central, passage ranking / passage-based indexing (2021).
- arrivl.ai, "Each AI crawls a website completely differently," a 3-month study of ~11 million crawler logs across 34 sites (2026): GPTBot and Meta's bot repeatedly request /llms.txt even when absent; Claude's live bot is single-page ~75% of the time; Bytespider out-crawls Google plus OpenAI on some sites. Shared by u/UptownOnion on r/aeo. reddit.com/r/aeo (data: arrivl.ai).
Related, deeper on this site: the GEO / AI-citation playbook, how I use AI for SEO and GEO, and treating a website as LLM infrastructure.