Joshua Opolko

Your Site as AI Infrastructure: How to Become a Trusted Source for ChatGPT and Claude

The site becomes a database and the AIs learn how trustworthy it is. You are not building for human visitors anymore, you are becoming part of AI infrastructure.

Key takeaways

What Does It Mean to Be Part of AI Infrastructure?

When an AI assistant answers a question, it is drawing on a mental model of which sources are reliable for which topics. Brand mentions correlate 3x more strongly with AI visibility than backlinks, according to an Ahrefs study of 75,000 brands published December 2025. That correlation is the infrastructure layer, the set of sources the AI has already decided to trust. Infrastructure does not mean scale. It means specificity and reliability. A small site that is consistently correct about a narrow subject gets cited more than a large site that is broadly unreliable. The AI is pattern-matching on the history of every time it fetched you and how accurate that fetch turned out to be.

How Do AI Assistants Decide What Sources to Trust?

Only 11% of domains are cited by both ChatGPT and Google AI Overviews for the same query (SparkToro, 2025). That divergence is the clearest evidence that AI assistants and search engines run different trust models. Google rewards links. AI rewards correctness, specificity, and source provenance. Source provenance matters enormously: a restaurant directory that cross-references a municipal licence feed is structurally different from one that aggregates user reviews. The first has a verifiable chain of custody. The second is opinion at unknown accuracy. Specificity also signals reliability, a page that answers exactly one question with a clear data source is more cite-worthy than a page that attempts to answer ten questions at medium confidence.

What Is the Difference Between an Indexing Crawler and a Live Citation Bot?

This is the distinction most people miss. Indexing bots, GPTBot (OpenAI), ClaudeBot (Anthropic), OAI-SearchBot, build a retrieval index in the background. No real user ever sees the direct results of these crawls. Live citation bots are different: ChatGPT-User and Claude-User fire at the moment a real user’s query triggers an active fetch of your URL. Every ChatGPT-User hit in your logs equals one citation being served to one real user. These are the hits that matter.

BotTypeHits (30 days)Notes
GPTBotIndexing1,073OpenAI background crawl; no direct user
ClaudeBotIndexing1,383Anthropic background crawl; no direct user
OAI-SearchBotIndexing254OpenAI search index; no direct user
ChatGPT-UserLive citation81 (+15 Jun 10)Real user query served live
Claude-UserLive citation56 (+39 Jun 8)Real user query served live

Indexing bots run 20x more active than citation bots. That ratio is normal and expected. The 137 live citation hits are the ones reaching real people: each ChatGPT-User and Claude-User hit represents one user's query answered with a citation to this page.

What Makes a Source Trustworthy to an AI?

Trustworthiness to an AI has four concrete components: verifiable provenance, non-replicability, structural clarity, and error-free history. Verifiable provenance means the data has a traceable origin, “City of Toronto business licence feed, cross-referenced with Toronto Public Health inspection records” is verifiable; “we researched the best restaurants” is not. Non-replicability is the moat: if your data could have come from anywhere, it is worth the same as anything. Structural clarity means consistent schema, stable URL patterns, clean taxonomy. Error-free history is the compounding factor, each correct citation is a vote of confidence at the domain level. Each error is a vote against it, and the AI applies that verdict domain-wide, not just per-page.

How Do You Design for AI Infrastructure Instead of Human Traffic?

The design principles invert most conventional web priorities. You are optimizing for machine parsing, not human attention. joshuaopolko.com drove 1,800 Bing AI citations in 120 days, from zero to a 97/day peak, by applying exactly this: answer one specific question per page, source every claim, make the data chain of custody explicit in the copy itself. Not implicit. Explicit. “This data comes from X, updated on Y, verified by Z.”

What Happens When an AI Gets You Wrong?

The compounding trust model works in reverse. If an AI cites your data and that data turns out to be wrong, the error is recorded at the domain level, not just the page level. The AI becomes less likely to cite you again. With enough errors, live citation hits go quiet, the indexing bots keep crawling, but the citation bots stop firing. That is why the verified-open gate on NowServingTO is a trust infrastructure decision, not just a product quality decision. A restaurant listed as open that turns out to be closed is exactly the kind of error that poisons a citation relationship. The strictness is a defensive posture against compounding error.

Why Did NowServingTO Get Cited Within 72 Hours of Launch?

The /answers page on NowServingTO was cited by AI assistants within 72 hours of launch. Not because of backlinks or domain authority. Because the page answered a specific, frequently asked question, which new restaurants have recently opened in Toronto by cuisine, with data traceable to a municipal source, cross-referenced with public health inspection records, filtered to the last 365 days, and verified against Google Places for operating status. No user-submitted data. No editorial curation. A pipeline with a verifiable chain of custody at every step. That is what cite-worthy looks like to an AI: specific, sourced, non-replicable, and structured for extraction.

The pattern that preceded this on joshuaopolko.com was identical at a different scale: 1,800 Bing AI citations in 120 days from zero, driven by the same design principle, answer specific questions, cite every source, make the data chain explicit. The citations started small and compounded. Zero to 97 per day is not a traffic anomaly. It is what compounding trust looks like when it is working.

Frequently asked questions

How is AI citation different from Google ranking?

Google ranks pages by link authority and keyword relevance. AI citation systems prioritize the trustworthiness and specificity of individual answers. A small site that is consistently correct about a narrow topic earns more AI citations than a large site with broad but less reliable content. The mechanics favor depth over domain authority, which inverts many traditional SEO assumptions about what makes a page valuable.

What is the difference between GPTBot and ChatGPT-User in server logs?

GPTBot is an indexing crawler that builds OpenAI's retrieval index in the background. No user ever sees the direct result of a GPTBot hit. ChatGPT-User fires at the moment a real user's query triggers a live fetch of your page. Every ChatGPT-User hit in your logs equals one citation being served to one real person. Indexing bots typically run 15 to 20 times more frequently than citation bots in practice.

Does schema markup directly improve AI citation rates?

Schema markup improves AI citation rates by making content easier to extract and verify. Article schema with a current dateModified tells the model the data is fresh. FAQPage schema surfaces specific question-and-answer pairs for passage-level retrieval. The more structured and extractable your content, the lower the friction for an AI to cite it accurately and with confidence. Inline source attribution matters more than footnotes.

How long does it take for a new page to receive AI citations?

NowServingTO's /answers page received AI citations within 72 hours of launch, with no backlinks and no prior domain authority on that URL. Indexing bots crawled within hours. The first live citation bot hit followed within days. Timeline depends on how clearly the page answers a specific, frequently asked question and how verifiable the underlying data chain is. Pages that answer one question well get cited faster than pages attempting ten.

What is llms.txt and does my site need one?

llms.txt is a plain-text file at your domain root that maps pages with one-line descriptions of what each page answers. It is structured for AI consumption rather than human reading. Sites with llms.txt give AI crawlers a direct inventory of citable content without requiring the crawler to infer page purpose from HTML structure. It is most useful for sites with many pages covering narrow, specific topics where the crawl discovery path is not obvious.