How many AI fetches did joshuaopolko.com receive in June 2026?

joshuaopolko.com received 4,288 AI fetches between June 13 and June 27, 2026 (14 days). Of those, 2,310 were live-user fetches (real people using AI tools that retrieved content in real time), 1,613 were AI training crawls, and 365 were AI search crawls. The site also received 3,081 traditional search engine fetches in the same window.

Which AI system sends the most traffic to joshuaopolko.com?

ChatGPT user-fetch is by far the dominant source, with 2,211 fetches in the June 13-27 window. This represents real ChatGPT users asking questions that caused ChatGPT to retrieve content from joshuaopolko.com in real time. It accounts for 51.6% of all AI fetches and 95.7% of all live-user AI fetches.

What page on joshuaopolko.com gets the most AI traffic?

The page /claude-fable-5-issues-fixes/ received 893 AI fetches in the June 13-27 window, with 859 of those being live-user fetches from ChatGPT. This page covers common issues and fixes for Claude Fable 5, a topic with high real-time search demand. The second-highest was /agent-zero/ with 328 AI fetches.

How many AI training crawlers visit joshuaopolko.com?

Four major AI training crawlers visited in the June 13-27 window: ByteDance Bytespider (365 fetches, used for training Chinese LLMs), Anthropic ClaudeBot (353 fetches, used for training Claude), Amazonbot (352 fetches), and Meta AI (343 fetches, used for training LLaMA and related models). GPTBot added 112 more. Total training fetches: 1,613.

How does this site verify that AI bot traffic is real and not spoofed?

The GEO Observatory uses rDNS verification: it does a reverse DNS lookup on each bot's IP address and checks that the resulting hostname matches the expected network for that bot (e.g., Googlebot IPs resolve to googlebot.com or google.com). In the June window, 2,163 out of 2,541 verifiable bots passed verification (85.1%). An additional 1,003 requests claimed to be known bots but failed verification and are logged as spoofed.

AI Crawler Report June 2026: First-Party Data from joshuaopolko.com

By Joshua Opolko. Published June 27, 2026. Data: June 13–27, 2026 (14-day window).

Bottom line

joshuaopolko.com received 4,288 AI fetches from 20 distinct bots in the 14 days ending June 27, 2026. ChatGPT live-user traffic alone accounted for 2,211 of those, more than all traditional search engines combined. The /claude-fable-5-issues-fixes/ page received 893 AI fetches in two weeks, driven almost entirely by real ChatGPT users asking about the model in real time. Four training crawlers are actively building datasets from this domain. 84,975 scanner requests were filtered and 1,003 spoofed bots were blocked.

4,288AI fetches in 14 days (Jun 13-27)

2,310live-user AI fetches (real people using AI tools)

20distinct AI and search bots detected

85.1%bot verification rate (rDNS confirmed)

What these numbers mean

This report covers first-party data from Apache server logs for joshuaopolko.com, processed by the GEO Observatory. Cloudflare sits in front of the site and passes real visitor IPs via the CF-Connecting-IP header, which Apache logs directly. All bot classification uses a combination of User-Agent matching, rDNS verification, and ASN lookup. Bots that fail rDNS verification are classified as spoofed and excluded from the counts below.

The critical distinction in this data is between live-user fetches and training fetches. Live-user means a real person asked an AI tool something, and the AI fetched this site in real time to answer. Training means a crawler was building a dataset. The two have very different implications for citability: live-user traffic means you are being cited right now; training traffic means future model versions may incorporate your content.

Bot-by-bot breakdown

Bot	Type	Fetches	Verified
ChatGPT user-fetch	live-user	2,211	1,063 / 1,075
Googlebot	search	1,018	158 / 221
Bingbot	search	934	576 / 585
Petal (Huawei)	search	624	0 / 0
ByteDance Bytespider	training	365	0 / 0
Anthropic ClaudeBot	training	353	0 / 180
Amazonbot	training	352	0 / 0
Meta AI	training	343	0 / 0
Perplexity	ai-search	203	76 / 88
Yandex	search	184	89 / 103
Applebot	search	177	84 / 95
ChatGPT search	ai-search	157	68 / 73
OpenAI GPTBot	training	112	38 / 51
DuckDuckGo	search	84	0 / 27
Claude user-fetch	live-user	83	0 / 32
Seznam	search	60	0 / 0
You.com	training	45	0 / 0
Common Crawl	training	43	0 / 0
Perplexity user-fetch	live-user	16	11 / 11
Claude search	ai-search	5	0 / 0

Verification columns show confirmed / sampled. A "0 / 0" reading means this bot was not sampled for rDNS in this window, not that it failed. Bots that fail rDNS on a sampled check are classified as spoofed and excluded from this table (1,003 spoofed requests total this period).

ChatGPT dominates live-user AI traffic

ChatGPT user-fetch sent 2,211 fetches in the 14-day window, more than Googlebot (1,018) and Bingbot (934) combined. This is ChatGPT's real-time retrieval: a user types a question, ChatGPT decides the answer needs fresh web content, and it fetches from this site. These are not training crawls. They represent direct citations in ongoing conversations.

Claude user-fetch added 83 live-user fetches, and Perplexity user-fetch added 16. The live-user AI category total was 2,310 fetches, versus 365 for AI-search crawls (bots pre-crawling for AI search indexes) and 1,613 for AI training crawls.

Four training crawlers are actively indexing this domain

ByteDance Bytespider (365 fetches), Anthropic ClaudeBot (353), Amazonbot (352), and Meta AI (343) each sent 340-370 fetches. OpenAI GPTBot added 112 more. These five bots collectively account for 1,525 of the 1,613 training fetches. This is the pool that goes into future model versions of Doubao, Claude, Amazon's LLMs, LLaMA, and GPT respectively.

You.com (45) and Common Crawl (43) round out the training category. Common Crawl data is used by many smaller LLM projects and research models. Being present in Common Crawl is a secondary citability signal with wide reach.

Top pages by AI traffic

Page	AI fetches	Live-user	Total
/claude-fable-5-issues-fixes/	893	859	945
/agent-zero/	328	308	368
/ (homepage)	310	241	504
/dify-self-hosted-guide/	302	285	320
/searxng-self-hosted-guide/	183	164	203
/claude-code-specification-workflow-mcp/	160	142	193
/ollama/	77	70	121
/crewai-setup-production-guide/	71	54	86
/kidsevents/	57	25	128
/medical-training/	41	35	58
/hometurf/	27	7	54
/claude-seo/	26	1	51
/geo-observatory/	26	0	44
/perplexica-self-hosted-guide/	25	6	40
/geo-field-manual/	22	5	59
/n8n-self-hosted-guide/	20	4	40
/geo-ai-citation/	20	2	43
/driftlights/	21	0	37
/psychedelic-vr-visual-effects-meta-quest/	21	3	30

What drives the top pages

The /claude-fable-5-issues-fixes/ page is a case study in timely content. It covers known issues and fixes for a specific new model, a topic where real users query ChatGPT and get referred to specific sources. 834 of its 893 AI fetches came from ChatGPT user-fetch. The page did not exist a month earlier; this is what the GEO literature calls "freshness-sensitive" citation in action.

/agent-zero/ and /dify-self-hosted-guide/ represent a different pattern: high-intent install guides where someone asking "how do I set up Agent Zero" or "how do I install Dify" gets pointed here. These pages have durable citation appeal because the questions are stable, not time-sensitive.

/geo-field-manual/ and /geo-ai-citation/ have more AI training bot fetches relative to live-user fetches, suggesting they are being incorporated into model training data rather than cited in real time. This is the long-game payoff: future model versions may reflect these pages' framing of GEO concepts.

Spoofed bots and scanner noise

84,975 scanner and noise requests were filtered from these counts entirely. On top of that, 1,003 requests claimed to be known AI bots (using their User-Agent strings) but failed rDNS verification, meaning the IP did not resolve to the expected network. These are logged as spoofed and excluded. The verification rate among bots with a valid rDNS sample was 85.1% (2,163 verified out of 2,541 sampled).

Spoofed bots are common. Any request claiming to be Googlebot from a Hetzner VPS, or Bingbot from a random residential IP, fails immediately. The rDNS check is the only reliable way to distinguish real crawlers from noise.

Methodology

Data source: Apache combined_cf log format. Cloudflare passes real visitor IPs via CF-Connecting-IP; Apache logs this directly. The GEO Observatory pipeline runs daily at 06:40 UTC, processing the previous 14 days of logs. Bot classification uses User-Agent string matching against a curated bot list, then rDNS verification for bots where a known hostname pattern exists. Verification count columns in the table above show (confirmed / sampled): bots with "0 / 0" were not sampled in this window. ASN lookup via ip-api.com. All data is from server logs, not from any third-party analytics platform.

See the live version of this data at GEO Observatory (updates daily). Source code and methodology details in Site as AI Infrastructure. Questions about this report: see GEO Answers.

AI Crawler Report: June 2026