AI Discovery Standards
Every file, protocol, and technique used to make websites discoverable by AI systems, search engines, and autonomous agents. One command to set up everything.
Why AI discovery matters now
The way people find information is changing. Google Search is no longer the only gateway to content. Perplexity, ChatGPT Search, Google AI Overviews, and Claude are answering questions directly, pulling from websites and citing sources inline. If your content is not structured for these systems, you are invisible to a growing share of how people discover information.
Traditional SEO optimizes for one system: Google's ranking algorithm. AI discovery optimizes for three simultaneously: traditional search engines (SEO), AI answer engines that cite sources (AEO), and generative models that recommend content (GEO). Each requires different signals, different file formats, and different content structures.
Most websites today have zero AI discovery infrastructure. They have a robots.txt that was last updated in 2019, no llms.txt, no structured AI permissions, and no agent-readable metadata. This is the equivalent of not having a sitemap in 2010. The gap between sites with AI discovery files and sites without will widen as AI search traffic grows.
This project provides every file you need to close that gap. It is not a framework, a library, or a SaaS product. It is a set of static files that any website can deploy in under five minutes.
What it does
Run one command and generate 13 AI discovery files for any web project. The CLI tool auto-detects your public/ or static/ directory, asks for your site details, and creates every file you need. Existing files are never overwritten.
One-command setup
npx ai-discovery-standards generates all 13 files
25+ AI crawlers
Complete robots.txt with every known AI bot
AEO & GEO guides
Answer Engine and Generative Engine optimization
Claude Code skill
Slash command for AI-assisted setup
Discovery Files
Static files you place on your web server to communicate with AI crawlers and agents. Each file serves a specific purpose in the discovery stack.
| File | Purpose |
|---|---|
robots.txt | Crawler access policies for 25+ AI bots |
llms.txt | Curated content summary for LLMs |
llms-full.txt | Full-text content for AI ingestion |
ai.txt | AI usage permissions (training, citation, indexing) |
ai.json | Structured content map for AI agents |
brand.txt | Brand governance rules for AI systems |
ai-plugin.json | ChatGPT plugin manifest |
agents.json | A2A agent capability advertisement |
security.txt | Vulnerability reporting (RFC 9116) |
humans.txt | Team credits and technologies |
sitemap.xml | URL index with metadata |
manifest.json | PWA metadata and icons |
browserconfig.xml | Windows tile configuration |
AEO vs GEO: how to optimize for both
Answer Engine Optimization (AEO) is about getting your content selected as the direct answer when someone asks a question to Perplexity, ChatGPT Search, or Google AI Overviews. The key is structure: use H2 headings that are literal questions, follow each with a concise 2-3 sentence answer, then provide supporting detail below. AI answer engines preferentially extract from this question-answer pattern because it maps cleanly to user queries.
Generative Engine Optimization (GEO) targets a different outcome: being cited as a source across AI platforms. When Claude or ChatGPT recommends a tool, a framework, or a company, what determines which ones get mentioned? The answer is authority signals: structured data (JSON-LD), consistent terminology across pages, clear authorship attribution, and machine-readable content summaries like llms.txt.
Practical implementation:
- Restructure your top 10 pages with question-format H2 headings and concise answer paragraphs
- Add FAQ schema (JSON-LD) to every page that answers common questions
- Publish an
llms.txtwith a clear, factual description of your site and its content - Add Organization and Person schema to establish entity authority
- Use consistent, specific terminology rather than vague descriptions across all pages
- Ensure every page has a clear, quotable summary in the first paragraph
The companies that implement both AEO and GEO now will compound their visibility as AI search traffic grows. Sites without these signals will not lose Google traffic immediately, but they will miss the fastest-growing discovery channel of 2026.
AI Crawler Registry
All known AI crawler user-agent strings as of April 2026, organized by company. Your robots.txt should address each of these explicitly.
OpenAI
GPTBotOAI-SearchBotChatGPT-UserAnthropic
ClaudeBotClaude-SearchBotClaude-UserGooglebotGoogle-ExtendedGoogleOtherPerplexity
PerplexityBotPerplexity-UserMeta
meta-externalagentmeta-externalfetcherApple
ApplebotApplebot-ExtendedAmazon
AmazonbotByteDance
BytespiderTikTokSpiderOthers
CCBotcohere-aiCopilotBotYouBotDiffbotrobots.txt strategy for AI crawlers
The critical distinction in AI crawler management is between search bots and training bots. These serve fundamentally different purposes, and your robots.txt policy should treat them differently.
Search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot) crawl your site to include your content in AI-generated answers. When someone asks "what is the best tool for X?" and your site has the answer, these bots are what make your content citable. Blocking them removes you from AI search results entirely.
Training bots (GPTBot, ClaudeBot, Google-Extended) crawl your site to ingest content into model training data. Your content becomes part of the model's knowledge but is not attributed to you. Some publishers block these to retain control over their content. Others allow them for broader influence.
Recommended strategy for most businesses: allow all search bots (you want citations), selectively allow or block training bots based on your content strategy, and always allow Googlebot (traditional search remains the largest traffic source for most sites).
FAQ
What is llms.txt?
What is the difference between AEO and GEO?
Which AI crawlers should I allow?
What is brand.txt?
What is ai.txt?
Do these files replace structured data (JSON-LD)?
How often should I update llms.txt?
Does this work with any web framework?
Get started
Run a single command to generate all 13 discovery files. The CLI auto-detects your project structure and walks you through the setup interactively. Existing files are never overwritten.
Works with Next.js, React, Vue, Hugo, Gatsby, and any static site. No dependencies to install.
Full documentation on GitHub