llms.txt is a Markdown file placed at the root of a website (/llms.txt) that provides LLMs a curated summary of the site content. It includes an H1 title, a blockquote description, and organized links to key pages. Created by Jeremy Howard (Answer.AI) in 2024, it is widely adopted by companies like Anthropic, Stripe, and Vercel but is not an IETF or W3C standard.

What AI crawlers should I allow in robots.txt?

As of 2026, the major AI crawlers are: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI), ClaudeBot, Claude-SearchBot, Claude-User (Anthropic), Google-Extended (Gemini), PerplexityBot, Perplexity-User, meta-externalagent, Applebot-Extended, Amazonbot, CCBot, Bytespider, and cohere-ai. Search/retrieval bots cite your content in AI answers, while training bots absorb content into model weights.

brand.txt is a plain-text file placed at the root of a website that provides AI systems with explicit instructions on how to represent a brand. It defines the canonical brand name with exact capitalization, preferred and prohibited terminology, product names, tone guidance, and competitor disambiguation. It reduces AI hallucinations about your brand identity.

ai.txt is a plain-text file that declares permissions for AI use of website content. While robots.txt controls crawl access, ai.txt specifies what AI systems may do with the content: training, indexing, citation, or summarization. It includes owner contact information and links to other discovery files like llms.txt and sitemap.xml.

Open-Source Project

AI Discovery Standards

Q: What is the difference between AEO and GEO?

AEO (Answer Engine Optimization) focuses on structuring content so AI-powered answer engines like ChatGPT, Perplexity, and Google AI Overviews cite your site when generating responses. GEO (Generative Engine Optimization) extends this to focus on appearing in AI-generated summaries across all platforms. AEO targets question-answer extraction; GEO targets topical authority and citation rate across the entire AI ecosystem.

Every file, protocol, and technique used to make websites discoverable by AI systems, search engines, and autonomous agents. One command to set up everything.

View on GitHub

npx ai-discovery-standards

Why AI discovery matters now

The way people find information is changing. Google Search is no longer the only gateway to content. Perplexity, ChatGPT Search, Google AI Overviews, and Claude are answering questions directly, pulling from websites and citing sources inline. If your content is not structured for these systems, you are invisible to a growing share of how people discover information.

Traditional SEO optimizes for one system: Google's ranking algorithm. AI discovery optimizes for three simultaneously: traditional search engines (SEO), AI answer engines that cite sources (AEO), and generative models that recommend content (GEO). Each requires different signals, different file formats, and different content structures.

Most websites today have zero AI discovery infrastructure. They have a robots.txt that was last updated in 2019, no llms.txt, no structured AI permissions, and no agent-readable metadata. This is the equivalent of not having a sitemap in 2010. The gap between sites with AI discovery files and sites without will widen as AI search traffic grows.

This project provides every file you need to close that gap. It is not a framework, a library, or a SaaS product. It is a set of static files that any website can deploy in under five minutes.

What it does

Run one command and generate 13 AI discovery files for any web project. The CLI tool auto-detects your public/ or static/ directory, asks for your site details, and creates every file you need. Existing files are never overwritten.

One-command setup

npx ai-discovery-standards generates all 13 files

25+ AI crawlers

Complete robots.txt with every known AI bot

AEO & GEO guides

Answer Engine and Generative Engine optimization

Claude Code skill

Slash command for AI-assisted setup

Discovery Files

Static files you place on your web server to communicate with AI crawlers and agents. Each file serves a specific purpose in the discovery stack.

File	Purpose	Category
robots.txt	Crawler access policies for 25+ AI bots	Access Control
llms.txt	Curated content summary for LLMs	Content Discovery
llms-full.txt	Full-text content for AI ingestion	Content Discovery
ai.txt	AI usage permissions (training, citation, indexing)	Permissions
ai.json	Structured content map for AI agents	Permissions
brand.txt	Brand governance rules for AI systems	Brand
ai-plugin.json	ChatGPT plugin manifest	Agent Discovery
agents.json	A2A agent capability advertisement	Agent Discovery
security.txt	Vulnerability reporting (RFC 9116)	Trust
humans.txt	Team credits and technologies	Trust
sitemap.xml	URL index with metadata	Content Discovery
manifest.json	PWA metadata and icons	Platform
browserconfig.xml	Windows tile configuration	Platform

AEO vs GEO: how to optimize for both

Answer Engine Optimization (AEO) is about getting your content selected as the direct answer when someone asks a question to Perplexity, ChatGPT Search, or Google AI Overviews. The key is structure: use H2 headings that are literal questions, follow each with a concise 2-3 sentence answer, then provide supporting detail below. AI answer engines preferentially extract from this question-answer pattern because it maps cleanly to user queries.

Generative Engine Optimization (GEO) targets a different outcome: being cited as a source across AI platforms. When Claude or ChatGPT recommends a tool, a framework, or a company, what determines which ones get mentioned? The answer is authority signals: structured data (JSON-LD), consistent terminology across pages, clear authorship attribution, and machine-readable content summaries like llms.txt.

Practical implementation:

Restructure your top 10 pages with question-format H2 headings and concise answer paragraphs
Add FAQ schema (JSON-LD) to every page that answers common questions
Publish an llms.txt with a clear, factual description of your site and its content
Add Organization and Person schema to establish entity authority
Use consistent, specific terminology rather than vague descriptions across all pages
Ensure every page has a clear, quotable summary in the first paragraph

The companies that implement both AEO and GEO now will compound their visibility as AI search traffic grows. Sites without these signals will not lose Google traffic immediately, but they will miss the fastest-growing discovery channel of 2026.

AI Crawler Registry

All known AI crawler user-agent strings as of April 2026, organized by company. Your robots.txt should address each of these explicitly.

OpenAI

GPTBotOAI-SearchBotChatGPT-User

Anthropic

ClaudeBotClaude-SearchBotClaude-User

Google

GooglebotGoogle-ExtendedGoogleOther

Perplexity

PerplexityBotPerplexity-User

Apple

ApplebotApplebot-Extended

Amazon

Amazonbot

ByteDance

BytespiderTikTokSpider

Others

CCBotcohere-aiCopilotBotYouBotDiffbot

robots.txt strategy for AI crawlers

The critical distinction in AI crawler management is between search bots and training bots. These serve fundamentally different purposes, and your robots.txt policy should treat them differently.

Search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot) crawl your site to include your content in AI-generated answers. When someone asks "what is the best tool for X?" and your site has the answer, these bots are what make your content citable. Blocking them removes you from AI search results entirely.

Training bots (GPTBot, ClaudeBot, Google-Extended) crawl your site to ingest content into model training data. Your content becomes part of the model's knowledge but is not attributed to you. Some publishers block these to retain control over their content. Others allow them for broader influence.

Recommended strategy for most businesses: allow all search bots (you want citations), selectively allow or block training bots based on your content strategy, and always allow Googlebot (traditional search remains the largest traffic source for most sites).

FAQ

What is llms.txt?

A Markdown file at /llms.txt that gives LLMs a curated summary of your site. It includes a title, a one-paragraph description, and organized links to your key pages. Created by Jeremy Howard (Answer.AI) in 2024. Adopted by Anthropic, Stripe, Vercel, and Cloudflare.

What is the difference between AEO and GEO?

AEO (Answer Engine Optimization) targets question-answer extraction by AI systems like ChatGPT and Perplexity. GEO (Generative Engine Optimization) targets citation rate and "Share of AI Voice" across all AI platforms. AEO is about being the answer. GEO is about being the cited source.

Which AI crawlers should I allow?

Separate training bots (GPTBot, ClaudeBot, Google-Extended) from search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot). Blocking training bots prevents your content from being absorbed into model weights. Blocking search bots removes you from AI-generated answers entirely.

What is brand.txt?

A plain-text file that tells AI systems how to represent your brand: correct name capitalization, preferred terminology, prohibited terms, tone guidance, and competitor disambiguation. Reduces hallucinations about your brand identity.

What is ai.txt?

A plain-text file declaring what AI systems may do with your content: training, indexing, citation, or summarization. Works alongside robots.txt but with AI-specific granularity. Not yet standardized but gaining adoption.

Do these files replace structured data (JSON-LD)?

No. Discovery files and structured data serve different purposes. JSON-LD tells search engines and AI systems what type of content a page contains (Article, FAQ, Product). Discovery files tell AI systems what your site is about overall and how they may use it. You need both.

How often should I update llms.txt?

Update it whenever you add or remove major content sections, launch new products, or change your site structure. For most sites, a monthly review is sufficient. The file should reflect the current state of your site, not a historical archive.

Does this work with any web framework?

Yes. The CLI auto-detects public/ (Next.js, React, Vue), static/ (Hugo, Gatsby), and root directories. Files are plain text and JSON, framework-agnostic. They work with any web server that serves static files.

Get started

Run a single command to generate all 13 discovery files. The CLI auto-detects your project structure and walks you through the setup interactively. Existing files are never overwritten.

$ npx ai-discovery-standards

Works with Next.js, React, Vue, Hugo, Gatsby, and any static site. No dependencies to install.

Full documentation on GitHub