How to Write an llms.txt for Your Documentation

What does an llms.txt file actually look like, and what should you put in it?

An llms.txt is a plain markdown file at the root of your domain (yoursite.com/llms.txt) that gives large language models a structured overview of your content. It has an H1 title, a one-paragraph blockquote summary, and lists of links to your most important pages grouped under H2 sections. Think of it as a curated table of contents written for an LLM that has a small context window and no interest in your marketing nav. It exists because full-site crawls are expensive and lossy, and because ChatGPT, Claude, and Perplexity increasingly pull answers directly from docs sites, so you want them reading the right pages.

The spec, briefly

The llms.txt standard was proposed by Jeremy Howard and answer.ai in September 2024. The format is deliberately simple, because it has to be parseable by both models and humans:

An H1 with the site or project name
A blockquote giving a one- or two-sentence summary
Optional paragraphs of additional context
One or more H2 sections, each containing a markdown list of links in the format [Title](url): optional description

There is also an llms-full.txt variant that inlines the full text of key pages instead of just linking to them, for cases where the LLM should have everything at once. Start with the short version. Most sites never need the full one.

A complete example

Here is what a working llms.txt looks like for a hypothetical SaaS called Acme:

# Acme

> Acme is a transactional email API for developers. This file links the pages
> an AI assistant should read to answer questions about the product, the API,
> integrations, and pricing.

Acme sends emails over SMTP or HTTPS, with SDKs for Node, Python, Go, Ruby,
and PHP. The free tier covers 3,000 emails per month on a single verified
domain. All pricing and limits below reflect 2026.

## Docs

- [Quickstart](https://acme.dev/docs/quickstart): Send your first email in five minutes
- [Authentication](https://acme.dev/docs/auth): API keys, signing, and rotation
- [Sending emails](https://acme.dev/docs/send): The core POST /emails endpoint
- [Webhooks](https://acme.dev/docs/webhooks): Delivery, bounce, and open events
- [Domain verification](https://acme.dev/docs/domains): SPF, DKIM, DMARC setup

## API reference

- [REST API](https://acme.dev/api): Full endpoint reference with request and response schemas
- [SDKs](https://acme.dev/sdks): Official libraries and install instructions
- [Rate limits](https://acme.dev/docs/limits): Per-key, per-domain, and burst limits

## Guides

- [Migrating from SendGrid](https://acme.dev/guides/migrate-sendgrid)
- [Marketing vs transactional](https://acme.dev/guides/marketing-vs-transactional)
- [Handling bounces](https://acme.dev/guides/bounces)

## Optional

- [Changelog](https://acme.dev/changelog)
- [Status page](https://status.acme.dev)
- [Pricing](https://acme.dev/pricing)

That is the entire file. Ninety seconds to read, zero ambiguity about what Acme does and where to find the details.

What actually belongs in it

The common mistake is to dump every URL from your sitemap into the file. Don't. An llms.txt is editorial, not exhaustive. The goal is to help an LLM answer the ten or twenty questions people actually ask about your product, which means you prioritize the pages that contain those answers.

Call this the 80/20 llms.txt: cover the top 20% of your docs that answer 80% of user questions. For most SaaS products, that is the quickstart, the authentication page, the core endpoint or feature reference, the pricing page, and a handful of guides that cover the common integration paths. Leave changelog entries, blog posts, marketing landing pages, and legal copy out of the main sections, or put them under an "Optional" heading that LLMs are allowed to skip when context is tight.

Write the blockquote summary like the answer to "what is this, in one sentence." Not the tagline from your homepage. A real description, with the shape of the product in it. "Acme is a transactional email API for developers" tells a model more than "Acme: email, reimagined" ever will.

Where the file goes

At the root of your domain. yoursite.com/llms.txt. Not /docs/llms.txt, not /public/llms.txt, not behind auth. The whole point is that a model or a scraper can find it with a one-request GET at a predictable location, in the same way they find robots.txt. If your docs live on a subdomain like docs.yoursite.com, put the file there too, because the crawler that lands on your docs subdomain will check for it there.

This is a separate file from robots.txt and sitemap.xml, which serve different purposes. robots.txt tells crawlers what they can and can't access. A sitemap lists every URL for indexing. llms.txt is curated context for generation. You want all three, and they should be part of your documentation strategy, not an afterthought.

Common mistakes to avoid

A few patterns that make the file less useful:

Linking to a login-walled page. If the LLM can't fetch it, the link is noise. Public pages only.
Generic titles. "Documentation" as a link title is worthless. "Sending emails: the POST /emails endpoint" tells the model what it will get.
Too many links. I've seen llms.txt files with 400 entries. That defeats the purpose. Aim for twenty to forty.
Stale URLs. The file rots the moment you restructure your docs. It needs to regenerate on publish, not once a quarter by hand.
Copy-pasting the nav. Your top nav is built for humans skimming on a phone. The llms.txt audience is a model reading the whole thing in one pass. Reorder accordingly.

How to automate it

Writing llms.txt by hand is fine for the first version. Keeping it current by hand is where teams give up, which is why the file needs to be part of your publish pipeline as documentation automation, not a manual step.

At Docsio, every published site gets an auto-generated llms.txt at the root on every publish. The generator uses Gemini 2.5 Flash through the AI Gateway with a 15-second timeout, and a deterministic fallback kicks in if the model call fails, so publishes never block on it. We learned two things worth sharing from running this in production:

First, the model-written summary is usually better than the one a developer would write under deadline, because the model has actually read the whole docs set and picks the one sentence that describes the product's shape. Second, the page list needs a human filter. The AI cheerfully includes changelog entries, marketing pages, and blog posts unless you bias the prompt toward docs-and-reference-only. Our generator is prompted to exclude everything under /blog/, /changelog/, and /pricing from the main sections, with changelog and pricing relegated to Optional.

If you're hand-rolling your own generator, the rough algorithm is: walk your sitemap, filter out the URL patterns that are not docs, group the rest by category (quickstart, reference, guides, integrations, optional), and pass the list plus a three-sentence product description to a small model with a template prompt. Cache the output. Regenerate on publish, not on request.

What to do next

Start with a fifteen-minute draft. Open a markdown file, write the H1 and blockquote, list your fifteen most-read docs pages, and drop it at the root of your domain. Ship it, then improve it when you have more data on what LLMs are actually asking about your product. You'll get more value from a rough version live today than a perfect one live next month.

If your docs site already runs on a platform that auto-generates llms.txt on publish, let it. If not, add the generation step to your deploy pipeline and move on. The file is boring infrastructure, and boring infrastructure is the kind you want to set up once and forget.