Production agent docs patterns — Vercel, Cloudflare, HumanLayer, Anthropic
Production agent docs patterns
Section titled “Production agent docs patterns”Headline finding (Vercel eval, Jan 2026)
Section titled “Headline finding (Vercel eval, Jan 2026)”Vercel ran a formal eval measuring agent success rates across documentation strategies on hardened Build/Lint/Test workflows:
| Strategy | Pass rate |
|---|---|
| Baseline (no AGENTS.md) | 53% |
| Skills, default loaded | 53% (no improvement) |
| Skills with explicit invocation instructions | 79% |
| AGENTS.md with embedded 8 KB compressed docs index | 100% |
Vercel’s quote on why: “Prefer retrieval-led reasoning over pre-training-led reasoning.” Passive context (facts the agent reads without deciding to retrieve) beats active retrieval steps where the agent might mis-route.
Production AGENTS.md shapes
Section titled “Production AGENTS.md shapes”Vercel (vercel/vercel/AGENTS.md)
Section titled “Vercel (vercel/vercel/AGENTS.md)”- ~140 lines, 11 sections. Not aspirational prose — a rulebook.
- Sections: Repository Structure, Essential Commands, Changesets (with mandatory rules), Code Style, Testing Patterns, Package Development, Runtime Packages, CLI Development, Common Pitfalls.
- Ends with a numbered “Common Pitfalls” list — the literal anti-patterns agents hit in this repo.
Cloudflare (cloudflare/cloudflare-docs/AGENTS.md)
Section titled “Cloudflare (cloudflare/cloudflare-docs/AGENTS.md)”- 733 lines — an outlier driven by MDX rendering quirks.
- Verbatim “MDX gotchas — the #1 cause of build failures” table mapping
{,},<,>to specific fixes. - 12-item “Common mistakes to avoid” numbered list.
- CI-vs-local split: “
npm run buildwill time out in CI environments… usenpm run checkand linters only — do not run a full build.”
Vercel Labs (vercel-labs/agent-skills/AGENTS.md)
Section titled “Vercel Labs (vercel-labs/agent-skills/AGENTS.md)”- Compiled, not hand-written. Source is individual rule files in a repo, CI composes AGENTS.md.
- Hallucinated facts fail
npm run check, not human review.
CLAUDE.md specifics (HumanLayer, followed by Anthropic internal)
Section titled “CLAUDE.md specifics (HumanLayer, followed by Anthropic internal)”- ≤ 60 lines. Claude Code’s system prompt already ships ~50 instructions; frontier models track ~150–200 reliably.
- Everything else goes in
agent_docs/*.mdloaded on-demand. - Claude Code auto-reads CLAUDE.md +
.claude/skills/*/SKILL.md. It does NOT auto-readllms.txt,docs/index.md, or frontmatterqueries:fields — those are inert to the CLI. - Quote: “Never send an LLM to do a linter’s job.” Don’t put code-style rules in CLAUDE.md.
What works (evidence-based)
Section titled “What works (evidence-based)”- Embedded docs index in AGENTS.md, not link-out. Skills-style “go fetch X” loses because agents need a decision point they frequently mis-route.
- Numbered “Common Pitfalls” lists at the end — both Vercel and Cloudflare do this; it’s the section agents hit when blocked.
- Split CI-vs-local validation matrices.
- Schema-validated YAML frontmatter (Cloudflare’s
pcx_content_typeenum, tag allowlist insrc/schemas/tags.ts) — build fails on drift, so drift is prevented by the compiler, not by goodwill. - Two-tier lint in CI — Vale + markdownlint + lychee (Datadog, GitLab, Fern all published this stack).
What’s vestigial (evidence-based)
Section titled “What’s vestigial (evidence-based)”llmstxt.org— no production rig in the sample adopted it. AGENTS.md won.- Code-style rules in CLAUDE.md — “Never send an LLM to do a linter’s job” (HumanLayer).
- Auto-generated
/initCLAUDE.md — unanimously treated as a starting draft, not a keeper. - OpenAPI specs as general agent docs — useful for tool surfaces, overkill for project conventions.
Failure modes documented elsewhere
Section titled “Failure modes documented elsewhere”- Silent rate-limit cascades → hallucinated gap-filling. Petrenko’s 16-agent refactor: “2 of 9 interview agents hit API rate limits and failed silently.” The dashecorp rig’s 529-overloaded rule addresses part of this; extend it: any agent that skipped a doc update must write an explicit “skipped: reason” line.
- Schema drift becoming canon. Industry data: ~7 engineer-days/month lost. Cloudflare prevents this by making invalid
pcx_content_typeor tags hard-fail the build. - Doc index bloat (“context rot”). Past ~8 KB, agent success rate measurably regresses (Vercel’s data, Morph’s research). Surge HQ: 693 lines of hallucinations from uncompressed context.
Three concrete recommendations beyond the public spec docs
Section titled “Three concrete recommendations beyond the public spec docs”- Compile, don’t hand-write, AGENTS.md. Vercel’s pattern: canonical facts in
facts/*.yamlwith JSON-Schema validation, compiled into AGENTS.md on each PR. Hallucinated facts failnpm run check. - Split into three files with hard size budgets. AGENTS.md ≤ 150 lines; CLAUDE.md ≤ 60 lines;
docs/agent-runbooks/*.mdfor task-specific, referenced byfile:line. Enforce with CI. Memory MCP stays separate. - Two-tier lint: deterministic first, LLM-as-judge second, temperature=0. markdownlint + vale + lychee every commit; scheduled job runs a different Claude instance over docs with a fixed rubric (contradictions, stale claims, orphans, factual drift against
git logsince last lint). Evidently/Arize guidance: “strict separation between generation and evaluation.”