Skip to content

Production agent docs patterns — Vercel, Cloudflare, HumanLayer, Anthropic

Vercel ran a formal eval measuring agent success rates across documentation strategies on hardened Build/Lint/Test workflows:

StrategyPass rate
Baseline (no AGENTS.md)53%
Skills, default loaded53% (no improvement)
Skills with explicit invocation instructions79%
AGENTS.md with embedded 8 KB compressed docs index100%

Vercel’s quote on why: “Prefer retrieval-led reasoning over pre-training-led reasoning.” Passive context (facts the agent reads without deciding to retrieve) beats active retrieval steps where the agent might mis-route.

  • ~140 lines, 11 sections. Not aspirational prose — a rulebook.
  • Sections: Repository Structure, Essential Commands, Changesets (with mandatory rules), Code Style, Testing Patterns, Package Development, Runtime Packages, CLI Development, Common Pitfalls.
  • Ends with a numbered “Common Pitfalls” list — the literal anti-patterns agents hit in this repo.

Cloudflare (cloudflare/cloudflare-docs/AGENTS.md)

Section titled “Cloudflare (cloudflare/cloudflare-docs/AGENTS.md)”
  • 733 lines — an outlier driven by MDX rendering quirks.
  • Verbatim “MDX gotchas — the #1 cause of build failures” table mapping {, }, <, > to specific fixes.
  • 12-item “Common mistakes to avoid” numbered list.
  • CI-vs-local split: npm run build will time out in CI environments… use npm run check and linters only — do not run a full build.”

Vercel Labs (vercel-labs/agent-skills/AGENTS.md)

Section titled “Vercel Labs (vercel-labs/agent-skills/AGENTS.md)”
  • Compiled, not hand-written. Source is individual rule files in a repo, CI composes AGENTS.md.
  • Hallucinated facts fail npm run check, not human review.

CLAUDE.md specifics (HumanLayer, followed by Anthropic internal)

Section titled “CLAUDE.md specifics (HumanLayer, followed by Anthropic internal)”
  • ≤ 60 lines. Claude Code’s system prompt already ships ~50 instructions; frontier models track ~150–200 reliably.
  • Everything else goes in agent_docs/*.md loaded on-demand.
  • Claude Code auto-reads CLAUDE.md + .claude/skills/*/SKILL.md. It does NOT auto-read llms.txt, docs/index.md, or frontmatter queries: fields — those are inert to the CLI.
  • Quote: “Never send an LLM to do a linter’s job.” Don’t put code-style rules in CLAUDE.md.
  1. Embedded docs index in AGENTS.md, not link-out. Skills-style “go fetch X” loses because agents need a decision point they frequently mis-route.
  2. Numbered “Common Pitfalls” lists at the end — both Vercel and Cloudflare do this; it’s the section agents hit when blocked.
  3. Split CI-vs-local validation matrices.
  4. Schema-validated YAML frontmatter (Cloudflare’s pcx_content_type enum, tag allowlist in src/schemas/tags.ts) — build fails on drift, so drift is prevented by the compiler, not by goodwill.
  5. Two-tier lint in CI — Vale + markdownlint + lychee (Datadog, GitLab, Fern all published this stack).
  1. llmstxt.org — no production rig in the sample adopted it. AGENTS.md won.
  2. Code-style rules in CLAUDE.md“Never send an LLM to do a linter’s job” (HumanLayer).
  3. Auto-generated /init CLAUDE.md — unanimously treated as a starting draft, not a keeper.
  4. OpenAPI specs as general agent docs — useful for tool surfaces, overkill for project conventions.
  1. Silent rate-limit cascades → hallucinated gap-filling. Petrenko’s 16-agent refactor: “2 of 9 interview agents hit API rate limits and failed silently.” The dashecorp rig’s 529-overloaded rule addresses part of this; extend it: any agent that skipped a doc update must write an explicit “skipped: reason” line.
  2. Schema drift becoming canon. Industry data: ~7 engineer-days/month lost. Cloudflare prevents this by making invalid pcx_content_type or tags hard-fail the build.
  3. Doc index bloat (“context rot”). Past ~8 KB, agent success rate measurably regresses (Vercel’s data, Morph’s research). Surge HQ: 693 lines of hallucinations from uncompressed context.

Three concrete recommendations beyond the public spec docs

Section titled “Three concrete recommendations beyond the public spec docs”
  1. Compile, don’t hand-write, AGENTS.md. Vercel’s pattern: canonical facts in facts/*.yaml with JSON-Schema validation, compiled into AGENTS.md on each PR. Hallucinated facts fail npm run check.
  2. Split into three files with hard size budgets. AGENTS.md ≤ 150 lines; CLAUDE.md ≤ 60 lines; docs/agent-runbooks/*.md for task-specific, referenced by file:line. Enforce with CI. Memory MCP stays separate.
  3. Two-tier lint: deterministic first, LLM-as-judge second, temperature=0. markdownlint + vale + lychee every commit; scheduled job runs a different Claude instance over docs with a fixed rubric (contradictions, stale claims, orphans, factual drift against git log since last lint). Evidently/Arize guidance: “strict separation between generation and evaluation.”