Stage A — Compiled AGENTS.md with Schema Validation

TL;DR

One PR to dashecorp/rig-gitops. ~2 hours of agent work. Replaces hand-written AGENTS.md with a CI-validated, compiled-from-facts, size-budgeted version. Highest-leverage step from the full docs strategy — based on Vercel’s published eval showing agent success rate 53% → 100% when AGENTS.md carries an embedded compressed index under 8 KB. Defer the larger wiki migration (Stage B, ~11 more hours) until we have 5 real assignments’ worth of data showing Stage A moved the needle.

AI-vendor-agnostic (design constraint)

The rig must run on any coding agent — Claude Code is today’s default but the design accommodates GPT-5 CLI, Gemini CLI, Aider, Cursor, and successors. AGENTS.md is the multi-vendor standard (stewarded by the Agentic AI Foundation; joint Google/OpenAI/Factory/Sourcegraph/Cursor). CLAUDE.md in this proposal is strictly optional — only added when Claude Code-specific behavior matters. Equivalent vendor-specific files (.cursorrules, GEMINI.md, CODEX.md) follow the same overlay pattern when their agent is running. The facts/ layer, compiled AGENTS.md, schema validation, and CI enforcement are identical regardless of which agent reads.

Flow

flowchart LR
    Y[facts/*.yaml] --> C[compile-agents-md.sh]
    S[facts/schema.json] --> C
    C --> A[AGENTS.md]
    A --> G[CI: docs-check.yml]
    G -->|--check| C
    G -.->|size > 8KB| X[fail]
    G -.->|schema invalid| X
    G -.->|ok| P[merge]

View Mermaid source

flowchart LR
    Y[facts/*.yaml] --> C[compile-agents-md.sh]
    S[facts/schema.json] --> C
    C --> A[AGENTS.md]
    A --> G[CI: docs-check.yml]
    G -->|--check| C
    G -.->|size > 8KB| X[fail]
    G -.->|schema invalid| X
    G -.->|ok| P[merge]

Scope

Adds

facts/stack.yaml — canonical tech stack (runtime, package manager, linter, test framework)
facts/conventions.yaml — commit format, branch naming, MCP servers in use
facts/pitfalls.yaml — numbered anti-patterns agents hit in this repo
facts/schema.json — JSON Schema that all three facts/*.yaml files validate against
scripts/compile-agents-md.sh — regenerates AGENTS.md from facts/; supports --check mode for CI
CLAUDE.md at repo root — ≤60 lines, @-imports AGENTS.md, adds Claude-specific overrides. Optional, Claude Code only — skipped entirely when pod runs a non-Claude agent.

Replaces

AGENTS.md at repo root — hand-written → compiled. Size budget 8 KB enforced by CI.

Updates

.github/workflows/docs-check.yml — adds compile-agents-md.sh --check, adds size budget checks, removes queries: from frontmatter validation, adds audience: requirement
docs/documentation-standard.md — frontmatter spec changes (drop queries, add audience/supersedes/source_refs), new “Compiled AGENTS.md” section, size budgets

Does NOT do (deferred to Stage B)

raw/ / wiki/ directory migration
Propagation to other repos
LLM-as-judge lint cron
File-back rule in character prompts
Memory MCP changes

Why Stage A over alternatives

Not full strategy: Vercel’s measured gain (53→100%) traces to compiled 8 KB embedded index. Lint crons, file-back, raw/ populations are compounding bets. Do the measured win first.
Not Phase 0 (dangerous-command guard): that’s days; Stage A is 2h. Every assignment between now and Phase 0 benefits.
Not just fix frontmatter: frontmatter alone doesn’t move agent success rate per Vercel.
Not adopt llms.txt too: no production rig uses it. AGENTS.md won.

Acceptance criteria

./scripts/compile-agents-md.sh produces valid AGENTS.md ≤ 8 KB.
./scripts/compile-agents-md.sh --check on fresh checkout exits 0.
Editing facts/stack.yaml without re-running compile causes CI to fail with diff.
Editing AGENTS.md directly causes CI to fail.
Invalid enum in facts/stack.yaml fails schema validation.
CLAUDE.md present at root, ≤ 60 lines, imports AGENTS.md via @.
Every existing doc has valid audience: field post-migration.
docs-check.yml passes on fresh PR.

Measurement plan

Baseline (last 30 days from `mt_events`)

Median turns per cli_completed on issue→PR assignments
Median cost per cli_completed
agent_stuck events per 100 assignments
First-attempt Review-E approval rate

Post-merge (5 real assignments)

Recompute same metrics.

Decision rule for Stage B

If 2 of 4 improve:

Median turns drops ≥ 15%
Median cost drops ≥ 15%
agent_stuck rate drops ≥ 20%
First-attempt approval improves ≥ 15 percentage points

→ proceed with Stage B. Otherwise investigate why Stage A didn’t help, or pivot to Phase 0.

Risks and rollback

Compile script bugs: tests in same PR covering schema validation, size budget, drift detection.
Frontmatter migration edge cases: idempotent migrate-frontmatter.sh script; dry-run first.
8 KB budget too tight: current hand-written AGENTS.md is 3-5 KB; 8 KB gives ~60% headroom.
CLAUDE.md import graph fails on Claude Code: test on live Dev-E pod in staging first.

Rollback: revert the PR. No cluster changes, no data loss.

Timeline

T+0: File tracking issue with baseline metrics
T+0 to T+2h: Dev-E implements
T+2h: PR opened, Review-E reviews
T+3h: Merges
T+3h to T+5d: 5 assignments process
T+5d: Recompute metrics, decide Stage B

Open questions

facts/ YAML or TOML? → YAML (repo convention + yq present).
Template engine for compile? → No, heredoc is readable.
Propagate to other repos? → No, rig-gitops alone for first measurement.
Exclude first 24h from baseline? → Yes, image propagation steady-state.

Evidence (from research)

Vercel eval: AGENTS.md with 8 KB embedded index → 100% pass rate on hardened Build/Lint/Test. See research/2026-04-18-production-docs-patterns.md.
Karpathy schema-file pattern: root schema file (AGENTS.md/CLAUDE.md) is the key config. See research/2026-04-18-llm-wiki-pattern-analysis.md.
Current state: rig-gitops frontmatter compliance 96% (exemplar); queries: field unread by CI or Claude. See research/2026-04-18-docs-state-audit.md.

Lifecycle

Draft — this state; awaiting human approval via PR merge
Approved — PR merged to main with status: approved; triggers create-impl-issues.sh
Implementing — GitHub issues created, Dev-E working
Done — all child issues merged, metrics recomputed, Stage B decision made

Stage A — Compiled AGENTS.md with Schema Validation

Stage A — Compiled AGENTS.md with Schema Validation

TL;DR

AI-vendor-agnostic (design constraint)

Flow

Scope

Adds

Replaces

Updates

Does NOT do (deferred to Stage B)

Why Stage A over alternatives

Acceptance criteria

Measurement plan

Baseline (last 30 days from mt_events)

Post-merge (5 real assignments)

Decision rule for Stage B

Risks and rollback

Timeline

Open questions

Evidence (from research)

Lifecycle

Baseline (last 30 days from `mt_events`)