Is the brain pattern a good idea? — prior art in 2025-2026
TL;DR. The rig’s “brain” is a single ~30 KB markdown file, compiled from
facts/*.yaml, fetched as step 1 of every agent session. That shape is mainstream 2026 practice — the current industry direction is markdown over RAG/MCP for institutional knowledge (Anthropic Skills, AGENTS.md convention, Karpathy’s LLM Wiki). Our compile-from-YAML + CI drift check is better than the community norm. But four concrete things are off-trend: we’re 3-5× the recommended always-load size, we don’t emitAGENTS.mdso every non-Claude tool can’t find us, we use a per-portfolio split where the community uses nested scoping, and we front-load brain before reading the issue where the current trend is progressive disclosure.
Why this research exists
Section titled “Why this research exists”Before investing more in the brain pattern (per-repo notify workflows, a dashecorp-docs aggregator, a brain map upgrade), sanity-check it against what other teams are actually doing in 2025-2026. Is “brain” a known pattern, a bespoke invention, or a local name for something the industry has already standardized?
What our brain actually is (for contrast)
Section titled “What our brain actually is (for contrast)”| Axis | Our rig |
|---|---|
| Storage | One BRAIN.md file per portfolio, served at research.rig.dashecorp.com/BRAIN.md (raw) and docs.rig.dashecorp.com/brain/ (rendered) |
| Size | ~30 KB, CI-enforced budget of 36 KB |
| Source of truth | facts/*.yaml in dashecorp/rig-docs (repos, agents, surfaces, flows, events, whitepaper catalog, backlog); hand-edit of BRAIN.md forbidden |
| Compile step | npm run brain → emits BRAIN.md + public/BRAIN.md + src/content/docs/brain.md; CI brain:check rejects drift |
| When fetched | Step 1 of every agent session. Before reading the issue. Before any other tool call. |
| Two-hop | Rig BRAIN first (invariant), portfolio BRAIN (dashecorp-docs and other portfolios) second if assignment matches |
| Not | RAG. Vector DB. MCP memory. Dynamic retrieval. |
Prior art — 10 data points
Section titled “Prior art — 10 data points”| # | Who | Shape | Size | Key diff from ours |
|---|---|---|---|---|
| 1 | AGENTS.md — cross-vendor convention, Linux Foundation-stewarded | One MD at repo root; nested files override in subdirs | Target ≤150 lines. Codex caps aggregate at 32 KiB via project_doc_max_bytes | Per-repo, not per-portfolio. Hand-written, no compile. Respected by 60k+ OSS projects. |
| 2 | Claude Code CLAUDE.md | Auto-loaded at session start | HumanLayer recommends <300 lines; their own is <60 | Single file, no YAML facts, no drift check. |
| 3 | Cursor .cursor/rules/*.mdc | Multiple scoped rules, path-glob activated | Each file small; progressive | Scope-by-path, not always-loaded. |
| 4 | Aider CONVENTIONS.md | Style/convention file | Arbitrary | Not auto-loaded; user --reads it explicitly. |
| 5 | llms.txt + llms-full.txt (Anthropic, Vercel, Mintlify, OpenAI) | Slim index + full-corpus dump exposed by docs sites | llms-full.txt often MB-scale | Targets external LLM crawlers, not in-session priming. Two-tier pattern maps well. |
| 6 | Karpathy LLM Wiki (Apr 2026) | index.md + log.md + entity pages, LLM-maintained, compiled from raw sources | ~100-200 pages before needing retrieval | Same compile-vs-RAG philosophy. LLM-maintained, not YAML-compiled. Wiki-scale, not single file. |
| 7 | Anthropic Agent Skills (Oct 2025) | SKILL.md index + scripts + resources, loaded on demand | Index tiny; bodies loaded only when triggered | Progressive disclosure — explicit opposite of our “read full brain every session.” |
| 8 | OpenAI Codex openai/openai monorepo | 88 nested AGENTS.md files | Aggregate capped at 32 KiB | Scales via nesting, not portfolio split. |
| 9 | Datadog frontend monorepo (blog) | AGENTS.md hierarchy per package | Per-package | Same nested-scoping pattern. |
| 10 | Builder.io, Factory, Ona, Augment, Gemini CLI | All publish AGENTS.md guides and tooling | 100-300 lines typical | Commercial endorsement of the AGENTS.md convention. |
Tradeoffs the field actually reports
Section titled “Tradeoffs the field actually reports”Token budget. HumanLayer: LLMs reliably follow ~150-200 instructions; Claude Code’s system prompt already burns ~50. Every extra paragraph in an always-loaded file uniformly degrades instruction-following. Our ~30 KB is 3-5× the community ceiling and sits right at Codex’s 32 KiB aggregate cap.
Staleness. Hand-curated files rot. AGENTS.md community reports a recurring “feedback loop” where multiple contributors append conflicting opinions, files grow unmaintainably, agent performance drops. Our YAML-compile + CI drift check is a meaningful mitigation the community mostly lacks — we should call this out as a differentiator when explaining the pattern externally.
Hand-curated vs RAG. The industry swing in late 2025 / early 2026 (Anthropic Skills release, The New Stack “Skills vs MCP”) is away from RAG/MCP for knowledge and toward markdown + progressive disclosure. GitHub MCP server used ~50k tokens per session; equivalent SKILL.md used ~200. We’re on the winning side of this argument.
Large codebases. A single 30 KB file doesn’t scale. OpenAI’s own repo has 88 nested AGENTS.md. Karpathy notes the wiki pattern breaks down past ~100-200 pages without a secondary retrieval layer. We’ll hit this wall if portfolios grow.
Multi-portfolio. Almost nobody does portfolio-level brains. The community answer is nested AGENTS.md files scoped by path. Our two-hop (rig brain + portfolio brain) is unusual but coherent.
Memory vs knowledge. Zep: RAG loses ground for institutional knowledge (what the rig-memory-mcp stores) but wins for conversational memory across sessions. Our rig-memory-mcp covers the latter; BRAIN covers the former. The separation is correct.
The counter-argument — who hates this and why
Section titled “The counter-argument — who hates this and why”Three camps will push back:
- Progressive-disclosure camp (Anthropic Skills, Cursor rules). “Don’t front-load 30 KB every session. Keep the index small (≤60 lines); let the agent pull deeper files on demand. You’re burning context on every run to cover cases that apply to 5% of sessions.”
- Nested-convention camp (mainstream AGENTS.md). “Put AGENTS.md at every relevant directory. The agent reads the nearest one. No global brain needed; scope is implicit in where the agent is working.”
- RAG / agent-memory camp (Zep, sqlite-memory, MCP memory primitives). “Knowledge grows; static files don’t. Use an MCP memory server with semantic retrieval so agents pull exactly what they need.” Losing ground for institutional knowledge in 2026, still winning for conversational memory.
Nobody credible is saying “no context file at all.” The live debate is static-and-full vs dynamic-and-sliced.
Comparison table
Section titled “Comparison table”| Choice | Our rig | Mainstream (AGENTS.md) | Progressive (Skills/Cursor) | Karpathy LLM-Wiki |
|---|---|---|---|---|
| One static file | Yes, ~30 KB | Yes per scope, ≤150 lines | No — index + on-demand | No — many pages |
| Authoring | YAML → compiled, CI drift check | Hand-written | Hand-written | LLM-maintained |
| Size budget in CI | Yes | No (Codex caps at 32 KiB silently) | No | No |
| Per-portfolio split | Yes (separate fetch) | No — nested by path | No — by skill trigger | No |
| Fetched every session | Yes, step 1 | Yes (auto-load) | Only index; body on demand | Queried as needed |
| Retrieval method | Static URL fetch | File read | Tool-triggered load | LLM reads wiki pages |
Verdict
Section titled “Verdict”The pattern is defensible and sits in the mainstream 2026 camp (markdown over MCP/RAG for institutional knowledge). The compile-from-YAML + CI drift check is better than what most teams do — keep that.
Four concrete concerns, ordered by severity:
- 30 KB is too big for “step 1 every session.” HumanLayer, AGENTS.md guidance, and Codex’s own 32 KiB cap all converge around ≤8-10 KB for always-loaded context. We’re burning context budget on portfolio-wide detail most sessions don’t need.
Action: split BRAIN.md into a ~5 KB “always-load index” + on-demand deeper files keyed off the cold-start recipe that’s already in the brain. This is the Anthropic Skills pattern; it maps cleanly onto our YAML compile. - We’re not on the AGENTS.md name. Every non-Claude tool (Cursor, Codex, Copilot, Factory, Ona, Gemini CLI, Aider) looks for
AGENTS.md, notBRAIN.md.
Action: symlink or emitAGENTS.mdfrom the same compile so non-Claude agents can find us, even if our own agents prefer the “brain” label internally. - Per-portfolio split is unusual. The industry answer is nested files scoped by path. Portfolio-level isn’t wrong, but verify that our agents actually benefit from the two-hop; if the second fetch is skipped or wrong-portfolio’d often, we’ve paid for nothing.
Action: log BRAIN fetches per agent for a week; measure whether the portfolio fetch correlates with the repo touched. - “Fetch before reading the issue” is aggressive. For trivial issues (typo fix, lockfile bump) 30 KB of brain is pure waste.
Action: add a gate — load brain only when the first tool call suggests org-level context is needed. Skills does this naturally via triggers; ours could via a simple first-turn heuristic.
What to verify before investing more
Section titled “What to verify before investing more”- Measure actual token cost of the session-start fetch across the agent fleet for a week. If it’s >3% of total token spend, we’re overpaying.
- Audit whether agents actually use the deep sections of BRAIN.md. If 80% of sessions only reference the top 5 KB, strip the rest.
- Sanity-check that
AGENTS.mdexists at each repo root even if it just points at the brain URL — otherwise we’re locked out of every non-Claude coding agent, and thedashe-*/ multi-vendor story breaks.
Bottom line
Section titled “Bottom line”We’re doing a slightly-heavier-than-average version of the thing the field broadly agrees is right. Trim the always-load size, add progressive disclosure, emit AGENTS.md, measure usage — and we’re ahead of the pack rather than just keeping up.
Sources
Section titled “Sources”- AGENTS.md (official)
- agentsmd/agents.md GitHub
- Augment — How to Build Your AGENTS.md (2026)
- OpenAI Codex — Custom instructions with AGENTS.md
- AGENTS.md in monorepos — precedence issue #53
- Datadog — Steering AI Agents in Monorepos with AGENTS.md
- HumanLayer — Writing a good CLAUDE.md
- DeployHQ — How to Configure Every AI Coding Assistant
- The Prompt Shelf — .cursorrules vs CLAUDE.md vs AGENTS.md
- Karpathy LLM Wiki gist
- Level Up Coding — Beyond RAG, Karpathy’s LLM Wiki pattern
- VentureBeat — Karpathy’s LLM Knowledge Base architecture
- Mintlify — Real llms.txt examples
- Mintlify — What is llms.txt?
- Anthropic llms-full.txt
- Vercel llms-full.txt
- BD Tech Talks — Inside Claude Skills
- Marcel Castro — Skills and progressive disclosure
- The New Stack — Running agents on Markdown instead of MCP
- Zep — Stop Using RAG for Agent Memory
- sqlite-memory — markdown-first agent memory