Is the brain pattern a good idea? — prior art in 2025-2026

TL;DR. The rig’s “brain” is a single ~30 KB markdown file, compiled from facts/*.yaml, fetched as step 1 of every agent session. That shape is mainstream 2026 practice — the current industry direction is markdown over RAG/MCP for institutional knowledge (Anthropic Skills, AGENTS.md convention, Karpathy’s LLM Wiki). Our compile-from-YAML + CI drift check is better than the community norm. But four concrete things are off-trend: we’re 3-5× the recommended always-load size, we don’t emit AGENTS.md so every non-Claude tool can’t find us, we use a per-portfolio split where the community uses nested scoping, and we front-load brain before reading the issue where the current trend is progressive disclosure.

Why this research exists

Before investing more in the brain pattern (per-repo notify workflows, a dashecorp-docs aggregator, a brain map upgrade), sanity-check it against what other teams are actually doing in 2025-2026. Is “brain” a known pattern, a bespoke invention, or a local name for something the industry has already standardized?

What our brain actually is (for contrast)

Axis	Our rig
Storage	One `BRAIN.md` file per portfolio, served at `research.rig.dashecorp.com/BRAIN.md` (raw) and `docs.rig.dashecorp.com/brain/` (rendered)
Size	~30 KB, CI-enforced budget of 36 KB
Source of truth	`facts/*.yaml` in `dashecorp/rig-docs` (repos, agents, surfaces, flows, events, whitepaper catalog, backlog); hand-edit of BRAIN.md forbidden
Compile step	`npm run brain` → emits BRAIN.md + public/BRAIN.md + src/content/docs/brain.md; CI `brain:check` rejects drift
When fetched	Step 1 of every agent session. Before reading the issue. Before any other tool call.
Two-hop	Rig BRAIN first (invariant), portfolio BRAIN (dashecorp-docs and other portfolios) second if assignment matches
Not	RAG. Vector DB. MCP memory. Dynamic retrieval.

Prior art — 10 data points

#	Who	Shape	Size	Key diff from ours
1	AGENTS.md — cross-vendor convention, Linux Foundation-stewarded	One MD at repo root; nested files override in subdirs	Target ≤150 lines. Codex caps aggregate at 32 KiB via `project_doc_max_bytes`	Per-repo, not per-portfolio. Hand-written, no compile. Respected by 60k+ OSS projects.
2	Claude Code `CLAUDE.md`	Auto-loaded at session start	HumanLayer recommends <300 lines; their own is <60	Single file, no YAML facts, no drift check.
3	*Cursor `.cursor/rules/.mdc`**	Multiple scoped rules, path-glob activated	Each file small; progressive	Scope-by-path, not always-loaded.
4	Aider `CONVENTIONS.md`	Style/convention file	Arbitrary	Not auto-loaded; user `--read`s it explicitly.
5	`llms.txt` + `llms-full.txt` (Anthropic, Vercel, Mintlify, OpenAI)	Slim index + full-corpus dump exposed by docs sites	llms-full.txt often MB-scale	Targets external LLM crawlers, not in-session priming. Two-tier pattern maps well.
6	Karpathy LLM Wiki (Apr 2026)	`index.md` + `log.md` + entity pages, LLM-maintained, compiled from raw sources	~100-200 pages before needing retrieval	Same compile-vs-RAG philosophy. LLM-maintained, not YAML-compiled. Wiki-scale, not single file.
7	Anthropic Agent Skills (Oct 2025)	`SKILL.md` index + scripts + resources, loaded on demand	Index tiny; bodies loaded only when triggered	Progressive disclosure — explicit opposite of our “read full brain every session.”
8	OpenAI Codex openai/openai monorepo	88 nested AGENTS.md files	Aggregate capped at 32 KiB	Scales via nesting, not portfolio split.
9	Datadog frontend monorepo (blog)	AGENTS.md hierarchy per package	Per-package	Same nested-scoping pattern.
10	Builder.io, Factory, Ona, Augment, Gemini CLI	All publish AGENTS.md guides and tooling	100-300 lines typical	Commercial endorsement of the AGENTS.md convention.

Tradeoffs the field actually reports

Token budget. HumanLayer: LLMs reliably follow ~150-200 instructions; Claude Code’s system prompt already burns ~50. Every extra paragraph in an always-loaded file uniformly degrades instruction-following. Our ~30 KB is 3-5× the community ceiling and sits right at Codex’s 32 KiB aggregate cap.

Staleness. Hand-curated files rot. AGENTS.md community reports a recurring “feedback loop” where multiple contributors append conflicting opinions, files grow unmaintainably, agent performance drops. Our YAML-compile + CI drift check is a meaningful mitigation the community mostly lacks — we should call this out as a differentiator when explaining the pattern externally.

Hand-curated vs RAG. The industry swing in late 2025 / early 2026 (Anthropic Skills release, The New Stack “Skills vs MCP”) is away from RAG/MCP for knowledge and toward markdown + progressive disclosure. GitHub MCP server used ~50k tokens per session; equivalent SKILL.md used ~200. We’re on the winning side of this argument.

Large codebases. A single 30 KB file doesn’t scale. OpenAI’s own repo has 88 nested AGENTS.md. Karpathy notes the wiki pattern breaks down past ~100-200 pages without a secondary retrieval layer. We’ll hit this wall if portfolios grow.

Multi-portfolio. Almost nobody does portfolio-level brains. The community answer is nested AGENTS.md files scoped by path. Our two-hop (rig brain + portfolio brain) is unusual but coherent.

Memory vs knowledge. Zep: RAG loses ground for institutional knowledge (what the rig-memory-mcp stores) but wins for conversational memory across sessions. Our rig-memory-mcp covers the latter; BRAIN covers the former. The separation is correct.

The counter-argument — who hates this and why

Three camps will push back:

Progressive-disclosure camp (Anthropic Skills, Cursor rules). “Don’t front-load 30 KB every session. Keep the index small (≤60 lines); let the agent pull deeper files on demand. You’re burning context on every run to cover cases that apply to 5% of sessions.”
Nested-convention camp (mainstream AGENTS.md). “Put AGENTS.md at every relevant directory. The agent reads the nearest one. No global brain needed; scope is implicit in where the agent is working.”
RAG / agent-memory camp (Zep, sqlite-memory, MCP memory primitives). “Knowledge grows; static files don’t. Use an MCP memory server with semantic retrieval so agents pull exactly what they need.” Losing ground for institutional knowledge in 2026, still winning for conversational memory.

Nobody credible is saying “no context file at all.” The live debate is static-and-full vs dynamic-and-sliced.

Comparison table

Choice	Our rig	Mainstream (AGENTS.md)	Progressive (Skills/Cursor)	Karpathy LLM-Wiki
One static file	Yes, ~30 KB	Yes per scope, ≤150 lines	No — index + on-demand	No — many pages
Authoring	YAML → compiled, CI drift check	Hand-written	Hand-written	LLM-maintained
Size budget in CI	Yes	No (Codex caps at 32 KiB silently)	No	No
Per-portfolio split	Yes (separate fetch)	No — nested by path	No — by skill trigger	No
Fetched every session	Yes, step 1	Yes (auto-load)	Only index; body on demand	Queried as needed
Retrieval method	Static URL fetch	File read	Tool-triggered load	LLM reads wiki pages

Verdict

The pattern is defensible and sits in the mainstream 2026 camp (markdown over MCP/RAG for institutional knowledge). The compile-from-YAML + CI drift check is better than what most teams do — keep that.

Four concrete concerns, ordered by severity:

30 KB is too big for “step 1 every session.” HumanLayer, AGENTS.md guidance, and Codex’s own 32 KiB cap all converge around ≤8-10 KB for always-loaded context. We’re burning context budget on portfolio-wide detail most sessions don’t need.
Action: split BRAIN.md into a ~5 KB “always-load index” + on-demand deeper files keyed off the cold-start recipe that’s already in the brain. This is the Anthropic Skills pattern; it maps cleanly onto our YAML compile.
We’re not on the AGENTS.md name. Every non-Claude tool (Cursor, Codex, Copilot, Factory, Ona, Gemini CLI, Aider) looks for AGENTS.md, not BRAIN.md.
Action: symlink or emit AGENTS.md from the same compile so non-Claude agents can find us, even if our own agents prefer the “brain” label internally.
Per-portfolio split is unusual. The industry answer is nested files scoped by path. Portfolio-level isn’t wrong, but verify that our agents actually benefit from the two-hop; if the second fetch is skipped or wrong-portfolio’d often, we’ve paid for nothing.
Action: log BRAIN fetches per agent for a week; measure whether the portfolio fetch correlates with the repo touched.
“Fetch before reading the issue” is aggressive. For trivial issues (typo fix, lockfile bump) 30 KB of brain is pure waste.
Action: add a gate — load brain only when the first tool call suggests org-level context is needed. Skills does this naturally via triggers; ours could via a simple first-turn heuristic.

What to verify before investing more

Measure actual token cost of the session-start fetch across the agent fleet for a week. If it’s >3% of total token spend, we’re overpaying.
Audit whether agents actually use the deep sections of BRAIN.md. If 80% of sessions only reference the top 5 KB, strip the rest.
Sanity-check that AGENTS.md exists at each repo root even if it just points at the brain URL — otherwise we’re locked out of every non-Claude coding agent, and the dashe-* / multi-vendor story breaks.

Bottom line

We’re doing a slightly-heavier-than-average version of the thing the field broadly agrees is right. Trim the always-load size, add progressive disclosure, emit AGENTS.md, measure usage — and we’re ahead of the pack rather than just keeping up.