LLM Wiki pattern — Karpathy analysis
LLM Wiki pattern — Karpathy analysis
Section titled “LLM Wiki pattern — Karpathy analysis”What Karpathy proposes
Section titled “What Karpathy proposes”The “LLM Wiki” gist is not a documentation style guide — it’s an architecture pattern where the LLM maintains a knowledge base rather than RAG-retrieving from raw docs.
Concrete techniques
Section titled “Concrete techniques”-
Three-layer architecture:
raw/— immutable sources, LLM-read-onlywiki/— LLM-owned markdown synthesis- Schema file (
CLAUDE.mdorAGENTS.md) defining conventions and workflows
-
Schema file is the key config. Quoted from the gist:
“a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki.”
-
Three named operations:
- Ingest — new source → touches 10-15 pages
- Query — “good answers can be filed back into the wiki as new pages”
- Lint — health-check for contradictions, stale claims, orphans, missing cross-refs
-
Two special files:
index.md— content-oriented catalog, LLM reads first. Karpathy claims this works up to “~100 sources, ~hundreds of pages” and avoids embedding RAG entirely.log.md— append-only chronological. Greppable prefix:## [2026-04-02] ingest | Article Titlesogrep "^## \[" log.md | tail -5works.
-
Optional CLI tool escalation:
qmd(hybrid BM25/vector + LLM rerank, CLI + MCP) when wiki outgrowsindex.md. -
Frontmatter for Dataview-style queries: “tags, dates, source counts” → dynamic tables.
-
Wiki is a git repo. Version history is free.
How this applies to the dashecorp rig
Section titled “How this applies to the dashecorp rig”Strengthens
Section titled “Strengthens”- Root
CLAUDE.md/AGENTS.mdas schema file. Karpathy explicitly names these as the config entry point. - Memory MCP ↔ docs split. Karpathy’s
raw/vswiki/split is the same shape the rig’s memory-store-vs-git-docs boundary wants to take.
Contradicts (and Karpathy’s call is better for agent rigs)
Section titled “Contradicts (and Karpathy’s call is better for agent rigs)”llms.txt(per llmstxt.org) — Karpathy doesn’t use it. His equivalent isindex.md(LLM-maintained, categorized summaries). For an agent rig that MAINTAINS its own docs,index.mdis more correct because it’s dynamic.- “One canonical
docs/directory” — Karpathy implies three dirs (raw/ wiki/ schema). Collapsing raw and wiki into one tree loses the provenance layer.
Net-new ideas the dashecorp rig is missing
Section titled “Net-new ideas the dashecorp rig is missing”log.mdwith greppable prefix — cheap audit trail, no DB needed. Dashecorp has none.- Lint as a scheduled operation — contradictions, stale claims, orphans. Dashecorp has
supersedes:as a passive field with no process consuming it. - “Query outputs get filed back” — compounding knowledge. Currently Dev-E’s good analyses die in chat history.
- Avoid embedding RAG at small scale —
index.md+ grep beats vector DBs under ~hundreds of pages. Dashecorp has rig-memory-mcp with pgvector; may be premature.
Application to this proposal tree
Section titled “Application to this proposal tree”This research/ directory IS the Karpathy raw/ + partial wiki/ pattern:
research/*.md— agent-authored synthesis of external sources (wiki-layer)- External URLs in
source_refs:— theraw/pointer (until we add araw/archive layer) proposals/*.md— decisions informed by research
Diagrams live as Mermaid source inline in markdown (Karpathy’s “assets” treated as code). The .mmd source is the canonical artifact; no PNG/SVG is committed.
Open questions
Section titled “Open questions”- Do we mirror Karpathy’s three-layer strictly, or keep the simpler two-layer (research/ + proposals/)? Current rig settled on two-layer plus a separate
user-stories/— seeproposals/2026-04-18-docs-tooling-decision. - How aggressive should Lint be? LLM-as-judge is expensive; scheduled weekly probably right.
- How do agents decide when to “file back”? Needs an explicit rule in character prompts.