# Dashecorp Rig — full content dump > One-fetch ingestion of the research hub. Sections marked with URL headers. > Spec: https://llmstxt.org (llms-full.txt variant). > Canonical brain entry point: https://research.rig.dashecorp.com/brain/ --- ## https://research.rig.dashecorp.com/brain/ ## https://research.rig.dashecorp.com/BRAIN.md # Dashecorp Rig — Brain > Fresh-agent entry point. Read this first. One fetch (~27 KB) gives you the > repo manifest, deployed surfaces (including rig-conductor's 13 endpoints and > built-in Dashboard), agent instances, primary flows, frontmatter schema, > 40+ event types (summary; full schemas at [/events.md](./events.md)), > 18-whitepaper catalog, and the current backlog with prior_art links. Every > claim traces to its source file in `facts/`. > > Compiled from `facts/*.yaml` + live GitHub state (`gh api /orgs/dashecorp/repos` > for the repo list; manifest validation for agents). Do not hand-edit BRAIN.md. > Regenerate with `npm run brain`. CI runs `--check` and fails on drift. ## What this is The Dashecorp rig is an autonomous coding-agent system. A human posts a user story; agents research, propose, code, review, and ship. Canonical docs live in `dashecorp/rig-docs` (Astro Starlight); operational memory lives in a Postgres + pgvector Memory MCP; deployments are Flux-managed on a k3s cluster running on a GCE VM (`invotek-k3s` in `invotek-github-infra`). ## Published surfaces ### Rig landing — discoverable index of all surfaces - **URL:** https://rig.dashecorp.com/ - **Type:** html ### Canonical brain entry point (this file, rendered) - **URL:** https://docs.rig.dashecorp.com/brain/ - **Raw:** https://research.rig.dashecorp.com/BRAIN.md - **Type:** markdown ### Brain map — visual architecture + doc-linkage graph - **URL:** https://research.rig.dashecorp.com/map/ - **Type:** astro-starlight - **Note:** Two auto-derived diagrams (architecture from facts/, linkage from doc frontmatter). See the shape of what the rig knows before fetching individual pages. ### LLM site map (research, proposals, user-stories) - **URL:** https://research.rig.dashecorp.com/llms.txt - **Type:** llms-txt ### Full content dump (single-shot ingestion) - **URL:** https://research.rig.dashecorp.com/llms-full.txt - **Type:** llms-full-txt ### Research, proposals, user-stories (rendered Starlight site) - **URL:** https://research.rig.dashecorp.com/ - **Type:** astro-starlight - **Source:** dashecorp/rig-docs ### Aggregated engineering docs (architecture, guides, whitepapers, per-repo docs) - **URL:** https://docs.rig.dashecorp.com/ - **Type:** mkdocs-material - **Source:** dashecorp/rig-gitops (docs-site/) - **Note:** Built by scripts/build-docs.sh in rig-gitops on push + hourly cron. Pulls each rig repo's docs/ via gh api. Different scope from research.rig.dashecorp.com (engineering reference vs. research). ### Sitemap (XML) - **URL:** https://research.rig.dashecorp.com/sitemap-index.xml - **Type:** sitemap-xml ### rig-conductor API (cluster-internal) - **URL:** http://rig-conductor-api.rig-conductor.svc.cluster.local:8080 - **Type:** rest-api - **Visibility:** cluster-internal-only - **Endpoints:** - `POST /api/events` — Submit any of the 40+ event types — see /events.md - `GET /api/assignments/next` — Claim next issue assignment. Query: agentId=dev-e-node - `GET /api/reviews/next` — Claim next PR review assignment. Query: agentId=review-e - `GET /api/pr-reviews/next` — Claim direct-PR review (no issue) for infra/tooling PRs - `GET /api/issues` — List tracked issues. Query: state=open|done|stuck - `GET /api/queue` — Current dispatch queue state - `GET /api/usage` — Token / cost usage by agent and/or repo. Query: agentId, repo - `GET /api/costs/issue` — Cost for a specific issue. Query: repo, issueNumber - `GET /api/costs/summary` — Aggregate cost. Query: days (default 7) - `GET /api/costs/daily` — Daily cost time series. Query: days - `GET /api/events/live` — SSE stream of live events (for Dashboard.html) - `GET /api/streams/status` — Stream consumer status - `POST /api/webhook/github` — GitHub webhook intake — normalizes GH events into rig-conductor stream - `POST /api/merge` — Server-side merge gate - `POST /api/execution-logs` — Create execution log envelope - `POST /api/execution-logs/{id}/logs` — Append log entries - `POST /api/execution-logs/{id}/steps` — Append structured step - `POST /api/execution-logs/{id}/complete` — Mark log complete - `GET /api/execution-logs/{id}` — Fetch log by id - `GET /api/execution-logs/issue` — Logs per issue. Query: repo, issueNumber - `GET /api/execution-logs` — List logs. Query: limit, status - `POST /api/execution-logs/cleanup` — Prune old logs - `GET /api/repo-learnings` — Fetch learnings. Query: repo - `POST /api/repo-learnings` — Upsert learning - `DELETE /api/repo-learnings` — Delete learning. Query: repo, key - `GET /api/guard-blocked` — Guard-block counts per agent. Query: agentId ### rig-conductor Dashboard (the built-in cost/activity UI) - **URL:** http://rig-conductor-api.rig-conductor.svc.cluster.local:8080/dashboard - **Type:** html-dashboard - **Source:** dashecorp/rig-conductor (src/ConductorE.Api/Dashboard.html) - **Visibility:** cluster-internal-only - **Note:** 42 KB single-page HTML dashboard — "Engineering Rig — Control Plane". Has Costs, Issues, Agents, Streams tabs. Driven by /api/costs/*, /api/usage, /api/issues, /api/streams/* endpoints. No separate Grafana/Starlight dashboard is needed — this one already renders per-agent / per-issue / per-day cost. ### Memory MCP (Postgres + pgvector) - **Type:** mcp-server - **Package:** @dashecorp/rig-memory-mcp - **Tools:** - `read_memories` — Query prior memory by topic/repo/scope with vector similarity - `write_memory` — Persist a new memory with scope/kind/importance/tags - `mark_used` — Increment hit_count on a memory that informed a decision ### Discord agent channels (notifications) - **Type:** discord - **Channels:** #dev-e, #review-e, #ibuild-e, #admin - **Note:** Agents post thread updates here; humans watch for stuck / pending state. ## Repos Live from `gh api /orgs/dashecorp/repos` merged with `facts/repos.yaml` annotations. Archived repos are dropped automatically. | Repo | Purpose | Language | Depends on | AGENTS.md | |---|---|---|---|---| | [`rig-gitops`](https://github.com/dashecorp/rig-gitops) | GitOps manifests (Flux HelmReleases, Kustomize bases) and the canonical AGENTS.md shared by every rig repo via `@dashecorp/rig-gitops/AGENTS | shell | — | compiled | | [`rig-agent-runtime`](https://github.com/dashecorp/rig-agent-runtime) | The AI agent runtime (Node) — one image that deploys as Dev-E, Review-E, or iBuild-E depending on character file + environment. Handles prom | javascript | rig-memory-mcp, rig-conductor | imports-rig-gitops | | [`rig-memory-mcp`](https://github.com/dashecorp/rig-memory-mcp) | MCP server backing persistent agent memory with Postgres + pgvector. Exposes `read_memories` / `write_memory` / `mark_used` tools consumed b | javascript | postgres-pgvector | claude-md | | [`rig-conductor`](https://github.com/dashecorp/rig-conductor) | Event store + dispatch service (C# + Marten + Postgres). Receives PR/issue events, assigns work, tracks turns/cost/stuck state, serves the ` | csharp | postgres, pgvector | imports-rig-gitops | | [`rig-docs`](https://github.com/dashecorp/rig-docs) | Research, proposals, user-stories, and rig-wide reference (Astro Starlight). This repo — you're reading its BRAIN.md. Deploys to research.ri | astro | — | hand | | [`rig-tools`](https://github.com/dashecorp/rig-tools) | Shell scripts, Git hooks, and workflow sync for AI-assisted development. Developer tooling, not deployed. The one repo without an AGENTS.md | shell | — | none | | [`infra`](https://github.com/dashecorp/infra) | OpenTofu/Terraform for GitHub org settings, Cloudflare (DNS, Pages, tunnels), GCP (k3s cluster on a GCE VM (invotek-k3s) hosting the rig), a | hcl | — | imports-rig-gitops | ### Per-repo doc index (token-efficient discovery) Before cloning a repo to find docs, consult this list to decide which docs are relevant to your issue. Then fetch raw markdown for **only** the relevant ones: ``` gh api repos/dashecorp//contents/docs/.md --header 'Accept: application/vnd.github.raw' ``` Auto-derived per compile via `gh api /repos//contents/docs`. Repos without a `docs/` dir are omitted. - **`rig-gitops`** — architecture-current.md, architecture-proposed-v2.md, architecture-proposed.md, documentation-standard.md, onboarding.md, research-multi-agent-platforms.md, review-e-bootstrap.md, sops.md - **`rig-agent-runtime`** — architecture.md, configuration.md, dashboard.md, deployment.md, discord-setup.md, heartbeat.md, index.md, memory.md, messaging.md, observability.md, quickstart.md, usage-tracking.md - **`rig-memory-mcp`** — api.md - **`rig-conductor`** — api.md, architecture.md, deployment.md, event-store.md, index.md, principles.md - **`rig-tools`** — agent-workflow.md ## Agents (deployment instances) ### Dev-E — writes code - **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime) - **Deployed in:** k3s cluster on GCE VM (invotek-k3s, invotek-github-infra) - **Manifest:** `dashecorp/rig-gitops/apps/dev-e/` - **Variants:** - node: `apps/dev-e/rig-agent-helmrelease.yaml` - python: `apps/dev-e/python-helmrelease.yaml` - dotnet: `apps/dev-e/dotnet-helmrelease.yaml` - **Character:** baked into HelmRelease values - **Triggers:** rig-conductor dispatch (issue.assigned events) ### Review-E — reviews PRs - **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime) - **Deployed in:** k3s cluster on GCE VM (invotek-k3s, invotek-github-infra) - **Manifest:** `dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml` - **Cron:** `*/5 * * * *` - **Search filter:** `org:dashecorp is:pr is:open author:app/dev-e-bot author:app/ibuild-e-bot -reviewed-by:app/review-e-bot` - **Discord:** #review-e ### iBuild-E — macOS / iOS builds - **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime) - **Deployed in:** Mac Mini (Oslo, Tailscale 100.92.170.124) - **Manifest:** `not-in-cluster` - **Discord:** #ibuild-e - **Notes:** Apple Silicon host, Xcode + App Store Connect. Auto-reauth cron refreshes OAuth every 5 min. Separate from the GCE-hosted agents because iOS builds require macOS. ## Primary flows ### Epic to merged work **Trigger:** Human opens a user-story GitHub issue in dashecorp/rig-docs 1. **rig-conductor** — Scans open issues, classifies, dispatches to appropriate agent 2. **Dev-E** — Reads issue + relevant research; authors research / proposal / code PR 3. **Review-E (cron every 5 min)** — Finds PR, reviews against AGENTS.md + memory, requests changes or approves 4. **Human** — Merges (or Review-E's approval satisfies branch protection; auto-merge fires) 5. **Cloudflare Pages** — Redeploys research.rig.dashecorp.com and docs.rig.dashecorp.com **Complete when:** issue closed via `Closes ### Research and proposal authoring **Trigger:** An Epic needs investigation before implementation 1. author dated research/YYYY-MM-DD-slug.md with user_story frontmatter 2. author proposals/YYYY-MM-DD-slug.md with source_research frontmatter 3. user_story file gets research_docs and proposal fields pointing back 4. RelatedDocs component auto-renders the graph; no manual cross-linking **Rules:** - bidirectional links required - schema enforced in src/content.config.ts - CI rejects PRs missing required fields ### Cold-start agent session **Trigger:** Fresh agent with blank memory receives an Epic or task 1. WebFetch https://research.rig.dashecorp.com/brain/ (or raw BRAIN.md) 2. Parse facts/repos.yaml equivalent in BRAIN.md — learn repo manifest 3. Parse facts/surfaces.yaml equivalent — learn URLs and endpoints 4. WebFetch https://research.rig.dashecorp.com/llms.txt for topic index 5. WebFetch relevant research/proposal docs directly via raw URL 6. For the target repo, fetch its AGENTS.md (compiled or imports-rig-gitops) 7. read_memories scoped to repo + topic via Memory MCP 8. Begin work with full context in ~15 KB total **Token budget:** ~15 KB read, leaves 200K+ for actual work on Opus ### Docs-memory promotion (weekly Lint) **Trigger:** Weekly scheduled Lint job 1. Scan Memory MCP for rows with importance >= 4 AND hit_count >= 5 2. For each candidate, check if docs already cover the topic (BM25 sim) 3. If not covered, propose a docs PR with the memory content promoted 4. Human approves PR, merge triggers redeploy **Status:** not-yet-built (design in research/2026-04-18-docs-memory-drift-lint) ### Diagram-as-code authoring **Trigger:** A research / proposal / user-story needs a diagram **Rule:** Mermaid source inline in fenced code block. No PNG or SVG ever committed. **Rendering:** remark-mermaid plugin wraps in `
` with `
` and `
` source; mermaid.js renders client-side; source preserved post-render for agent readers. ## Frontmatter schema (for authoring rig-docs content) - **type** (optional): one of `research` | `proposal` | `decision` | `reference` | `user-story` | `runbook` - **audience** (optional): one of `human` | `agent` | `both` — not a free-form array - **Required:** `title`, `description` - **Optional linkage fields** (paths are relative to src/content/docs/, no leading slash, no .md or .mdx extension): - `type` — See type enum above. - `audience` — See audience enum above. - `created` — ISO date string YYYY-MM-DD. - `updated` — ISO date string YYYY-MM-DD. - `topic` — Short slug grouping related docs. - `source_refs` — Array of URLs (external sources supporting this doc). - `supersedes` — Path to doc this replaces (no leading slash, no .md extension). - `superseded_by` — Path to newer doc that replaces this (same format). - `user_story` — (research/proposal only) Path to the user story this supports. - `research_docs` — (user-story only) Array of research doc paths this story spawned. - `proposal` — (user-story only) Path to the proposal answering this story. - `source_research` — (proposal only) Array of research paths this proposal synthesises. - `github_issue` — (user-story only) Full GitHub issue URL. Omit the field entirely if there is no issue — do NOT use empty string. Path examples: `user-stories/2026-04-18-docs-memory-strategy`, `research/2026-04-18-docs-tools-evaluation`, `proposals/2026-04-18-docs-tooling-decision`. Omit a field entirely when it has no value — do **not** use empty string. ## Whitepapers (private — catalog only) These whitepapers live at `dashecorp/rig-gitops/docs/whitepaper/*.md` (private repo — requires `gh` auth to fetch). BRAIN.md surfaces their titles + 1-line summaries so agents know what exists. Full content must be fetched with: `gh api /repos/dashecorp/rig-gitops/contents/docs/whitepaper/ --jq .download_url | xargs curl -sL`. - **[Whitepaper index](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/index.md)** (`index.md`) — Entry point listing all whitepaper sections and their companion docs. - **[MVP scope](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/mvp-scope.md)** (`mvp-scope.md`) — What the rig does in the minimum viable release. Gatekeeper for "is this in scope?" - **[Design principles](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/principles.md)** (`principles.md`) — First principles (measurement precedes trust; honest gaps; provider portability). - **[Trust model](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/trust-model.md)** (`trust-model.md`) — Who can approve what, which gates exist, human-in-the-loop rules. - **[Safety](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/safety.md)** (`safety.md`) — Dangerous-command guards, sandboxing, blast-radius containment. - **[Security](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/security.md)** (`security.md`) — Secrets handling, attestation, audit trail, SOPS+age. - **[Provider portability](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/provider-portability.md)** (`provider-portability.md`) — Multi-runtime (Claude Code, Codex CLI, Gemini CLI) via OTel GenAI conventions. Swap runtime without changing backend. - **[Observability — OTel, Langfuse, Prometheus, SLOs](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/observability.md)** (`observability.md`) — Self-hosted Langfuse (agent traces) + Grafana Cloud (infra) + local Prometheus (SLO gates) hybrid. Native OTel via `CLAUDE_CODE_ENABLE_TELEMETRY=1`. OTel Collector runs per-cluster, routes LLM traces to Langfuse, infra to managed. Per implementation-status: OTel Collector "Partial" (deployed for rig-conductor, agents not yet emitting), Langfuse "Planned", cost dashboard "Partial" (TokenUsageProjection exists, no LiteLLM proxy yet). - **[Cost framework](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/cost-framework.md)** (`cost-framework.md`) — Budget policy, per-model rate tables, cost attribution strategy. Companion to observability. - **[Self-healing](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/self-healing.md)** (`self-healing.md`) — Automatic recovery loops, StaleHeartbeatService, escalation severity routing. - **[Memory architecture](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/memory.md)** (`memory.md`) — Memory MCP scope, importance/hit_count model, promotion-to-docs threshold design. - **[Quality and evaluation](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/quality-and-evaluation.md)** (`quality-and-evaluation.md`) — How the rig evaluates its own output. Judge-agent pattern, fixed rubrics. - **[Drift detection](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/drift-detection.md)** (`drift-detection.md`) — Schema drift, docs drift, infra drift — detection thresholds and response. - **[Development process](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/development-process.md)** (`development-process.md`) — Issue → Epic → research → proposal → PR lifecycle, agent-human gates. - **[Example first story](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/example-first-story.md)** (`example-first-story.md`) — Worked walkthrough of one Epic end-to-end. - **[Glossary](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/glossary.md)** (`glossary.md`) — Rig-specific terminology (Epic, proposal, rig-conductor, Review-E, etc). - **[Known limitations](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/limitations.md)** (`limitations.md`) — Honest catalog of what the rig can't do today. - **[Implementation status](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/implementation-status.md)** (`implementation-status.md`) — Single source of truth for deployed vs planned per capability. 78 tracked across 11 domains; 21 deployed/partial (27%), 44 planned/deferred (56%). Every capability named in the whitepapers gets a row with status + whitepaper section + ticket/evidence. - **[Tool choices (ADRs)](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/tool-choices.md)** (`tool-choices.md`) — Decision records for tooling. Includes rejection list with rationale. **Most agents should start with:** `implementation-status.md` (what's deployed vs planned — 78 tracked capabilities) and whichever domain-specific whitepaper matches the Epic. ## rig-conductor event types (POST /api/events) All events from `dashecorp/rig-conductor/src/ConductorE.Core/UseCases/SubmitEvent.cs` MapToEvent switch. Names only here — fetch **[/events.md](https://research.rig.dashecorp.com/events.md)** for full field schemas (no auth required). **Pipeline (issue → PR → merge → deploy):** `ISSUE_APPROVED`, `ISSUE_ASSIGNED`, `ISSUE_UNASSIGNED`, `WORK_STARTED`, `BRANCH_CREATED`, `PR_CREATED`, `CI_PASSED`, `CI_FAILED`, `REVIEW_ASSIGNED`, `REVIEW_PASSED`, `REVIEW_DISPUTED`, `HUMAN_GATE_TRIGGERED`, `HUMAN_GATE_REMINDER`, `MERGED`, `MERGE_GATE_WAITING`, `MERGE_GATE_MERGED`, `MERGE_GATE_TIMEOUT`, `MAIN_CI_STARTED`, `MAIN_CI_PASSED`, `MAIN_CI_FAILED`, `DEPLOYED_STAGING`, `DEPLOYED_PRODUCTION`, `SMOKE_PASSED`, `SMOKE_FAILED`, `BUILD_FAILED`, `VERIFIED`, `ISSUE_DONE`, `ESCALATED`, `MILESTONE_COMPLETE`, `DUPLICATE_PR_CLOSED` **Direct PR path (no issue):** `PR_OPENED`, `PR_REVIEW_ASSIGNED`, `PR_REVIEW_APPROVED`, `PR_REVIEW_REJECTED` **Agent lifecycle:** `AGENT_STARTED`, `HEARTBEAT`, `AGENT_STUCK` **CLI sessions:** `CLI_STARTED`, `CLI_PROGRESS`, `CLI_COMPLETED` **Observability (cost + tooling):** `TOKEN_USAGE`, `TOOL_USED` **Memory MCP:** `MEMORY_WRITE`, `MEMORY_READ`, `MEMORY_HIT_USED` ## Known gaps (rig backlog) Cold-start agents should see these so they don't re-discover what's already identified. Each gap links to `prior_art` — existing stubs, research, or PRs that have already touched it. When a gap is being worked, `linked_user_story` points to the user story; when closed, the entry is removed from `facts/backlog.yaml`. ### [observability] Cost tracking mostly deployed — LiteLLM proxy + external access are the remaining gaps DO NOT propose "build a cost pipeline" — most of it is already shipped: 1. Data pipeline: TokenUsageProjection + CostProjection in rig-conductor consume TOKEN_USAGE + CLI_COMPLETED events. Read models live on Marten/Postgres. 2. API: GET /api/usage, /api/costs/issue, /api/costs/summary, /api/costs/daily on the rig-conductor cluster-internal URL (see BRAIN.md Published surfaces). Query by agent, repo, date range. 3. Dashboard: src/ConductorE.Api/Dashboard.html (~42 KB SPA, "Engineering Rig — Control Plane"). Served at / and /dashboard. Has a Costs tab driven by the /api/costs/* endpoints. The remaining gaps: a. LiteLLM proxy — not deployed. Blocks hard budget enforcement (agent ceiling kill-switch). b. External access — /dashboard is cluster-internal. A human on laptop can't view it without kubectl port-forward or a Cloudflare tunnel. Consider publishing a read-only projection. c. Alerting — no Discord webhook on cost threshold breach yet. Rough current spend: ~$5-15/day fleet-wide (order-of-magnitude only). **Prior art:** - rig-conductor cost endpoints and Dashboard.html — dashecorp/rig-conductor src/ConductorE.Api/ - TokenUsageProjection + CostProjection source: dashecorp/rig-conductor src/ConductorE.Api/Adapters/MartenProjections.cs - TOKEN_USAGE + CLI_COMPLETED events defined and emitted — see /events.md - Cost framework design: rig-gitops/docs/whitepaper/cost-framework.md (private) - Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml) - LiteLLM proxy not yet deployed — blocks hard budget enforcement **Status:** mostly-deployed ### [observability] OTel collector deployed for rig-conductor only — agents not yet emitting OpenTelemetry Collector is "Partial": deployed for rig-conductor; agent pods (Dev-E, Review-E, iBuild-E) have not yet enabled native OTel via `CLAUDE_CODE_ENABLE_TELEMETRY=1`. Langfuse (self-hosted) and Grafana Cloud ingest are both "Planned". Full design in the observability whitepaper. **Prior art:** - Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml) - Implementation status: whitepaper/implementation-status.md marks OTel Collector 'Partial', Langfuse 'Planned' - rig-memory-mcp/events.js FUTURE comment: migrate to OTel GenAI spans - Env var to enable native OTel: CLAUDE_CODE_ENABLE_TELEMETRY=1 + OTEL_EXPORTER_OTLP_ENDPOINT pointed at the in-cluster collector **Status:** partial ### [docs-memory] Docs-memory drift lint not implemented Weekly LLM-as-judge pass that promotes memory→docs (when importance≥4 AND hit_count≥5), flags stale research, catches orphan docs. Designed but no runtime built. **Prior art:** - Full design in research/2026-04-18-docs-memory-drift-lint - Parent user story: user-stories/2026-04-18-docs-memory-strategy - Principles synthesis: research/2026-04-18-docs-vs-memory-principles **Linked user story:** `user-stories/2026-04-18-docs-memory-strategy` **Status:** open ### [docs-surfaces] Two docs surfaces with overlapping scope docs.rig.dashecorp.com (MkDocs aggregation from rig-gitops/docs-site/) and research.rig.dashecorp.com (Starlight research hub from dashecorp/rig-docs). Both host rig docs; boundaries not formalised. Agents currently learn this empirically. Eventually unify or formalise the split. **Prior art:** - MkDocs site built by dashecorp/rig-gitops/scripts/build-docs.sh - Starlight site defined in dashecorp/rig-docs/ (this repo) - Docs tooling decision: proposals/2026-04-18-docs-tooling-decision (picked Starlight for research hub; MkDocs kept for aggregation) **Status:** open ### [agents] Review-E does not scan human-authored PRs Review-E's cron filter is `author:app/dev-e-bot author:app/ibuild-e-bot`. PRs authored by humans (including operator PRs to rig repos) are invisible to her. Design decision pending — widen filter or keep separation-of-concerns (human PRs = human review). **Prior art:** - HelmRelease: dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml (cron prompt line: `author:app/dev-e-bot author:app/ibuild-e-bot`) **Status:** open ### [deployment] CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID not in rig-docs repo secrets The deploy workflow gracefully skips deploy when secrets absent (notice only). Current deploys happen via direct `wrangler pages deploy` from the operator's laptop. Adding the secrets would enable per-PR preview deploys and automatic main-branch publishing. **Prior art:** - .github/workflows/deploy.yml has the has_cf_secrets guard - Cloudflare Pages project already exists: rig-research (created via wrangler) **Status:** open ### [agents] ATL-E retired, no active coordinator agent ATL-E (Stig-Johnny/atl-agent) was previously deployed as a k3s CronJob on dell-stig-1 and handled handoff-stall Discord notifications. As of ~2026-03-26 it is no longer deployed (not present in Stig-Johnny/cluster-gitops/apps/). The repo still exists but is dormant. If an Epic needs a coordinator/team-lead role, decide whether to redeploy ATL-E or build a replacement. **Prior art:** - Dormant repo: https://github.com/Stig-Johnny/atl-agent (last push 2026-03-26) - Stig-Johnny/cluster-gitops/apps/ — no atl-agent ArgoCD manifest **Status:** open ### [networking] iBuild-E cannot reach rig-conductor cluster-internal API Empirically verified on 2026-04-19: from iBuild-E (Mac Mini, Oslo, Tailscale IP 100.92.170.124), `curl http://rig-conductor-api.rig-conductor.svc.cluster.local:8080/api/health` fails with DNS resolve timeout. The `*.svc.cluster.local` name only resolves inside the k3s cluster via CoreDNS; Tailscale connects the host but doesn't federate cluster DNS. Impact: iBuild-E today cannot: - Send TOKEN_USAGE / HEARTBEAT / CLI_COMPLETED events (`POST /api/events`) - Pick up assignments (`GET /api/assignments/next`) - Reach the cost Dashboard or `/api/costs/*` iBuild-E is effectively disconnected from rig-conductor coordination. She operates from GitHub issues + Discord channels directly. Fix options (none implemented): a. Tailscale subnet router on a cluster node → expose `*.svc.cluster.local` range b. Ingress / GCP load balancer for rig-conductor-api with mTLS c. Cloudflare tunnel into the cluster d. Accept the gap: iBuild-E never sees rig-conductor; she runs on GitHub-only flows This has been a chronic "unknown" flagged by every cold-start test (v1 through v5). Now measured. **Prior art:** - facts/agents.yaml — iBuild-E: deployed_in: Mac Mini (Oslo, Tailscale 100.92.170.124) - curl rig-conductor-api.rig-conductor.svc.cluster.local:8080 → DNS resolve timeout after 3s (measured 2026-04-19) - Every cold-start test session-log flagged 'iBuild-E routing through cluster-internal services — latency unknown'. Not latency — reachability. Zero, not high. **Status:** open ### [cleanup] Plane residue — uninstall GitHub App + archive workspace Plane was retired 2026-04-18 but the makeplane GitHub App is still installed on the dashecorp org, and the Plane workspace at app.plane.so is still alive (token revoked). Manual UI action needed. **Prior art:** - Retraction proposal: proposals/2026-04-18-docs-tooling-decision (What retires section) - Retirement commit: dashecorp/infra PR #74 **Status:** open ## Architecture at a glance ```mermaid flowchart LR H[Human] subgraph Code["Code repos"] RD[rig-docs] RG[rig-gitops] RAR[rig-agent-runtime] CE_R[rig-conductor] RMM_R[rig-memory-mcp] RT[rig-tools] INF[infra] end subgraph Deployed["Deployed services + agents"] direction TB CE[rig-conductor svc] RMM[rig-memory-mcp svc] DE[Dev-E pod] RE[Review-E cron] IB[iBuild-E — Mac Mini] end subgraph Publish["Published surfaces"] direction TB S1[research.rig.dashecorp.com
Astro Starlight] S2[docs.rig.dashecorp.com
MkDocs aggregator] CFP[Cloudflare Pages] end %% Authoring + dispatch H -->|user-story issue| RD RD -->|dispatch| CE CE -->|assign issue| DE CE -->|assign PR review| RE CE -->|assign iOS build| IB DE -->|author PR| RD RD -->|PR opens| RE RE -->|approve / request changes| RD RD -->|merge| CFP CFP -->|publish| S1 RG -->|docs aggregation| S2 %% MCP + memory DE -->|tool use| RMM RE -->|tool use| RMM IB -->|tool use| RMM RMM_R -.implements.-> RMM %% Flux GitOps RG -->|Flux deploys| CE RG -->|Flux deploys| RMM RG -->|Flux deploys| DE RG -->|Flux deploys| RE %% Runtime image used by all agent deployments RAR -.image.-> DE RAR -.image.-> RE RAR -.image.-> IB CE_R -.image.-> CE %% Per-repo docs/ feeding into the MkDocs aggregator RG -.docs/.-> S2 RAR -.docs/.-> S2 CE_R -.docs/.-> S2 RMM_R -.docs/.-> S2 RT -.docs/.-> S2 %% Infra — outside the loop but manages everything above INF -.OpenTofu.-> CFP ``` Legend: solid arrows are runtime flows (dispatch, tool calls, deploys). Dashed arrows are *source-of* relationships — "this repo's image powers that pod" or "this repo's `docs/` feeds that site". Every rig repo from facts/repos.yaml is represented. ## Conventions (rig-wide) - **Docs are markdown with YAML frontmatter.** Required fields: `title`, `description`, `type`, `audience`, `created`/`updated`, `topic`. See [AGENTS.md](./AGENTS.md) in this repo. - **Bidirectional linkage.** User story ↔ research ↔ proposal via `research_docs`, `proposal`, `user_story`, `source_research`. RelatedDocs component renders the graph. - **Diagrams as code.** Mermaid source inline in markdown. No PNG or SVG committed. Source preserved post-render via `
` blocks. - **Per-repo CLAUDE.md** auto-loads when Claude Code starts a session in that repo's cwd (Claude Code reads `CLAUDE.md`, not `AGENTS.md` — cross-vendor standard is AGENTS.md but the loader is CLAUDE.md). Same-repo local `@AGENTS.md` imports work; cross-repo `@owner/repo/file` does **not** fetch from GitHub (filesystem-only, max 5 hops). - **Rig-wide agent instructions live in TWO places:** (1) each running agent's HelmRelease `character.personality` prompt (authoritative for Dev-E, Review-E in-cluster), (2) each repo's root `CLAUDE.md` (authoritative for interactive sessions). Both include the BRAIN.md fetch at session start. - **Closes #N required** in PR bodies. Review-E blocks on this. - **Memory MCP scope:** operational / ephemeral state only. Durable knowledge goes to rig-docs. ## Token-efficient cold start When you pick up a new Epic with blank memory, the cheapest order of operations: 1. **Fetch this file** (`https://research.rig.dashecorp.com/BRAIN.md`, public, no auth) — ~27 KB. 2. Fetch [`/llms.txt`](https://research.rig.dashecorp.com/llms.txt) for the research hub topic index — ~2 KB. 3. Identify 1-3 relevant research / proposal docs, fetch raw — ~5-15 KB. 4. Fetch target repo's `AGENTS.md` (each repo's is ≤8 KB) — ~5 KB. 5. `read_memories` from Memory MCP scoped to repo + topic — ~2 KB. Total cold-start context: ~35-45 KB. Leaves the rest of the budget for actual work. ## When this file needs updating Manual fields that live in `facts/*.yaml` — update when the matching reality changes: - `facts/repos.yaml` — **annotations only** (purpose, depends_on, used_by, agents_md, docs_surface). The repo list itself is auto-derived from `gh api` on every compile. Adding a new annotation, or updating an existing one, happens here. - `facts/surfaces.yaml` — URLs, API endpoints, MCP tools. Update when an endpoint changes or a new surface is published. - `facts/agents.yaml` — agent deployment instances. Compile validates each `manifest:` path exists on GitHub and warns on drift (how ATL-E retirement was caught). - `facts/flows.yaml` — documented rig processes. Update after retrospectives. - `facts/schema.yaml` — mirrors the Zod schema in `src/content.config.ts`. Keep in sync manually when the schema changes. - `facts/events.yaml` — rig-conductor event types. Keep in sync with `MapToEvent` in the C# source. - `facts/backlog.yaml` — known gaps. Add when identified; remove when closed. Then run `npm run brain`. CI (build workflow) runs `brain:check` and fails on drift. --- ## https://research.rig.dashecorp.com/reference/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/reference.md # Reference > Durable reference material — AGENTS.md, facts, conventions. import SectionIndex from '../../../components/SectionIndex.astro'; Durable reference material for humans and agents: the agent-facing AGENTS.md, compiled facts, and repo-wide conventions. --- ## https://research.rig.dashecorp.com/reference/agents/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/reference/agents.md # AGENTS.md (rig-docs) > Agent-facing reference for this repo — where user-stories/research/proposals live, how to author, what the CI schema requires. Canonical source at `AGENTS.md` in the repo root. This is the wiki-navigable copy. ## Purpose `dashecorp/rig-docs` is the static documentation site for the rig, built with [Astro Starlight](https://starlight.astro.build/) and deployed to Cloudflare Pages at https://research.rig.dashecorp.com. - **User stories** — markdown under `src/content/docs/user-stories/`, one file per story, frontmatter `github_issue:` points to the GitHub issue - **Research** — markdown under `src/content/docs/research/`, one file per focused question - **Proposals** — markdown under `src/content/docs/proposals/`, one file per decision - **Reference** — durable material under `src/content/docs/reference/` (AGENTS.md, facts) - **Diagrams** — Mermaid source inline in markdown. `.mmd` sources under `public/diagrams/` for direct linking. **No PNG or SVG committed.** ## Authoring rules 1. Every file has YAML frontmatter with at minimum `title`, `description`, `type`, `audience`, `created`, `updated`, `topic`. Schema enforced in `src/content.config.ts`. 2. User stories, research, and proposals are dated (`YYYY-MM-DD-slug.md`). 3. **Linkage is required, bidirectional, and rendered.** Declare in frontmatter: - User story: `research_docs: [paths]` + `proposal: path` + `github_issue: url` - Research / proposal: `user_story: path` - Proposal: `source_research: [paths]` for multi-research synthesis - Superseding: `supersedes: path` + `superseded_by: path` 4. A `RelatedDocs` panel renders above every page body showing the declared graph, plus backlinks from other pages. Agents consuming the rendered HTML (or /llms.txt) see the full relationship tree. 5. Diagrams: Mermaid source as fenced code blocks. The `remark-mermaid` plugin wraps them in a `
` containing both the rendered SVG (client-side) AND the source inside `
` — so agents reading post-JS HTML keep access to the source. ## Flow ```mermaid flowchart LR H["Human (mobile/desktop)"] -->|GitHub issue| I["user-stories/YYYY-MM-DD-slug.md
(github_issue: url)"] I -->|research_docs:| R1["research/YYYY-MM-DD-slug.md"] I -->|research_docs:| R2["research/YYYY-MM-DD-slug.md"] R1 -->|user_story:| I R2 -->|user_story:| I I -->|proposal:| P["proposals/YYYY-MM-DD-decision.md"] R1 -.->|source_research:| P R2 -.->|source_research:| P P -->|Review-E + human gate| M["merged to main"] M -->|Cloudflare Pages deploy| S["research.rig.dashecorp.com"] ``` ## Local dev ```bash npm install npm run dev ``` ## Deploy Merges to `main` publish via `.github/workflows/deploy.yml`. The deploy job is skipped when `CLOUDFLARE_API_TOKEN` / `CLOUDFLARE_ACCOUNT_ID` secrets are missing. --- ## https://research.rig.dashecorp.com/reference/facts/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/reference/facts.md # Facts > Source-of-truth YAML for compiled AGENTS.md. CI fails if AGENTS.md drifts from these files. # Facts YAML files that are the canonical source for `AGENTS.md`. `scripts/compile-agents-md.sh` reads these and produces AGENTS.md; `--check` mode detects drift in CI. **Pattern:** edit YAML → run compile script → commit both. CI fails if AGENTS.md is out of sync. ## Files - `stack.yaml` — runtime, package manager, linters (when this repo has its own runtime; currently content-only) - `conventions.yaml` — commit format, branch names, MCP servers - `pitfalls.yaml` — numbered anti-patterns, max 12 - `schema.json` — JSON Schema draft-07 validating all three ## Status Not yet populated — this repo is docs/content only (no runtime to describe). The compiled AGENTS.md pattern lives in `rig-gitops/` where it governs all agents reading the rig-wide contract. This directory is a placeholder for rig-docs-specific conventions if they emerge. --- ## https://research.rig.dashecorp.com/user-stories/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories.md # User stories > Stories captured from human stakeholders. Each links to its research and (once approved) its proposal. import SectionIndex from '../../../components/SectionIndex.astro'; User stories captured from human stakeholders. Each story has a GitHub issue (`github_issue:`) and lists its research docs and proposal via frontmatter. Open any story to see the full Related panel. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-18-docs-memory-strategy/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-18-docs-memory-strategy.md # User story: how should the rig handle docs vs memory (2026-04-18) > Parent user story tying together the three research docs (DASHE-11/12/13) for the docs/memory strategy refinement. > Migrated from Plane work item DASHE-14 on 2026-04-18. Original Plane workspace retired; this is the canonical home. **As** a rig operator **I want** a clear, enforced separation between canonical docs (git source of truth) and operational memory (MCP-backed Postgres), with explicit rules for agents on what goes where. **So that** I can trust what's authoritative when reading on mobile, and agents consistently write knowledge to the right layer without drift. **Acceptance criteria:** - Rig-docs and memory MCP have non-overlapping responsibilities documented in AGENTS.md - Agents follow file-back rule: durable learnings to rig-docs PR; ephemeral operational state to memory MCP - Lint operation detects drift (memory that should be docs, docs that duplicate memory) - Mobile-readable summary of the split on research.rig.dashecorp.com **Priority:** medium. Not blocking MVP but compounds over time. ### 🔁 Flow (diagram-as-code) *Mermaid source embedded below. If your viewer renders Mermaid live, you see the diagram. If not, you see the source — which IS the artifact (no PNG, no SVG, no drift).* ```mermaid flowchart LR H["Human (mobile)"] -- "writes user story
(GitHub mobile app)" --> I["GitHub issue in rig-docs"] I -- "picked up" --> A["Agent (Dev-E / Review-E)"] A -- "read_memories BEFORE work" --> M["Memory MCP (Postgres)"] A -- "authors PR
(markdown under src/content/docs/)" --> D["rig-docs"] D -- "Review-E gate" --> D2["merged to main"] D2 -- "Cloudflare Pages deploy" --> S["research.rig.dashecorp.com"] A -- "write_memory AFTER merge" --> M L["Lint (weekly)"] M -. "hit_count>=5, importance>=4" .-> L L -. "promote to docs" .-> D L -. "archive if 30d unused" .-> M ``` > **Updated 2026-04-18 (same day):** Original diagram had a `Plane["Plane work item"]` node between Human and GitHub issue. The Plane intake was retired the same day; the diagram now shows direct GitHub-issue authoring via the GitHub mobile app. Rationale in `proposals/2026-04-18-docs-tooling-decision.md`. Render live: mermaid.live (paste source) · source in rig-docs (canonical) --- ## https://research.rig.dashecorp.com/user-stories/2026-04-19-memory-to-docs-promotion/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-19-memory-to-docs-promotion.md # User story: memory-to-docs promotion pipeline > Memory accumulates agent experience; brain holds canonical truth. Close the loop: a weekly lint promotes high-value memories into docs PRs so the brain gets smarter over time. ## User story **As** a rig operator **I want** the rig to automatically surface high-value memories for promotion into the brain — a weekly lint that proposes docs PRs when a memory has crossed importance + hit-count thresholds **So that** durable learnings compound over time: what rig-dev discovered last month becomes what every agent knows next week, without a human having to catch it. ## Context The [brain/memory whitepaper §1](/whitepapers/2026-04-19-brain-and-memory/#1--why-two-systems) argues the promotion pipeline is what makes the architecture compound. Without it, memory grows noisy + brain stays static; both degrade. Design already exists in [research/2026-04-18-docs-memory-drift-lint](/research/2026-04-18-docs-memory-drift-lint/). What's missing is the runtime. ## Acceptance criteria 1. **Scheduled lint** runs weekly — new agent variant (or shared rig-reviewer-like role) reads Memory MCP, applies promotion rules. 2. **Promotion rule** (starting point): `importance >= 4 AND hit_count >= 5 AND NOT docs_cover_already(content)`. Last clause via semantic similarity against current docs. 3. **Output** is a docs PR in the relevant repo (rig-docs or a project brain) — not a silent memory edit. Human reviews + merges. 4. **Archival rule** runs in the same pass: memories with `hit_count = 0 AND age > 30d` are compacted/archived. Brain stays lean. 5. **Metrics** exposed via rig-conductor: memories written/week, promoted/week, archived/week, time-to-promotion for high-value ones. 6. **Memory pollution guard**: promotions require at least 3 distinct agents have hit-used the memory. Single-agent echo chambers don't qualify. ## Blocking dependencies - [ ] Docs-to-memory embedding pipeline (so similarity check has something to compare against) - [ ] LiteLLM proxy (so the lint agent is budgeted separately from work agents) - [ ] Decision on which role runs the lint (dedicated rig-lint? rig-reviewer variant? cron + stateless script?) ## Out of scope - Auto-merging promoted docs without human review (promotion always via PR) - Retroactive promotion of pre-implementation memories (start fresh after rollout) - Memory scope other than `repo` and `rig` (session scope never gets promoted) ## Priority **Medium.** Not blocking current operation, but directly addresses the "memory vs docs" design gap that was a founding concern of the rig (see the [LLM Wiki analysis](/research/2026-04-18-llm-wiki-pattern-analysis/)). --- ## https://research.rig.dashecorp.com/user-stories/2026-04-19-onboard-tablez-to-rig/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-19-onboard-tablez-to-rig.md # User story: onboard Tablez to the rig > Tablez leadership wants a dedicated autonomous engineering capacity on its product backlog. First onboarded project against the multi-project brain design. ## User story **As** Tablez leadership (CEO / CCO / CPO) **I want** dedicated autonomous engineering capacity for the Tablez product backlog — using the shared rig, with a per-Tablez project brain and issue dispatch on Tablez repos **So that** Tablez ships more product per quarter without proportional headcount growth, retains full control over product direction and release decisions, and inherits the rig's measured engineering-cost model (~$0.62 per small task, 20 min issue-to-merge for simple work, verified on Dashecorp work today). ## Context See the [Brain and Memory whitepaper](/whitepapers/2026-04-19-brain-and-memory/) for the architecture proposal. **What stays with Tablez:** product direction, architecture decisions, release judgment, customer communication, issue-quality discipline. **What the rig provides:** dispatch, implementation, review, merge, cost/quality observability. ## Acceptance criteria 1. ✅ **`tablez-docs` repo exists**, scaffolded from the dashecorp-docs template (`facts/*.yaml`, `compile-brain.mjs`, Starlight site, deploy workflows) — shipped at [dashecorp/tablez-docs](https://github.com/dashecorp/tablez-docs). 2. 🟡 **`tablez-docs.pages.dev/BRAIN.md`** — brain is served (2.5 KB scaffold) but **not yet populated** with Tablez repos / stack / deploy targets. Needs Tablez engineering to fill `facts/repos.yaml`. 3. 🟡 **rig-conductor dispatcher recognises Tablez-org repos** — the character-prompt side of routing landed ([rig-gitops#110](https://github.com/dashecorp/rig-gitops/pull/110) — `tablez-dev/*` now maps to `tablez-docs.pages.dev/BRAIN.md` in dev-e + review-e prompts). The rig-conductor side (server-side assignment annotation) is still the same deferred AC #5 from [#102](/user-stories/2026-04-19-two-layer-brain-multi-project/). 4. ⏳ **Tablez engineers can file `agent-ready` issues on Tablez repos → merged PR within minutes** — blocked on: dev-e + review-e GitHub Apps must be installed on `tablez-dev` org (dashboard action on Tablez side), and conductor dispatch scope expanded to include `tablez-dev`. 5. ⏳ **Per-Tablez cost attribution in the Dashboard** — depends on dispatch routing (#3). When agents run on Tablez work, `TOKEN_USAGE` events need a `project=tablez` dimension. 6. ⏳ **Weekly 30-min brain-review cadence** — process decision for when Tablez engineering starts using the rig. ## What shipped today (2026-04-19) - [github.com/dashecorp/tablez-docs](https://github.com/dashecorp/tablez-docs) — scaffolded, first deploy live - [tablez-docs.pages.dev](https://tablez-docs.pages.dev) — brain served, empty repos list, guidance comments for Tablez engineering - CF Pages project + all secrets (CF token, account ID, issue-read token) configured - `tablez-dev/*` → `tablez-docs.pages.dev/BRAIN.md` lookup active in dev-e + review-e character prompts (rig-gitops#110) ## What's still needed (Tablez-side) | Step | Owner | Blocking | |---|---|---| | Populate `facts/repos.yaml` with Tablez product repos | Tablez engineering | AC 2 | | Install `dev-e` GitHub App on `tablez-dev` org | Tablez org admin | AC 4 | | Install `review-e` GitHub App on `tablez-dev` org | Tablez org admin | AC 4 | | Expand rig-conductor's PR-watch scope to include `tablez-dev` | Rig maintainer | AC 4 | | Add `project` dimension to TOKEN_USAGE events for cost attribution | Rig maintainer | AC 5 | | Schedule first brain-review 30-min | Tablez eng lead | AC 6 | ## First 90 days | Phase | Days | Goal | |---|---|---| | Setup | 1-7 | Scaffold repo, populate facts, App installs, deploy | | Graduated | 8-30 | Small issues only; calibrate label discipline | | Scaled | 31-90 | Include feature work; expand to 50-80% of mechanical engineering | Day 0 = the day Tablez engineering installs the GitHub Apps and populates `facts/repos.yaml`. ## Priority **High** once Tablez is ready to start. The rig-side infrastructure is now in place; remaining work is Tablez-side bootstrap. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-19-two-layer-brain-multi-project/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-19-two-layer-brain-multi-project.md # User story: two-layer brain for multi-project support > Split the brain into an invariant rig brain (how agents work) and per-project brains (what each project is) so the same agents serve multiple product portfolios (Dashecorp iOS, Tablez, future). ## User story **As** the rig maintainer **I want** the brain split into two layers — an invariant rig brain (how agents work) and a per-project brain (what each project is) **So that** the same agent pods can serve Dashecorp's iOS portfolio, Tablez, and future projects without duplicating agents, while each project owns its own truth about itself. ## Context Scoping note from delivery: the original framing was "extract Dashecorp-specific content out of the rig brain". Once I started the work I found the rig brain already only contained rig-platform facts (rig-gitops, rig-agent-runtime, rig-conductor, etc.) — those **are** the rig. The Dashecorp **product** repos (cutie, fast-e, drink-e, count-e, star-rewards, nutri-e, ios-link-sdk) weren't in any brain. So this story is really **create project brains from scratch**, not extract-and-split. Rig brain stayed as-is. See the [whitepaper](/whitepapers/2026-04-19-brain-and-memory/#5--scaling-to-many-projects) for the architecture. ## Acceptance criteria 1. ✅ **Rig brain** at `research.rig.dashecorp.com/BRAIN.md` contains only invariant content. Already true before this story — no Dashecorp-product content was ever in there. 2. ✅ **Dashecorp project brain** at `dashecorp-docs.pages.dev/BRAIN.md` — shipped. Populated with the 7 iOS app repos, stack (SwiftUI + Xcode Cloud + RevenueCat), deploy targets, portfolio conventions (opaque RGB icons, release sequencing). 3. ✅ **Template** — `dashecorp-docs` itself is the template. Copy-strip-populate when spinning up a new project brain (as done for dashecorp-docs: clone rig-docs scaffold, strip rig-specific machinery, replace `compile-brain.mjs` with a minimal project-brain version). 4. ✅ **Agent character prompts** — dev-e-node + review-e now fetch both brains at session start when the assignment is on a repo outside the `dashecorp` org. Client-side routing via the prompt, no rig-conductor change (see AC 5 note). 5. ⏳ **rig-conductor dispatcher annotates assignments** with `projectBrainUrl` — **deferred** until we onboard a second tenant (Tablez, story #103). Client-side prompt routing works for the single-tenant case today; centralised routing becomes useful for per-project cost attribution + dispatch scope expansion. 6. ✅ Documentation in the rig brain explains the two-layer pattern — covered by the brain-and-memory whitepaper §5. ## Design decisions made - **URL convention:** `-docs.pages.dev/BRAIN.md`. `dashecorp-docs.pages.dev` shipped under that pattern. - **Ownership:** each project brain is owned by the project's maintainer team. For now both live under the `dashecorp` org; a future split moves per-tenant. - **Duplicate-fact resolution:** project brain wins on project-specific; rig brain wins on rig-wide; agents consult both and the prompt makes the precedence explicit. ## What shipped | Component | Where | PR/commit | |---|---|---| | `dashecorp-docs` repo scaffolded | github.com/dashecorp/dashecorp-docs | initial commit, manual push | | Project brain compiled + served | https://dashecorp-docs.pages.dev/BRAIN.md | first deploy via wrangler | | Dual-brain fetch in dev-e + review-e prompts | dashecorp/rig-gitops | #106 | | ConfigMap verified on live pod | dev-e-rig-agent-runtime-0 | live check 2026-04-19 | ## Still open (follow-up) - rig-conductor side routing (AC 5) — file as a separate story when Tablez onboarding starts, since Tablez is the first real second-tenant that exercises it. - Dispatch scope expansion: rig-conductor currently monitors `org:dashecorp`. The prompt update enables agents to behave correctly when routed to a Stig-Johnny or tablez-dev repo, but the dispatch itself needs to learn about those orgs — also a #103 concern. ## Priority **High.** Unblocks Tablez onboarding (#103). Core work done; full autonomy for multi-tenant deferred to #103's scope. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-20-agent-observability/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-20-agent-observability.md # User story: agent observability — one env var + a trace store > Turn on CLAUDE_CODE_ENABLE_TELEMETRY=1 in agent pods and route OTLP spans to a dual-backend stack: Grafana Cloud (startup credit) for infra+trace waterfalls, Langfuse Cloud (startup discount) for LLM-specific UX. Phoenix stays as the dev-inner-loop store. OpenObserve self-host is the documented fallback if Grafana credit is denied. ## User story **As** the rig operator **I want** every agent LLM call emitted as an OpenTelemetry GenAI span and stored in an LLM-aware trace store — without adding significant RAM pressure to the existing 8 GB VM **So that** a healthy run and an unhealthy run are distinguishable at a glance (tokens, cost, latency, prompt, response), every other priority on the roadmap has baselines to gate on, and the trace store decision does not lock us out of scaling later. ## Context See [whats-next whitepaper §Priority 2](/whitepapers/2026-04-20-whats-next/#priority-2--agent-observability), the source [`observability.md`](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/observability.md) whitepaper, and two research docs in sequence: 1. [2026-04-20 OTel-native LLM observability options](/research/2026-04-20-otel-llm-observability-options/) — structural comparison of 11 candidates (footprint, LLM UX, lock-in). **Superseded** on the pricing/recommendation question. 2. [2026-04-21 Startup programs + storage economics](/research/2026-04-21-otel-startup-programs-storage-economics/) — factors in **Grafana Cloud for Startups ($100k / 12 mo)** and **Langfuse 50% first-year discount**, both of which Invotek AS qualifies for. Flips the production-backend recommendation. The OTel Collector is already deployed for rig-conductor infrastructure traces. Two gaps remain: (a) agent pods don't emit OTel; (b) no LLM-aware trace store to render what they would emit. **Revised vendor path (2026-04-21):** Phoenix remains for the **dev inner loop** (local latency matters during development). Production backend becomes **Grafana Cloud + Langfuse**: Grafana's startup credit makes the full LGTM stack effectively free for 12 months ($22–$47/mo list at our 1.5M–15M spans/mo workload); Langfuse's startup discount halves the LLM-specific UX cost. **OpenObserve self-hosted on GKE** is the documented fallback if Grafana credit is denied (~$30/mo flat, zero lock-in). ## Acceptance criteria 1. ⏳ **Apply for credits** — Grafana Cloud for Startups and Langfuse early-stage discount. Both approvals 1–2 weeks. 2. ⏳ **`CLAUDE_CODE_ENABLE_TELEMETRY=1` set in agent pods** — one `rig-agent-helmrelease.yaml` edit per agent (dev-e, review-e, macos-e). No code change. Verifiable via `kubectl exec -- env | grep TELEMETRY`. 3. ⏳ **OTel Collector dual-export** — GenAI spans → Langfuse Cloud (Hobby free until discount lands); infra + full OTel → Grafana Cloud (free 50 GB / 14-day until credit lands). 4. ⏳ **LLM spans visible in Langfuse for all three agents** — token counts, model, latency, prompt/response, session tree. Spot-check 10 random tasks. 5. ⏳ **Infra traces visible in Grafana Cloud** — service topology, request spans, latency histograms. 6. ⏳ **Phoenix kept for dev inner loop** — local `docker compose` instance developers run while iterating on prompts or agent code; OTLP endpoint configurable via env var. 7. ⏳ **Credits-granted vs fallback ADR** — short tool-choices ADR. If either credit is denied within 30 days, ADR specifies the fallback path (Grafana credit denied → OpenObserve self-host; Langfuse discount denied → stay on Cloud Hobby until volume forces a choice). ## What it unblocks - **Priority 3 (hard cost ceiling)** — cost dashboards need span data; LiteLLM proxy attribution dovetails into per-task cost tracking via `gen_ai.usage.*` attributes. - **Priority 4 (nightly quality gate)** — regression metrics need baselines; baselines need a trace store. - **Tier promotion from T1 to T2** — decision requires a quality signal with more precision than "did CI pass." - **Debuggability of sick runs** — today a bad run is visible only in Discord threads and event-store projections. ## Out of scope - LiteLLM proxy (Priority 3 — separate user story) - Langfuse self-hosted migration (only relevant if Cloud discount is denied AND we outgrow Hobby; covered in AC 7 ADR) - Honeycomb-style burn-rate alerts (Phase 5 per implementation-status) - Agent quality evaluation (Priority 4) ## Priority **High.** Second in sequence after Priority 1 safety foundation. Visibility is prerequisite for every measurement-driven decision that follows. ## Estimated effort - AC 1 (apply for credits): ~30 min of paperwork, 1–2 week approval wait. - AC 2 (`CLAUDE_CODE_ENABLE_TELEMETRY=1` + `OTEL_EXPORTER_OTLP_ENDPOINT`): ~1 day. HelmRelease env-var edits across three agents + one SealedSecret for the endpoint auth. - AC 3 (Collector dual-export): ~2 days. Config change to the existing Collector. - AC 4 (Langfuse spans): ~1 day. Manual spot-check of 10 tasks. - AC 5 (Grafana infra traces): ~1 day. Free-tier signup + endpoint config. - AC 6 (Phoenix dev-loop `docker compose`): ~1 day. One-off dev-setup doc; no cluster work. - AC 7 (ADR): ~1 day. Total: **~1 week of focused work** (plus 1–2 week credit-approval wait that doesn't block AC 2–5 on free tiers). ## Caveats - **Phoenix OSS has no auth.** If ever exposed, Tailscale ACL is mandatory. Dev-loop use is `docker compose` on the engineer's machine — no public surface. - **Phoenix is Elastic-2.0 licensed** — source-available, self-host permitted for internal use, not for offering competing Phoenix-as-a-service. Non-issue for us. - **Grafana Cloud credit** is not guaranteed — Invotek AS qualifies on the plain criteria (<$10M funded, <25 FTE) but approval takes 1–2 weeks. AC 7 ADR covers the fallback path (OpenObserve self-host). - **Langfuse Hobby free tier** is 50k observations/mo — enough for today's volume but not for Tablez onboarding. The 50% startup discount makes Pro $99/mo first year, so the upgrade path is bounded. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-20-hard-cost-ceiling/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-20-hard-cost-ceiling.md # User story: hard cost ceiling — LiteLLM proxy with per-agent virtual keys > Layer-3 enforcement that returns 429 at the proxy before the request reaches the LLM provider. No trust-based limiter; no override flag. A compromised or looping agent cannot exceed its budget because the proxy fails the call. ## User story **As** the rig operator **I want** an enforced dollar ceiling on every agent's hourly spend — at the proxy layer, not the agent layer — with per-agent virtual keys and hard 429s before the request reaches the LLM provider **So that** one looping agent cannot burn the shared rate-limit budget for every other agent, and a new tenant (Tablez today, others later) can be onboarded with a hard per-key cap as a config row, not an engineering project. ## Context See [whats-next whitepaper §Priority 3](/whitepapers/2026-04-20-whats-next/#priority-3--hard-cost-ceiling) and the source [`cost-framework.md`](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/cost-framework.md) whitepaper. Today only the cheapest, lowest-guarantee layer exists: a `TokenUsageProjection` aggregates per-agent × per-repo cost after the fact. Layer 3 — **proxy-level hard ceiling** — is unbuilt. Without it, cost control is trust-based ("don't let agents loop"), which is exactly the failure mode a shared infrastructure cannot accept. **Honest caveat from the source whitepaper**: [LiteLLM issue #12905](https://github.com/BerriAI/litellm/issues/12905) shows user-level budgets are not enforced inside team configurations. The proxy is the **primary** defense, not an absolute one. Every LiteLLM upgrade needs a synthetic budget-overrun test. ## Acceptance criteria 1. ⏳ **LiteLLM proxy deployed** to the rig cluster. Helm release under `apps/litellm/`. Postgres-backed config (not in-memory) so virtual keys and budgets survive restarts. 2. ⏳ **Per-agent virtual keys** — one key per agent (dev-e, review-e, macos-e) plus one per onboarded tenant project (Dashecorp, Tablez). Each key has a hard daily budget and an hourly rate limit. 3. ⏳ **Anthropic as the default backend**, OpenAI and Gemini configured as fallbacks via LiteLLM `fallback_models` (deferred per cost-framework.md — enable only when multi-provider config exists). 4. ⏳ **Agent pods use virtual keys, not the raw Anthropic key** — the Anthropic account key exists only in the LiteLLM config. Agent pods get their virtual key via SealedSecret. 5. ⏳ **Synthetic budget-overrun test** — a dedicated CronJob deliberately exceeds a test key's daily cap. CI asserts: (a) 429 fires, (b) event emitted to rig-conductor, (c) provider not billed past the cap. Run on every LiteLLM upgrade. 6. ⏳ **Weekly budget review projection** — `TokenUsageProjection` extended with `virtual_key` dimension; per-key costs visible in the rig-conductor dashboard. 7. ⏳ **Pre-flight cost prediction** (nice-to-have, Phase 2 of cost-framework.md) — cheap model (Haiku) estimates tokens before dispatch; abort if estimated cost exceeds the task's budget fraction. ## What it unblocks - **Multi-tenant onboarding.** A new tenant becomes a LiteLLM virtual-key config row with `daily_budget: 50` and `models: [sonnet-4.6, haiku-4.5]`. No per-tenant engineering. - **Priority 1 Cilium egress policy** (AC 5 of safety-foundation) can narrow its allowlist to the LiteLLM proxy service only — no direct LLM egress from agent pods. Defense in depth. - **Cost attribution by tenant** — `TokenUsage` events carry `virtual_key` / `project` dimension; billing becomes queryable rather than reconstructed. - **Circuit breaker on 529 storms** — the proxy is the natural place to implement "pause dispatch for 5 min after 3 consecutive 529s" without per-agent code. ## Out of scope - Cross-provider fallback routing (deferred per cost-framework.md; adopt when multi-provider is actually configured) - Prompt caching verification (Claude Code does this automatically; verify via cost dashboard, not a code change) - flagd/OpenFeature feature flags (YAGNI — Kustomize + env vars cover today) ## Priority **High.** Sequenced after Priority 2 because cost dashboards need the trace store; before Priority 4 because nightly quality-gate runs need a bounded budget to be safe. ## Estimated effort - AC 1 (proxy deploy): ~5 days. HelmRelease + Postgres + SealedSecret for the master Anthropic key. - AC 2 (virtual keys + caps): ~3 days. LiteLLM admin UI or `/key/generate` API; document the key-rotation runbook. - AC 3 (Anthropic default, fallbacks deferred): ~1 day. Config only. - AC 4 (agents use virtual keys): ~2 days. SealedSecret rotation per agent; helm chart plumb. - AC 5 (synthetic overrun test): ~2 days. CronJob + assertion + alert. **Load-bearing AC** — the `#12905` caveat requires this to validate reality. - AC 6 (budget-review projection): ~3 days. Marten projection + dashboard panel. - AC 7 (pre-flight prediction, nice-to-have): ~5 days. Defer if time pressured. Total: ~2.5 weeks focused, ~3.5 weeks with AC 7. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-20-nightly-quality-gate/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-20-nightly-quality-gate.md # User story: nightly quality gate — golden suite as regression blocker > Nightly harness runs the rig against a fixed set of 10 internal tasks + weekly SWE-bench Pro subset + per-incident regression cases. Fails the pipeline on >10% metric regression. ~$3–8/night (~$1.1–2.9k/year). Prerequisite for autonomy tier promotion past T1. ## User story **As** the rig operator **I want** a nightly evaluation harness that runs the current rig against a fixed task set and fails the pipeline on >10% regression on any tracked metric **So that** prompt changes, dependency bumps, and model upgrades cannot silently degrade quality; autonomy-tier promotion (T1 → T2 → T3) has a defensible data source; and the headline claim "~20 min issue→merge, ~$0.62/task" is an invariant with guard rails, not a snapshot. ## Context See [whats-next whitepaper §Priority 4](/whitepapers/2026-04-20-whats-next/#priority-4--nightly-quality-gate) and the source [`quality-and-evaluation.md`](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/quality-and-evaluation.md) whitepaper. Today the rig has zero automated quality gates. Every change is a hope — "probably fine, CI was green." The source whitepaper is explicit: **"the agents are doing well" is not evidence; a dashboard line is.** Implementation-status tracks 7 quality-and-evaluation capabilities, **0 deployed.** The golden suite is not SWE-bench, it's *our* tasks — the ones the rig actually runs in production, fixtured and replayable. SWE-bench Pro is a separate, weekly, broader check for trend lines. Per-incident regression cases grow the suite organically — every bug the rig misses becomes a new test. ## Acceptance criteria 1. ⏳ **Golden suite of 10 internal tasks** — seeded from real merged issues across rig-conductor, rig-docs, rig-gitops. Each task has a fixture (starting SHA, issue body, expected diff scope) and a grading rubric (compile clean, tests pass, acceptance-criteria checklist). 2. ⏳ **Nightly harness deployed as a CronJob** — runs at 02:00 UTC (low contention with dispatch). Spins up ephemeral agent runtimes against the golden suite; posts results to Phoenix (via Priority 2) and the rig-conductor event store. 3. ⏳ **Grafana dashboard** for the 10-task trend — pass rate, median latency, median cost, median turns. Available for screenshot in weekly reviews. 4. ⏳ **Regression threshold** — `> 10%` degradation on any of {pass rate, median cost, median latency} versus rolling 7-day baseline **fails the pipeline** and alerts to Discord. 5. ⏳ **Per-incident regression cases** — every production bug the rig misses gets a row in `regression-cases/` + a new task in the nightly suite within the same PR that fixes it. Discipline, not automation. 6. ⏳ **Weekly SWE-bench Pro subset** — ~20 tasks, runs Saturday 02:00 UTC; budget ~$20–40/week; trend line only, does not block the pipeline. Dropped first if budget tightens. 7. ⏳ **Cost dashboard row** — the nightly run's spend is itself tracked via Priority 3 virtual keys; `nightly-eval` becomes a rig user for attribution. ## What it unblocks - **Autonomy tier promotion from T1 to T2.** Per trust-model.md, T2 requires "20 successful runs, zero rollbacks, quality metrics within tolerance." That sentence is meaningless without a fixed definition of "successful run" — the nightly suite **is** that definition. - **Per-PR regression gate** (follow-up Phase 2) can be hung on the same scaffold: PRs touching prompts or brain content trigger a subset of the nightly suite before merge. - **Property-based testing on labeled changes** (Phase 2) uses the same harness plumbing. - **LLM-as-judge sampling** (10% T0, 100% T2) is a second consumer of the nightly pipeline. ## Out of scope - Quarterly LiveCodeBench (deferred per quality-and-evaluation.md — drop first if budget tightens) - Inspect AI adoption (deferred — emerging; re-evaluate in Era 2) - DORA metrics adapted to agents (Phase 2 follow-up; same pipeline, different consumer) ## Priority **Medium-high.** Sequenced after Priority 3 so the nightly run's spend is itself bounded by a virtual key. Can start the golden-suite curation in parallel with Priorities 1–3; the harness wiring waits. ## Estimated effort - AC 1 (golden suite seeding — 10 tasks): ~5 days. Task selection, fixture capture, rubric writing. Hardest step — needs real engineering taste. - AC 2 (harness CronJob): ~5 days. Kustomize manifest + agent-runtime invocation loop + Phoenix span emission. - AC 3 (Grafana dashboard): ~2 days. TraceQL over Phoenix/Langfuse span attributes. - AC 4 (regression threshold + alert): ~3 days. Projection + Discord webhook. - AC 5 (regression-cases discipline): ongoing, process only — no code. - AC 6 (SWE-bench Pro weekly): ~3 days — or defer. - AC 7 (cost attribution via virtual key): ~1 day. Total: ~2.5 weeks focused, sequential after Priority 3. ## Budget Per quality-and-evaluation.md: **~$3–8/night × 365 = $1.1–2.9k/year** for the golden suite. **~$20–40/week × 52 = $1.0–2.1k/year** for SWE-bench Pro. Total ceiling ~$5k/year — tracked via Priority 3 virtual key. --- ## https://research.rig.dashecorp.com/user-stories/2026-04-20-safety-foundation/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/user-stories/2026-04-20-safety-foundation.md # User story: safety foundation — block the unrecoverable before higher-trust tiers > Phase-0 runtime guards that sit between agent reasoning and tool execution: dangerous-command blocklist, git worktrees per task, default-deny egress NetworkPolicy, GitHub App tokens with 1h TTL. Prerequisite for autonomy tier promotion past T1. ## User story **As** the rig operator **I want** deterministic runtime guards between agent reasoning and tool execution — dangerous-command blocklist, per-task git worktrees, default-deny egress, short-lived GitHub tokens **So that** a compromised or looping agent cannot do the unrecoverable thing (filesystem destruction, secret exfiltration, force-push to main, long-lived token replay) without a human in the loop, and the rig earns the right to advance autonomy tiers (T1 → T2 → T3). ## Progress (as of 2026-04-21) | AC | Status | Ship | |---|---|---| | 1 · Dangerous-command guard | ✅ Shipped | rig-agent-runtime #97 + #98 | | 2 · `GuardBlocked` events + dashboard panel | ✅ Shipped | rig-conductor #90, rig-agent-runtime #99, rig-conductor #99 | | 3 · Git worktrees per task | ✅ Shipped | rig-agent-runtime #101 | | 4 · GitHub App 1h tokens | ✅ Shipped | rig-agent-runtime #103 + rig-gitops #119 + #121 | | 5 · Default-deny egress (Phase 1) | 🔴 Attempted + reverted | rig-gitops #137 (shipped) → #143, #144 (reverted) | **Score: 4 / 5 ACs closed across 2026-04-20 → 2026-04-22.** AC 5 Phase 1 was shipped and reverted within hours on 2026-04-22. End-to-end test (this repo's issue #97) caught that `api.anthropic.com` resolves to Cloudflare anycast (162.159.x.x), not Anthropic's published `160.79.104.0/21` — the ipBlock approach cannot work. Leading redesign: route LLM egress through an in-cluster LiteLLM proxy (Priority 3), allowlist only the proxy pod. The GitHub CIDR refresh workflow was also removed (its split NetworkPolicy was not additive — any Egress policy implicitly denies non-matched traffic). Cluster-reality correction (rig-docs #95 — k3s, not GKE) stands. ## Context See [whats-next whitepaper §Priority 1](/whitepapers/2026-04-20-whats-next/#priority-1--safety-foundation) and the source [`safety.md`](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/safety.md) whitepaper (pillars 1–2). Today the rig has zero runtime guards. Trust is prompt-level ("don't do bad things") plus branch protection after the fact. The implementation-status matrix lists 8 safety capabilities with **0 deployed**. This is the highest-leverage first investment because every higher-trust tier depends on it — you cannot promote an agent to T2 (merge-with-approval) if T1 has no floor. ## Acceptance criteria 1. ✅ **Dangerous-command guard** — PreToolUse hook reads tool-call JSON on stdin, matches `tool_input.command` against a blocklist, exits 2 (block + reason) on match. Minimum blocklist: `sudo`, `rm -rf /` (not `rm -rf ./`), `git push --force` (without `--force-with-lease`), `git reset --hard`, `git clean -f`, `drop table`, `drop database`, `truncate table`, `kubectl delete namespace`, package installers, `chmod 777`, `chmod -R 000`, `curl … | sh`. **No override flag.** Escape hatch is the human running the command outside the agent loop. **Shipped** in [rig-agent-runtime#97](https://github.com/dashecorp/rig-agent-runtime/pull/97) (script + tests + CI) and [#98](https://github.com/dashecorp/rig-agent-runtime/pull/98) (activated by default via baked-in `~/.claude/settings.json`). 43 test cases pass. 2. ✅ **`GuardBlocked` event emission** — every block emits a non-blocking event to rig-conductor; counts visible via `GET /api/guard-blocked` (optional `agentId` filter) **and on the rig-conductor dashboard Safety panel** (header stat + per-agent table with top reason, last command, last-blocked time). **Shipped** in [rig-conductor#90](https://github.com/dashecorp/rig-conductor/pull/90) (event + projection + endpoint, 46/46 tests), [rig-agent-runtime#99](https://github.com/dashecorp/rig-agent-runtime/pull/99) (hook payload shape fix), and [rig-conductor#99](https://github.com/dashecorp/rig-conductor/pull/99) (dashboard Safety panel). 3. ✅ **Git worktrees per agent task** — each dispatched task runs in its own worktree under `/workspace/tasks///`, backed by a shared bare clone at `/workspace/_bare//.git`. One task's workspace cannot reach another's. Cursor 2026 pattern. **Shipped** in [rig-agent-runtime#101](https://github.com/dashecorp/rig-agent-runtime/pull/101) (`task-workspace` helper + 17 tests + CI, wired into the agent task prompt). 4. ✅ **GitHub App installation tokens (1h TTL)** — replaces the classic PAT in agent pods. Tokens minted per dispatch, expire in 60 minutes, never persisted to disk. **Shipped** across: - [rig-agent-runtime#103](https://github.com/dashecorp/rig-agent-runtime/pull/103) — removes the PAT fallback when App-mint fails (fail loud, not silent). - [rig-gitops#119](https://github.com/dashecorp/rig-gitops/pull/119) — removes the `GITHUB_PERSONAL_ACCESS_TOKEN` env var from dev-e + review-e pods entirely. - [rig-gitops#121](https://github.com/dashecorp/rig-gitops/pull/121) — implementation-status matrix updated. The only remaining trace is the `github-token` key still present inside the SealedSecrets; pruning requires a re-seal and is deferred to the next rotation. Nothing in the running pods references it. 5. 🔴 **Default-deny egress NetworkPolicy (Phase 1) — attempted + reverted 2026-04-22.** Two-phase rollout story: - **Cluster reality correction (stands)**: the rig runs on **k3s v1.34.6 on a GCE VM** (not GKE as BRAIN.md had drifted to claim). BRAIN + research corrected in rig-docs #95; "GKE Dataplane V2 / FQDNNetworkPolicy" plan was always inapplicable. - **Phase 1 ship (rig-gitops #137)**: plain k8s `NetworkPolicy` on `dev-e` + `review-e` allowing kube-dns, rig-conductor API (8080), valkey (6379), Anthropic `160.79.104.0/21`, GitHub `/meta` CIDRs. Weekly refresh workflow (#139) to prevent drift. - **Revert (rig-gitops #143 + #144)**: end-to-end test on this repo's issue #97 failed — `api.anthropic.com` resolves to Cloudflare anycast (162.159.x.x). Anthropic's published CIDR is origin-level, not client-reachable. Separately: postgres on 5432 was blocked (only 8080+6379 allowed to rig-conductor). The split `{ns}-github-egress` policy was also removed — k8s NetworkPolicy Egress rules are not additive; any matching policy creates default-deny for unmatched traffic. - **Redesign (pending)**: leading candidate is routing LLM egress through an in-cluster LiteLLM proxy (Priority 3) and allowlisting only the proxy pod. Hostname policy on GitHub + registries then via egress gateway or CNI swap. - **Parallel prompt fix (shipped, stands)**: `stream-consumer.js:226` rewritten in [rig-agent-runtime#110](https://github.com/dashecorp/rig-agent-runtime/pull/110) — no more dead `sudo apt-get` advice. ## What it unblocks - **T1 → T2 tier promotion.** Per [trust-model.md](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/trust-model.md), T2 is "agent merges with approval; no prod deploy creds." That policy is meaningless if the agent can `rm -rf` its way around approval. AC 1–3 are what make T2 real. - **Priority 2 observability** can be wired to the `GuardBlocked` event stream as an early signal. - **Priority 3 cost ceiling** — the egress policy (AC 5) is the chokepoint through which the LiteLLM proxy is made mandatory (if the only allowed LLM egress is the proxy, no agent can bypass it). ## Out of scope - Kyverno admission policies (Phase 4 per implementation-status) - Sigstore + cosign + SLSA L3 attestation (Phase 4) - CaMeL trust separation (Phase 6; only prompt-injection defense with a formal guarantee) - Schema-validated tool use via Pydantic/Instructor (continuous, not phase-gated) ## Priority **High.** Prerequisite for Priorities 2–4. No higher-trust autonomy tier is honest without it. ## Estimated effort - AC 1 (dangerous-command guard): ~1 week. Pattern specified in `safety.md`; reference implementation [Gastown's `tap_guard_dangerous`](https://github.com/gastown-ai/agent-safety). - AC 2 (`GuardBlocked` events): ~1 day. New event type + projection + dashboard panel. - AC 3 (worktrees per task): ~1 week. Well-trodden Cursor 2026 pattern. - AC 4 (GitHub App 1h tokens): ~3 days. Replaces classic PAT; installation-token mint loop in the agent startup. - AC 5 (default-deny egress): Phase 1 attempted + reverted 2026-04-22. Redesign via LiteLLM proxy pattern — estimate bundled with Priority 3. Total: ~5 weeks of focused work, parallelisable across 2 engineers. ## Adjacent ships (context) Work that landed alongside the AC deliverables but isn't a formal AC: - **[rig-agent-runtime#110](https://github.com/dashecorp/rig-agent-runtime/pull/110)** — rewrote the `## Runtime installs` block in `stream-consumer.js` to match guard reality. The old prompt advised `sudo` + `apt-get install`, both blocked by the AC 1 guard, so agents following their own guidance hit `GuardBlocked` and got stuck. Prep for AC 5 as well (primes agents for a future egress policy that denies arbitrary hosts). Surfaced by the [agent runtime-install audit research](/research/2026-04-21-agent-runtime-install-audit/). - **[dashecorp/infra#112](https://github.com/dashecorp/infra/pull/112)** — declarative per-repo provisioning of `RIG_BOT_PAT` via Terraform (`needs_rig_bot_pat = true` in `github/dashecorp/variables.tf:repos`). Not part of this user story, but the 2026-04-20 CI resuscitation that unblocked rig-conductor's `publish-image` → PR-based update-gitops flow depended on a manually-created PAT secret; #112 makes that pattern reproducible so the next dashecorp repo that needs PR-on-main doesn't rediscover the trap. See `dashecorp/infra/BOOTSTRAP.md`. --- ## https://research.rig.dashecorp.com/proposals/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/proposals.md # Proposals > Ship plans — one per decision. Each links to the research it synthesises and the user story it answers. import SectionIndex from '../../../components/SectionIndex.astro'; Proposals that the rig has shipped, is shipping, or rejected. Proposals declare `user_story:` and optionally `source_research:` in their frontmatter; superseded proposals chain via `supersedes:` / `superseded_by:`. --- ## https://research.rig.dashecorp.com/proposals/2026-04-18-docs-tooling-decision/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/proposals/2026-04-18-docs-tooling-decision.md # Docs tooling decision: Starlight + Cloudflare Pages (2026-04-18) > Final tooling decision for Dashecorp rig documentation, superseding the research report's Notion recommendation. ## Decision **Astro Starlight**, sources in `dashecorp/rig-docs`, deployed to Cloudflare Pages at `https://research.rig.dashecorp.com`. ## Context The research report (`research/2026-04-18-docs-tools-evaluation.md`) surveyed 25 tools against 12 criteria and recommended Notion, primarily on mobile-editing quality. Three facts that arrived after the report changed the calculus: 1. **Plane was set up mid-session** without confirming team conventions. The Plane workspace, Terraform module, and DASHE-* items are being retired. 2. **The Tablez enterprise uses Linear** on its own side — it is not a sunk cost for the rig side, and Tablez integration is webhook-based, not tool-coupling. 3. **The rig operator prefers code-first markdown** with YAML frontmatter, on an owned git repo, deployed statically. This aligns with the rig's pre-existing `docs.rig.dashecorp.com` convention (Material for MkDocs, aggregated from `rig-gitops/docs-site/`) and the "all docs must use YAML frontmatter markdown" rule. This site is a sibling Starlight surface, not a replacement — see the retired-decisions note below. With those in place, Notion's mobile-editing advantage (the main reason it beat Outline/GitBook) is no longer load-bearing. Phone-based authoring is acceptable via the GitHub mobile app for occasional edits; the primary authoring surface is desktop. ## Why Starlight specifically Starlight was picked over Material for MkDocs and Docusaurus on four dimensions: - **Active development.** Material for MkDocs entered maintenance mode in November 2025. Starlight is actively developed by the Astro team. - **Cloudflare Pages fit.** Astro has first-class Cloudflare support (it is Cloudflare's own docs stack for Agents SDK docs and several customer-facing sites). - **Native Mermaid.** Via a small custom remark plugin (`src/plugins/remark-mermaid.mjs`), fenced `mermaid` code fences render client-side with `mermaid.js`. No build-time rendering, no committed SVG artifacts. - **Mobile-first responsive defaults.** Starlight's reading experience on phones is strong out of the box — matters for anyone reviewing the site on mobile even if authoring is desktop. ## What moves - `dashecorp/rig-docs`: converted from flat markdown to Starlight scaffold. Existing research and proposals moved to `src/content/docs/research/` and `src/content/docs/proposals/` with their frontmatter intact. - Diagrams: PNG/SVG artifacts deleted. Mermaid `.mmd` sources preserved under `public/diagrams/` for direct linking; inline Mermaid fences are now the preferred form. - Cloudflare Pages: a GitHub Action deploys `dist/` on merge to `main` and previews on PRs. Secrets `CLOUDFLARE_API_TOKEN` and `CLOUDFLARE_ACCOUNT_ID` must exist on the repo. ## What retires - The Plane workspace (`dashecorp` / project `DASHE`) — to be archived after research content is migrated. - The `infra/plane/dashecorp/` Terraform module and `TF_VAR_plane_token` plumbing. - BOOTSTRAP.md steps 17-18 (GitHub Project v2 and Plane intake) are superseded by a simpler step: enable GitHub Pages / Cloudflare Pages on the rig-docs repo. - The `makeplane` GitHub App install on the org (not critical — can be left installed or removed). ## What this proposal does not decide - **User-story intake format.** Phone-first intake is deferred. If commute authoring becomes painful, a Telegram-to-PR bridge can be built later. User stories are, for now, GitHub issues in `dashecorp/rig-docs`. - **Linear integration for Tablez handoff.** Out of scope for this proposal. When Tablez onboarding starts, a Linear webhook receiver in the rig runtime will translate Linear issues to internal work items. ## Rollback If Starlight proves wrong, the markdown files survive as-is — the `src/content/docs/` tree is standard markdown with frontmatter and can be served by MkDocs, Docusaurus, or any other SSG with minor config changes. Content portability was a first-class concern. --- ## https://research.rig.dashecorp.com/proposals/2026-04-18-stage-a-compiled-agents-md/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/proposals/2026-04-18-stage-a-compiled-agents-md.md # Stage A — Compiled AGENTS.md with Schema Validation > One PR to rig-gitops replacing hand-written AGENTS.md with a compiled, schema-validated, size-budgeted version # Stage A — Compiled AGENTS.md with Schema Validation ## TL;DR One PR to `dashecorp/rig-gitops`. ~2 hours of agent work. Replaces hand-written `AGENTS.md` with a CI-validated, compiled-from-facts, size-budgeted version. Highest-leverage step from the full docs strategy — based on Vercel's published eval showing agent success rate 53% → 100% when AGENTS.md carries an embedded compressed index under 8 KB. Defer the larger wiki migration (Stage B, ~11 more hours) until we have 5 real assignments' worth of data showing Stage A moved the needle. ## AI-vendor-agnostic (design constraint) The rig must run on any coding agent — Claude Code is today's default but the design accommodates GPT-5 CLI, Gemini CLI, Aider, Cursor, and successors. AGENTS.md is the multi-vendor standard (stewarded by the Agentic AI Foundation; joint Google/OpenAI/Factory/Sourcegraph/Cursor). CLAUDE.md in this proposal is **strictly optional** — only added when Claude Code-specific behavior matters. Equivalent vendor-specific files (`.cursorrules`, `GEMINI.md`, `CODEX.md`) follow the same overlay pattern when their agent is running. The facts/ layer, compiled AGENTS.md, schema validation, and CI enforcement are identical regardless of which agent reads. ## Flow ```mermaid flowchart LR Y[facts/*.yaml] --> C[compile-agents-md.sh] S[facts/schema.json] --> C C --> A[AGENTS.md] A --> G[CI: docs-check.yml] G -->|--check| C G -.->|size > 8KB| X[fail] G -.->|schema invalid| X G -.->|ok| P[merge] ``` ## Scope ### Adds - `facts/stack.yaml` — canonical tech stack (runtime, package manager, linter, test framework) - `facts/conventions.yaml` — commit format, branch naming, MCP servers in use - `facts/pitfalls.yaml` — numbered anti-patterns agents hit in this repo - `facts/schema.json` — JSON Schema that all three `facts/*.yaml` files validate against - `scripts/compile-agents-md.sh` — regenerates AGENTS.md from facts/; supports `--check` mode for CI - `CLAUDE.md` at repo root — ≤60 lines, `@`-imports AGENTS.md, adds Claude-specific overrides. **Optional, Claude Code only** — skipped entirely when pod runs a non-Claude agent. ### Replaces - `AGENTS.md` at repo root — hand-written → compiled. Size budget 8 KB enforced by CI. ### Updates - `.github/workflows/docs-check.yml` — adds `compile-agents-md.sh --check`, adds size budget checks, removes `queries:` from frontmatter validation, adds `audience:` requirement - `docs/documentation-standard.md` — frontmatter spec changes (drop `queries`, add `audience`/`supersedes`/`source_refs`), new "Compiled AGENTS.md" section, size budgets ### Does NOT do (deferred to Stage B) - `raw/` / `wiki/` directory migration - Propagation to other repos - LLM-as-judge lint cron - File-back rule in character prompts - Memory MCP changes ## Why Stage A over alternatives - **Not full strategy:** Vercel's measured gain (53→100%) traces to compiled 8 KB embedded index. Lint crons, file-back, raw/ populations are compounding bets. Do the measured win first. - **Not Phase 0 (dangerous-command guard):** that's days; Stage A is 2h. Every assignment between now and Phase 0 benefits. - **Not just fix frontmatter:** frontmatter alone doesn't move agent success rate per Vercel. - **Not adopt llms.txt too:** no production rig uses it. AGENTS.md won. ## Acceptance criteria 1. `./scripts/compile-agents-md.sh` produces valid AGENTS.md ≤ 8 KB. 2. `./scripts/compile-agents-md.sh --check` on fresh checkout exits 0. 3. Editing `facts/stack.yaml` without re-running compile causes CI to fail with diff. 4. Editing `AGENTS.md` directly causes CI to fail. 5. Invalid enum in `facts/stack.yaml` fails schema validation. 6. `CLAUDE.md` present at root, ≤ 60 lines, imports AGENTS.md via `@`. 7. Every existing doc has valid `audience:` field post-migration. 8. `docs-check.yml` passes on fresh PR. ## Measurement plan ### Baseline (last 30 days from `mt_events`) - Median turns per `cli_completed` on issue→PR assignments - Median cost per `cli_completed` - `agent_stuck` events per 100 assignments - First-attempt Review-E approval rate ### Post-merge (5 real assignments) Recompute same metrics. ### Decision rule for Stage B If 2 of 4 improve: - Median turns drops ≥ 15% - Median cost drops ≥ 15% - `agent_stuck` rate drops ≥ 20% - First-attempt approval improves ≥ 15 percentage points → proceed with Stage B. Otherwise investigate why Stage A didn't help, or pivot to Phase 0. ## Risks and rollback - **Compile script bugs:** tests in same PR covering schema validation, size budget, drift detection. - **Frontmatter migration edge cases:** idempotent `migrate-frontmatter.sh` script; dry-run first. - **8 KB budget too tight:** current hand-written AGENTS.md is 3-5 KB; 8 KB gives ~60% headroom. - **CLAUDE.md import graph fails on Claude Code:** test on live Dev-E pod in staging first. Rollback: revert the PR. No cluster changes, no data loss. ## Timeline - T+0: File tracking issue with baseline metrics - T+0 to T+2h: Dev-E implements - T+2h: PR opened, Review-E reviews - T+3h: Merges - T+3h to T+5d: 5 assignments process - T+5d: Recompute metrics, decide Stage B ## Open questions 1. `facts/` YAML or TOML? → YAML (repo convention + `yq` present). 2. Template engine for compile? → No, heredoc is readable. 3. Propagate to other repos? → No, rig-gitops alone for first measurement. 4. Exclude first 24h from baseline? → Yes, image propagation steady-state. ## Evidence (from research) - **Vercel eval:** AGENTS.md with 8 KB embedded index → 100% pass rate on hardened Build/Lint/Test. See [research/2026-04-18-production-docs-patterns.md](../research/2026-04-18-production-docs-patterns.md). - **Karpathy schema-file pattern:** root schema file (AGENTS.md/CLAUDE.md) is the key config. See [research/2026-04-18-llm-wiki-pattern-analysis.md](../research/2026-04-18-llm-wiki-pattern-analysis.md). - **Current state:** rig-gitops frontmatter compliance 96% (exemplar); `queries:` field unread by CI or Claude. See [research/2026-04-18-docs-state-audit.md](../research/2026-04-18-docs-state-audit.md). ## Lifecycle - **Draft** — this state; awaiting human approval via PR merge - **Approved** — PR merged to main with `status: approved`; triggers `create-impl-issues.sh` - **Implementing** — GitHub issues created, Dev-E working - **Done** — all child issues merged, metrics recomputed, Stage B decision made --- ## https://research.rig.dashecorp.com/research/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research.md # Research > All research docs. Dated, one file per focused question. Each links to the user story it supports. import SectionIndex from '../../../components/SectionIndex.astro'; All research in the rig. Each entry shows its type, last-updated date, and summary. Research docs declare `user_story:` in their frontmatter; open the doc for the full Related panel. --- ## https://research.rig.dashecorp.com/research/2026-04-18-docs-memory-drift-lint/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-docs-memory-drift-lint.md # Research: anti-drift lint rules between docs and memory (2026-04-18) > Automated lint rules to prevent docs and memory from drifting apart as the rig operates. > Migrated from Plane work item DASHE-13 on 2026-04-18. Original Plane workspace retired; this is the canonical home. ## Anti-drift lint rules between docs and memory Concrete checks a scheduled Lint agent would run to catch docs↔memory drift. ### Promotion triggers (memory → docs) - Memory entries with `importance ≥ 4` AND `hit_count ≥ 5` — signal: this learning matters enough to go canonical ### Auto-archival (memory) - Memory entries older than 30 days with `hit_count = 0` — auto-compact or delete (unused, not worth keeping) ### Broken provenance - Docs with `source_refs` pointing to files that no longer exist in git → flag for cleanup ### Redundancy (memory ↔ docs) - Memory entries duplicating a known doc section (BM25 similarity \> 0.85) → flag as redundant ### Stale docs signal - Docs with `updated` older than any memory write on the same topic → review whether the doc has gone stale ### Who runs it Scheduled LLM-as-judge weekly pass — a dedicated Review-E variant or a cronjob. Per Evidently / Arize guidance: *"strict separation between generation and evaluation"* — use a different model from the agents that authored the content. --- ## https://research.rig.dashecorp.com/research/2026-04-18-docs-memory-inventory/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-docs-memory-inventory.md # Research: current docs and memory inventory (2026-04-18) > Snapshot of what lives in docs vs memory at the time the docs-memory strategy arc started. > Migrated from Plane work item DASHE-11 on 2026-04-18. Original Plane workspace retired; this is the canonical home. > Snapshot represents state at ~09:00 2026-04-18; by end of day this repo had grown to 7 research docs + 2 proposals + 1 user story + reference material. ## Current state of docs and memory in the rig (2026-04-18 morning) ### Inventory - **rig-docs (git)** — 0 GitHub issues yet; seeded with 3 research docs + 1 proposal from the 2026-04-18 docs-strategy arc (this was the initial state at the start of the day). - **rig-gitops/docs/ (git)** — 19 whitepaper pages + 7 root docs + compiled AGENTS.md (3623 bytes). - **Memory MCP (Postgres+pgvector)** — ~11 rows as of yesterday; agents use `read_memories` before work and `write_memory` after merge per prompt instructions. ### Duplication risk zones - Agent character prompts live in HelmReleases (not rig-docs). - Runbooks could go either place without a clear rule. - Decision logs (ADRs) have no clear home yet. ### Question to answer in the proposal What principle determines which system owns what? Candidate principles are in sibling research [Principles for docs vs memory separation](research/2026-04-18-docs-vs-memory-principles). --- ## https://research.rig.dashecorp.com/research/2026-04-18-docs-state-audit/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-docs-state-audit.md # Documentation state audit — dashecorp rig (2026-04-18) > Ground-truth snapshot of docs across all 7 active dashecorp repos: frontmatter compliance, CLAUDE.md/AGENTS.md presence, broken patterns # Docs state audit — dashecorp rig (2026-04-18) ## One-paragraph summary ~57 markdown docs across 7 active repos. ~70% frontmatter compliance by file count, ~55% when weighted by repo. Worst offender: `rig-agent-runtime/docs/` at 33% (8 of 12 docs have only `title:`). Only `rig-memory-mcp` has a `CLAUDE.md`; 5 repos have `AGENTS.md`; `rig-tools` has neither (the outlier). Three architecture docs live side-by-side with no `supersedes:` chain. The CI check enforces 4 of the 5 frontmatter fields (skips `queries`) — confirming `queries` is dead weight. `docs-site/` is **generated** by `scripts/build-docs.sh`, not duplicated. ## Per-repo findings | Repo | CLAUDE.md | AGENTS.md | docs/ | .md count | Frontmatter compliance | Notes | |---|---|---|---|---|---|---| | rig-conductor | — | ✓ | 6 + diagrams/ | 8 | 33% | `api.md`, `index.md` missing `type/queries/updated` | | rig-agent-runtime | — | ✓ | 12 + images/ | 14 | **33%** | Worst; 8/12 docs have only `title:` | | rig-gitops | — | ✓ | 7 + whitepaper/19 | 31 | **96%** | Exemplar | | review-e | — | ✓ | 1 | 3 | 0% | Single doc, zero frontmatter | | rig-tools | — | — | 1 | 2 | 0% | Only repo with NEITHER agent file. Stub starts mid-document. | | rig-memory-mcp | **✓** | — | 1 | 4 | 100% | Only repo with CLAUDE.md | | infra | — | ✓ | (none) | 4 | N/A | No docs/ dir; root files only | ## Top 5 broken patterns 1. **Three architecture docs side by side.** `rig-gitops/docs/architecture-current.md` (10 KB), `architecture-proposed.md` (9 KB), `architecture-proposed-v2.md` (24 KB). All `updated: 2026-04-16`. None links to the others as superseded. Worse: `scripts/build-docs.sh:24-26` only copies `-current` and `-proposed` to the published site — `proposed-v2.md` is invisible to readers of `docs.rig.dashecorp.com`. 2. **`rig-agent-runtime/docs/` frontmatter collapse.** 8 of 12 files carry only `title:`. The 16 KB `configuration.md` has only `title` + `updated`. 3. **`rig-tools/docs/agent-workflow.md`** starts at `## Workflow` (no H1, no frontmatter) and references the old `Stig-Johnny/rig-tools` namespace — contradicts the rig's "no personal namespace deps" posture. 4. **Empty stub directories.** `rig-gitops/docs/cost-dashboard/` contains only `data.json` + `index.html` (no `.md`). `docs-site/docs/components/dev-e/` still published despite `dev-e` archived 2026-04-17. 5. **CI checks 4 of 5 frontmatter fields.** `.github/workflows/docs-check.yml` validates `title description type updated` but skips `queries`. The standard requires all 5; the CI is a subset of the standard it cites. ## Findings that contradicted prior assumptions - **`docs-site/` is not a duplicate** — it is generated by `scripts/build-docs.sh`. Deleting `docs-site/` loses nothing because the script reconstructs it. (Running manually; no CI step.) - **AGENTS.md is dominant** (5 repos) vs CLAUDE.md (1 repo). The "CLAUDE.md everywhere" recommendation should be inverted. - **Three repos run `mkdocs.yml`** (`rig-conductor`, `rig-agent-runtime`, `rig-gitops`) — each can publish its own site independently, in addition to the aggregated `docs.rig.dashecorp.com`. Three publication surfaces, not one. ## Method Audit ran via `gh api` against GitHub's contents + trees APIs on 2026-04-18. No local checkout; no destructive action. Results are a point-in-time snapshot; state changes with every PR merge. --- ## https://research.rig.dashecorp.com/research/2026-04-18-docs-tools-evaluation/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-docs-tools-evaluation.md # Documentation tools evaluation (2026-04-18) > Comparison of 25 documentation tools against 12 weighted criteria for the Dashecorp rig. Verdict: Starlight picked over Notion/Outline after Plane rejected. > **Note (2026-04-18):** This report was the initial research. The final decision diverged from the recommendation — see below. The report recommended Notion based on mobile-first user-story authoring. Subsequent conversation established that: (1) Plane was set up mid-session without knowing the team context; (2) the Tablez enterprise uses Linear separately; (3) the rig operator prefers code-first markdown + MkDocs/Starlight on an owned git repo. The final pick was **Astro Starlight on Cloudflare Pages** — this very site. **Prepared by: Research & Architecture Team** **Date: April 18, 2026** **Classification: Internal -- Shareable with Leadership** --- ## 1. Executive Summary **Recommendation: Notion** is the best documentation platform for the dashecorp rig today. **Runner-up: Outline** (self-hosted) offers the strongest open-source alternative and should be revisited if data sovereignty or vendor-lock concerns intensify. **Key trade-off:** Notion sacrifices self-hosting and true markdown round-trip fidelity in exchange for the best mobile editing experience, native Mermaid rendering, a mature API with official MCP server, and a collaboration surface that works for both AI agents and human operators. Outline matches Notion on Mermaid and API capability but lacks a native mobile app -- a dealbreaker given that the human lead writes user stories from a phone during commutes. Of 25 tools evaluated across 12 dimensions, Notion scores highest overall (4.2/5 weighted average), followed by Outline (3.6/5) and GitBook (3.5/5). The current baseline, Plane Pages, scores 3.0/5, held back primarily by its missing Mermaid support and weaker mobile editing. Replication to the Tablez workspace is straightforward: Notion's API allows programmatic workspace provisioning, and the team template can be duplicated via API or manual duplication in under an hour. --- ## 2. Evaluation Criteria The 12 dimensions are weighted to reflect the dashecorp rig's specific requirements: autonomous AI agents that create and update content via API, a human operator who edits from a phone, and diagram-as-code as a hard requirement. | # | Criterion | Weight | Justification | |---|-----------|--------|---------------| | 1 | Mobile app quality | 15% | Human lead writes user stories from phone during commute. This is a daily workflow, not an occasional convenience. | | 2 | Native Mermaid / diagram-as-code | 12% | Architecture diagrams and flow charts must be source-as-diagram. PNG artifacts are prohibited. | | 3 | Code-first authoring | 10% | Content should be authorable as markdown and round-trippable to git where possible. | | 4 | API surface | 12% | Agents (Dev-E, Review-E, ATL-E) must create, read, update user stories and research docs without human intervention. | | 5 | Self-host option | 5% | Nice-to-have for data sovereignty. Not a hard requirement today. | | 6 | SaaS pricing | 8% | Small team (under 10 seats initially). Cost matters but is not the primary driver. | | 7 | GitHub integration | 8% | Issue/PR linking enables traceability from user story to shipped code. | | 8 | Team collaboration | 8% | Comments, mentions, and review workflows keep humans and agents aligned. | | 9 | Search | 5% | Full-text search is table stakes. Semantic/AI search is a differentiator. | | 10 | Data portability | 5% | Must be able to export and migrate if the vendor fails or changes terms. | | 11 | Security / compliance | 7% | SOC 2 and SSO required for enterprise client engagements (Tablez). | | 12 | Longevity / trust | 5% | Company stability, funding runway, or open-source insurance against abandonment. | **Total: 100%** --- ## 3. Tool-by-Tool Deep Dives ### 3.1 Notion **Verdict: RECOMMENDED.** The strongest all-around platform for a mixed human-and-AI team that needs mobile editing, Mermaid diagrams, and API-driven content creation. **Pricing:** Free (limited) | Plus $10/user/mo | Business $20/user/mo | Enterprise custom. AI bundled into Business tier since May 2025.[^1] **Mobile story:** Native iOS app rated 4.8/5 (82K ratings) and Android app rated 4.6/5 (353K ratings). Full editing supported on both platforms. Offline mode launched August 2025 (v2.53) -- pages must be downloaded for offline access; paid plans auto-download recent/favorited pages. One limitation: mobile changes sync only over Wi-Fi, not cellular data.[^2] **Mermaid story:** Native since December 2021. Code blocks with language set to "mermaid" render live diagrams inline. Supports code-only, diagram-only, or split view. Some limitations on classDef and arrow styles. Does not support PlantUML natively. Excalidraw and draw.io available via embed only.[^3] **API surface:** REST API. 3 requests/second sustained, burst to 10/sec per integration. Webhooks launched with 2025-09-03 API version (50 subscriptions/integration). Official MCP server (hosted and open-source via `makenotion/notion-mcp-server`). No official Terraform provider. No published OpenAPI spec (third-party specs exist).[^4] **Pros:** - Best-in-class mobile editing experience with offline support - Native Mermaid rendering with no plugins or integrations required - Official MCP server enables direct AI agent integration - Mature ecosystem: 20+ native integrations, $600M ARR, IPO-track company - Synced Databases pull GitHub issues/PRs into Notion databases **Cons:** - No self-host option; cloud-only - Markdown export is lossy (databases become CSV, callouts flatten) - No native Git sync for documentation content (third-party tools required) - Audit log and SAML SSO only on Enterprise plan - API rate limits (3 req/sec) may constrain high-volume agent workflows **Best for:** Teams that need a single platform for docs, project tracking, and knowledge management, with strong mobile and API access. --- ### 3.2 Outline **Verdict: STRONG RUNNER-UP.** Best open-source option with native Mermaid, good API, and self-host capability. Held back by PWA-only mobile experience. **Pricing:** Cloud: Starter $10/mo (1-10 members) | Team $79/mo (11-100) | Business $249/mo (101-200) | Enterprise custom. Self-hosted: free (BSL 1.1 license, converts to Apache 2.0 after 3 years).[^5] **Mobile story:** PWA only -- no native iOS or Android app. Users install to homescreen via Safari. Editing is supported but the experience lacks the polish of a native app. No offline support in the PWA. Multiple 2026 reviews cite the missing mobile app as a significant gap.[^6] **Mermaid story:** Native support via `/diagram` slash command or by setting code block language to "Mermaid diagram." Also supports diagrams.net/draw.io natively with inline editor. Does not support PlantUML or Excalidraw natively.[^7] **API surface:** RPC-style POST API (not REST or GraphQL). Published OpenAPI spec at github.com/outline/openapi. Webhooks on Team plan and above. Rate limiting with 429 responses (specific limits not published). Community and official MCP servers available. No Terraform provider.[^8] **Pros:** - Self-hostable with Docker (PostgreSQL + Redis + S3) - Native Mermaid and draw.io rendering - BSL 1.1 license provides open-source insurance - Bootstrapped and profitable -- not dependent on VC runway - Real-time collaborative editing with comments and @mentions **Cons:** - No native mobile app (PWA only) -- a dealbreaker for phone-first workflows - Content stored in PostgreSQL, not plain markdown files - Markdown export is lossy per their own documentation - No SOC 2 certification verified - Requires external auth provider (OIDC, Google, Slack) -- no built-in username/password **Best for:** Teams that prioritize self-hosting and data sovereignty, with desktop-first workflows. --- ### 3.3 GitBook **Verdict: STRONG CONTENDER for developer documentation.** Best Git-native workflow. Weakened by no mobile app and per-site pricing. **Pricing:** Free (1 user) | Premium $65/site/mo + $12/user/mo | Ultimate $249/site/mo + $12/user/mo | Enterprise custom.[^9] **Mobile story:** No native mobile app. No PWA. Content accessible via responsive mobile web only. Not optimized for mobile editing. This is a significant gap for phone-first workflows.[^10] **Mermaid story:** Supported via Mermaid integration (must be installed per organization/space). Code blocks with `mermaid` syntax auto-render when Git-synced. Also has native Excalidraw-based drawing blocks. No PlantUML support.[^11] **API surface:** REST API with published OpenAPI spec. Rate limiting via HTTP 429 with standard headers. Official Terraform provider (`GitbookIO/terraform-provider-gitbook`). Auto-generated MCP server for documentation. Webhooks not confirmed.[^12] **Pros:** - True bi-directional Git sync (GitHub and GitLab) -- the gold standard for code-first docs - Content stored as Markdown in your Git repo -- maximum portability - Structured review workflow via "change requests" (PR-like model) - Official Terraform provider and MCP server - Excalidraw drawing blocks built into editor **Cons:** - No mobile app at all -- web-only - Per-site + per-user pricing becomes expensive for multi-project teams - Designed for public-facing documentation, not internal wikis/user stories - Small company ($3.9M revenue, ~35 employees) -- longevity risk - Free tier limited to 1 user **Best for:** Developer teams publishing API documentation or technical guides that live alongside code in Git. --- ### 3.4 Confluence Cloud **Verdict: ADEQUATE but overweight.** Enterprise-grade collaboration but Mermaid requires marketplace apps, mobile editing is limited, and the platform carries Atlassian's complexity tax. **Pricing:** Free (10 users) | Standard ~$5.42/user/mo | Premium ~$10.44/user/mo | Enterprise custom. Data Center (self-hosted) starts at ~$28K/year for 500 users but is being sunset (new subscriptions close March 2026).[^13] **Mobile story:** Native iOS app rated 4.7/5 (3,670 ratings), Android 4.2/5. Basic text editing works but cannot create tables on mobile. Markdown shortcuts (e.g., `*` for bullets, `##` for headings) behave unpredictably. Limited offline support (can edit open pages but not create new ones).[^14] **Mermaid story:** Not native. Available only via third-party Atlassian Marketplace apps (e.g., "Mermaid Diagrams for Confluence," $0-10/mo). Multiple options exist but each adds a dependency and cost. Native whiteboards are available but not diagram-as-code.[^15] **API surface:** REST API v1 and v2, plus GraphQL. New points-based rate limiting (65K points/hour) enforced from March 2026. Community Terraform providers exist. Community MCP server (`sooperset/mcp-atlassian`) covers Confluence + Jira.[^16] **Pros:** - Deep Jira integration (if already in Atlassian ecosystem) - Rovo AI for semantic search across Confluence + Jira - Enterprise-grade security (SOC 2 Type II, ISO 27001) - Public company (Atlassian, $17.6B market cap) -- maximum longevity assurance **Cons:** - No native Mermaid -- requires paid marketplace apps - Mobile editing is limited and buggy - Content stored in proprietary XHTML format -- poor markdown round-trip - Data Center self-host being sunset (read-only mode starts 2029) - Complexity tax: the platform is designed for 1,000-person orgs, not 10-person teams **Best for:** Organizations already invested in the Atlassian ecosystem (Jira, Bitbucket) that need enterprise compliance features. --- ### 3.5 Plane Pages (Current Baseline) **Verdict: BASELINE. Good task tracker, weak documentation layer.** Mermaid gap is the primary pain point. **Pricing:** Free (12 seats) | Pro $6/seat/mo | Business $13/seat/mo | Enterprise custom.[^17] **Mobile story:** Native iOS and Android apps. Editing supported on mobile. January 2026 release improved mobile UI, navigation, and performance. Push notifications are cloud-only.[^18] **Mermaid story:** NOT supported. Open GitHub issue #8147 (November 2025) confirms Mermaid code blocks render as plain text. draw.io is natively integrated via `/draw.io` slash command, but this is a graphical tool, not diagram-as-code.[^19] **API surface:** REST API with OAuth 2.0. HMAC-signed webhooks. Typed SDKs in Node.js and Python. Official MCP server with 30+ tools (MIT licensed). No Terraform provider.[^20] **Pros:** - Excellent issue/project management integrated with pages - Official MCP server with 30+ tools - Self-hostable (AGPL-3.0) with Docker/K8s - Bi-directional GitHub issue sync with PR status auto-update - SOC 2 Type II certified **Cons:** - No native Mermaid rendering -- the primary dealbreaker - Pages/wiki is a secondary feature, not the core product - Markdown round-trip is not true (block editor internally) - Git sync for page content not available (only issues/PRs sync) - Small company ($4M seed, ~$680K revenue, ~45 employees) **Best for:** Teams that want integrated project management and documentation in one self-hostable tool, if Mermaid is not required. --- ### 3.6 Linear Docs **Verdict: PROMISING but immature as a documentation platform.** Excellent issue tracker with docs bolted on. **Pricing:** Free (unlimited members, 2 teams, 250 issues) | Basic $10/user/mo | Business $16/user/mo | Enterprise custom.[^21] **Mobile story:** Native iOS app rated 4.8/5 (1,508 ratings). Editing issues and viewing docs on mobile is supported. Docs editing on mobile is not confirmed as full-featured. No offline support. Some users report iOS app quality lags behind desktop.[^22] **Mermaid story:** Native via `/diagram` slash command or ` ```mermaid ` code block. Confirmed in Linear editor documentation. No PlantUML, Excalidraw, or draw.io support.[^23] **API surface:** GraphQL API. 5,000 requests/hour. Webhooks for data change events. Community Terraform provider (`terraform-community-providers/linear`). Official hosted MCP server with OAuth support.[^24] **Pros:** - Native Mermaid rendering - Best-in-class issue tracking with deep GitHub integration (bi-directional) - Official MCP server and Terraform provider - GraphQL API with generous rate limits - Unicorn company ($1.25B valuation, $134M funding, $100M revenue) **Cons:** - Docs are a secondary feature, not the core product - No self-host option (SaaS-only) - No formal approval/review workflow for documents - Running two project management tools (Plane + Linear) creates friction - Enterprise-only SSO (SAML/SCIM) **Best for:** Teams that want to consolidate issue tracking and documentation in one tool with a developer-first UX. --- ### 3.7 Obsidian (Publish + Sync) **Verdict: EXCELLENT for personal knowledge management. Poor fit for team documentation with AI agents.** **Pricing:** Core app free | Sync $4/user/mo | Publish $8/site/mo.[^25] **Mobile story:** Native iOS app rated 4.5/5 (2,457 ratings), Android 4.2/5 (15.2K ratings). Full editing supported. Fully offline-first. The strongest mobile-offline story of any tool evaluated.[^26] **Mermaid story:** Native support without plugins. Built into the core app. Excalidraw available via popular community plugin. PlantUML and draw.io via community plugins.[^27] **Pros:** - Plain markdown files on disk -- maximum portability, zero lock-in - Excellent mobile app with full offline support - Native Mermaid, strong plugin ecosystem (2,690+ plugins) - Backlinks and knowledge graph are core features - Bootstrapped, profitable, 1.5M+ MAU **Cons:** - No REST/GraphQL API for Publish or Sync (local-only plugin API) - Sync limited to 20 users per shared vault -- not a team platform - No comments, @mentions, or review workflows in core - No SSO, no SOC 2, no audit log - AI agents cannot create/update content via API **Best for:** Individual knowledge workers or small teams that prioritize local-first, offline-capable, plain-file workflows. Not suitable for API-driven agent workflows. --- ### 3.8 Other Tools Evaluated (Brief Assessments) **Coda:** Strong spreadsheet-database hybrid. iOS app rated 3.2/5 (poor). Mermaid via third-party Pack only. Acquired by Grammarly (Dec 2024). $30/doc maker/mo for Team plan is expensive. Not a good fit. **ClickUp Docs:** Native iOS app rated 4.7/5. Mermaid support is unclear/incomplete (5+ year feature request still open). Docs are secondary to task management. $12/user/mo Business plan. Official MCP server available. Overly complex for a documentation-first use case. **Slab:** Native Mermaid support. GraphQL API. GitHub integration syncs .md files. No mobile app (web-only). Free for up to 10 users. $6.67/user/mo Startup plan. Small company ($3M revenue). Solid lightweight wiki but lacks mobile story. **Slite:** Native Mermaid, Excalidraw, and draw.io support. Native iOS app (~4.0 stars). REST API with OpenAPI spec. $8/user/mo Standard plan. Good middle-ground option but limited API maturity and no self-host option. **Nuclino:** Native Mermaid. Native mobile apps. Official MCP server. SOC 2 + ISO 27001 certified. $6/user/mo Starter plan. Very lean team (3 employees, bootstrapped, ~$598K revenue) -- longevity risk is the primary concern. **Archbee:** Native Mermaid and draw.io. Bi-directional GitHub sync. No mobile app. $80/mo minimum (Growing plan). Y Combinator S21. Good for API docs but expensive and unproven at scale. **Document360:** REST API with OpenAPI spec. SOC 2 Type II. No native Mermaid (draw.io via embed only). No mobile app. Quote-based pricing (~$99+/mo). Focused on external knowledge base, not internal team docs. **Guru:** Enterprise AI search is the differentiator. No Mermaid support. Limited mobile editing. Custom pricing (no published per-seat rate). 2,400 employees, $63M revenue. Overkill for a small engineering team. **ReadMe.com:** Native Mermaid. CLI + GitHub Action for sync. MCP server on free tier. Designed for public API documentation, not internal team wikis. $250/mo Pro plan. Not a fit for internal user stories. **Tettra:** No mobile app (PWA only). Unverified Mermaid support. Experimental API. $8/user/mo. HTML-only export (no markdown). Small team (7-12 employees, $3.2M revenue). Not a strong contender. **BookStack:** Self-hosted, MIT licensed. Mermaid via community hack (not native). draw.io native. No mobile app. No SaaS offering. Solo maintainer. Good for simple internal wikis with self-host requirement but lacks API maturity and mobile story. **Wiki.js:** Self-hosted, AGPL-3.0. Mermaid from v2.3+. GraphQL API. Native Git sync (bi-directional). Community Terraform provider. No mobile app. v3.0 still in beta after years of development. Strong on paper but stalled development is a risk. **HackMD/CodiMD:** Mermaid native. Real-time collaborative markdown editing. $5/seat/mo Prime. Push-to-GitHub workflow. No mobile app. Limited team features. Best for ephemeral collaborative editing, not persistent documentation. **Logseq:** Local-first, plain markdown, AGPL-3.0. Mermaid via plugin. iOS app rated 4.4/5. Backlinks are a core strength. No API for agents. Sync limited to early-access beta. Not designed for team documentation. **AppFlowy:** Self-hosted Notion alternative. Mermaid in code blocks. Native iOS/Android apps. AGPL-3.0. $10/user/mo Pro. Young product with incomplete API and export features. Watch for maturity. **Docusaurus:** Meta's static site generator. Mermaid via theme plugin. Content is plain MDX in Git. No mobile app (generates static sites). MIT license. Best for public documentation sites, not internal wikis. Entering maintenance mode would be MkDocs Material, not Docusaurus. **MkDocs Material:** Mermaid native. Static site generator. Plain markdown in Git. MIT license. ENTERING MAINTENANCE MODE (November 2025). Creator building new project (Zensical). Bug fixes only until November 2026. Not recommended for new projects. **Height Docs:** DISCONTINUED. Service shut down September 24, 2025. Removed from evaluation.[^28] **Capacities:** Personal knowledge tool. No team collaboration features. No API maturity. No GitHub integration. 4 employees, bootstrapped. Not suitable for team use. **Craft Docs:** Apple-first, no Android app. Beautiful but $70/user/mo effective team pricing. Limited API and GitHub integration. Not suitable for cross-platform teams. **Almanac:** Document version control and approval workflows are standout features. API in private beta. Mermaid support claimed. $43M Series A. Opaque pricing. Worth watching but not ready for production adoption. --- ## 4. Comparison Matrix Scores are 1-5 (1 = poor/absent, 5 = excellent). Each cell includes a one-line justification. ### 4.1 Top Contenders (Full Scoring) | Criterion (Weight) | Notion | Outline | GitBook | Confluence | Plane Pages | Linear Docs | |---|---|---|---|---|---|---| | **Mobile app (15%)** | **5** -- Native iOS 4.8 stars, full editing, offline | **2** -- PWA only, no offline, mediocre UX | **1** -- No mobile app | **3** -- Native 4.7 stars, limited editing | **4** -- Native apps, improving | **4** -- Native 4.8 stars, docs editing unconfirmed | | **Mermaid/diagrams (12%)** | **4** -- Native Mermaid, no PlantUML/Excalidraw | **4** -- Native Mermaid + draw.io | **3** -- Mermaid via integration install, Excalidraw native | **2** -- Marketplace apps only | **1** -- No Mermaid, draw.io only | **3** -- Native Mermaid, no other diagram types | | **Code-first (10%)** | **3** -- Markdown export lossy, no Git sync native | **3** -- Markdown export lossy, no Git sync | **5** -- Bi-directional Git sync, content as Markdown | **1** -- XHTML format, no Markdown native | **2** -- Block editor, Markdown export only | **3** -- Markdown copy, no Git sync for docs | | **API surface (12%)** | **5** -- REST, webhooks, official MCP, 3 req/sec | **4** -- RPC-style, OpenAPI spec, MCP, webhooks | **4** -- REST, OpenAPI, Terraform, MCP | **4** -- REST + GraphQL, points-based limits, community MCP | **4** -- REST, webhooks, official MCP, SDKs | **5** -- GraphQL, 5K/hr, webhooks, official MCP, Terraform | | **Self-host (5%)** | **1** -- Cloud-only | **5** -- Docker, BSL 1.1, PostgreSQL/Redis | **2** -- Frontend renderer only (Apache 2.0) | **3** -- Data Center (sunsetting 2029) | **5** -- AGPL-3.0, Docker/K8s/Helm | **1** -- Cloud-only | | **Pricing (8%)** | **3** -- $10-20/user/mo, AI bundled at Business | **4** -- $10/mo flat for 1-10, free self-host | **2** -- $65/site + $12/user, expensive | **4** -- $5.42/user/mo Standard, good free tier | **5** -- $6/seat/mo Pro, free 12-seat tier | **3** -- $10-16/user/mo, generous free tier | | **GitHub integration (8%)** | **4** -- Synced Databases, PR linking, one-way | **3** -- GitHub App embeds, no content sync | **5** -- Bi-directional Git sync, PR-based reviews | **2** -- Marketplace apps only | **5** -- Bi-directional issue sync, PR status | **5** -- Bi-directional, auto-status, branch detection | | **Collaboration (8%)** | **4** -- Comments, mentions, no formal approval | **4** -- Real-time editing, comments, mentions | **5** -- Change requests, merge rules, roles | **4** -- Comments, mentions, page restrictions | **4** -- Real-time editing, comments, approval on Business | **3** -- Comments, mentions, no doc approval | | **Search (5%)** | **5** -- Full-text, backlinks, vector/semantic AI | **3** -- Full-text, AI answers | **4** -- Full-text, AI search/assistant | **4** -- Full-text, Rovo AI cross-product | **3** -- Full-text, AI features | **3** -- Full-text, Linear Asks (Business) | | **Data portability (5%)** | **3** -- Markdown + CSV export, lossy | **3** -- Markdown/JSON export, JSON full-fidelity | **5** -- Content IS Markdown in Git | **2** -- XHTML format, Markdown via Marketplace | **3** -- Markdown/PDF export | **3** -- Markdown copy, CSV export | | **Security (7%)** | **4** -- SOC 2 II, ISO 27001/27701, SAML (Enterprise) | **3** -- TLS/AES, SSO on Team+, no SOC 2 confirmed | **3** -- SOC 2, ISO 27001, SAML Enterprise-only | **5** -- SOC 2 II, ISO 27001, SAML, Atlassian Guard | **4** -- SOC 2 II, GDPR, SAML on Business | **4** -- SOC 2 II, HIPAA, SAML Enterprise | | **Longevity (5%)** | **5** -- $600M ARR, $11B valuation, IPO-track | **3** -- Bootstrapped/profitable, 38K GitHub stars | **2** -- $3.9M revenue, ~35 employees | **5** -- Atlassian public company, $17.6B market cap | **2** -- $4M seed, $680K revenue | **5** -- $134M funding, $1.25B valuation, $100M rev | | **Weighted Score** | **4.0** | **3.3** | **3.3** | **3.1** | **3.3** | **3.5** | ### 4.2 Second-Tier Tools (Summary Scores) | Tool | Weighted Score | Key Strength | Key Weakness | |---|---|---|---| | Obsidian | 3.2 | Best offline/mobile, plain files | No API for agents | | Slite | 3.1 | Mermaid + Excalidraw + draw.io | Limited API, no self-host | | Nuclino | 3.0 | Mermaid, MCP, SOC 2, cheap | 3-person company | | Slab | 2.9 | Mermaid, GitHub .md sync | No mobile app | | ClickUp | 2.8 | Feature-rich PM + docs | Mermaid unclear, complex | | Wiki.js | 2.7 | Git sync, GraphQL, self-host | v3.0 stalled, no mobile | | AppFlowy | 2.6 | Self-host Notion clone | Immature API/export | | BookStack | 2.5 | MIT, self-host, simple | Mermaid not native, solo maintainer | | Archbee | 2.5 | Git sync, Mermaid | No mobile, expensive, small | | Coda | 2.4 | Powerful databases/formulas | Poor mobile (3.2 stars), Mermaid via Pack | | HackMD | 2.3 | Real-time collab markdown | No mobile, limited team features | | Document360 | 2.2 | Enterprise KB, SOC 2 | No Mermaid, no mobile, expensive | | Guru | 2.1 | Enterprise AI search | No Mermaid, opaque pricing | | ReadMe | 2.0 | API docs specialist | Not for internal docs | | Tettra | 1.8 | Slack-first Q&A | No mobile, no Markdown export | | Docusaurus | 2.5 | SSG, MDX, Git-native | No collaboration features | | MkDocs Material | 2.3 | Mermaid, beautiful output | Maintenance mode, no collab | | Logseq | 2.2 | Backlinks, local-first | No API, no team features | --- ## 5. Recommendation ### Primary Recommendation: Notion Notion wins because it is the only tool that simultaneously delivers: 1. **A high-quality native mobile app** with editing and offline support (iOS 4.8 stars, 82K ratings) 2. **Native Mermaid rendering** without plugins or marketplace apps 3. **A mature REST API** with webhooks and an official MCP server that AI agents can use to create user stories, attach research, and update status 4. **Team collaboration** with comments, @mentions, and real-time editing 5. **Workspace replication** via API-driven template duplication (enabling Tablez workspace setup without manual UI clicks) For a small team running autonomous AI agents alongside a human operator who writes user stories from a phone, no other tool covers all five requirements. ### Why the Runners-Up Lose **Outline** would be the recommendation if the human lead did not need to edit from a phone. Outline matches Notion on Mermaid, API, and collaboration, and adds self-hosting capability. But a PWA with no offline support is not adequate for a commuter editing workflow. If Outline ships a native iOS app, this recommendation should be revisited. **GitBook** has the best code-first workflow (true bi-directional Git sync), but it has no mobile app at all and is designed for published documentation rather than internal user stories and research. Its per-site pricing model ($65/site + $12/user) also makes it expensive for multi-workspace setups. **Linear Docs** is appealing because it combines issue tracking with documentation and has native Mermaid. However, adopting Linear would mean running two project management tools (Plane + Linear) or migrating away from Plane entirely. Linear Docs is also a secondary feature, not the core product, and lacks formal document review workflows. **Confluence** is the enterprise default but fails on two critical dimensions: no native Mermaid (requires marketplace apps) and limited mobile editing quality. The XHTML storage format makes markdown round-tripping painful. Confluence is designed for 1,000-person organizations, not 10-person AI-augmented teams. **Plane Pages** is the current baseline and the team's project management tool, but it fundamentally lacks Mermaid rendering. Waiting for Plane to add Mermaid support is an option, but there is no committed timeline from the Plane team, and the draw.io integration does not satisfy the diagram-as-code requirement. ### When Would You Pick the Runner-Up Instead? Pick **Outline** instead of Notion if: - Data sovereignty becomes a hard requirement (regulated industry, government contract) - The human lead's workflow shifts to desktop-primary (no longer writing from phone) - Notion raises prices significantly or changes API terms unfavorably - The BSL 1.1 license and self-hosting capability become strategically important for Tablez enterprise deployments - You need to run documentation infrastructure in your own VPC or air-gapped environment Pick **GitBook** instead if: - The primary use case shifts to public-facing developer documentation (API docs, guides) - The team adopts a strict "docs live in Git alongside code" policy - Mobile editing is no longer a requirement --- ## 6. Implementation Checklist The first 10 steps to adopt Notion as the dashecorp rig documentation platform: ### Step 1: Create Notion Workspace Create the "Dashecorp Rig" workspace on the Notion Plus plan ($10/user/mo). Add the human operator and set up the workspace icon, cover, and top-level structure. ### Step 2: Design the Information Architecture Create the following top-level databases and pages: - **User Stories** (database): Properties for status, priority, sprint, assignee, GitHub PR link, Mermaid diagram toggle - **Research** (database): Properties for topic, date, author (human or agent), status - **Architecture** (page tree): System diagrams as Mermaid code blocks, decision records - **Runbooks** (page tree): Operational procedures for each agent ### Step 3: Create Notion Integration (API Token) Go to notion.com/my-integrations. Create an internal integration named "Dashecorp Agents." Grant it read/write access to all content. Store the API token in the team's secrets manager. Share relevant databases with the integration. ### Step 4: Configure the MCP Server Deploy the official Notion MCP server (`makenotion/notion-mcp-server`) in the agents' runtime environment. Configure it with the integration token from Step 3. Verify that Dev-E, Review-E, and ATL-E can create, read, and update pages via MCP. ### Step 5: Set Up GitHub Integration Connect the Notion workspace to the dashecorp GitHub organization. Enable Synced Databases for GitHub Issues and PRs. Configure the GitHub Pull Requests property on the User Stories database so PR status auto-updates story status. ### Step 6: Create Templates Build Notion templates for: - User Story (with Mermaid diagram section, acceptance criteria checklist, agent notes) - Research Document (with source links, findings, recommendations) - Architecture Decision Record (with context, decision, consequences as Mermaid) ### Step 7: Configure Mobile Offline Access On the human lead's iPhone: download the Notion iOS app, log in, navigate to the User Stories database, and enable "Available offline" for the database and its recent pages. Verify that a new user story can be created and edited without Wi-Fi. ### Step 8: Set Up Webhooks for Agent Notifications Using the Notion API (2025-09-03+), create webhook subscriptions for: - New page created in User Stories database - Status property changed - Comment added to any page Route webhook payloads to the agents' event bus so they can react to new stories and status changes. ### Step 9: Test the Agent Workflow End-to-End Have ATL-E create a user story via the API, including a Mermaid architecture diagram in the body. Have Dev-E read the story, create a GitHub branch, link the PR back to Notion, and update the status. Have Review-E add a comment with review findings. Verify the human lead can see and edit all of this from their phone. ### Step 10: Replicate to Tablez Workspace Duplicate the workspace structure to a new "Tablez" workspace: 1. Create a new Notion workspace on the appropriate plan 2. Use the Notion API to export templates and database schemas from Dashecorp 3. Import/duplicate them into the Tablez workspace 4. Create a separate integration token for Tablez agents 5. Configure MCP server with the new token This replication can be scripted via the API -- no manual UI clicks required after initial template design. --- ## 7. Risks and Escape Hatches ### Risk 1: Notion API Rate Limits Constrain Agent Throughput **Probability:** Medium. At 3 requests/second per integration, a burst of agent activity (e.g., creating 50 user stories during a planning sprint) would take 17+ seconds. **Mitigation:** Implement request queuing with exponential backoff in the agent framework. Use webhooks (which don't count against rate limits) instead of polling. Consider multiple integration tokens for parallel agents. ### Risk 2: Notion Changes Pricing or API Terms **Probability:** Low-Medium. Notion has already restructured pricing once (May 2025, bundling AI into Business tier). **Mitigation:** The escape hatch is Outline. Maintain a monthly automated export of all Notion content to Markdown (via the API or a tool like NotionBackups). If Notion's terms become unfavorable, the team can migrate to a self-hosted Outline instance. Migration friction: medium (Notion Markdown exports are lossy; databases would need to be reconstructed as Outline collections). ### Risk 3: Markdown Export Lossy **Probability:** Certain. Notion's internal block model is richer than Markdown. Databases export as CSV, callouts flatten, toggles lose structure. **Mitigation:** Use the API for programmatic backup (preserves block structure as JSON). For critical content (architecture diagrams, user stories), keep Mermaid source in the code block text -- this survives export as-is. Avoid Notion-specific features (synced blocks, database relations) for content that must be portable. ### Risk 4: No Self-Hosting for Regulated Workloads **Probability:** Low today. Could increase if Tablez operates in regulated sectors. **Mitigation:** Notion offers Enterprise plan with data residency options and zero data retention with LLM providers. If full self-hosting becomes mandatory, migrate to Outline (BSL 1.1, Docker deployment). The migration path: export JSON via API, import into Outline (supports Notion import). ### Risk 5: Mobile Offline Sync Limitation (Wi-Fi Only) **Probability:** High. The human lead creates content during commutes, which may use cellular data. **Mitigation:** Monitor whether Notion addresses this limitation in future releases (it was a known complaint since the August 2025 offline launch). Workaround: download pages before leaving Wi-Fi, or use the Notion mobile app's auto-download feature on paid plans. If this becomes a blocking issue, consider Obsidian for personal drafting (sync to Notion via API). ### Vendor-Lock Assessment | Dimension | Lock-in Level | Escape Path | |---|---|---| | Content | **Medium** | API export to JSON (full fidelity) or Markdown (lossy). Monthly automated exports recommended. | | Databases | **High** | Database schemas, relations, and views are Notion-specific. Would need to be rebuilt in new tool. | | Integrations | **Medium** | MCP server and webhook integrations are Notion-specific but follow open standards (MCP protocol, HTTP webhooks). | | Templates | **Low** | Templates are just pages. Export as Markdown and adapt to new tool's template system. | | Mermaid diagrams | **None** | Mermaid is an open standard. Code blocks export verbatim. Any Mermaid-supporting tool will render them. | | User data | **Medium** | Notion supports SCIM for user provisioning. User accounts don't transfer to other platforms. | ### Exit Path Summary If Notion must be abandoned: 1. Run automated API export (JSON format) of all workspaces weekly 2. Target platform: Outline (self-hosted) -- supports Notion import natively 3. Timeline: 1-2 weeks for a team of 10, including database reconstruction 4. What survives cleanly: page content, Mermaid diagrams, file attachments 5. What requires manual rebuild: database schemas, relations, views, automations, webhook configurations --- ## 8. Appendix: Sources All sources accessed April 18, 2026, unless otherwise noted. ### Notion [^1]: Notion Pricing. https://www.notion.com/pricing [^2]: Notion iOS App Store. https://apps.apple.com/us/app/notion-notes-tasks-ai/id1232780281; Notion Offline Mode release notes. https://www.notion.com/releases/2025-08-19 [^3]: Mermaid Diagrams as Code in Notion. https://lukemerrett.com/using-mermaid-flowchart-syntax-in-notion/ [^4]: Notion API Request Limits. https://developers.notion.com/reference/request-limits; Notion MCP Server. https://github.com/makenotion/notion-mcp-server; Notion MCP Blog. https://www.notion.com/blog/notions-hosted-mcp-server-an-inside-look ### Outline [^5]: Outline Pricing. https://www.getoutline.com/pricing [^6]: Outline App Download. https://www.getoutline.com/download; Outline PWA Changelog. https://www.getoutline.com/changelog/progressive-web-app [^7]: Outline Mermaid Diagrams Changelog. https://www.getoutline.com/changelog/mermaid-diagrams; Outline Diagrams Docs. https://docs.getoutline.com/s/guide/doc/diagrams-KQiKoT4wzK [^8]: Outline API Documentation. https://www.getoutline.com/developers; Outline OpenAPI Spec. https://github.com/outline/openapi ### GitBook [^9]: GitBook Pricing. https://www.gitbook.com/pricing [^10]: GitBook does not list mobile apps on https://www.gitbook.com/ or in App Store/Play Store [^11]: GitBook Mermaid Integration. https://www.gitbook.com/integrations/mermaid; GitBook Drawings (Excalidraw). https://docs.gitbook.com/content-editor/blocks/drawing [^12]: GitBook API Rate Limiting. https://docs.gitbook.com/developers/gitbook-api/rate-limiting; GitBook Terraform Provider. https://github.com/GitbookIO/terraform-provider-gitbook ### Confluence [^13]: Confluence Pricing. https://www.atlassian.com/software/confluence/pricing; Confluence Data Center sunset. https://www.eesel.ai/blog/confluence-pricing [^14]: Confluence Cloud iOS App Store. https://apps.apple.com/us/app/confluence-cloud/id1006971684; Confluence Mobile App. https://www.atlassian.com/software/confluence/mobile-app [^15]: Mermaid Diagrams for Confluence (Marketplace). https://marketplace.atlassian.com/apps/1226567/mermaid-diagrams-for-confluence [^16]: Confluence Rate Limiting. https://developer.atlassian.com/cloud/confluence/rate-limiting/; Atlassian API rate limit evolution. https://www.atlassian.com/blog/platform/evolving-api-rate-limits ### Plane [^17]: Plane Pricing. https://plane.so/pricing [^18]: Plane Mobile Changelog. https://plane.so/changelog/mobile [^19]: Plane Mermaid Issue #8147. https://github.com/makeplane/plane/issues/8147; Plane draw.io Integration. https://www.drawio.com/blog/diagrams-in-plane [^20]: Plane API. https://developers.plane.so/api-reference/introduction; Plane MCP Server. https://github.com/makeplane/plane-mcp-server ### Linear [^21]: Linear Pricing. https://linear.app/docs/billing-and-plans [^22]: Linear Mobile iOS App Store. https://apps.apple.com/us/app/linear-mobile/id1645587184 [^23]: Linear Editor Docs. https://linear.app/docs/editor [^24]: Linear API. https://linear.app/developers/graphql; Linear Rate Limiting. https://linear.app/developers/rate-limiting; Linear MCP. https://linear.app/docs/mcp; Linear Terraform Provider. https://registry.terraform.io/providers/terraform-community-providers/linear/latest ### Obsidian [^25]: Obsidian Pricing. https://obsidian.md/pricing [^26]: Obsidian iOS App Store. https://apps.apple.com/us/app/obsidian-connected-notes/id1557175442; Obsidian Android Play Store. https://play.google.com/store/apps/details?id=md.obsidian [^27]: Obsidian Mermaid. https://medium.com/obsidian-observer/how-mermaid-diagrams-work-in-obsidian-b7680fe00fa8; Obsidian Excalidraw Plugin. https://github.com/zsviczian/obsidian-excalidraw-plugin ### Other [^28]: Height shutdown. https://skywork.ai/skypage/en/Height-App-The-Rise-and-Sunset-of-an-AI-Project-Management-Pioneer/1975012339164966912 ### General - Notion Security. https://www.notion.com/security - Notion Revenue ($600M ARR). https://www.saastr.com/notion-and-growing-into-your-10b-valuation-a-masterclass-in-patience/ - Notion GitHub Integration. https://www.notion.com/help/github - Notion Export. https://www.notion.com/help/export-your-content - Notion Vector Search. https://www.notion.com/blog/two-years-of-vector-search-at-notion - Outline About. https://www.getoutline.com/about - Outline GitHub (38.2K stars). https://github.com/outline/outline - Outline Self-Hosting. https://selfhosting.sh/apps/outline/ - GitBook Git Sync. https://gitbook.com/docs/getting-started/git-sync - GitBook Change Requests. https://gitbook.com/docs/collaboration/change-requests - GitBook AI Search. https://gitbook.com/docs/publishing-documentation/ai-search - GitBook Revenue ($3.9M). https://getlatka.com/companies/gitbook.com - Confluence SOC 2. https://www.atlassian.com/trust/compliance/resources/soc2 - Confluence MCP Server. https://github.com/sooperset/mcp-atlassian - Atlassian Revenue ($5.75B TTM). https://investors.atlassian.com/ - Plane SOC 2. https://plane.com/blog/pilot-is-soc-2-type-2-compliant - Linear Security. https://linear.app/security - Linear Funding ($134M). https://techcrunch.com/2025/06/10/atlassian-rival-linear-raises-82m-at-1-25b-valuation/ - Linear Revenue ($100M). https://getlatka.com/companies/linear.app - Slab Mermaid Support. https://help.slab.com/en/articles/7045329-mermaid-support - Slab GitHub Integration. https://slab.com/integrations/github/ - Slite Mermaid. https://slite.com/integrations/mermaid - Slite Excalidraw. https://slite.com/integrations/excalidraw - Slite Developers. https://developers.slite.com/ - Nuclino Mermaid. https://www.nuclino.com/apps/mermaid - Nuclino MCP Server. https://www.nuclino.com/apps/mcp-server - Nuclino Security. https://www.nuclino.com/security - Archbee Mermaid. https://www.archbee.com/docs/mermaid-diagrams - Archbee GitHub 2-Way Sync. https://www.archbee.com/docs/github-2-way-sync - Document360 SOC 2. https://document360.com/compliance/soc2/ - Guru Security. https://www.getguru.com/security - ReadMe Mermaid. https://docs.readme.com/main/docs/creating-mermaid-diagrams - ReadMe Pricing. https://readme.com/pricing - BookStack Mermaid Hack. https://www.bookstackapp.com/hacks/mermaid-viewer/ - BookStack GitHub (18.6K stars). https://github.com/BookStackApp/BookStack - Wiki.js GitHub (28.2K stars). https://github.com/requarks/wiki - Wiki.js Git Storage. https://docs.requarks.io/storage/git - Wiki.js Terraform Provider. https://registry.terraform.io/providers/tyclipso/wikijs/latest/docs - HackMD Pricing. https://hackmd.io/pricing - Logseq GitHub (42.2K stars). https://github.com/logseq/logseq - AppFlowy GitHub (69.8K stars). https://github.com/AppFlowy-IO/AppFlowy - Docusaurus GitHub (64.6K stars). https://github.com/facebook/docusaurus - MkDocs Material Maintenance Mode. https://squidfunk.github.io/mkdocs-material/blog/2025/11/11/insiders-now-free-for-everyone/ - Obsidian Revenue (~$25M ARR). https://www.taskade.com/blog/obsidian-history - Coda Acquired by Grammarly. https://www.grammarly.com/blog/company/grammarly-to-acquire-coda/ - ClickUp MCP Server. https://developer.clickup.com/docs/connect-an-ai-assistant-to-clickups-mcp-server - Mermaid Integrations. https://mermaid.ai/open-source/ecosystem/integrations-community.html --- *This report evaluates 25 documentation tools across 12 dimensions for the dashecorp rig multi-agent software engineering system. Research conducted April 18, 2026. All pricing, ratings, and feature claims reflect publicly available information as of the access date. Claims that could not be independently verified are noted as such throughout the document.* --- ## https://research.rig.dashecorp.com/research/2026-04-18-docs-vs-memory-principles/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-docs-vs-memory-principles.md # Research: principles for docs vs memory separation (2026-04-18) > Five candidate principles for deciding what belongs in canonical docs versus in operational memory MCP. > Migrated from Plane work item DASHE-12 on 2026-04-18. The original Plane content was a stub pointing to a Plane Page; that Page was discarded when the Plane workspace retired. The content below reconstructs the intended research from the sibling docs and the Karpathy LLM Wiki analysis. ## Context The rig has three candidate homes for knowledge: - **Docs** — markdown in `dashecorp/rig-docs`, PR-reviewed, durable - **Memory MCP** — Postgres + pgvector, auto-retrieved by agents, mutable, ephemeral - **Agent character prompts** — HelmReleases in `dashecorp/rig-gitops`, baked into deployment Without a rule, these overlap. Five candidate principles to pick an ownership rule: ## 1. Immutability **Docs** are durable and reviewed. **Memory** is ephemeral and overwritable. Rule: if you write it for future-you (or the next human) to read, it's a doc. If you write it so the *next agent run* uses it, it's memory. ## 2. Review gate **Docs** go through PR review (Review-E + human). **Memory** writes are un-gated — the agent decides inline whether something is worth remembering. Rule: the review gate prevents hallucinated facts from becoming canon. But review costs time. Memory trades the gate for speed. ## 3. Retrieval pattern **Docs** are grep'd or read as canonical pages. **Memory** is queried by similarity (semantic) for relevant operational bits mid-task. Rule: if an agent needs to recall something during work, that's memory. If an agent needs to reference a fixed standard, that's docs. ## 4. Audience **Docs** are for humans and agents. **Memory** is primarily for agents (humans read it only through a surfacing pipeline). Rule: humans can't — shouldn't — read raw memory blobs. They read docs. Memory is machine-to-machine operational state. ## 5. Failure tolerance **Memory loss** is degraded-but-recoverable. Learnings re-accumulate over runs. **Docs loss** breaks the rig until restored. Rule: treat docs as ground truth requiring backup and version control (git gives this). Treat memory as an eventually-consistent cache that can be wiped. ## Synthesis All five principles point in the same direction: **docs are the canonical, human-reviewed, durable layer; memory is the ephemeral, agent-operational, vector-queryable layer.** The Karpathy LLM Wiki pattern encodes the same split as `wiki/` (durable, LLM-maintained) versus a separate operational cache. The rule operationalised: - **Docs own** — architecture, conventions, user stories, research, proposals, decisions, runbooks, glossary - **Memory owns** — per-run context, prior reviewer decisions, encountered pitfalls awaiting promotion, session-local state - **Agent character prompts own** — role definition, lore, tool list (small, versioned in rig-gitops HelmReleases) Promotion rule (memory → docs): any memory with `importance ≥ 4` and `hit_count ≥ 5` is a candidate for promotion to a docs entry by a weekly Lint pass. See sibling [anti-drift lint rules](research/2026-04-18-docs-memory-drift-lint). --- ## https://research.rig.dashecorp.com/research/2026-04-18-llm-wiki-pattern-analysis/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-llm-wiki-pattern-analysis.md # LLM Wiki pattern — Karpathy analysis > Analysis of Karpathy's LLM Wiki gist and how it applies to an autonomous coding-agent rig # LLM Wiki pattern — Karpathy analysis ## What Karpathy proposes The "LLM Wiki" gist is **not a documentation style guide** — it's an architecture pattern where the LLM *maintains* a knowledge base rather than RAG-retrieving from raw docs. ### Concrete techniques 1. **Three-layer architecture:** - `raw/` — immutable sources, LLM-read-only - `wiki/` — LLM-owned markdown synthesis - Schema file (`CLAUDE.md` or `AGENTS.md`) defining conventions and workflows 2. **Schema file is the key config.** Quoted from the gist: > "a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki." 3. **Three named operations:** - **Ingest** — new source → touches 10-15 pages - **Query** — *"good answers can be filed back into the wiki as new pages"* - **Lint** — health-check for contradictions, stale claims, orphans, missing cross-refs 4. **Two special files:** - `index.md` — content-oriented catalog, LLM reads first. Karpathy claims this works up to *"~100 sources, ~hundreds of pages"* and **avoids embedding RAG entirely**. - `log.md` — append-only chronological. Greppable prefix: `## [2026-04-02] ingest | Article Title` so `grep "^## \[" log.md | tail -5` works. 5. **Optional CLI tool escalation:** `qmd` (hybrid BM25/vector + LLM rerank, CLI + MCP) when wiki outgrows `index.md`. 6. **Frontmatter for Dataview-style queries:** *"tags, dates, source counts"* → dynamic tables. 7. **Wiki is a git repo.** Version history is free. ## How this applies to the dashecorp rig ### Strengthens - **Root `CLAUDE.md` / `AGENTS.md` as schema file.** Karpathy explicitly names these as *the* config entry point. - **Memory MCP ↔ docs split.** Karpathy's `raw/` vs `wiki/` split is the same shape the rig's memory-store-vs-git-docs boundary wants to take. ### Contradicts (and Karpathy's call is better for agent rigs) - **`llms.txt` (per llmstxt.org)** — Karpathy doesn't use it. His equivalent is `index.md` (LLM-maintained, categorized summaries). For an agent rig that MAINTAINS its own docs, `index.md` is more correct because it's dynamic. - **"One canonical `docs/` directory"** — Karpathy implies three dirs (raw/ wiki/ schema). Collapsing raw and wiki into one tree loses the provenance layer. ### Net-new ideas the dashecorp rig is missing - **`log.md` with greppable prefix** — cheap audit trail, no DB needed. Dashecorp has none. - **Lint as a scheduled operation** — contradictions, stale claims, orphans. Dashecorp has `supersedes:` as a passive field with no process consuming it. - **"Query outputs get filed back"** — compounding knowledge. Currently Dev-E's good analyses die in chat history. - **Avoid embedding RAG at small scale** — `index.md` + grep beats vector DBs under ~hundreds of pages. Dashecorp has rig-memory-mcp with pgvector; may be premature. ## Application to this proposal tree This `research/` directory IS the Karpathy `raw/` + partial `wiki/` pattern: - `research/*.md` — agent-authored synthesis of external sources (wiki-layer) - External URLs in `source_refs:` — the `raw/` pointer (until we add a `raw/` archive layer) - `proposals/*.md` — decisions informed by research Diagrams live as Mermaid source inline in markdown (Karpathy's "assets" treated as code). The `.mmd` source is the canonical artifact; no PNG/SVG is committed. ## Open questions 1. Do we mirror Karpathy's three-layer strictly, or keep the simpler two-layer (research/ + proposals/)? Current rig settled on two-layer plus a separate `user-stories/` — see `proposals/2026-04-18-docs-tooling-decision`. 2. How aggressive should Lint be? LLM-as-judge is expensive; scheduled weekly probably right. 3. How do agents decide when to "file back"? Needs an explicit rule in character prompts. --- ## https://research.rig.dashecorp.com/research/2026-04-18-production-docs-patterns/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-18-production-docs-patterns.md # Production agent docs patterns — Vercel, Cloudflare, HumanLayer, Anthropic > What production coding-agent rigs actually put in their AGENTS.md / CLAUDE.md files, with measured eval data # Production agent docs patterns ## Headline finding (Vercel eval, Jan 2026) Vercel ran a formal eval measuring agent success rates across documentation strategies on hardened Build/Lint/Test workflows: | Strategy | Pass rate | |---|---| | Baseline (no AGENTS.md) | 53% | | Skills, default loaded | 53% (no improvement) | | Skills with explicit invocation instructions | 79% | | **AGENTS.md with embedded 8 KB compressed docs index** | **100%** | Vercel's quote on why: *"Prefer retrieval-led reasoning over pre-training-led reasoning."* Passive context (facts the agent reads without deciding to retrieve) beats active retrieval steps where the agent might mis-route. ## Production AGENTS.md shapes ### Vercel (`vercel/vercel/AGENTS.md`) - **~140 lines, 11 sections.** Not aspirational prose — a rulebook. - Sections: Repository Structure, Essential Commands, Changesets (with mandatory rules), Code Style, Testing Patterns, Package Development, Runtime Packages, CLI Development, Common Pitfalls. - Ends with a **numbered "Common Pitfalls"** list — the literal anti-patterns agents hit in this repo. ### Cloudflare (`cloudflare/cloudflare-docs/AGENTS.md`) - **733 lines** — an outlier driven by MDX rendering quirks. - Verbatim *"MDX gotchas — the #1 cause of build failures"* table mapping `{`, `}`, `<`, `>` to specific fixes. - 12-item "Common mistakes to avoid" numbered list. - **CI-vs-local split:** *"`npm run build` will time out in CI environments… use `npm run check` and linters only — do **not** run a full build."* ### Vercel Labs (`vercel-labs/agent-skills/AGENTS.md`) - **Compiled**, not hand-written. Source is individual rule files in a repo, CI composes AGENTS.md. - Hallucinated facts fail `npm run check`, not human review. ## CLAUDE.md specifics (HumanLayer, followed by Anthropic internal) - **≤ 60 lines.** Claude Code's system prompt already ships ~50 instructions; frontier models track ~150–200 reliably. - Everything else goes in `agent_docs/*.md` loaded on-demand. - Claude Code auto-reads CLAUDE.md + `.claude/skills/*/SKILL.md`. It does NOT auto-read `llms.txt`, `docs/index.md`, or frontmatter `queries:` fields — those are inert to the CLI. - Quote: *"Never send an LLM to do a linter's job."* Don't put code-style rules in CLAUDE.md. ## What works (evidence-based) 1. **Embedded docs index in AGENTS.md, not link-out.** Skills-style "go fetch X" loses because agents need a decision point they frequently mis-route. 2. **Numbered "Common Pitfalls" lists at the end** — both Vercel and Cloudflare do this; it's the section agents hit when blocked. 3. **Split CI-vs-local validation matrices.** 4. **Schema-validated YAML frontmatter** (Cloudflare's `pcx_content_type` enum, tag allowlist in `src/schemas/tags.ts`) — build fails on drift, so drift is prevented by the compiler, not by goodwill. 5. **Two-tier lint in CI** — Vale + markdownlint + lychee (Datadog, GitLab, Fern all published this stack). ## What's vestigial (evidence-based) 1. **`llmstxt.org`** — no production rig in the sample adopted it. AGENTS.md won. 2. **Code-style rules in CLAUDE.md** — *"Never send an LLM to do a linter's job"* (HumanLayer). 3. **Auto-generated `/init` CLAUDE.md** — unanimously treated as a starting draft, not a keeper. 4. **OpenAPI specs as general agent docs** — useful for tool surfaces, overkill for project conventions. ## Failure modes documented elsewhere 1. **Silent rate-limit cascades → hallucinated gap-filling.** Petrenko's 16-agent refactor: *"2 of 9 interview agents hit API rate limits and failed silently."* The dashecorp rig's 529-overloaded rule addresses part of this; extend it: any agent that skipped a doc update must write an explicit "skipped: reason" line. 2. **Schema drift becoming canon.** Industry data: ~7 engineer-days/month lost. Cloudflare prevents this by making invalid `pcx_content_type` or tags hard-fail the build. 3. **Doc index bloat ("context rot").** Past ~8 KB, agent success rate measurably regresses (Vercel's data, Morph's research). Surge HQ: 693 lines of hallucinations from uncompressed context. ## Three concrete recommendations beyond the public spec docs 1. **Compile, don't hand-write, AGENTS.md.** Vercel's pattern: canonical facts in `facts/*.yaml` with JSON-Schema validation, compiled into AGENTS.md on each PR. Hallucinated facts fail `npm run check`. 2. **Split into three files with hard size budgets.** AGENTS.md ≤ 150 lines; CLAUDE.md ≤ 60 lines; `docs/agent-runbooks/*.md` for task-specific, referenced by `file:line`. Enforce with CI. Memory MCP stays separate. 3. **Two-tier lint: deterministic first, LLM-as-judge second, temperature=0.** markdownlint + vale + lychee every commit; scheduled job runs a *different* Claude instance over docs with a fixed rubric (contradictions, stale claims, orphans, factual drift against `git log` since last lint). Evidently/Arize guidance: *"strict separation between generation and evaluation."* --- ## https://research.rig.dashecorp.com/research/2026-04-20-otel-llm-observability-options/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-20-otel-llm-observability-options.md # OTel-native LLM observability — free and low-cost options for the rig > Comparison of 11 OTel-compatible LLM observability stacks on deploy footprint, LLM-specific UI, free-tier terms, and maturity. Verdict: Phoenix first (on the 8 GB VM), Langfuse later (at scale). Corrects a premise in whatsnext whitepaper that treated them as interchangeable. > **Superseded 2026-04-21.** The structural evaluation of candidates below (footprints, LLM UX, lock-in) still stands. The **pricing and recommendation** sections are superseded by [research/2026-04-21-otel-startup-programs-storage-economics](/research/2026-04-21-otel-startup-programs-storage-economics/), which factors in startup credit programs (Grafana Cloud for Startups = $100k/12mo; Langfuse 50% off first year) and storage-at-scale economics this doc missed. The revised verdict moves the production backend from Phoenix-now/Langfuse-later to *Grafana Cloud (credit) + Langfuse (discount)* with self-hosted OpenObserve as the explicit fallback. Phoenix remains for the dev inner loop. > **TL;DR.** The [whats-next whitepaper](/whitepapers/2026-04-20-whats-next/) treats **Langfuse self-hosted** and **Phoenix** as alternatives chosen on VM size. Real 2026 footprint numbers flip that: Langfuse self-hosted is 1.5–2 GB (six services), Phoenix is 300–500 MB (one container). For the current 8 GB VM, **Phoenix is the default**; Langfuse is a later migration when multi-tenant/RBAC becomes load-bearing. The other nine tools evaluated are either too heavy (SigNoz, HyperDX), wrong shape (Helicone is proxy-first), at momentum risk (Uptrace), or lack LLM-specific UI (OpenObserve, Jaeger/Tempo, Axiom, Honeycomb). ## Why this research exists The rig emits OpenTelemetry GenAI spans when `CLAUDE_CODE_ENABLE_TELEMETRY=1` is set. We need a trace store that: 1. Ingests OTLP natively — no vendor SDK lock-in, portable across Claude Code, Codex CLI, Gemini CLI 2. Has **LLM-specific UI** — token counts, cost per model, prompt/response diff, session trees — not just generic trace waterfalls 3. Fits alongside rig-conductor, Postgres, Valkey, and the agents on a **single 8 GB k3s VM** today 4. Has an honest migration path when we outgrow the 8 GB box Cost-sensitivity is explicit. The question isn't *which premium tool*; it's *which free or near-free OSS tool doesn't paint us into a corner.* ## The candidates (in one paragraph each) **Langfuse** — MIT core; commercial `ee/` modules hold SSO, RBAC, multi-tenant. Self-hosted stack: web + worker + Postgres + ClickHouse + Redis + MinIO — realistic **1.5–2 GB RAM**, not the 1 GB the whitepaper currently claims. OTel/OTLP GA since v3. Best-in-class LLM UI: sessions, token/cost per model, evaluations, prompts, datasets. Cloud Hobby free: **50k observations/month, 30-day retention.** ~16k stars, YC W23, adopted by Khan Academy, Twilio, SumUp. Depth is the bet; footprint is the cost. **Arize Phoenix** — Elastic-2.0 (source-available, self-host permitted). **One Python container, ~300–500 MB RAM**, SQLite or Postgres backend. Native OTLP + OpenInference (their extension of OTel GenAI). Strong LLM UI: traces, evals, prompt playground, datasets. **No built-in auth or multi-tenant in OSS** — Arize sells that. ~6k stars, very active. "Drop in and go" on an 8 GB VM. Auth gap is resolvable by putting it behind Tailscale or an auth-proxy sidecar. **OpenObserve** — AGPL-3. Single Rust binary, ~200 MB RAM idle, local disk or S3. OTel-native (traces + logs + metrics). **No LLM-specific UI** — generic trace waterfalls and SQL over span attributes. Cloud free: 200 GB ingest/month. Fast release cadence. Excellent *infrastructure* backbone; weak for prompt/token inspection. **SigNoz** — MIT. ClickHouse + Zookeeper + query-service + frontend + OTel collector — **~2–3 GB RAM**, tight on an 8 GB VM sharing with everything else. OTel-native end-to-end. Added LLM/GenAI dashboards in 2025. Cloud: free trial only, paid from ~$199/mo. ~25k stars. Right answer when the rig has its own observability node. **Uptrace** — BSL-1.1 (converts to Apache-2 after 3 years). Go binary + ClickHouse + Postgres, ~1 GB. OTel-native, decent trace UI, modest AI dashboards. ~2.5k stars, **commit cadence slowed in 2025 — momentum risk.** Pick only with a migration contingency. **HyperDX** — MIT. ClickHouse + Mongo + app, ~1.5–2 GB. OTel-native, strong trace/log UI, session replay. **Acquired by ClickHouse Inc. mid-2024**, v2 effectively folded into ClickHouse Observability. No LLM-specific UI. Roadmap now ClickHouse-driven — vendor trajectory unpredictable for the standalone OSS version. **Grafana Cloud Free + Tempo** — 50 GB traces/mo, 14-day retention, 3 users. Tempo is OTel-native but has no LLM UI — you build views in Grafana with TraceQL/attributes. Self-hosted Tempo is ~500 MB but needs object storage. Good if you already run Grafana; weak for prompt inspection. **Axiom** — SaaS only. Free Personal tier: **0.5 TB ingest/month, 30-day retention, 3 users.** Native OTLP. No LLM UI — fast log/event explorer (APL). Generous free tier, zero LLM affordance out of the box. Pair with a thin instrumentation layer if you go this route. **Honeycomb** — SaaS. Free: **20M events/month, 60-day retention, 5 users.** OTLP-native, excellent BubbleUp/trace UI. No LLM-specific UI; GenAI semconv attributes render as normal span fields. **Helicone** — Apache-2. Proxy-first (Postgres + ClickHouse + proxy + web), OTel support experimental. Cloud free: 10k requests/month. Good LLM UI (cost, caching, sessions). **Not an OTel-ingest backend; it's a proxy.** Pick only if agent traffic is routed through Helicone, which re-frames the stack. **OpenLLMetry / Traceloop SDK** — Apache-2 instrumentation library, **not a backend**. Emits OTel GenAI spans → ship anywhere. Traceloop SaaS free: 50k spans/month. Useful as the emission side for Codex/Gemini CLI if native GenAI semconv isn't there yet. **Jaeger / Tempo baseline** — CNCF, free, OTel-native, no LLM UI. Baseline only; include as comparison anchor. ## Picking by need | Need | Pick | Why | |---|---|---| | Cheapest free-tier SaaS, OTel-native, LLM UI | **Langfuse Cloud Hobby** | Only free SaaS with proper token/cost/prompt UI over OTLP. 50k observations/month covers a small rig. | | Best self-hosted on one 8 GB VM (today) | **Arize Phoenix** | 300–500 MB, single container, OTLP-native, strong LLM UX. Keep OTel Collector routing infra traces separately. | | Best if we scale past 8 GB (later) | **Langfuse self-hosted** (ClickHouse externalised) or **SigNoz** | Langfuse wins on LLM depth; SigNoz wins on unified APM. | ## Recommendation Deploy **Phoenix** to the 8 GB VM now as the agent trace store. Single container, expose an OTLP endpoint, set `CLAUDE_CODE_ENABLE_TELEMETRY=1` on agent pods, done. Put Phoenix behind Tailscale or a tiny auth proxy (oauth2-proxy) since OSS has no auth — acceptable for internal-only use. Keep the existing OTel Collector as the single OTLP ingress, fan-out: - **Agent LLM spans** (`gen_ai.*` attributes) → Phoenix - **Infra traces + metrics** → Grafana Cloud Free (50 GB/mo free, 14-day retention) via Tempo/Mimir — or stay on the local Prometheus path for Flagger gates - **Optional later**: OpenObserve as a unified backend if Grafana Cloud's free tier exhausts When Tablez agents come online and multi-tenant / RBAC / prompt-management becomes load-bearing — which Phoenix OSS can't give us — migrate to **Langfuse self-hosted** with ClickHouse moved to managed Postgres/Cloud SQL. The OTLP emitters on agent pods don't change; only the ingest endpoint URL does. That migration risk is the single highest-leverage reason to pick Phoenix now rather than Langfuse-from-day-one: we prove the traces flow, get the LLM UX, and pay zero switching cost when the scale demands ClickHouse anyway. ## Caveats - **Phoenix OSS has no auth**. Do not expose publicly. Tailscale ACL or oauth2-proxy sidecar. - **Phoenix on Elastic-2.0** — source-available, not OSI-OSS. Self-host is explicitly permitted for any purpose **except** offering a competing hosted Phoenix-as-a-service. Fine for internal use; would matter only if we ever try to SaaS-resell it. - **Langfuse commercial features** live under `ee/` in the repo — SSO, SCIM, RBAC, audit logs, multi-tenant quotas. MIT core is enough for single-tenant ops; at Tablez onboarding we'll need `ee/`. - **HyperDX post-acquisition trajectory** — the OSS repo still ships, but roadmap is ClickHouse Inc.'s. Treat as a ClickHouse front-end, not an independent project. - **Uptrace momentum** — fewer commits in H2 2025 than H1. If you pick it, plan a migration contingency. - **Helicone is a proxy, not an OTel backend.** If we route agent HTTP through it we get its UI but break OTel-native emission. Picking it means *replacing* the OTLP path, not augmenting it. Don't pick for that reason. - **Free-tier reality check** — all three SaaS free tiers (Langfuse 50k obs, Axiom 0.5 TB, Honeycomb 20M events) handle a small rig today. At ~1k tasks/day with ~50 LLM calls/task = 1.5M spans/mo — still comfortable on all three. A busy Tablez onboarding could push past the Langfuse Hobby limit within a quarter; the self-host is the hedge. ## Supersession This research supersedes the original Observability recommendation in [`whitepapers/2026-04-20-whats-next`](/whitepapers/2026-04-20-whats-next/) Priority 2, which listed Langfuse and Phoenix as interchangeable alternatives. The whitepaper should be amended to name Phoenix as the immediate default and Langfuse as the documented scale-up migration. --- ## https://research.rig.dashecorp.com/research/2026-04-21-agent-runtime-install-audit/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-21-agent-runtime-install-audit.md # Agent runtime installs — what's baked in, what's needed, and why Phase 1 egress CIDR-only breaks agents > Audit of package-install behavior across rig agent pods: base and per-language Dockerfiles bake ~20 tools; the task prompt instructs agents to install more at runtime with sudo + apt-get (both blocked by pretool-guard — contradiction). Any Phase 1 NetworkPolicy with only CIDR allowlists would break `npm install` / `pip install` for customer project deps (npm + nuget are Cloudflare-fronted, ~1500 rotating prefixes). Revised recommendation: skip Phase 1 CIDR-only, go direct to FQDNNetworkPolicy; fix the stream-consumer.js runtime-install prompt to match pretool-guard reality. > **TL;DR.** Audit prep work for AC 5 Phase 1 of the [safety-foundation user story](/user-stories/2026-04-20-safety-foundation/). Finds two issues: **(1)** the current agent task prompt in `stream-consumer.js:226` tells agents to install tools at runtime via `sudo` + `apt-get install`, both already blocked by the pretool-guard shipped in AC 1. Dead advice, worth fixing. **(2)** Phase 1 of the [egress policy options research](/research/2026-04-21-egress-policy-options/) recommended a CIDR-only NetworkPolicy with npm blocked at the network layer. That breaks **every agent task that runs on a repo with a `package.json`** because `npm install` needs registry access. **Revised plan: skip Phase 1 CIDR-only and go direct to GKE FQDNNetworkPolicy** with an expanded allowlist (npm + pypi + nuget registries in addition to GitHub + Anthropic). Still Preview, still acceptable at 2 agents. ## What's baked into agent images today | Layer | Packages installed at build time | |---|---| | `Dockerfile.base` (common) | `ca-certificates`, `curl`, `jq`, `git`, `openssh-client`, GitHub CLI `gh`, Node 22 (via parent image `node:22-slim`), `@anthropic-ai/claude-code`, `@openai/codex`, `@dashecorp/rig-memory-mcp` | | `Dockerfile.node` (dev-e, review-e stacks) | adds `typescript`, `tsx`, `jest`, `vitest`, `eslint`, `prettier` (global) | | `Dockerfile.python` (if used) | adds `python3`, `python3-pip`, `python3-venv`, `pytest`, `black`, `ruff` | | `Dockerfile.dotnet` (rig-conductor agent) | adds .NET 10 SDK | Total: ~20 common dev tools baked in. Developers working on the rig itself rarely need more. ## What the task prompt tells agents (today) From `src/stream-consumer.js:226`: ``` ## Runtime installs If you need a tool that is not installed, install it yourself (npm install -g, pip install, apt-get install). You have sudo access for apt-get. Prefer global installs so they persist for the session. ``` This is **doubly broken** today, before any egress policy even lands: | Command in prompt | pretool-guard behavior | Result | |---|---|---| | `sudo apt-get install …` | blocked (sudo + apt install regex) | agent stuck | | `apt-get install …` | blocked (apt install regex) | agent stuck | | `brew install …` | blocked (brew install regex) | agent stuck | | `npm install -g …` | allowed | works today, would break under egress policy | | `pip install …` | allowed | works today, would break under egress policy | So: two of the three installer families the prompt recommends are already stopped by AC 1's guard. The guard's `GuardBlocked` dashboard panel (shipped yesterday) will surface these attempts. ## What agent tasks actually need at runtime Agents work on **customer repos**, not just rig-agent-runtime itself. A typical task flow: 1. `task-workspace create issue-` → worktree with the customer repo 2. `cd ` and read the project 3. Implement changes — which frequently requires installing the project's declared deps: - Node repo: `npm install` (or `npm ci`) resolving `package.json` - Python repo: `pip install -r requirements.txt` - .NET repo: `dotnet restore` 4. Run tests: `npm test` / `pytest` / `dotnet test` 5. Open PR Step 3 is the one that egress policy breaks. The project-level `npm install` is **not optional** — it's how dependencies come down to run the project's tests. Baking customer deps into the image is not feasible: every customer repo is different. ## Why Phase 1 (CIDR-only allowlist) doesn't work [The egress policy options research](/research/2026-04-21-egress-policy-options/) recommended: > Phase 1 — default-deny `NetworkPolicy` with CIDR allowlist (Anthropic `160.79.104.0/21`, GitHub `/meta`, kube-dns, `rig-memory-mcp`). Block npm at the network layer; bake deps into agent images. The bake-deps advice was reasonable for **our own agent runtime** (dep list is small, stable). But it collapses as soon as agents operate on customer repos. And the Cloudflare problem remains: - `registry.npmjs.org` → Cloudflare (≈1500 prefixes that rotate) - `nuget.org` → Azure CDN (frequently changing) - `pypi.org` → Fastly (less-frequent rotation but still not stable-enough for an `ipBlock`) Putting these in an `ipBlock` NetworkPolicy produces a monthly stream of `npm install` failures as prefixes rotate. ## Revised phase plan **Skip Phase 1 as originally drafted.** Go direct to **Phase 2** — GKE-native `FQDNNetworkPolicy` (Preview in 2026, acceptable for 2 agents): 1. Default-deny egress `NetworkPolicy` on the agent namespace. 2. Allow kube-dns (TCP/UDP 53). 3. Allow `rig-memory-mcp` service via `podSelector`. 4. FQDNNetworkPolicy allowing: - `api.github.com`, `github.com`, `codeload.github.com`, `objects.githubusercontent.com` - `api.anthropic.com` (or route this via the LiteLLM proxy when Priority 3 ships) - `registry.npmjs.org`, `registry.yarnpkg.com` (Yarn registry, alias to npm) - `pypi.org`, `files.pythonhosted.org` - `api.nuget.org`, `*.nuget.org` 5. Everything else denied. FQDN Preview limits per GKE docs: 50 IPs per FQDN resolution, 100 IP/hostname quota, one-label-deep wildcards. `*.nuget.org` matches `api.nuget.org` but not `foo.bar.nuget.org` — acceptable for the registries above. ## Also fix: the prompt contradicts the guard `stream-consumer.js:226` should be rewritten to match what the pretool-guard actually allows. Proposed: ``` ## Runtime installs Most common tools (git, gh, jq, curl, claude, codex, typescript, jest, vitest, eslint, prettier, pytest, black, ruff, dotnet) are pre-installed. For project dependencies, use `npm install`, `pip install`, or `dotnet restore` inside your worktree. Do NOT use `sudo`, `apt-get`, or `brew` — those are blocked by the PreToolUse guard. If you need a tool that is genuinely missing from the image, open a separate PR against rig-agent-runtime/Dockerfile.* to add it; do not try to install it at runtime. ``` Two benefits: aligns with guard reality today, and primes agents correctly for a future FQDNNetworkPolicy where arbitrary-host network calls are denied. ## Small implementation plan Two PRs, in order: 1. **rig-agent-runtime**: rewrite the `## Runtime installs` block in `src/stream-consumer.js`. One-line test: prompt no longer mentions `sudo` / `apt-get` / `brew install`. ~1 hour. 2. **rig-gitops**: add the FQDNNetworkPolicy (and the default-deny base `NetworkPolicy`) under `apps/dev-e/` and `apps/review-e/`. Needs `kubectl apply --dry-run=server` against the cluster before merge. ~½ day plus cluster validation. Neither PR needs Cilium CRDs; both rely only on GKE Dataplane V2 features available to Invotek today. ## Supersession This research **supersedes the Phase 1 slice** of [research/2026-04-21-egress-policy-options](/research/2026-04-21-egress-policy-options/). The seven-option comparison and the Phase 3 egress-gateway recommendation in that doc still stand; the "Phase 1 CIDR-only first" advice is superseded. The combined plan is now: skip to what that doc called Phase 2, and keep Phase 3 as the later scale-up milestone. ## Sources - `rig-agent-runtime/Dockerfile.base` — base image layers - `rig-agent-runtime/Dockerfile.{node,python,dotnet}` — per-language layers - `rig-agent-runtime/src/stream-consumer.js:226` — runtime-install prompt text - `rig-agent-runtime/hooks/pretool-guard.sh` — blocklist (shipped in rig-agent-runtime#97) - [research/2026-04-21-egress-policy-options](/research/2026-04-21-egress-policy-options/) — superseded Phase-1 recommendation --- ## https://research.rig.dashecorp.com/research/2026-04-21-brain-pattern-prior-art/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-21-brain-pattern-prior-art.md # Is the brain pattern a good idea? — prior art in 2025-2026 > Audit of what the rig's BRAIN.md actually is vs. what AGENTS.md, CLAUDE.md, llms.txt, Anthropic Skills, Cursor rules, and Karpathy's LLM Wiki do. Verdict: defensible and in the mainstream markdown-over-RAG camp, but our 30 KB always-load size is 3-5× the community ceiling and we're not emitting AGENTS.md so non-Claude tools can't find it. > **TL;DR.** The rig's "brain" is a single ~30 KB markdown file, compiled from `facts/*.yaml`, fetched as step 1 of every agent session. That shape is mainstream 2026 practice — the current industry direction is *markdown over RAG/MCP* for institutional knowledge (Anthropic Skills, AGENTS.md convention, Karpathy's LLM Wiki). Our compile-from-YAML + CI drift check is *better* than the community norm. But four concrete things are off-trend: **we're 3-5× the recommended always-load size**, **we don't emit `AGENTS.md`** so every non-Claude tool can't find us, we use a **per-portfolio split** where the community uses nested scoping, and we **front-load brain before reading the issue** where the current trend is progressive disclosure. ## Why this research exists Before investing more in the brain pattern (per-repo notify workflows, a dashecorp-docs aggregator, a brain map upgrade), sanity-check it against what other teams are actually doing in 2025-2026. Is "brain" a known pattern, a bespoke invention, or a local name for something the industry has already standardized? ## What our brain actually is (for contrast) | Axis | Our rig | |---|---| | Storage | One `BRAIN.md` file per portfolio, served at `research.rig.dashecorp.com/BRAIN.md` (raw) and `docs.rig.dashecorp.com/brain/` (rendered) | | Size | ~30 KB, CI-enforced budget of 36 KB | | Source of truth | `facts/*.yaml` in `dashecorp/rig-docs` (repos, agents, surfaces, flows, events, whitepaper catalog, backlog); hand-edit of BRAIN.md forbidden | | Compile step | `npm run brain` → emits BRAIN.md + public/BRAIN.md + src/content/docs/brain.md; CI `brain:check` rejects drift | | When fetched | Step 1 of every agent session. Before reading the issue. Before any other tool call. | | Two-hop | Rig BRAIN first (invariant), portfolio BRAIN (dashecorp-docs / tablez-docs) second if assignment matches | | Not | RAG. Vector DB. MCP memory. Dynamic retrieval. | ## Prior art — 10 data points | # | Who | Shape | Size | Key diff from ours | |---|---|---|---|---| | 1 | **[AGENTS.md](https://agents.md/)** — cross-vendor convention, Linux Foundation-stewarded | One MD at repo root; nested files override in subdirs | Target ≤150 lines. Codex caps aggregate at 32 KiB via `project_doc_max_bytes` | Per-repo, not per-portfolio. Hand-written, no compile. Respected by 60k+ OSS projects. | | 2 | **Claude Code `CLAUDE.md`** | Auto-loaded at session start | [HumanLayer recommends <300 lines](https://www.humanlayer.dev/blog/writing-a-good-claude-md); their own is <60 | Single file, no YAML facts, no drift check. | | 3 | **Cursor `.cursor/rules/*.mdc`** | Multiple scoped rules, path-glob activated | Each file small; progressive | Scope-by-path, not always-loaded. | | 4 | **Aider `CONVENTIONS.md`** | Style/convention file | Arbitrary | Not auto-loaded; user `--read`s it explicitly. | | 5 | **`llms.txt` + `llms-full.txt`** (Anthropic, Vercel, Mintlify, OpenAI) | Slim index + full-corpus dump exposed by docs sites | llms-full.txt often MB-scale | Targets external LLM crawlers, not in-session priming. Two-tier pattern maps well. | | 6 | **[Karpathy LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)** (Apr 2026) | `index.md` + `log.md` + entity pages, LLM-maintained, compiled from raw sources | ~100-200 pages before needing retrieval | Same compile-vs-RAG philosophy. LLM-maintained, not YAML-compiled. Wiki-scale, not single file. | | 7 | **[Anthropic Agent Skills](https://bdtechtalks.com/2025/10/20/anthropic-agent-skills/)** (Oct 2025) | `SKILL.md` index + scripts + resources, loaded on demand | Index tiny; bodies loaded only when triggered | Progressive disclosure — explicit opposite of our "read full brain every session." | | 8 | **OpenAI Codex openai/openai monorepo** | 88 nested AGENTS.md files | Aggregate capped at 32 KiB | Scales via nesting, not portfolio split. | | 9 | **Datadog frontend monorepo** ([blog](https://dev.to/datadog-frontend-dev/steering-ai-agents-in-monorepos-with-agentsmd-13g0)) | AGENTS.md hierarchy per package | Per-package | Same nested-scoping pattern. | | 10 | **Builder.io, Factory, Ona, Augment, Gemini CLI** | All publish AGENTS.md guides and tooling | 100-300 lines typical | Commercial endorsement of the AGENTS.md convention. | ## Tradeoffs the field actually reports **Token budget.** [HumanLayer](https://www.humanlayer.dev/blog/writing-a-good-claude-md): LLMs reliably follow ~150-200 instructions; Claude Code's system prompt already burns ~50. Every extra paragraph in an always-loaded file uniformly degrades instruction-following. Our ~30 KB is 3-5× the community ceiling and sits right at Codex's 32 KiB aggregate cap. **Staleness.** Hand-curated files rot. AGENTS.md community reports a recurring "feedback loop" where multiple contributors append conflicting opinions, files grow unmaintainably, agent performance *drops*. **Our YAML-compile + CI drift check is a meaningful mitigation the community mostly lacks** — we should call this out as a differentiator when explaining the pattern externally. **Hand-curated vs RAG.** The industry swing in late 2025 / early 2026 (Anthropic Skills release, [The New Stack "Skills vs MCP"](https://thenewstack.io/skills-vs-mcp-agent-architecture/)) is *away* from RAG/MCP for knowledge and *toward* markdown + progressive disclosure. GitHub MCP server used ~50k tokens per session; equivalent SKILL.md used ~200. **We're on the winning side of this argument.** **Large codebases.** A single 30 KB file doesn't scale. OpenAI's own repo has 88 nested AGENTS.md. Karpathy notes the wiki pattern breaks down past ~100-200 pages without a secondary retrieval layer. We'll hit this wall if portfolios grow. **Multi-portfolio.** Almost nobody does portfolio-level brains. The community answer is *nested* AGENTS.md files scoped by path. Our two-hop (rig brain + portfolio brain) is unusual but coherent. **Memory vs knowledge.** [Zep](https://blog.getzep.com/stop-using-rag-for-agent-memory/): RAG loses ground for *institutional knowledge* (what the rig-memory-mcp stores) but wins for *conversational memory across sessions*. Our rig-memory-mcp covers the latter; BRAIN covers the former. The separation is correct. ## The counter-argument — who hates this and why Three camps will push back: 1. **Progressive-disclosure camp** (Anthropic Skills, Cursor rules). *"Don't front-load 30 KB every session. Keep the index small (≤60 lines); let the agent pull deeper files on demand. You're burning context on every run to cover cases that apply to 5% of sessions."* 2. **Nested-convention camp** (mainstream AGENTS.md). *"Put AGENTS.md at every relevant directory. The agent reads the nearest one. No global brain needed; scope is implicit in where the agent is working."* 3. **RAG / agent-memory camp** (Zep, sqlite-memory, MCP memory primitives). *"Knowledge grows; static files don't. Use an MCP memory server with semantic retrieval so agents pull exactly what they need."* Losing ground for institutional knowledge in 2026, still winning for conversational memory. Nobody credible is saying "no context file at all." The live debate is static-and-full vs dynamic-and-sliced. ## Comparison table | Choice | Our rig | Mainstream (AGENTS.md) | Progressive (Skills/Cursor) | Karpathy LLM-Wiki | |---|---|---|---|---| | One static file | **Yes, ~30 KB** | Yes per scope, ≤150 lines | No — index + on-demand | No — many pages | | Authoring | **YAML → compiled, CI drift check** | Hand-written | Hand-written | LLM-maintained | | Size budget in CI | **Yes** | No (Codex caps at 32 KiB silently) | No | No | | Per-portfolio split | **Yes (separate fetch)** | No — nested by path | No — by skill trigger | No | | Fetched every session | **Yes, step 1** | Yes (auto-load) | Only index; body on demand | Queried as needed | | Retrieval method | Static URL fetch | File read | Tool-triggered load | LLM reads wiki pages | ## Verdict **The pattern is defensible and sits in the mainstream 2026 camp** (markdown over MCP/RAG for institutional knowledge). The compile-from-YAML + CI drift check is *better* than what most teams do — keep that. Four concrete concerns, ordered by severity: 1. **30 KB is too big for "step 1 every session."** HumanLayer, AGENTS.md guidance, and Codex's own 32 KiB cap all converge around ≤8-10 KB for always-loaded context. We're burning context budget on portfolio-wide detail most sessions don't need.
**Action:** split BRAIN.md into a ~5 KB "always-load index" + on-demand deeper files keyed off the cold-start recipe that's already in the brain. This is the Anthropic Skills pattern; it maps cleanly onto our YAML compile. 2. **We're not on the AGENTS.md name.** Every non-Claude tool (Cursor, Codex, Copilot, Factory, Ona, Gemini CLI, Aider) looks for `AGENTS.md`, not `BRAIN.md`.
**Action:** symlink or emit `AGENTS.md` from the same compile so non-Claude agents can find us, even if our own agents prefer the "brain" label internally. 3. **Per-portfolio split is unusual.** The industry answer is nested files scoped by path. Portfolio-level isn't wrong, but verify that our agents actually *benefit* from the two-hop; if the second fetch is skipped or wrong-portfolio'd often, we've paid for nothing.
**Action:** log BRAIN fetches per agent for a week; measure whether the portfolio fetch correlates with the repo touched. 4. **"Fetch before reading the issue" is aggressive.** For trivial issues (typo fix, lockfile bump) 30 KB of brain is pure waste.
**Action:** add a gate — load brain only when the first tool call suggests org-level context is needed. Skills does this naturally via triggers; ours could via a simple first-turn heuristic. ### What to verify before investing more - Measure actual token cost of the session-start fetch across the agent fleet for a week. If it's >3% of total token spend, we're overpaying. - Audit whether agents actually *use* the deep sections of BRAIN.md. If 80% of sessions only reference the top 5 KB, strip the rest. - Sanity-check that `AGENTS.md` exists at each repo root even if it just points at the brain URL — otherwise we're locked out of every non-Claude coding agent, and the `dashe-*` / multi-vendor story breaks. ## Bottom line We're doing a slightly-heavier-than-average version of the thing the field broadly agrees is right. Trim the always-load size, add progressive disclosure, emit `AGENTS.md`, measure usage — and we're ahead of the pack rather than just keeping up. ## Sources - [AGENTS.md (official)](https://agents.md/) - [agentsmd/agents.md GitHub](https://github.com/agentsmd/agents.md) - [Augment — How to Build Your AGENTS.md (2026)](https://www.augmentcode.com/guides/how-to-build-agents-md) - [OpenAI Codex — Custom instructions with AGENTS.md](https://developers.openai.com/codex/guides/agents-md) - [AGENTS.md in monorepos — precedence issue #53](https://github.com/agentsmd/agents.md/issues/53) - [Datadog — Steering AI Agents in Monorepos with AGENTS.md](https://dev.to/datadog-frontend-dev/steering-ai-agents-in-monorepos-with-agentsmd-13g0) - [HumanLayer — Writing a good CLAUDE.md](https://www.humanlayer.dev/blog/writing-a-good-claude-md) - [DeployHQ — How to Configure Every AI Coding Assistant](https://www.deployhq.com/blog/ai-coding-config-files-guide) - [The Prompt Shelf — .cursorrules vs CLAUDE.md vs AGENTS.md](https://thepromptshelf.dev/blog/cursorrules-vs-claude-md/) - [Karpathy LLM Wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) - [Level Up Coding — Beyond RAG, Karpathy's LLM Wiki pattern](https://levelup.gitconnected.com/beyond-rag-how-andrej-karpathys-llm-wiki-pattern-builds-knowledge-that-actually-compounds-31a08528665e) - [VentureBeat — Karpathy's LLM Knowledge Base architecture](https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an) - [Mintlify — Real llms.txt examples](https://www.mintlify.com/blog/real-llms-txt-examples) - [Mintlify — What is llms.txt?](https://www.mintlify.com/blog/what-is-llms-txt) - [Anthropic llms-full.txt](https://docs.claude.com/llms-full.txt) - [Vercel llms-full.txt](https://vercel.com/docs/llms-full.txt) - [BD Tech Talks — Inside Claude Skills](https://bdtechtalks.com/2025/10/20/anthropic-agent-skills/) - [Marcel Castro — Skills and progressive disclosure](https://marcelcastrobr.github.io/posts/2026-01-29-Skills-Context-Engineering.html) - [The New Stack — Running agents on Markdown instead of MCP](https://thenewstack.io/skills-vs-mcp-agent-architecture/) - [Zep — Stop Using RAG for Agent Memory](https://blog.getzep.com/stop-using-rag-for-agent-memory/) - [sqlite-memory — markdown-first agent memory](https://github.com/sqliteai/sqlite-memory) --- ## https://research.rig.dashecorp.com/research/2026-04-21-egress-policy-options/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-21-egress-policy-options.md # Default-deny egress on GKE Dataplane V2 — seven options, one layered recommendation > AC 5 of the safety-foundation user story assumes Cilium L7. GKE Dataplane V2 is Cilium-derived but doesn't expose full Cilium CRDs. This doc evaluates seven options (FQDNNetworkPolicy, self-Cilium, sidecar, egress gateway, ipBlock, Anthropic-CIDR, LiteLLM+gateway), recommends a layered rollout starting with a default-deny + CIDR allowlist NetworkPolicy this week, then FQDNNetworkPolicy next sprint, then an egress gateway when agent count grows. > **⚠️ RETRACTED 2026-04-22.** This document assumed the rig runs on **GKE Dataplane V2**. It does not — the rig runs on **k3s v1.34.6 on a single GCE VM** (`invotek-k3s` in `invotek-github-infra`), flannel CNI, NetworkPolicy enforced by k3s-embedded kube-router. GKE-specific options (FQDNNetworkPolicy, Dataplane V2, self-Cilium-alongside-DPv2) do not apply. The k8s-native parts of the analysis (default-deny NetworkPolicy + CIDR allowlist, egress gateway, sidecar proxy) still stand. **Phase 1 shipped 2026-04-22** as a plain `NetworkPolicy` with default-deny + CIDR allowlist (Anthropic `160.79.104.0/21` + GitHub `/meta` snapshot). Phase 2 hostname allowlisting is deferred pending fresh research — either a CNI swap (Calico/Cilium) or a plain-Envoy cluster egress gateway. Sections below are kept for historical context; the recommendation is superseded. > > **Original TL;DR (retracted).** The AC 5 spec in the [safety-foundation user story](/user-stories/2026-04-20-safety-foundation/) names "Cilium L7 NetworkPolicy." The rig runs on GKE Dataplane V2 which does not expose full Cilium CRDs. Seven options were evaluated; three viable. Layered recommendation: default-deny + CIDR, then FQDNNetworkPolicy, then egress gateway. ## Why this research exists AC 5 of the safety-foundation user story reads *"Cilium L7 policy per agent namespace — everything else denied."* Good policy, wrong primitive for the cluster we actually run on. GKE Dataplane V2 supports standard Kubernetes NetworkPolicy (L3/L4, IP blocks, pod selectors) but not the Cilium CRDs that would give us `toFQDNs`. Shipping YAML blindly without this distinction would either fail to apply or enforce nothing useful. This doc is the capability check that should have been the first step of AC 5. ## The seven options ### 1. FQDNNetworkPolicy (GKE-native) Google's own CRD (`networking.gke.io/v1alpha1`) that layers FQDN rules on top of Dataplane V2. Still **Preview** in April 2026. Requires Dataplane V2 + GKE 1.26.4-gke.500 / 1.27.1-gke.400+ + kube-dns or Cloud DNS (no custom DNS). Enable via `gcloud container clusters update --enable-fqdn-network-policy`; on Standard you must restart `anetd`. Limits: **50 IPs per FQDN resolution**, **100 IP/hostname quota**, wildcards match only **one label deep** (`*.company.com` matches `eu.company.com`, not `eu.api.company.com`), incompatible with inter-node transparent encryption, cannot target ClusterIP/Headless services. Low effort, zero extra infra, exactly the right primitive — but Preview means no SLA. ### 2. Cilium CNI alongside Dataplane V2 **Not supported.** Dataplane V2 is Google's managed Cilium fork; you cannot run upstream Cilium on top. To get `CiliumNetworkPolicy` + `toFQDNs` you'd need a new cluster with Dataplane V2 disabled and self-managed Cilium — DNS-intercept proxy, upgrade churn, zero Google support. Not worth it for a 2-agent team. ### 3. Proxy sidecar per pod Mature pattern (Netflix, Lyft, Airbnb variants via Envoy). `NetworkPolicy` allows egress only to `localhost:`; sidecar does SNI/HTTP host allowlisting. Overhead: ~30–80 MB RAM + ~5–10 m CPU per pod plus sidecar lifecycle coupling. Strong defense-in-depth (TLS SNI inspection, per-request logs), but config drift multiplies with pod count. Too much overhead at current scale. ### 4. Cluster-level egress gateway A single Envoy Deployment (2 replicas) as HTTP/HTTPS forward proxy. Agent pods get `HTTPS_PROXY=egress-gw:3128` + NetworkPolicy egress allows only `egress-gw` + kube-dns. Gateway enforces FQDN allowlist via SNI. Less per-pod overhead than sidecars, one config file, single audit log. Downside: single point of failure (mitigated with 2 replicas + PDB), and apps must honor `HTTPS_PROXY` — Node/Python/curl do, some Go clients don't by default. Istio egress gateway works but drags the whole mesh; a plain Envoy Deployment is ~80 lines of YAML. ### 5. Pure ipBlock CIDR allowlist Standard Kubernetes NetworkPolicy with `ipBlock` entries. GitHub publishes `api.github.com/meta` (verified — returns stable CIDRs). Anthropic publishes `160.79.104.0/21` outbound at [platform.claude.com/docs/en/api/ip-addresses](https://platform.claude.com/docs/en/api/ip-addresses), stated stable. **npmjs.org is Cloudflare-fronted** — CIDRs are not stable (~1500 prefixes, rotate). Works for GitHub + Anthropic, fails for npm. ### 6. Anthropic-specific CIDR `160.79.104.0/21` outbound, `160.79.104.0/23` inbound. Published at the Anthropic IP addresses page, explicitly promised "will not change without notice." No structured `meta` endpoint, but docs are canonical. ### 7. LiteLLM + separate egress gateway Once Priority 3's LiteLLM proxy ships: LiteLLM becomes the only pod allowed to reach Anthropic; agents reach LiteLLM via ClusterIP. A thin egress gateway only allowlists GitHub + npm. Clean separation, two components to operate — defer until LiteLLM is actually landing. ## Comparison | Option | Impl effort | Ops cost | Defense strength | Lock-in | Fit for DPv2 | |---|---|---|---|---|---| | 1 FQDNNetworkPolicy | Low | Low | Medium (DNS-race, Preview) | GKE | **Native** | | 2 Self-Cilium | Very high | High | High | None | Poor (rebuild cluster) | | 3 Sidecar proxy | Medium | Medium-high | High | None | Good | | 4 Egress gateway | Medium | Low | High | None | Good | | 5 ipBlock only | Low | Medium (drift) | Low–Medium | None | **Native** | | 6 Anthropic CIDR | Trivial | Trivial | Medium | None | **Native** | | 7 LiteLLM + gateway | High | Medium | High | None | Good | ## Recommendation for Invotek **Layered, starting minimal.** ### Phase 1 — this week (~1 hour) Default-deny egress `NetworkPolicy` on the agent namespace. Allow: - `kube-dns` (UDP/TCP 53) - Internal `rig-memory-mcp` service (via `podSelector`) - Anthropic `160.79.104.0/21` (via `ipBlock`) - GitHub `/meta` CIDRs (snapshot; refreshed manually quarterly, or via a CronJob that updates a ConfigMap) **Block npm at the network layer entirely at this stage.** Agents shouldn't `npm install` at runtime; bake dependencies into agent images instead. This aligns with the pretool-guard blocklist convention for package installers. ### Phase 2 — next sprint (~½ day) Add **FQDNNetworkPolicy** (Option 1) for `api.github.com`, `api.anthropic.com`, and — if truly needed at runtime — `registry.npmjs.org`. Preview status is acceptable for 2 agents; worst case the Phase-1 CIDR policy keeps the floor. Monitor `anetd` logs. ### Phase 3 — when agent count ≥ 4 or a non-CIDR-stable vendor appears (~2 days) Deploy a **cluster egress gateway** (plain Envoy, 2 replicas, SNI allowlist). Single reasoning point for outbound. Keep FQDNNetworkPolicy as belt-and-braces. Merge with Priority 3's LiteLLM proxy when it ships — LiteLLM handles Anthropic, Envoy handles the rest. ## What NOT to do - **Don't run self-managed Cilium alongside DPv2.** Unsupported, cluster rebuild required, zero benefit at our scale. - **Don't ipBlock Cloudflare ranges for npm.** 1500+ prefixes that rotate; you'll be debugging `npm install` failures monthly. Bake deps into images or use FQDN instead. - **Don't use Istio just for an egress gateway.** Mesh tax is enormous for a 2-pod use case. Plain Envoy Deployment. - **Don't sidecar every pod** at current scale. Revisit only if per-agent policy differentiation becomes a real requirement. - **Don't skip the default-deny NetworkPolicy** while waiting for the "right" long-term solution. Ship the CIDR version this week; FQDN upgrade is additive. ## AC 5 scoping impact The user story AC 5 originally read: > Cilium L7 policy per agent namespace. Allowlist: api.github.com, api.anthropic.com (or the LiteLLM proxy once Priority 3 lands), registry.npmjs.org (or pinned registry mirror), the rig-memory-mcp service. Everything else denied. This research refines the scope: - **Phase 1** satisfies the *default-deny + essential allowlist* intent without Cilium CRDs — ship this first and claim partial credit. - **Phase 2** adds hostname-based filtering via the GKE-native FQDNNetworkPolicy — closer to the "L7-ish" layer the AC intended. - **Phase 3** adds the true-L7 enforcement (SNI at an egress gateway) when scale justifies the operational cost. Updating the user story AC wording to reflect this phased interpretation is a small follow-up — the original AC isn't achievable as literally specified on GKE Dataplane V2 in 2026. ## Sources - [GKE FQDN NetworkPolicy docs](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/fqdn-network-policies) - [GKE Dataplane V2 concepts](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2) - [Anthropic API IP addresses](https://platform.claude.com/docs/en/api/ip-addresses) - [GitHub meta endpoint](https://api.github.com/meta) - [DoiT — FQDN policies on GKE DPv2](https://www.doit.com/blog/controlling-pod-egress-traffic-with-fqdn-network-policies-on-gke-dataplane-v2/) --- ## https://research.rig.dashecorp.com/research/2026-04-21-otel-startup-programs-storage-economics/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-21-otel-startup-programs-storage-economics.md # OTel observability — startup programs, storage economics, 2026 price-at-scale > Second-pass research that the 2026-04-20 options note missed. Startup credit programs (Grafana Cloud $100k / 12 mo is the standout; Invotek AS qualifies), long-term storage pricing, and cost at 1.5M→15M spans/month. Revises the Priority 2 recommendation: Grafana Cloud for Startups becomes the default production backend candidate; Langfuse stays for the LLM-specific UX layer. > **TL;DR.** The [2026-04-20 OTel options note](/research/2026-04-20-otel-llm-observability-options/) compared 11 backends on deploy footprint, free-tier limits, and LLM UX. It did **not** cover startup credit programs or long-term storage economics. Closing both gaps flips the verdict. **Grafana Cloud for Startups gives $100k of credit over 12 months** — Invotek AS qualifies. Langfuse offers **50% off first year** for early-stage startups, stackable. For our 1.5M–15M spans/month workload the list cost is ~$22–$47/mo on Grafana's Pro tier, so the credit gives effectively unlimited runway. Phoenix-first survives for the dev inner loop; production backend shifts to **Grafana Cloud (traces+logs+metrics) + Langfuse (LLM layer)**, with self-hosted OpenObserve as the explicit fallback if the credit is denied. ## Why this research exists The prior note did the structural evaluation — *which tools ingest OTel natively, which have LLM UX, which fit on an 8 GB VM* — but treated pricing as "free tier vs. paid" without checking (a) startup credit programs that flip headline pricing, (b) long-term retention / cold storage cost, or (c) price at growth scales (1.5M → 15M spans/month). The user's specific question on 2026-04-20 — *"have we done research on OTel storage, price and startup programs?"* — caught that gap. This doc closes it. ## Startup programs — 2026 status | Vendor | Program | Credit / discount | Eligibility | Duration | Status | |---|---|---|---|---|---| | **Grafana Labs** | Grafana Cloud for Startups | **Up to $100k** in Grafana Cloud credits (incl. Enterprise plugins) | <$10M funded, <25 employees (exceptions possible); new or migrating OSS users | 12 mo or until next round | **Active** (launched Sep 2024) | | **Langfuse** | Early-stage startup discount | **50% off first year** on Core / Pro; OSS credits $300/mo; Research up to 100%; Non-profit $199/mo credits | "Early-stage" (~pre-Series-A); email to apply | 1 year | Active | | **Datadog** | Datadog for Startups | $30k–$100k of Pro credit | Series A or earlier, **referral from partner VC/accelerator required** | 1 year or until next round | Active — gated | | **New Relic** | New Relic for Startups | 100 GB/mo free on the full platform (matches public free tier) | Seed–pre-Series B, <10 yrs, <100 employees | Ongoing | Active — no incremental credit | | **AWS Activate** | Founders / Portfolio tracks | **Up to $100k** AWS credit (usable for CloudWatch, Managed Prometheus, X-Ray) | Pre-Series B, <10 yrs; Portfolio tier needs VC/accelerator Org ID | 2 yrs | Active | | **Honeycomb** | No public startup program (page returns 404). Free tier is the path. | — | — | — | **None published** | | **Dash0** | No startup program; 14-day free trial only | — | — | — | None | | **Elastic** | "Elastic for Startups" page returns 404; must inquire directly | — | — | — | Unclear | | **Innovation Norway** | Markedsavklaring / Kommersialisering grants | **NOK 50k–500k** cash (vendor-agnostic) | Norwegian company with innovation project | Project-based | Active; TINC accelerator opens Jan–Feb each year | **Standout**: Grafana Cloud for Startups. $100k over 12 months for an OTel-native unified stack (Tempo traces + Loki logs + Mimir metrics + Grafana + Enterprise plugins). Invotek AS qualifies on the plain criteria. **Second best**: Langfuse 50% discount. Stacks cleanly with Grafana — solves a different layer (LLM-specific prompt diff, eval scoring, datasets). **Skip**: Datadog (referral-gated, high lock-in); Dash0 (no credits, 30-day retention cap); Honeycomb (no startup program). ## Long-term storage economics Retention beyond the free tier is where quiet costs accumulate. For LLM traces specifically, 30 days is enough for "is this working today," 90 days is enough for trend analysis, 1 year is enough for model/prompt regression detection across upgrades. | Vendor | Hot retention | Extended retention | Price model | |---|---|---|---| | **Grafana Cloud Pro (Tempo)** | 30 days | Contact sales | Process $0.05/GB + Write $0.40/GB + **Retain $0.10/GB/mo** | | **Axiom Cloud** | Configurable | Configurable, single tier | **$0.030/GB/mo** (compressed ~95%) — cheapest hot+warm at scale | | **Honeycomb Pro** | 60 days | Enterprise only | Bundled into event pricing | | **Dash0** | 30 days spans/logs; 13 mo metrics | Not published | No cold tier | | **Langfuse Cloud** | Hobby 30 d / Core 90 d / Pro unlimited | Pro is unlimited | Included in per-unit price | | **OpenObserve self-host** | Configurable, S3-backed | **Flat S3 cost** (~$0.023/GB/mo AWS Standard; less on GCS/R2) | Parquet columnar + decoupled compute — storage stays flat as volume grows | | **SigNoz self-host** | ClickHouse, configurable | S3 cold tier supported | Flat; pay GKE compute only | **Flat-at-scale winners**: OpenObserve self-host and Axiom Cloud. Both stay economical at 10×–100× current volume. ## Price at scale Workload assumption: **1.5M spans/mo today** (~5 GB at 3.3 KB avg), **15M spans/mo in 12 months** (~50 GB). Real span sizes for OTel GenAI semconv with prompts attached can be larger; treat as a lower bound. | Vendor | 1.5M spans/mo | 15M spans/mo | Notes | |---|---|---|---| | **Grafana Cloud Pro** | ~$22/mo | ~$47/mo | **$100k startup credit = free for 12 mo** at either scale | | **Grafana Cloud Free** | $0 | $0 | 14-day retention only | | **Langfuse Cloud Core** | ~$145/mo | ~$1,079/mo | LLM-specific UX; unit ≠ span (1 trace ≈ 2–5 units) | | **Langfuse Cloud Pro** | ~$315/mo | ~$1,242/mo | Unlimited retention | | **Honeycomb Free** | $0 | $0 (under 20M event cap) | No startup program; no LLM UX | | **Honeycomb Pro** | $130/mo flat | $130/mo flat | Good rates past 20M | | **Axiom Cloud** | ~$25/mo | ~$25/mo (1 TB free) | Single tier; no LLM UX | | **Dash0** | ~$0.90/mo | ~$9/mo | Cheapest list price; 30-day retention; no credits | | **OpenObserve self-host** | ~$31/mo (GKE + GCS) | ~$31/mo | Flat; requires self-op | ## Data egress and lock-in - **Langfuse** — full SQL/CSV export, no egress fee, documented self-host migration. **Low lock-in.** - **Grafana Cloud** — Tempo traces exportable via API; OTLP in/out. **Low lock-in.** - **Axiom** — APL query export, no egress fee. **Low lock-in.** - **Dash0** — OTLP-native, bulk export via API. **Low lock-in.** - **Honeycomb** — bulk event export is Enterprise-only. **Medium lock-in.** - **Datadog** — historical bulk export is paid Enterprise. **High lock-in** (a known tax). - **OpenObserve / SigNoz self-host** — data is your Parquet / ClickHouse in your bucket. **Zero lock-in.** ## Revised recommendation ### What changes versus 2026-04-20 1. **Grafana Cloud for Startups shifts the production-backend calculus.** At $22–$47/mo list and $100k over 12 months of credit, it is effectively free for well past our 12-month growth horizon and covers the full LGTM stack in one vendor. 2. **Langfuse's 50% discount is stackable** and worth applying for whether or not Grafana is picked — it halves the LLM-specific UX cost. 3. **Phoenix-first for the dev inner loop still holds.** Local latency matters during development; a hosted cloud adds a network hop. But Phoenix is no longer the default *production* backend — Grafana (with Tempo) is. ### Action plan 1. **Apply now** (user action, ~15 min each): - Grafana Cloud for Startups — `https://grafana.com/startup-program/` - Langfuse early-stage startup discount — email hello@langfuse.com with the 50%-off ask - Both approvals take 1–2 weeks 2. **While credits are pending**: - Keep **Phoenix local** for dev-loop debugging (unchanged) - Begin **dual-export via the existing OTel Collector**: GenAI spans → Langfuse Cloud (Hobby free until approval); full OTel spans → Grafana Cloud Free (50 GB / 14-day) 3. **When Grafana credit lands**: - Promote Grafana Cloud to primary backend for the rig - Keep Langfuse for LLM-specific UX (prompt mgmt, eval scoring, datasets) - Total cost: **$0 for 12 months** 4. **If Grafana credit is denied**: - Fall back to **OpenObserve self-hosted on GKE** with GCS backing (flat ~$30/mo) - Keep Langfuse Cloud Core for the LLM layer (~$145/mo at current volume) - Expected all-in: ~$175/mo 5. **Skip regardless**: Datadog (referral-gated, high lock-in), Dash0 (no credits, 30-day retention), Honeycomb (no startup program, weak LLM ergonomics) ## Supersession This research **supersedes** [research/2026-04-20-otel-llm-observability-options](/research/2026-04-20-otel-llm-observability-options/) on the production-backend question. The structural evaluation in that doc (candidates, deploy footprints, LLM UX) remains valid; the pricing/recommendation section is updated here. The whats-next whitepaper Priority 2 section should be revised: Phoenix remains as the dev-loop store, but the production backend recommendation becomes *"Grafana Cloud (via startup credit) + Langfuse (discounted)"* with self-hosted OpenObserve as the documented fallback. --- ## https://research.rig.dashecorp.com/research/2026-04-22-cross-repo-access/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-22-cross-repo-access.md # Cross-repo Actions access — patterns, tradeoffs, and the OpenTofu-managed answer > Comparative study of GitHub options for giving one workflow read access to private repos in other repos/orgs: Apps + org secrets, PATs, deploy keys, artifacts, workflow_run, repository_dispatch, reusable workflows, internal visibility. Concludes with a single recommended pattern that is set-up-once, cross-org, Tofu-managed, and inherited by new repos without per-repo config. ## Problem `dashecorp/dashe-docs` aggregates `docs/` trees from ~7 other private repos (some in `dashecorp`, some in `Stig-Johnny` until cutover). The default `GITHUB_TOKEN` is scoped to the workflow's own repo, so any cross-repo clone 404s. We want: 1. Setup once, new repos inherit without per-repo wiring 2. Cross-org support (dashecorp ↔ Stig-Johnny during migration) 3. Everything declared in OpenTofu — no dashboard clickery 4. Short-lived tokens, no PAT rotation toil This paper audits the nine options we considered and picks a single answer. ## Comparison | # | Option | Cross-org | New-repo inherit | Tofu-managed | Per-new-repo cost | Verdict | |---|---|---|---|---|---|---| | 1 | GitHub App + org secret + `actions/create-github-app-token@v1` | Yes | Yes (install on "all repos") | Mostly | ~0 | **Primary** | | 2 | Fine-grained PAT | Yes | Partial (re-scope PAT) | No (no API) | Edit PAT scope | Reject | | 3 | Deploy keys per repo-pair | No | No | Yes | 1 keypair/repo | Reject | | 4 | `actions/download-artifact@v4` cross-repo | Yes (needs token) | Inherits option 1 | — | ~0 | Supplementary | | 5 | `workflow_run` trigger | **No — same-repo only** | — | — | — | Not viable | | 6 | `repository_dispatch` | Yes | Inherits option 1 | Yes | ~0 | Notification only | | 7 | Reusable workflows | Same-org or public | Yes | Yes | 1 caller stub | **Combine with 1** | | 8 | "Let `GITHUB_TOKEN` reach other repos" setting | **Doesn't exist for private** | — | — | — | Not real | | 9 | Internal visibility | GHE only, and policy forbids | — | — | — | Blocked | ## Per-option notes ### 1. GitHub App with organization secret Install one App (`DasheDocs Reader` or reuse `review-e-bot` which is already installed on `dashecorp`, `Stig-Johnny`, `cuti-e`, `tablez-dev` per our agent-runner setup) with `Contents: read` + `Metadata: read`. Set the installation target to **"All repositories"** so every new private repo in the org is covered automatically. Store `APP_ID` and the PEM private key as **organization-level Actions secrets**. Every workflow that needs cross-repo read writes: ```yaml - uses: actions/create-github-app-token@v1 id: token with: app-id: ${{ vars.AGGREGATOR_APP_ID }} private-key: ${{ secrets.AGGREGATOR_APP_PRIVATE_KEY }} owner: ${{ github.repository_owner }} # or a literal for cross-org ``` The action mints a 1-hour installation token scoped to `owner` + optional `repositories:` list. Cross-org works as long as the same App is installed in the target org — which `review-e-bot` already is. **OpenTofu coverage:** - `github_actions_organization_secret` / `_variable` — fully supported. - `github_app_installation_repositories` ([resource docs](https://registry.terraform.io/providers/integrations/github/latest/docs/resources/app_installation_repositories)) — manages which repos an installation covers *within one org*. Use `selected_repositories` to restrict, or configure "All repositories" in the UI (one-time manual step per org; App installation itself is not a Terraform resource). - App **creation** is not API-creatable — one-time manifest flow per App. Acceptable given we only need one. **Rotation:** PEM key rotated in one place (org secret). Tokens are auto-short-lived. ### 2. Fine-grained PAT as org secret Fine-grained PATs can't be created, listed, or rotated via API — [explicitly declined on the roadmap](https://github.com/github/roadmap/issues/818). Every scope change is a UI click. Cross-org requires two PATs (one per org). Rotation cost compounds with each repo added. Rejected. ### 3. Deploy keys `github_repository_deploy_key` terraforms cleanly, but scales O(source repos). Every new repo in scope means a new keypair + a new repo-level secret on the aggregator. Read-only at repo granularity, no org-level equivalent. Rejected for "set once, inherit" goal. ### 4. `actions/download-artifact@v4` cross-repo `v4` supports `repository:`, `run-id:`, `github-token:` — can pull artifacts from another repo's run. But: **artifacts expire after 90 days by default**, so if a source repo hasn't built recently the aggregator has to fall back to `git archive` via token anyway. Strictly worse than just cloning via App token. Keep in the toolbox for cases where we need a specific build output, not for docs aggregation. ### 5. `workflow_run` trigger [`workflow_run`](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run) is **same-repository only**. Cannot fire `dashe-docs` from another repo's workflow completion. Not viable. ### 6. `repository_dispatch` Max `client_payload` is **65,535 bytes** per [REST docs](https://docs.github.com/en/rest/repos/repos#create-a-repository-dispatch-event). Enough for a "docs changed, rebuild" ping but nowhere near enough to carry a docs tree. Already our notification mechanism (`docs-update` event type); aggregator then pulls via App token. Keep as-is. ### 7. Reusable workflows `uses: dashecorp/dashe-docs/.github/workflows/docs-export.yml@main` works across same-org private repos when the callee's Actions Settings → Access allows "Accessible from repositories in the organization". Cross-org reusable workflows require the callee to be **public or internal** — blocked by our "all repos private" rule, so Stig-Johnny repos can't consume a dashecorp reusable workflow directly. Useful pattern: thin per-repo "export docs" workflow in each dashecorp app repo *calling* a reusable workflow at `dashecorp/rig-tools/.github/workflows/notify-docs-aggregator.yml`. Stig-Johnny repos get a copy of the caller (still small — 3-5 lines) managed via `github_repository_file`. ### 8. Org-level "let `GITHUB_TOKEN` reach other repos" Doesn't exist for private repos. The setting people confuse it with is either **"Allow GitHub Actions to create and approve pull requests"** (different thing) or the reusable-workflow access scope (option 7). `GITHUB_TOKEN` is always scoped to the running repo. No 2025–2026 change. ### 9. Internal repositories Internal visibility exists only on **GitHub Enterprise Cloud / Enterprise Server**, and our policy is "all repos private". Even ignoring policy, GHE-only means it doesn't apply to our current plan. Skip. ## Recommendation **Option 1 (org App) + Option 6 (`repository_dispatch` trigger) + Option 7 (reusable workflow for the export stub).** Concretely: 1. Reuse `review-e-bot` (already installed on `dashecorp` and `Stig-Johnny` with "All repositories"). Grant it `Contents: read` + `Metadata: read`. 2. Declare two organization secrets/variables in `dashecorp/terraform` (module: `cloudflare/dashecorp/.github` or new `github/dashecorp`): ```hcl resource "github_actions_organization_variable" "aggregator_app_id" { variable_name = "AGGREGATOR_APP_ID" visibility = "all" value = "" } resource "github_actions_organization_secret" "aggregator_app_private_key" { secret_name = "AGGREGATOR_APP_PRIVATE_KEY" visibility = "all" plaintext_value = file("./secrets/review-e-bot.pem") # SOPS-encrypted } ``` 3. `dashe-docs` build workflow mints the installation token and `git clone`s each source repo declared in `facts/repos.yaml`. 4. Mirror the same two org secrets on `Stig-Johnny` (same PEM — same App installation, different org install target) so workflows there can also produce docs artifacts if ever needed. 5. Every new dashecorp or Stig-Johnny repo inherits read access for free — the App is already installed org-wide. No per-repo Terraform edit. Per-new-repo cost: **zero**. ## Rejected alternatives — one-line each - **PAT** — no API, manual rotation. - **Deploy keys** — O(repo) setup. - **Artifacts as transport** — expiry makes it unreliable vs. just cloning. - **`workflow_run` cross-repo** — doesn't exist. - **Org `GITHUB_TOKEN` bypass** — doesn't exist for private repos. - **Internal visibility** — policy forbids + GHE-only. ## Follow-up work - Terraform PR in `dashecorp/infra` (or wherever the GitHub provider blocks live) to declare the two org secrets; wire SOPS for the PEM. - Thin reusable workflow at `dashecorp/rig-tools/.github/workflows/mint-aggregator-token.yml` that abstracts the `actions/create-github-app-token@v1` call so callers just do `uses: dashecorp/rig-tools/.github/workflows/mint-aggregator-token.yml@main`. - Update `dashe-docs/.github/workflows/deploy.yml` to use the minted token instead of the current `STIG_JOHNNY_READ_TOKEN` placeholder secret. - Document the one-time manual step: install the App on any new org we add later (e.g. if we create a `tablez-apps` org). --- ## https://research.rig.dashecorp.com/research/2026-04-22-egress-policy-pitfall-cloudflare-fronted-apis/ ## https://raw.githubusercontent.com/dashecorp/rig-docs/main/src/content/docs/research/2026-04-22-egress-policy-pitfall-cloudflare-fronted-apis.md # Pitfall: ipBlock NetworkPolicy cannot allowlist Cloudflare-fronted LLM APIs > AC 5 Phase 1 shipped a default-deny egress NetworkPolicy with ipBlock allowing Anthropic's published CIDR (160.79.104.0/21). It was reverted the same day: api.anthropic.com resolves to Cloudflare anycast (162.159.x.x), not to the origin CIDR, so the policy silently blocked all agent-to-Anthropic traffic. Future approach: route Anthropic calls through an in-cluster LiteLLM proxy pod — the only egress target the NetworkPolicy needs to allow. ## Retrospective: why Phase 1 was reverted AC 5 Phase 1 (shipped and reverted 2026-04-22 in [rig-gitops#143](https://github.com/dashecorp/rig-gitops/pull/143) and [rig-gitops#144](https://github.com/dashecorp/rig-gitops/pull/144)) attempted to allowlist `api.anthropic.com` by adding Anthropic's published origin CIDR `160.79.104.0/21` to a default-deny Kubernetes `NetworkPolicy`. The CIDR is real and Anthropic does publish it, but it is **origin-level** — traffic from client pods hits Cloudflare's anycast edge first (`162.159.x.x`), not the origin directly. A client-side `ipBlock` allowlist sees Cloudflare IPs, not `160.79.104.0/21`, so the policy blocked all outbound calls to the Anthropic API while appearing syntactically valid. The lesson: **any hostname fronted by Cloudflare (or another CDN) cannot be reliably allowlisted via `ipBlock` in a Kubernetes `NetworkPolicy`**, because the resolved IPs are CDN edge nodes that rotate and are shared across millions of tenants. The correct long-term fix is to deploy an **in-cluster LiteLLM proxy pod** as the sole egress target for Anthropic traffic; the `NetworkPolicy` then only needs to allow agent pods → LiteLLM (a stable ClusterIP), and LiteLLM itself runs outside the restricted namespace or is given its own narrow allowlist. ## Summary table | Fact | Detail | |---|---| | Anthropic published CIDR | `160.79.104.0/21` (origin, not client-reachable) | | Actual resolved IPs | Cloudflare anycast `162.159.x.x` (rotate, shared) | | ipBlock result | Silent block — policy looks valid, traffic fails | | Root cause | Cloudflare terminates TLS at the edge; origin CIDR never appears in client routing | | Correct Phase 1 fix | In-cluster LiteLLM proxy pod; allow agents → LiteLLM ClusterIP only | | Reverts | [rig-gitops#143](https://github.com/dashecorp/rig-gitops/pull/143), [rig-gitops#144](https://github.com/dashecorp/rig-gitops/pull/144) | ---