# Dashecorp Rig — Brain

> Fresh-agent entry point. Read this first. One fetch (~27 KB) gives you the
> repo manifest, deployed surfaces (including rig-conductor's 13 endpoints and
> built-in Dashboard), agent instances, primary flows, frontmatter schema,
> 40+ event types (summary; full schemas at [/events.md](./events.md)),
> 18-whitepaper catalog, and the current backlog with prior_art links. Every
> claim traces to its source file in `facts/`.
>
> Compiled from `facts/*.yaml` + live GitHub state (`gh api /orgs/dashecorp/repos`
> for the repo list; manifest validation for agents). Do not hand-edit BRAIN.md.
> Regenerate with `npm run brain`. CI runs `--check` and fails on drift.

## What this is

The Dashecorp rig is an autonomous coding-agent system. A human posts a user
story; agents research, propose, code, review, and ship. Canonical docs live
in `dashecorp/rig-docs` (Astro Starlight); operational memory lives in a
Postgres + pgvector Memory MCP; deployments are Flux-managed in a dashecorp
GKE cluster.

## Published surfaces

### Canonical brain entry point (this file, rendered)
- **URL:** https://rig-research.pages.dev/brain/
- **Raw:** https://rig-research.pages.dev/BRAIN.md
- **Type:** markdown
- **Note:** The raw URL serves the same bytes as the repo-root BRAIN.md but is publicly accessible (the repo is private, so raw.githubusercontent.com requires an auth token; the Cloudflare Pages URL does not).

### LLM site map (research, proposals, user-stories)
- **URL:** https://rig-research.pages.dev/llms.txt
- **Type:** llms-txt

### Full content dump (single-shot ingestion)
- **URL:** https://rig-research.pages.dev/llms-full.txt
- **Type:** llms-full-txt

### Research, proposals, user-stories (rendered Starlight site)
- **URL:** https://rig-research.pages.dev
- **Type:** astro-starlight
- **Source:** dashecorp/rig-docs

### Aggregated engineering docs (architecture, guides, whitepapers, per-repo docs)
- **URL:** https://rig-docs.pages.dev
- **Type:** mkdocs-material
- **Source:** dashecorp/rig-gitops (docs-site/)
- **Note:** Built by scripts/build-docs.sh which copies from rig-gitops/docs/ and each rig repo's docs/. Different scope from rig-research.pages.dev.

### Sitemap (XML)
- **URL:** https://rig-research.pages.dev/sitemap-index.xml
- **Type:** sitemap-xml

### rig-conductor API (cluster-internal)
- **URL:** http://rig-conductor-api.rig-conductor.svc.cluster.local:8080
- **Type:** rest-api
- **Visibility:** cluster-internal-only
- **Endpoints:**
  - `POST /api/events` — Submit any of the 40+ event types — see /events.md
  - `GET /api/assignments/next` — Claim next issue assignment. Query: agentId=dev-e-node
  - `GET /api/reviews/next` — Claim next PR review assignment. Query: agentId=review-e
  - `GET /api/pr-reviews/next` — Claim direct-PR review (no issue) for infra/tooling PRs
  - `GET /api/issues` — List tracked issues. Query: state=open|done|stuck
  - `GET /api/queue` — Current dispatch queue state
  - `GET /api/usage` — Token / cost usage by agent and/or repo. Query: agentId, repo
  - `GET /api/costs/issue` — Cost for a specific issue. Query: repo, issueNumber
  - `GET /api/costs/summary` — Aggregate cost. Query: days (default 7)
  - `GET /api/costs/daily` — Daily cost time series. Query: days
  - `GET /api/events/live` — SSE stream of live events (for Dashboard.html)
  - `GET /api/streams/status` — Stream consumer status
  - `POST /api/webhook/github` — GitHub webhook intake — normalizes GH events into rig-conductor stream

### rig-conductor Dashboard (the built-in cost/activity UI)
- **URL:** http://rig-conductor-api.rig-conductor.svc.cluster.local:8080/dashboard
- **Type:** html-dashboard
- **Source:** dashecorp/rig-conductor (src/ConductorE.Api/Dashboard.html)
- **Visibility:** cluster-internal-only
- **Note:** 42 KB single-page HTML dashboard — "Engineering Rig — Control Plane". Has Costs, Issues, Agents, Streams tabs. Driven by /api/costs/*, /api/usage, /api/issues, /api/streams/* endpoints. No separate Grafana/Starlight dashboard is needed — this one already renders per-agent / per-issue / per-day cost.

### Memory MCP (Postgres + pgvector)
- **Type:** mcp-server
- **Package:** @dashecorp/rig-memory-mcp
- **Tools:**
  - `read_memories` — Query prior memory by topic/repo/scope with vector similarity
  - `write_memory` — Persist a new memory with scope/kind/importance/tags
  - `mark_used` — Increment hit_count on a memory that informed a decision

### Discord agent channels (notifications)
- **Type:** discord
- **Channels:** #dev-e, #review-e, #ibuild-e, #admin
- **Note:** Agents post thread updates here; humans watch for stuck / pending state.

## Repos

Live from `gh api /orgs/dashecorp/repos` merged with `facts/repos.yaml` annotations. Archived repos are dropped automatically.

| Repo | Purpose | Language | Depends on | AGENTS.md |
|---|---|---|---|---|
| [`rig-gitops`](https://github.com/dashecorp/rig-gitops) | GitOps manifests (Flux HelmReleases, Kustomize bases) and the canonical AGENTS.md shared by every rig repo via `@dashecorp/rig-gitops/AGENTS | shell | — | compiled |
| [`rig-agent-runtime`](https://github.com/dashecorp/rig-agent-runtime) | The AI agent runtime (Node) — one image that deploys as Dev-E, Review-E, or iBuild-E depending on character file + environment. Handles prom | javascript | rig-memory-mcp, rig-conductor | imports-rig-gitops |
| [`rig-memory-mcp`](https://github.com/dashecorp/rig-memory-mcp) | MCP server backing persistent agent memory with Postgres + pgvector. Exposes `read_memories` / `write_memory` / `mark_used` tools consumed b | javascript | postgres-pgvector | claude-md |
| [`rig-conductor`](https://github.com/dashecorp/rig-conductor) | Event store + dispatch service (C# + Marten + Postgres). Receives PR/issue events, assigns work, tracks turns/cost/stuck state, serves the ` | csharp | postgres, pgvector | imports-rig-gitops |
| [`rig-docs`](https://github.com/dashecorp/rig-docs) | Research, proposals, user-stories, and rig-wide reference (Astro Starlight). This repo — you're reading its BRAIN.md. Deploys to rig-researc | astro | — | hand |
| [`rig-tools`](https://github.com/dashecorp/rig-tools) | Shell scripts, Git hooks, and workflow sync for AI-assisted development. Developer tooling, not deployed. The one repo without an AGENTS.md  | shell | — | none |
| [`infra`](https://github.com/dashecorp/infra) | OpenTofu/Terraform for GitHub org settings, Cloudflare (DNS, Pages, tunnels), GCP (GKE cluster hosting the rig), and Tailscale ACL/DNS. Plan | hcl | — | imports-rig-gitops |

### Per-repo doc index (token-efficient discovery)

Before cloning a repo to find docs, consult this list to decide which docs are relevant to your issue. Then fetch raw markdown for **only** the relevant ones:

```
gh api repos/dashecorp/<repo>/contents/docs/<file>.md --header 'Accept: application/vnd.github.raw'
```

Auto-derived per compile via `gh api /repos/<r>/contents/docs`. Repos without a `docs/` dir are omitted.

- **`rig-gitops`** — architecture-current.md, architecture-proposed-v2.md, architecture-proposed.md, documentation-standard.md, onboarding.md, research-multi-agent-platforms.md, review-e-bootstrap.md, sops.md
- **`rig-agent-runtime`** — architecture.md, configuration.md, dashboard.md, deployment.md, discord-setup.md, heartbeat.md, index.md, memory.md, messaging.md, observability.md, quickstart.md, usage-tracking.md
- **`rig-memory-mcp`** — api.md
- **`rig-conductor`** — api.md, architecture.md, deployment.md, event-store.md, index.md, principles.md
- **`rig-tools`** — agent-workflow.md

## Agents (deployment instances)

### Dev-E — writes code
- **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime)
- **Deployed in:** GKE cluster (dashecorp)
- **Manifest:** `dashecorp/rig-gitops/apps/dev-e/`
- **Variants:**
  - node: `apps/dev-e/rig-agent-helmrelease.yaml`
  - python: `apps/dev-e/python-helmrelease.yaml`
  - dotnet: `apps/dev-e/dotnet-helmrelease.yaml`
- **Character:** baked into HelmRelease values
- **Triggers:** rig-conductor dispatch (issue.assigned events)

### Review-E — reviews PRs
- **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime)
- **Deployed in:** GKE cluster (dashecorp)
- **Manifest:** `dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml`
- **Cron:** `*/5 * * * *`
- **Search filter:** `org:dashecorp is:pr is:open author:app/dev-e-bot author:app/ibuild-e-bot -reviewed-by:app/review-e-bot`
- **Discord:** #review-e

### iBuild-E — macOS / iOS builds
- **Runtime:** [dashecorp/rig-agent-runtime](https://github.com/dashecorp/rig-agent-runtime)
- **Deployed in:** Mac Mini (Oslo, Tailscale 100.92.170.124)
- **Manifest:** `not-in-cluster`
- **Discord:** #ibuild-e
- **Notes:** Apple Silicon host, Xcode + App Store Connect. Auto-reauth cron refreshes OAuth every 5 min. Separate from the GKE-hosted agents because iOS builds require macOS.

## Primary flows

### Epic to merged work
**Trigger:** Human opens a user-story GitHub issue in dashecorp/rig-docs

1. **rig-conductor** — Scans open issues, classifies, dispatches to appropriate agent
2. **Dev-E** — Reads issue + relevant research; authors research / proposal / code PR
3. **Review-E (cron every 5 min)** — Finds PR, reviews against AGENTS.md + memory, requests changes or approves
4. **Human** — Merges (or Review-E's approval satisfies branch protection; auto-merge fires)
5. **Cloudflare Pages** — Redeploys rig-research.pages.dev and rig-docs.pages.dev
**Complete when:** issue closed via `Closes

### Research and proposal authoring
**Trigger:** An Epic needs investigation before implementation

1. author dated research/YYYY-MM-DD-slug.md with user_story frontmatter
2. author proposals/YYYY-MM-DD-slug.md with source_research frontmatter
3. user_story file gets research_docs and proposal fields pointing back
4. RelatedDocs component auto-renders the graph; no manual cross-linking

**Rules:**
- bidirectional links required
- schema enforced in src/content.config.ts
- CI rejects PRs missing required fields

### Cold-start agent session
**Trigger:** Fresh agent with blank memory receives an Epic or task

1. WebFetch https://rig-research.pages.dev/brain/ (or raw BRAIN.md)
2. Parse facts/repos.yaml equivalent in BRAIN.md — learn repo manifest
3. Parse facts/surfaces.yaml equivalent — learn URLs and endpoints
4. WebFetch https://rig-research.pages.dev/llms.txt for topic index
5. WebFetch relevant research/proposal docs directly via raw URL
6. For the target repo, fetch its AGENTS.md (compiled or imports-rig-gitops)
7. read_memories scoped to repo + topic via Memory MCP
8. Begin work with full context in ~15 KB total
**Token budget:** ~15 KB read, leaves 200K+ for actual work on Opus

### Docs-memory promotion (weekly Lint)
**Trigger:** Weekly scheduled Lint job

1. Scan Memory MCP for rows with importance >= 4 AND hit_count >= 5
2. For each candidate, check if docs already cover the topic (BM25 sim)
3. If not covered, propose a docs PR with the memory content promoted
4. Human approves PR, merge triggers redeploy
**Status:** not-yet-built (design in research/2026-04-18-docs-memory-drift-lint)

### Diagram-as-code authoring
**Trigger:** A research / proposal / user-story needs a diagram
**Rule:** Mermaid source inline in fenced code block. No PNG or SVG ever committed.
**Rendering:** remark-mermaid plugin wraps in `<figure>` with `<pre class=mermaid>` and `<details>` source; mermaid.js renders client-side; source preserved post-render for agent readers.

## Frontmatter schema (for authoring rig-docs content)

- **type** (optional): one of `research` | `proposal` | `decision` | `reference` | `user-story` | `runbook`
- **audience** (optional): one of `human` | `agent` | `both` — not a free-form array
- **Required:** `title`, `description`
- **Optional linkage fields** (paths are relative to src/content/docs/, no leading slash, no .md or .mdx extension):
  - `type` — See type enum above.
  - `audience` — See audience enum above.
  - `created` — ISO date string YYYY-MM-DD.
  - `updated` — ISO date string YYYY-MM-DD.
  - `topic` — Short slug grouping related docs.
  - `source_refs` — Array of URLs (external sources supporting this doc).
  - `supersedes` — Path to doc this replaces (no leading slash, no .md extension).
  - `superseded_by` — Path to newer doc that replaces this (same format).
  - `user_story` — (research/proposal only) Path to the user story this supports.
  - `research_docs` — (user-story only) Array of research doc paths this story spawned.
  - `proposal` — (user-story only) Path to the proposal answering this story.
  - `source_research` — (proposal only) Array of research paths this proposal synthesises.
  - `github_issue` — (user-story only) Full GitHub issue URL. Omit the field entirely if there is no issue — do NOT use empty string.

Path examples: `user-stories/2026-04-18-docs-memory-strategy`, `research/2026-04-18-docs-tools-evaluation`, `proposals/2026-04-18-docs-tooling-decision`.

Omit a field entirely when it has no value — do **not** use empty string.

## Whitepapers (private — catalog only)

These whitepapers live at `dashecorp/rig-gitops/docs/whitepaper/*.md` (private repo — requires `gh` auth to fetch). BRAIN.md surfaces their titles + 1-line summaries so agents know what exists. Full content must be fetched with: `gh api /repos/dashecorp/rig-gitops/contents/docs/whitepaper/<file> --jq .download_url | xargs curl -sL`.

- **[Whitepaper index](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/index.md)** (`index.md`) — Entry point listing all whitepaper sections and their companion docs.
- **[MVP scope](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/mvp-scope.md)** (`mvp-scope.md`) — What the rig does in the minimum viable release. Gatekeeper for "is this in scope?"
- **[Design principles](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/principles.md)** (`principles.md`) — First principles (measurement precedes trust; honest gaps; provider portability).
- **[Trust model](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/trust-model.md)** (`trust-model.md`) — Who can approve what, which gates exist, human-in-the-loop rules.
- **[Safety](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/safety.md)** (`safety.md`) — Dangerous-command guards, sandboxing, blast-radius containment.
- **[Security](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/security.md)** (`security.md`) — Secrets handling, attestation, audit trail, SOPS+age.
- **[Provider portability](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/provider-portability.md)** (`provider-portability.md`) — Multi-runtime (Claude Code, Codex CLI, Gemini CLI) via OTel GenAI conventions. Swap runtime without changing backend.
- **[Observability — OTel, Langfuse, Prometheus, SLOs](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/observability.md)** (`observability.md`) — Self-hosted Langfuse (agent traces) + Grafana Cloud (infra) + local Prometheus (SLO gates) hybrid. Native OTel via `CLAUDE_CODE_ENABLE_TELEMETRY=1`. OTel Collector runs per-cluster, routes LLM traces to Langfuse, infra to managed. Per implementation-status: OTel Collector "Partial" (deployed for rig-conductor, agents not yet emitting), Langfuse "Planned", cost dashboard "Partial" (TokenUsageProjection exists, no LiteLLM proxy yet).
- **[Cost framework](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/cost-framework.md)** (`cost-framework.md`) — Budget policy, per-model rate tables, cost attribution strategy. Companion to observability.
- **[Self-healing](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/self-healing.md)** (`self-healing.md`) — Automatic recovery loops, StaleHeartbeatService, escalation severity routing.
- **[Memory architecture](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/memory.md)** (`memory.md`) — Memory MCP scope, importance/hit_count model, promotion-to-docs threshold design.
- **[Quality and evaluation](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/quality-and-evaluation.md)** (`quality-and-evaluation.md`) — How the rig evaluates its own output. Judge-agent pattern, fixed rubrics.
- **[Drift detection](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/drift-detection.md)** (`drift-detection.md`) — Schema drift, docs drift, infra drift — detection thresholds and response.
- **[Development process](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/development-process.md)** (`development-process.md`) — Issue → Epic → research → proposal → PR lifecycle, agent-human gates.
- **[Example first story](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/example-first-story.md)** (`example-first-story.md`) — Worked walkthrough of one Epic end-to-end.
- **[Glossary](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/glossary.md)** (`glossary.md`) — Rig-specific terminology (Epic, proposal, rig-conductor, Review-E, etc).
- **[Known limitations](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/limitations.md)** (`limitations.md`) — Honest catalog of what the rig can't do today.
- **[Implementation status](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/implementation-status.md)** (`implementation-status.md`) — Single source of truth for deployed vs planned per capability. 78 tracked across 11 domains; 21 deployed/partial (27%), 44 planned/deferred (56%). Every capability named in the whitepapers gets a row with status + whitepaper section + ticket/evidence.
- **[Tool choices (ADRs)](https://github.com/dashecorp/rig-gitops/blob/main/docs/whitepaper/tool-choices.md)** (`tool-choices.md`) — Decision records for tooling. Includes rejection list with rationale.

**Most agents should start with:** `implementation-status.md` (what's deployed vs planned — 78 tracked capabilities) and whichever domain-specific whitepaper matches the Epic.

## rig-conductor event types (POST /api/events)

All events from `dashecorp/rig-conductor/src/ConductorE.Core/UseCases/SubmitEvent.cs` MapToEvent switch. Names only here — fetch **[/events.md](https://rig-research.pages.dev/events.md)** for full field schemas (no auth required).

**Pipeline (issue → PR → merge → deploy):** `ISSUE_APPROVED`, `ISSUE_ASSIGNED`, `ISSUE_UNASSIGNED`, `WORK_STARTED`, `BRANCH_CREATED`, `PR_CREATED`, `CI_PASSED`, `CI_FAILED`, `REVIEW_ASSIGNED`, `REVIEW_PASSED`, `REVIEW_DISPUTED`, `HUMAN_GATE_TRIGGERED`, `HUMAN_GATE_REMINDER`, `MERGED`, `MERGE_GATE_WAITING`, `MERGE_GATE_MERGED`, `MERGE_GATE_TIMEOUT`, `MAIN_CI_STARTED`, `MAIN_CI_PASSED`, `MAIN_CI_FAILED`, `DEPLOYED_STAGING`, `DEPLOYED_PRODUCTION`, `SMOKE_PASSED`, `SMOKE_FAILED`, `BUILD_FAILED`, `VERIFIED`, `ISSUE_DONE`, `ESCALATED`, `MILESTONE_COMPLETE`, `DUPLICATE_PR_CLOSED`

**Direct PR path (no issue):** `PR_OPENED`, `PR_REVIEW_ASSIGNED`, `PR_REVIEW_APPROVED`, `PR_REVIEW_REJECTED`

**Agent lifecycle:** `AGENT_STARTED`, `HEARTBEAT`, `AGENT_STUCK`

**CLI sessions:** `CLI_STARTED`, `CLI_PROGRESS`, `CLI_COMPLETED`

**Observability (cost + tooling):** `TOKEN_USAGE`, `TOOL_USED`

**Memory MCP:** `MEMORY_WRITE`, `MEMORY_READ`, `MEMORY_HIT_USED`

## Known gaps (rig backlog)

Cold-start agents should see these so they don't re-discover what's already identified. Each gap links to `prior_art` — existing stubs, research, or PRs that have already touched it. When a gap is being worked, `linked_user_story` points to the user story; when closed, the entry is removed from `facts/backlog.yaml`.

### [observability] Cost tracking mostly deployed — LiteLLM proxy + external access are the remaining gaps

DO NOT propose "build a cost pipeline" — most of it is already shipped:

  1. Data pipeline: TokenUsageProjection + CostProjection in rig-conductor
     consume TOKEN_USAGE + CLI_COMPLETED events. Read models live on
     Marten/Postgres.
  2. API: GET /api/usage, /api/costs/issue, /api/costs/summary,
     /api/costs/daily on the rig-conductor cluster-internal URL (see
     BRAIN.md Published surfaces). Query by agent, repo, date range.
  3. Dashboard: src/ConductorE.Api/Dashboard.html (~42 KB SPA,
     "Engineering Rig — Control Plane"). Served at / and /dashboard.
     Has a Costs tab driven by the /api/costs/* endpoints.

The remaining gaps:
  a. LiteLLM proxy — not deployed. Blocks hard budget enforcement
     (agent ceiling kill-switch).
  b. External access — /dashboard is cluster-internal. A human on
     laptop can't view it without kubectl port-forward or a
     Cloudflare tunnel. Consider publishing a read-only projection.
  c. Alerting — no Discord webhook on cost threshold breach yet.

Rough current spend: ~$5-15/day fleet-wide (order-of-magnitude only).

**Prior art:**
- rig-conductor cost endpoints and Dashboard.html — dashecorp/rig-conductor src/ConductorE.Api/
- TokenUsageProjection + CostProjection source: dashecorp/rig-conductor src/ConductorE.Api/Adapters/MartenProjections.cs
- TOKEN_USAGE + CLI_COMPLETED events defined and emitted — see /events.md
- Cost framework design: rig-gitops/docs/whitepaper/cost-framework.md (private)
- Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml)
- LiteLLM proxy not yet deployed — blocks hard budget enforcement

**Status:** mostly-deployed

### [observability] OTel collector deployed for rig-conductor only — agents not yet emitting

OpenTelemetry Collector is "Partial": deployed for rig-conductor; agent
pods (Dev-E, Review-E, iBuild-E) have not yet enabled native OTel via
`CLAUDE_CODE_ENABLE_TELEMETRY=1`. Langfuse (self-hosted) and Grafana
Cloud ingest are both "Planned". Full design in the observability
whitepaper.

**Prior art:**
- Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml)
- Implementation status: whitepaper/implementation-status.md marks OTel Collector 'Partial', Langfuse 'Planned'
- rig-memory-mcp/events.js FUTURE comment: migrate to OTel GenAI spans
- Env var to enable native OTel: CLAUDE_CODE_ENABLE_TELEMETRY=1 + OTEL_EXPORTER_OTLP_ENDPOINT pointed at the in-cluster collector

**Status:** partial

### [docs-memory] Docs-memory drift lint not implemented

Weekly LLM-as-judge pass that promotes memory→docs (when importance≥4
AND hit_count≥5), flags stale research, catches orphan docs. Designed
but no runtime built.

**Prior art:**
- Full design in research/2026-04-18-docs-memory-drift-lint
- Parent user story: user-stories/2026-04-18-docs-memory-strategy
- Principles synthesis: research/2026-04-18-docs-vs-memory-principles

**Linked user story:** `user-stories/2026-04-18-docs-memory-strategy`

**Status:** open

### [docs-surfaces] Two docs surfaces with overlapping scope

rig-docs.pages.dev (MkDocs aggregation from rig-gitops/docs-site/) and
rig-research.pages.dev (Starlight research hub from dashecorp/rig-docs).
Both host rig docs; boundaries not formalised. Agents currently learn
this empirically. Eventually unify or formalise the split.

**Prior art:**
- MkDocs site built by dashecorp/rig-gitops/scripts/build-docs.sh
- Starlight site defined in dashecorp/rig-docs/ (this repo)
- Docs tooling decision: proposals/2026-04-18-docs-tooling-decision (picked Starlight for research hub; MkDocs kept for aggregation)

**Status:** open

### [agents] Review-E does not scan human-authored PRs

Review-E's cron filter is `author:app/dev-e-bot author:app/ibuild-e-bot`.
PRs authored by humans (including operator PRs to rig repos) are
invisible to her. Design decision pending — widen filter or keep
separation-of-concerns (human PRs = human review).

**Prior art:**
- HelmRelease: dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml (cron prompt line: `author:app/dev-e-bot author:app/ibuild-e-bot`)

**Status:** open

### [deployment] CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID not in rig-docs repo secrets

The deploy workflow gracefully skips deploy when secrets absent (notice
only). Current deploys happen via direct `wrangler pages deploy` from
the operator's laptop. Adding the secrets would enable per-PR preview
deploys and automatic main-branch publishing.

**Prior art:**
- .github/workflows/deploy.yml has the has_cf_secrets guard
- Cloudflare Pages project already exists: rig-research (created via wrangler)

**Status:** open

### [agents] ATL-E retired, no active coordinator agent

ATL-E (Stig-Johnny/atl-agent) was previously deployed as a k3s CronJob
on dell-stig-1 and handled handoff-stall Discord notifications. As of
~2026-03-26 it is no longer deployed (not present in
Stig-Johnny/cluster-gitops/apps/). The repo still exists but is dormant.
If an Epic needs a coordinator/team-lead role, decide whether to redeploy
ATL-E or build a replacement.

**Prior art:**
- Dormant repo: https://github.com/Stig-Johnny/atl-agent (last push 2026-03-26)
- Stig-Johnny/cluster-gitops/apps/ — no atl-agent ArgoCD manifest

**Status:** open

### [networking] iBuild-E cannot reach rig-conductor cluster-internal API

Empirically verified on 2026-04-19: from iBuild-E (Mac Mini, Oslo,
Tailscale IP 100.92.170.124), `curl http://rig-conductor-api.rig-conductor.svc.cluster.local:8080/api/health`
fails with DNS resolve timeout. The `*.svc.cluster.local` name only
resolves inside the GKE cluster via CoreDNS; Tailscale connects the
host but doesn't federate cluster DNS.

Impact: iBuild-E today cannot:
  - Send TOKEN_USAGE / HEARTBEAT / CLI_COMPLETED events (`POST /api/events`)
  - Pick up assignments (`GET /api/assignments/next`)
  - Reach the cost Dashboard or `/api/costs/*`

iBuild-E is effectively disconnected from rig-conductor coordination.
She operates from GitHub issues + Discord channels directly.

Fix options (none implemented):
  a. Tailscale subnet router on a cluster node → expose `*.svc.cluster.local` range
  b. Ingress / GCP load balancer for rig-conductor-api with mTLS
  c. Cloudflare tunnel into the cluster
  d. Accept the gap: iBuild-E never sees rig-conductor; she runs on GitHub-only flows

This has been a chronic "unknown" flagged by every cold-start test
(v1 through v5). Now measured.

**Prior art:**
- facts/agents.yaml — iBuild-E: deployed_in: Mac Mini (Oslo, Tailscale 100.92.170.124)
- curl rig-conductor-api.rig-conductor.svc.cluster.local:8080 → DNS resolve timeout after 3s (measured 2026-04-19)
- Every cold-start test session-log flagged 'iBuild-E routing through cluster-internal services — latency unknown'. Not latency — reachability. Zero, not high.

**Status:** open

### [cleanup] Plane residue — uninstall GitHub App + archive workspace

Plane was retired 2026-04-18 but the makeplane GitHub App is still
installed on the dashecorp org, and the Plane workspace at
app.plane.so is still alive (token revoked). Manual UI action needed.

**Prior art:**
- Retraction proposal: proposals/2026-04-18-docs-tooling-decision (What retires section)
- Retirement commit: dashecorp/infra PR #74

**Status:** open

## Architecture at a glance

```mermaid
flowchart LR
  H[Human] -->|user-story issue| RD[rig-docs]
  RD -->|dispatch| CE[rig-conductor]
  CE -->|assign| DE[Dev-E runtime pod]
  DE -->|MCP tool use| RMM[rig-memory-mcp]
  DE -->|author PR| RD
  RD -->|PR opens| RE[Review-E cron]
  RE -->|MCP tool use| RMM
  RE -->|approve or request changes| RD
  RD -->|merge| CFP[Cloudflare Pages]
  CFP -->|publish| S1[rig-research.pages.dev]
  RG[rig-gitops] -->|Flux deploys| DE
  RG -->|Flux deploys| RE
  RG -->|Flux deploys| CE
  RG -->|docs aggregation| S2[rig-docs.pages.dev]
```

## Conventions (rig-wide)

- **Docs are markdown with YAML frontmatter.** Required fields: `title`, `description`, `type`, `audience`, `created`/`updated`, `topic`. See [AGENTS.md](./AGENTS.md) in this repo.
- **Bidirectional linkage.** User story ↔ research ↔ proposal via `research_docs`, `proposal`, `user_story`, `source_research`. RelatedDocs component renders the graph.
- **Diagrams as code.** Mermaid source inline in markdown. No PNG or SVG committed. Source preserved post-render via `<details>` blocks.
- **Per-repo CLAUDE.md** auto-loads when Claude Code starts a session in that repo's cwd (Claude Code reads `CLAUDE.md`, not `AGENTS.md` — cross-vendor standard is AGENTS.md but the loader is CLAUDE.md). Same-repo local `@AGENTS.md` imports work; cross-repo `@owner/repo/file` does **not** fetch from GitHub (filesystem-only, max 5 hops).
- **Rig-wide agent instructions live in TWO places:** (1) each running agent's HelmRelease `character.personality` prompt (authoritative for Dev-E, Review-E in-cluster), (2) each repo's root `CLAUDE.md` (authoritative for interactive sessions). Both include the BRAIN.md fetch at session start.
- **Closes #N required** in PR bodies. Review-E blocks on this.
- **Memory MCP scope:** operational / ephemeral state only. Durable knowledge goes to rig-docs.

## Token-efficient cold start

When you pick up a new Epic with blank memory, the cheapest order of operations:

1. **Fetch this file** (`https://rig-research.pages.dev/BRAIN.md`, public, no auth) — ~27 KB.
2. Fetch [`/llms.txt`](https://rig-research.pages.dev/llms.txt) for the research hub topic index — ~2 KB.
3. Identify 1-3 relevant research / proposal docs, fetch raw — ~5-15 KB.
4. Fetch target repo's `AGENTS.md` (each repo's is ≤8 KB) — ~5 KB.
5. `read_memories` from Memory MCP scoped to repo + topic — ~2 KB.

Total cold-start context: ~35-45 KB. Leaves the rest of the budget for actual work.

## When this file needs updating

Manual fields that live in `facts/*.yaml` — update when the matching reality changes:

- `facts/repos.yaml` — **annotations only** (purpose, depends_on, used_by, agents_md, docs_surface). The repo list itself is auto-derived from `gh api` on every compile. Adding a new annotation, or updating an existing one, happens here.
- `facts/surfaces.yaml` — URLs, API endpoints, MCP tools. Update when an endpoint changes or a new surface is published.
- `facts/agents.yaml` — agent deployment instances. Compile validates each `manifest:` path exists on GitHub and warns on drift (how ATL-E retirement was caught).
- `facts/flows.yaml` — documented rig processes. Update after retrospectives.
- `facts/schema.yaml` — mirrors the Zod schema in `src/content.config.ts`. Keep in sync manually when the schema changes.
- `facts/events.yaml` — rig-conductor event types. Keep in sync with `MapToEvent` in the C# source.
- `facts/backlog.yaml` — known gaps. Add when identified; remove when closed.

Then run `npm run brain`. CI (build workflow) runs `brain:check` and fails on drift.