Skip to content

Rig Brain

Fresh-agent entry point. Read this first. One fetch (~18 KB) gives you the repo manifest, deployed surfaces, agent instances, primary flows, frontmatter schema, event types, and the current backlog. Every claim links to its source.

Compiled from facts/*.yaml + live GitHub state (gh api /orgs/dashecorp/repos for the repo list; manifest validation for agents). Do not hand-edit BRAIN.md. Regenerate with npm run brain. CI runs --check and fails on drift.

The Dashecorp rig is an autonomous coding-agent system. A human posts a user story; agents research, propose, code, review, and ship. Canonical docs live in dashecorp/rig-docs (Astro Starlight); operational memory lives in a Postgres + pgvector Memory MCP; deployments are Flux-managed in a dashecorp GKE cluster.

Canonical brain entry point (this file, rendered)

Section titled “Canonical brain entry point (this file, rendered)”

LLM site map (research, proposals, user-stories)

Section titled “LLM site map (research, proposals, user-stories)”

Research, proposals, user-stories (rendered Starlight site)

Section titled “Research, proposals, user-stories (rendered Starlight site)”

Aggregated engineering docs (architecture, guides, whitepapers, per-repo docs)

Section titled “Aggregated engineering docs (architecture, guides, whitepapers, per-repo docs)”
  • URL: https://rig-docs.pages.dev
  • Type: mkdocs-material
  • Source: dashecorp/rig-gitops (docs-site/)
  • Note: Built by scripts/build-docs.sh which copies from rig-gitops/docs/ and each rig repo’s docs/. Different scope from rig-research.pages.dev.
  • URL: http://conductor-e-api.conductor-e.svc.cluster.local:8080
  • Type: rest-api
  • Visibility: cluster-internal-only
  • Endpoints:
    • GET /api/reviews/next — Claim next PR review assignment. Query: agentId=review-e
    • POST /api/events — Submit events (REVIEW_PASSED, REVIEW_DISPUTED, HEARTBEAT)
    • GET /api/agents — Status of all registered agents
  • Type: mcp-server
  • Package: @dashecorp/rig-memory-mcp
  • Tools:
    • read_memories — Query prior memory by topic/repo/scope with vector similarity
    • write_memory — Persist a new memory with scope/kind/importance/tags
    • mark_used — Increment hit_count on a memory that informed a decision
  • Type: discord
  • Channels: #dev-e, #review-e, #ibuild-e, #admin
  • Note: Agents post thread updates here; humans watch for stuck / pending state.

Live from gh api /orgs/dashecorp/repos merged with facts/repos.yaml annotations. Archived repos are dropped automatically.

RepoPurposeLanguageDepends onAGENTS.md
rig-gitopsGitOps manifests (Flux HelmReleases, Kustomize bases) and the canonical AGENTS.md shared by every rig repo via `@dashecorp/rig-gitops/AGENTSshellcompiled
rig-agent-runtimeThe AI agent runtime (Node) — one image that deploys as Dev-E, Review-E, or iBuild-E depending on character file + environment. Handles promjavascriptrig-memory-mcp, conductor-eimports-rig-gitops
rig-memory-mcpMCP server backing persistent agent memory with Postgres + pgvector. Exposes read_memories / write_memory / mark_used tools consumed bjavascriptpostgres-pgvectorclaude-md
conductor-eEvent store + dispatch service (C# + Marten + Postgres). Receives PR/issue events, assigns work, tracks turns/cost/stuck state, serves the `csharppostgres, pgvectorimports-rig-gitops
rig-docsResearch, proposals, user-stories, and rig-wide reference (Astro Starlight). This repo — you’re reading its BRAIN.md. Deploys to rig-researcastrohand
rig-toolsShell scripts, Git hooks, and workflow sync for AI-assisted development. Developer tooling, not deployed. The one repo without an AGENTS.mdshellnone
infraOpenTofu/Terraform for GitHub org settings, Cloudflare (DNS, Pages, tunnels), GCP (GKE cluster hosting the rig), and Tailscale ACL/DNS. Planhclimports-rig-gitops
  • Runtime: dashecorp/rig-agent-runtime
  • Deployed in: GKE cluster (dashecorp)
  • Manifest: dashecorp/rig-gitops/apps/dev-e/
  • Variants:
    • node: apps/dev-e/rig-agent-helmrelease.yaml
    • python: apps/dev-e/python-helmrelease.yaml
    • dotnet: apps/dev-e/dotnet-helmrelease.yaml
  • Character: baked into HelmRelease values
  • Triggers: Conductor-E dispatch (issue.assigned events)
  • Runtime: dashecorp/rig-agent-runtime
  • Deployed in: GKE cluster (dashecorp)
  • Manifest: dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml
  • Cron: */5 * * * *
  • Search filter: org:dashecorp is:pr is:open author:app/dev-e-bot author:app/ibuild-e-bot -reviewed-by:app/review-e-bot
  • Discord: #review-e
  • Runtime: dashecorp/rig-agent-runtime
  • Deployed in: Mac Mini (Oslo, Tailscale 100.92.170.124)
  • Manifest: not-in-cluster
  • Discord: #ibuild-e
  • Notes: Apple Silicon host, Xcode + App Store Connect. Auto-reauth cron refreshes OAuth every 5 min. Separate from the GKE-hosted agents because iOS builds require macOS.

Trigger: Human opens a user-story GitHub issue in dashecorp/rig-docs

  1. Conductor-E — Scans open issues, classifies, dispatches to appropriate agent
  2. Dev-E — Reads issue + relevant research; authors research / proposal / code PR
  3. Review-E (cron every 5 min) — Finds PR, reviews against AGENTS.md + memory, requests changes or approves
  4. Human — Merges (or Review-E’s approval satisfies branch protection; auto-merge fires)
  5. Cloudflare Pages — Redeploys rig-research.pages.dev and rig-docs.pages.dev Complete when: issue closed via `Closes

Trigger: An Epic needs investigation before implementation

  1. author dated research/YYYY-MM-DD-slug.md with user_story frontmatter
  2. author proposals/YYYY-MM-DD-slug.md with source_research frontmatter
  3. user_story file gets research_docs and proposal fields pointing back
  4. RelatedDocs component auto-renders the graph; no manual cross-linking

Rules:

  • bidirectional links required
  • schema enforced in src/content.config.ts
  • CI rejects PRs missing required fields

Trigger: Fresh agent with blank memory receives an Epic or task

  1. WebFetch https://rig-research.pages.dev/brain/ (or raw BRAIN.md)
  2. Parse facts/repos.yaml equivalent in BRAIN.md — learn repo manifest
  3. Parse facts/surfaces.yaml equivalent — learn URLs and endpoints
  4. WebFetch https://rig-research.pages.dev/llms.txt for topic index
  5. WebFetch relevant research/proposal docs directly via raw URL
  6. For the target repo, fetch its AGENTS.md (compiled or imports-rig-gitops)
  7. read_memories scoped to repo + topic via Memory MCP
  8. Begin work with full context in ~15 KB total Token budget: ~15 KB read, leaves 200K+ for actual work on Opus

Trigger: Weekly scheduled Lint job

  1. Scan Memory MCP for rows with importance >= 4 AND hit_count >= 5
  2. For each candidate, check if docs already cover the topic (BM25 sim)
  3. If not covered, propose a docs PR with the memory content promoted
  4. Human approves PR, merge triggers redeploy Status: not-yet-built (design in research/2026-04-18-docs-memory-drift-lint)

Trigger: A research / proposal / user-story needs a diagram Rule: Mermaid source inline in fenced code block. No PNG or SVG ever committed. Rendering: remark-mermaid plugin wraps in <figure> with <pre class=mermaid> and <details> source; mermaid.js renders client-side; source preserved post-render for agent readers.

Frontmatter schema (for authoring rig-docs content)

Section titled “Frontmatter schema (for authoring rig-docs content)”
  • type (optional): one of research | proposal | decision | reference | user-story | runbook
  • audience (optional): one of human | agent | both — not a free-form array
  • Required: title, description
  • Optional linkage fields (paths are relative to src/content/docs/, no leading slash, no .md or .mdx extension):
    • type — See type enum above.
    • audience — See audience enum above.
    • created — ISO date string YYYY-MM-DD.
    • updated — ISO date string YYYY-MM-DD.
    • topic — Short slug grouping related docs.
    • source_refs — Array of URLs (external sources supporting this doc).
    • supersedes — Path to doc this replaces (no leading slash, no .md extension).
    • superseded_by — Path to newer doc that replaces this (same format).
    • user_story — (research/proposal only) Path to the user story this supports.
    • research_docs — (user-story only) Array of research doc paths this story spawned.
    • proposal — (user-story only) Path to the proposal answering this story.
    • source_research — (proposal only) Array of research paths this proposal synthesises.
    • github_issue — (user-story only) Full GitHub issue URL. Omit the field entirely if there is no issue — do NOT use empty string.

Path examples: user-stories/2026-04-18-docs-memory-strategy, research/2026-04-18-docs-tools-evaluation, proposals/2026-04-18-docs-tooling-decision.

Omit a field entirely when it has no value — do not use empty string.

These whitepapers live at dashecorp/rig-gitops/docs/whitepaper/*.md (private repo — requires gh auth to fetch). BRAIN.md surfaces their titles + 1-line summaries so agents know what exists. Full content must be fetched with: gh api /repos/dashecorp/rig-gitops/contents/docs/whitepaper/<file> --jq .download_url | xargs curl -sL.

  • Whitepaper index (index.md) — Entry point listing all whitepaper sections and their companion docs.
  • MVP scope (mvp-scope.md) — What the rig does in the minimum viable release. Gatekeeper for “is this in scope?”
  • Design principles (principles.md) — First principles (measurement precedes trust; honest gaps; provider portability).
  • Trust model (trust-model.md) — Who can approve what, which gates exist, human-in-the-loop rules.
  • Safety (safety.md) — Dangerous-command guards, sandboxing, blast-radius containment.
  • Security (security.md) — Secrets handling, attestation, audit trail, SOPS+age.
  • Provider portability (provider-portability.md) — Multi-runtime (Claude Code, Codex CLI, Gemini CLI) via OTel GenAI conventions. Swap runtime without changing backend.
  • Observability — OTel, Langfuse, Prometheus, SLOs (observability.md) — Self-hosted Langfuse (agent traces) + Grafana Cloud (infra) + local Prometheus (SLO gates) hybrid. Native OTel via CLAUDE_CODE_ENABLE_TELEMETRY=1. OTel Collector runs per-cluster, routes LLM traces to Langfuse, infra to managed. Per implementation-status: OTel Collector “Partial” (deployed for Conductor-E, agents not yet emitting), Langfuse “Planned”, cost dashboard “Partial” (TokenUsageProjection exists, no LiteLLM proxy yet).
  • Cost framework (cost-framework.md) — Budget policy, per-model rate tables, cost attribution strategy. Companion to observability.
  • Self-healing (self-healing.md) — Automatic recovery loops, StaleHeartbeatService, escalation severity routing.
  • Memory architecture (memory.md) — Memory MCP scope, importance/hit_count model, promotion-to-docs threshold design.
  • Quality and evaluation (quality-and-evaluation.md) — How the rig evaluates its own output. Judge-agent pattern, fixed rubrics.
  • Drift detection (drift-detection.md) — Schema drift, docs drift, infra drift — detection thresholds and response.
  • Development process (development-process.md) — Issue → Epic → research → proposal → PR lifecycle, agent-human gates.
  • Example first story (example-first-story.md) — Worked walkthrough of one Epic end-to-end.
  • Glossary (glossary.md) — Rig-specific terminology (Epic, proposal, Conductor-E, Review-E, etc).
  • Known limitations (limitations.md) — Honest catalog of what the rig can’t do today.
  • Implementation status (implementation-status.md) — Single source of truth for deployed vs planned per capability. 78 tracked across 11 domains; 21 deployed/partial (27%), 44 planned/deferred (56%). Every capability named in the whitepapers gets a row with status + whitepaper section + ticket/evidence.
  • Tool choices (ADRs) (tool-choices.md) — Decision records for tooling. Includes rejection list with rationale.

Most agents should start with: implementation-status.md (what’s deployed vs planned — 78 tracked capabilities) and whichever domain-specific whitepaper matches the Epic.

Conductor-E event types (POST /api/events)

Section titled “Conductor-E event types (POST /api/events)”

All events from dashecorp/conductor-e/src/ConductorE.Core/UseCases/SubmitEvent.cs MapToEvent switch. Names only here — fetch /events.md for full field schemas (no auth required).

Pipeline (issue → PR → merge → deploy): ISSUE_APPROVED, ISSUE_ASSIGNED, ISSUE_UNASSIGNED, WORK_STARTED, BRANCH_CREATED, PR_CREATED, CI_PASSED, CI_FAILED, REVIEW_ASSIGNED, REVIEW_PASSED, REVIEW_DISPUTED, HUMAN_GATE_TRIGGERED, HUMAN_GATE_REMINDER, MERGED, MERGE_GATE_WAITING, MERGE_GATE_MERGED, MERGE_GATE_TIMEOUT, MAIN_CI_STARTED, MAIN_CI_PASSED, MAIN_CI_FAILED, DEPLOYED_STAGING, DEPLOYED_PRODUCTION, SMOKE_PASSED, SMOKE_FAILED, BUILD_FAILED, VERIFIED, ISSUE_DONE, ESCALATED, MILESTONE_COMPLETE, DUPLICATE_PR_CLOSED

Direct PR path (no issue): PR_OPENED, PR_REVIEW_ASSIGNED, PR_REVIEW_APPROVED, PR_REVIEW_REJECTED

Agent lifecycle: AGENT_STARTED, HEARTBEAT, AGENT_STUCK

CLI sessions: CLI_STARTED, CLI_PROGRESS, CLI_COMPLETED

Observability (cost + tooling): TOKEN_USAGE, TOOL_USED

Memory MCP: MEMORY_WRITE, MEMORY_READ, MEMORY_HIT_USED

Cold-start agents should see these so they don’t re-discover what’s already identified. Each gap links to prior_art — existing stubs, research, or PRs that have already touched it. When a gap is being worked, linked_user_story points to the user story; when closed, the entry is removed from facts/backlog.yaml.

[observability] Cost dashboard (visualisation) missing — data pipeline partially exists

Section titled “[observability] Cost dashboard (visualisation) missing — data pipeline partially exists”

Per-agent / per-task cost data IS being collected today: TokenUsageProjection in Conductor-E aggregates TOKEN_USAGE events (AgentId, Repo, IssueNumber, Model, InputTokens, OutputTokens, CostUsd, Category). CLI_COMPLETED events also carry CostUsd + Turns + DurationMs. What’s missing is the dashboard surface and hard-enforcement via LiteLLM proxy. Rough current spend: ~$5-15/day fleet-wide (order-of-magnitude only — instrumented data exists, no visualisation yet). Implementation-status whitepaper marks this “Partial”.

Prior art:

  • TokenUsageProjection already deployed in conductor-e (per implementation-status.md Observability section)
  • TOKEN_USAGE + CLI_COMPLETED events defined and emitted — see facts/events.yaml
  • Cost framework design: rig-gitops/docs/whitepaper/cost-framework.md (private)
  • Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; see facts/whitepapers.yaml for summary)
  • Empty stub directory docs/cost-dashboard/ in rig-gitops (per docs-state-audit)
  • LiteLLM proxy not yet deployed — blocks hard budget enforcement

Status: partial

[observability] OTel collector deployed for Conductor-E only — agents not yet emitting

Section titled “[observability] OTel collector deployed for Conductor-E only — agents not yet emitting”

OpenTelemetry Collector is “Partial”: deployed for Conductor-E; agent pods (Dev-E, Review-E, iBuild-E) have not yet enabled native OTel via CLAUDE_CODE_ENABLE_TELEMETRY=1. Langfuse (self-hosted) and Grafana Cloud ingest are both “Planned”. Full design in the observability whitepaper.

Prior art:

  • Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml)
  • Implementation status: whitepaper/implementation-status.md marks OTel Collector ‘Partial’, Langfuse ‘Planned’
  • rig-memory-mcp/events.js FUTURE comment: migrate to OTel GenAI spans
  • Env var to enable native OTel: CLAUDE_CODE_ENABLE_TELEMETRY=1 + OTEL_EXPORTER_OTLP_ENDPOINT pointed at the in-cluster collector

Status: partial

[docs-memory] Docs-memory drift lint not implemented

Section titled “[docs-memory] Docs-memory drift lint not implemented”

Weekly LLM-as-judge pass that promotes memory→docs (when importance≥4 AND hit_count≥5), flags stale research, catches orphan docs. Designed but no runtime built.

Prior art:

  • Full design in research/2026-04-18-docs-memory-drift-lint
  • Parent user story: user-stories/2026-04-18-docs-memory-strategy
  • Principles synthesis: research/2026-04-18-docs-vs-memory-principles

Linked user story: user-stories/2026-04-18-docs-memory-strategy

Status: open

[docs-surfaces] Two docs surfaces with overlapping scope

Section titled “[docs-surfaces] Two docs surfaces with overlapping scope”

rig-docs.pages.dev (MkDocs aggregation from rig-gitops/docs-site/) and rig-research.pages.dev (Starlight research hub from dashecorp/rig-docs). Both host rig docs; boundaries not formalised. Agents currently learn this empirically. Eventually unify or formalise the split.

Prior art:

  • MkDocs site built by dashecorp/rig-gitops/scripts/build-docs.sh
  • Starlight site defined in dashecorp/rig-docs/ (this repo)
  • Docs tooling decision: proposals/2026-04-18-docs-tooling-decision (picked Starlight for research hub; MkDocs kept for aggregation)

Status: open

[agents] Review-E does not scan human-authored PRs

Section titled “[agents] Review-E does not scan human-authored PRs”

Review-E’s cron filter is author:app/dev-e-bot author:app/ibuild-e-bot. PRs authored by humans (including operator PRs to rig repos) are invisible to her. Design decision pending — widen filter or keep separation-of-concerns (human PRs = human review).

Prior art:

  • HelmRelease: dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml (cron prompt line: author:app/dev-e-bot author:app/ibuild-e-bot)

Status: open

[deployment] CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID not in rig-docs repo secrets

Section titled “[deployment] CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID not in rig-docs repo secrets”

The deploy workflow gracefully skips deploy when secrets absent (notice only). Current deploys happen via direct wrangler pages deploy from the operator’s laptop. Adding the secrets would enable per-PR preview deploys and automatic main-branch publishing.

Prior art:

  • .github/workflows/deploy.yml has the has_cf_secrets guard
  • Cloudflare Pages project already exists: rig-research (created via wrangler)

Status: open

[agents] ATL-E retired, no active coordinator agent

Section titled “[agents] ATL-E retired, no active coordinator agent”

ATL-E (Stig-Johnny/atl-agent) was previously deployed as a k3s CronJob on dell-stig-1 and handled handoff-stall Discord notifications. As of ~2026-03-26 it is no longer deployed (not present in Stig-Johnny/cluster-gitops/apps/). The repo still exists but is dormant. If an Epic needs a coordinator/team-lead role, decide whether to redeploy ATL-E or build a replacement.

Prior art:

Status: open

[cleanup] Plane residue — uninstall GitHub App + archive workspace

Section titled “[cleanup] Plane residue — uninstall GitHub App + archive workspace”

Plane was retired 2026-04-18 but the makeplane GitHub App is still installed on the dashecorp org, and the Plane workspace at app.plane.so is still alive (token revoked). Manual UI action needed.

Prior art:

  • Retraction proposal: proposals/2026-04-18-docs-tooling-decision (What retires section)
  • Retirement commit: dashecorp/infra PR #74

Status: open

flowchart LR
  H[Human] -->|user-story issue| RD[rig-docs]
  RD -->|dispatch| CE[Conductor-E]
  CE -->|assign| DE[Dev-E runtime pod]
  DE -->|MCP tool use| RMM[rig-memory-mcp]
  DE -->|author PR| RD
  RD -->|PR opens| RE[Review-E cron]
  RE -->|MCP tool use| RMM
  RE -->|approve or request changes| RD
  RD -->|merge| CFP[Cloudflare Pages]
  CFP -->|publish| S1[rig-research.pages.dev]
  RG[rig-gitops] -->|Flux deploys| DE
  RG -->|Flux deploys| RE
  RG -->|Flux deploys| CE
  RG -->|docs aggregation| S2[rig-docs.pages.dev]
View Mermaid source
flowchart LR
  H[Human] -->|user-story issue| RD[rig-docs]
  RD -->|dispatch| CE[Conductor-E]
  CE -->|assign| DE[Dev-E runtime pod]
  DE -->|MCP tool use| RMM[rig-memory-mcp]
  DE -->|author PR| RD
  RD -->|PR opens| RE[Review-E cron]
  RE -->|MCP tool use| RMM
  RE -->|approve or request changes| RD
  RD -->|merge| CFP[Cloudflare Pages]
  CFP -->|publish| S1[rig-research.pages.dev]
  RG[rig-gitops] -->|Flux deploys| DE
  RG -->|Flux deploys| RE
  RG -->|Flux deploys| CE
  RG -->|docs aggregation| S2[rig-docs.pages.dev]
  • Docs are markdown with YAML frontmatter. Required fields: title, description, type, audience, created/updated, topic. See AGENTS.md in this repo.
  • Bidirectional linkage. User story ↔ research ↔ proposal via research_docs, proposal, user_story, source_research. RelatedDocs component renders the graph.
  • Diagrams as code. Mermaid source inline in markdown. No PNG or SVG committed. Source preserved post-render via <details> blocks.
  • Compiled AGENTS.md at dashecorp/rig-gitops/AGENTS.md, imported by other repos via @dashecorp/rig-gitops/AGENTS.md.
  • Closes #N required in PR bodies. Review-E blocks on this.
  • Memory MCP scope: operational / ephemeral state only. Durable knowledge goes to rig-docs.

When you pick up a new Epic with blank memory, the cheapest order of operations:

  1. Fetch this file (https://rig-research.pages.dev/BRAIN.md, public, no auth) — ~18 KB.
  2. Fetch /llms.txt for the research hub topic index — ~2 KB.
  3. Identify 1-3 relevant research / proposal docs, fetch raw — ~5-15 KB.
  4. Fetch target repo’s AGENTS.md (each repo’s is ≤8 KB) — ~5 KB.
  5. read_memories from Memory MCP scoped to repo + topic — ~2 KB.

Total cold-start context: ~35-45 KB. Leaves the rest of the budget for actual work.

Manual fields that live in facts/*.yaml — update when the matching reality changes:

  • facts/repos.yamlannotations only (purpose, depends_on, used_by, agents_md, docs_surface). The repo list itself is auto-derived from gh api on every compile. Adding a new annotation, or updating an existing one, happens here.
  • facts/surfaces.yaml — URLs, API endpoints, MCP tools. Update when an endpoint changes or a new surface is published.
  • facts/agents.yaml — agent deployment instances. Compile validates each manifest: path exists on GitHub and warns on drift (how ATL-E retirement was caught).
  • facts/flows.yaml — documented rig processes. Update after retrospectives.
  • facts/schema.yaml — mirrors the Zod schema in src/content.config.ts. Keep in sync manually when the schema changes.
  • facts/events.yaml — Conductor-E event types. Keep in sync with MapToEvent in the C# source.
  • facts/backlog.yaml — known gaps. Add when identified; remove when closed.

Then run npm run brain. CI (build workflow) runs brain:check and fails on drift.