The Autonomous Engineering Rig
The Autonomous Engineering Rig
Section titled “The Autonomous Engineering Rig”A shared infrastructure for human-supervised autonomous software development.
1. The Problem
Section titled “1. The Problem”Software development at pace has a fundamental coordination cost. An engineer maintaining six services faces the same overhead at every service boundary: context switch, test setup, review back-and-forth, deploy check. This overhead compounds as the number of maintained surfaces grows — but the productive work per surface doesn’t.
LLMs offered a partial answer: they can suggest code, draft tests, explain unfamiliar systems. But copilots are still human-proxies. They wait for direction. They don’t close issues. They don’t open PRs. They don’t iterate on review feedback autonomously. A copilot saves keystrokes; it doesn’t free engineering capacity.
The rig inverts the model. Humans own intent (the issue) and acceptance (merge). Agents own the work between those two events — research, implementation, review, iteration. The human’s job becomes: write clear issues, review PRs, merge. Not: context-switch, diagnose, iterate over multiple sessions.
The three hard problems
Section titled “The three hard problems”Building an autonomous engineering system that works in production surfaces three problems that copilot tools don’t face:
| Problem | Why it’s hard |
|---|---|
| Assignment exclusivity | If two agents pick up the same issue, they fork work and produce conflicting PRs. Must guarantee exactly-once delivery. |
| Acceptance gating | Agent output must pass human-authored quality checks before it can land. Agents cannot approve their own work. |
| Failure recovery | Agents get stuck. Systems go down. A production rig must detect, escalate, and recover without human polling. |
The rig’s architecture is shaped by these three problems.
2. Architecture
Section titled “2. Architecture”The rig runs on a single k3s cluster on a GCE VM (invotek-k3s), managed via Flux GitOps from dashecorp/rig-gitops. There are six deployed components plus two planned:
flowchart TB
subgraph Human["Human operator"]
H[Issues + PR reviews + merges]
end
subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
CE[rig-conductor\nC# / Marten / Postgres]
DE[Dev-E\nNode · Python · .NET]
RE[Review-E\nCron every 5 min]
MEM[rig-memory-mcp\nPostgres + pgvector]
end
subgraph External["External"]
GH[GitHub]
CF[Cloudflare Pages]
ANT[Anthropic API]
IB[iBuild-E\nMac Mini · Oslo]
end
H -->|file issue| GH
GH -->|webhook| CE
CE -->|ISSUE_ASSIGNED| DE
CE -->|REVIEW_ASSIGNED| RE
DE -->|open PR| GH
RE -->|approve / CHANGES_REQUESTED| GH
GH -->|merge| CF
DE -->|read_memories / write_memory| MEM
RE -->|read_memories / write_memory| MEM
DE & RE & IB -->|inference| ANT
IB -->|iOS PRs| GHView Mermaid source
flowchart TB
subgraph Human["Human operator"]
H[Issues + PR reviews + merges]
end
subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
CE[rig-conductor\nC# / Marten / Postgres]
DE[Dev-E\nNode · Python · .NET]
RE[Review-E\nCron every 5 min]
MEM[rig-memory-mcp\nPostgres + pgvector]
end
subgraph External["External"]
GH[GitHub]
CF[Cloudflare Pages]
ANT[Anthropic API]
IB[iBuild-E\nMac Mini · Oslo]
end
H -->|file issue| GH
GH -->|webhook| CE
CE -->|ISSUE_ASSIGNED| DE
CE -->|REVIEW_ASSIGNED| RE
DE -->|open PR| GH
RE -->|approve / CHANGES_REQUESTED| GH
GH -->|merge| CF
DE -->|read_memories / write_memory| MEM
RE -->|read_memories / write_memory| MEM
DE & RE & IB -->|inference| ANT
IB -->|iOS PRs| GH2.1 rig-conductor
Section titled “2.1 rig-conductor”The event store and dispatch engine. Written in C# with Marten (event-sourced Postgres). Receives GitHub webhooks, normalizes them into 40+ typed events, and maintains the issue lifecycle from ISSUE_APPROVED through ISSUE_DONE.
Key responsibilities:
| Endpoint | Purpose |
|---|---|
POST /api/webhook/github | Ingest GitHub PR and issue events |
GET /api/assignments/next | Claim next issue atomically (exclusivity guarantee) |
GET /api/reviews/next | Claim next PR review |
GET /api/usage | Token / cost usage by agent and repo |
GET /api/costs/summary | Aggregate cost (default: 7 days) |
/dashboard | 42 KB single-page cost + activity UI |
The conductor is stateless in the agent sense — it doesn’t know what the agent is doing inside a session. It tracks lifecycle events (PR opened, review assigned, CI passed) and derives state from the event stream via Marten projections.
2.2 Dev-E
Section titled “2.2 Dev-E”The coding agent. Deployed as three HelmReleases (node, python, dotnet) to handle polyglot work. Each instance follows the same loop:
1. Poll GET /api/assignments/next?agentId=dev-e-node2. Fetch BRAIN.md + issue context + relevant research3. Clone repo, create feature/issue-N-slug branch4. Implement: code + tests + docs5. Run test suite (npm test / pytest / dotnet test)6. Push, open PR with `Closes #N`7. Post progress to Discord thread8. Iterate on Review-E feedbackDev-E does not push to main. Every change goes through a PR. Every PR requires Review-E approval.
2.3 Review-E
Section titled “2.3 Review-E”A cron agent (every 5 minutes) that scans for open PRs authored by Dev-E or iBuild-E. For each PR:
- Reads the diff + AGENTS.md + issue context + memory
- Posts a structured review: approves or files
CHANGES_REQUESTEDwith specific line comments - Resolves threads as she sees fixes committed in subsequent pushes
Review-E is the acceptance gate. She’s scoped to exclude PRs she authored (which is none — she only writes reviews, not code). She catches: missing Closes #N, incorrect frontmatter, missing docs, failing test coverage, broken cross-links, and logic errors in code.
2.4 iBuild-E
Section titled “2.4 iBuild-E”The iOS/macOS build agent. Runs on a Mac Mini in Oslo because Xcode requires macOS hardware — it cannot run in the k3s cluster. Handles the operator’s personal-org iOS portfolio: Astro apps, React Native, Swift packages, App Store Connect submissions.
A known gap: iBuild-E cannot reach the conductor’s cluster-internal API (the cluster-internal DNS name only resolves inside k3s via CoreDNS). She operates on GitHub-direct flows — no conductor dispatch, no cost attribution — until the networking gap is resolved (Tailscale subnet router or Cloudflare tunnel).
2.5 rig-memory-mcp
Section titled “2.5 rig-memory-mcp”A Memory MCP server backed by Postgres + pgvector. Exposes three tools: read_memories, write_memory, mark_used. All agents share the same store.
| Field | Type | Purpose |
|---|---|---|
scope | enum | repo | global — search radius |
kind | enum | learning | decision | error |
repo | string | Which repo this memory applies to |
importance | 1-5 | Signal strength (5 = always read) |
hit_count | int | How many times this memory was useful |
Hybrid BM25 + vector search enables both keyword lookup (“XREADGROUP”, “CF 1014”) and semantic similarity. When importance ≥ 4 and hit_count ≥ 5, the weekly lint process promotes the memory to a permanent research doc.
2.6 Deployment stack
Section titled “2.6 Deployment stack”| Layer | Tool | Where |
|---|---|---|
| Cluster | k3s | GCE VM (invotek-k3s, invotek-github-infra) |
| GitOps | Flux v2 | Watches dashecorp/rig-gitops |
| Cloud infra | OpenTofu | dashecorp/infra |
| DNS + CDN | Cloudflare | Pages, DNS, tunnels |
| Secrets | SOPS + age | Age keys in sealed k8s secrets |
| Container registry | GitHub GHCR | ghcr.io/dashecorp/* |
3. Operational Principles
Section titled “3. Operational Principles”3.1 Reconciliation
Section titled “3.1 Reconciliation”Every agent session begins with a reconciliation step: the agent fetches BRAIN.md (the compiled system state, ~27 KB), cross-references its memory, and aligns its understanding of the current state before making any changes.
Reconciliation prevents the most common cold-start failure: an agent acting on stale context (e.g., proposing a migration that was done last week, or opening a PR against an endpoint that was renamed). BRAIN.md is compiled from facts/*.yaml + live GitHub API state on every push. CI runs npm run brain:check and fails if the compiled output drifts from source facts.
3.2 Exclusivity
Section titled “3.2 Exclusivity”The assignment API guarantees exactly-once delivery via optimistic concurrency on the Marten event store. When GET /api/assignments/next is called:
conductor: 1. SELECT unassigned issues ORDER BY priority 2. Attempt to append ISSUE_ASSIGNED event with expected version N 3. If version conflict → retry from step 1 4. Return the claimed issue to the callerA second concurrent caller returns a different issue (or 204 if none remain). This prevents the “duplicate branch” failure mode: two Dev-E instances working the same issue and producing conflicting PRs.
For Redis XREADGROUP-based workloads, exclusivity is per-pod: each consumer uses its HOSTNAME env var as the consumer group member ID. This means a pod restart creates a new consumer rather than inheriting the previous pod’s pending entries under a stale ID — the key insight behind the per-pod partitioning pattern.
3.3 Acceptance checks
Section titled “3.3 Acceptance checks”No agent-authored code merges without passing all of:
| Gate | Who enforces | Bypass |
|---|---|---|
| CI (build + lint + test) | GitHub Actions | None — must fix |
| Review-E approval | Review-E (cron) | manual-merge label skips auto-merge |
Closes #N in PR body | rig-conductor webhook | Edit PR body |
| No unresolved threads | rig-conductor merge gate | Resolve threads |
| CODEOWNERS approval | GitHub branch protection | Human approval required |
The order matters: CI runs first (cheap, fast), then Review-E (expensive, thorough), then the conductor merge gate (final consistency check). A failure at any gate holds the PR without blocking the conductor’s ability to assign new work.
3.4 GitOps split
Section titled “3.4 GitOps split”The rig enforces a clean separation between what agents can change and what requires human or Flux:
discover/dashecorp/rig-gitops/ ← Flux-managed; changes here = cluster changes apps/dev-e/ ← HelmRelease for Dev-E (replicas, image, env) apps/review-e/ ← HelmRelease for Review-E apps/rig-conductor/ ← HelmRelease for conductor
dashecorp/infra/ ← OpenTofu; changes here = cloud resource changes cloudflare/pages.tf ← Pages projects github/repos.tf ← Repo creation cloudflare/dns.tf ← DNS recordsAgents write to application repos (rig-docs, rig-conductor, rig-agent-runtime). They don’t modify HelmReleases or Terraform directly. That separation gives operators a hard boundary: if the HelmRelease says replicas: 1, only a human (or Ops-E, when deployed) can change it to replicas: 2.
3.5 Governance
Section titled “3.5 Governance”Branch protection is the agent gate. Every dashecorp/* repo requires:
- ≥1 PR approval (Review-E counts)
- CI passing
Closes #Nin body (conductor webhook check)- CODEOWNERS approval for declared-sensitive paths
CODEOWNERS files are the human veto layer: if infra/ or AGENTS.md is in CODEOWNERS, a human must approve changes to those paths even if Review-E already approved the rest of the PR. This gives operators surgical control over which paths require human oversight without blocking the whole rig.
The design philosophy: agents are trusted to implement within lanes. The lanes are defined by CODEOWNERS, branch protection, and the conductor’s gate rules. To widen a lane, you edit a config file — you don’t re-train a model.
4. Tradeoffs
Section titled “4. Tradeoffs”The rig makes explicit tradeoffs. Naming them honestly is part of the design.
| Tradeoff | What we chose | What we gave up |
|---|---|---|
| Operator-hosted | Full control, no vendor lock-in | Operational burden (k3s, Flux, Postgres, wrangler) |
| Event-sourced state | Replay, auditability, time-travel debugging | Schema migration complexity, append-only commits |
| Agent-authored docs | Docs stay current automatically | Occasional inaccuracies require human correction |
| Claude Code CLI | Deep tool use, agentic loops, built-in memory | Anthropic-only (provider portability is planned) |
| Shared Postgres for memory | Simple deployment, no extra infra | Memory queries compete with event store IOPS |
| XREADGROUP for dispatch | Scales horizontally, crash-safe delivery | Consumer group management, HOSTNAME coupling |
| One rig for all projects | Shared learnings, shared tooling | Blast radius if conductor goes down |
| Starlight for docs | Mermaid rendering, MDX, versioned | Overkill for tiny repos; two parallel surfaces exist |
What doesn’t work yet
Section titled “What doesn’t work yet”Honest gaps as of 2026-04-23:
- iBuild-E ↔ conductor — DNS gap prevents Mac Mini from calling cluster-internal endpoints. She runs on GitHub-direct flows with no cost attribution.
- LiteLLM proxy — Not deployed. Blocks hard budget enforcement (kill-switch if cost exceeds daily ceiling).
- Langfuse — Self-hosted LLM trace ingestion is planned, not running. Agent traces don’t flow to any observability backend today.
- External dashboard —
/dashboardis cluster-internal only. A human on laptop needskubectl port-forwardto view it. - ATL-E retired — The coordination agent was retired ~2026-03-26. No active team-lead role. Epics spanning multiple agents require human orchestration.
5. Roadmap
Section titled “5. Roadmap”Shipped
Section titled “Shipped”| Capability | Notes |
|---|---|
| Issue dispatch (rig-conductor) | Atomic assignment, 40+ event types |
| Dev-E node / python / dotnet | Three HelmReleases, polyglot coverage |
| Review-E cron reviews | Every 5 min, approval gates |
| Memory MCP (read/write/mark_used) | Postgres + pgvector hybrid search |
| BRAIN.md compiled from facts/ | CI-checked on every push |
| Cost dashboard | Conductor built-in, Costs + Issues + Agents tabs |
| Flux GitOps deployment | All rig components Flux-managed |
| CODEOWNERS + branch protection | Per-repo human veto layer |
| Per-pod XREADGROUP partitioning | HOSTNAME-based consumer IDs |
| Dangerous-command guard | pretool-guard.sh blocks sudo, apt, rm -rf |
In progress
Section titled “In progress”| Capability | Status | Notes |
|---|---|---|
| Default-deny egress | Partial | Network policy blocks most outbound; Cloudflare APIs still exempt (rig-docs#57) |
| OTel Collector | Partial | Deployed for conductor; agent pods not yet emitting |
| LiteLLM proxy | Planned | Deployment work in progress |
| Capability | What it unlocks |
|---|---|
| Architect-E | System-level design proposals; character exists, Helm config TBD |
| Ops-E | Deployment automation; character exists, runtime TBD |
| Cost ceiling kill-switch | Requires LiteLLM proxy first |
| Public conductor dashboard | Requires Cloudflare tunnel or read-only projection |
| Docs-memory drift lint | Weekly LLM pass promoting memory → docs at importance≥4, hit_count≥5 |
| Judge-E | Quality evaluation agent for LLM-as-judge rubric reviews |
6. Getting Started
Section titled “6. Getting Started”The rig is not a product you install. It’s a pattern you run. Every component is open-source or open-spec (Marten, Flux, Claude Code, pgvector, Mermaid). The orchestration lives in dashecorp/rig-gitops.
To replicate the pattern at minimum viable scale:
# 1. Stand up k3s with Fluxcurl -sfL https://get.k3s.io | shflux bootstrap github --owner=your-org --repository=your-gitops
# 2. Deploy Postgres (for Marten event store + pgvector memory)helm install postgres bitnami/postgresql
# 3. Deploy rig-conductor# HelmRelease in gitops repo points to ghcr.io/dashecorp/rig-conductor
# 4. Wire GitHub webhooks# POST dashecorp/rig-conductor /api/webhook/github# Events: issues, pull_request, check_run, push
# 5. Deploy one Dev-E instance# Start with replicas: 1, node variant
# 6. File your first issue with `agent-ready` label# The conductor picks it up within secondsFull deployment is documented in rig-gitops/docs/onboarding.md. BRAIN.md is the runtime entry point — every new agent session starts there.
See also
Section titled “See also”- Brain (read first) — live system state, compiled from facts/
- Implementation status — per-capability deployment status
- Development process — issue → PR lifecycle in full
- Stories — real case studies from production
- Agent platform comparison — how the rig compares to Devin, SWE-agent, Aider
- Cost model research — token attribution, cache economics, KEDA scale-to-zero