The Autonomous Engineering Rig

A shared infrastructure for human-supervised autonomous software development.

1. The Problem

Software development at pace has a fundamental coordination cost. An engineer maintaining six services faces the same overhead at every service boundary: context switch, test setup, review back-and-forth, deploy check. This overhead compounds as the number of maintained surfaces grows — but the productive work per surface doesn’t.

LLMs offered a partial answer: they can suggest code, draft tests, explain unfamiliar systems. But copilots are still human-proxies. They wait for direction. They don’t close issues. They don’t open PRs. They don’t iterate on review feedback autonomously. A copilot saves keystrokes; it doesn’t free engineering capacity.

The rig inverts the model. Humans own intent (the issue) and acceptance (merge). Agents own the work between those two events — research, implementation, review, iteration. The human’s job becomes: write clear issues, review PRs, merge. Not: context-switch, diagnose, iterate over multiple sessions.

The three hard problems

Building an autonomous engineering system that works in production surfaces three problems that copilot tools don’t face:

Problem	Why it’s hard
Assignment exclusivity	If two agents pick up the same issue, they fork work and produce conflicting PRs. Must guarantee exactly-once delivery.
Acceptance gating	Agent output must pass human-authored quality checks before it can land. Agents cannot approve their own work.
Failure recovery	Agents get stuck. Systems go down. A production rig must detect, escalate, and recover without human polling.

The rig’s architecture is shaped by these three problems.

2. Architecture

The rig runs on a single k3s cluster on a GCE VM (invotek-k3s), managed via Flux GitOps from dashecorp/rig-gitops. There are six deployed components plus two planned:

flowchart TB
  subgraph Human["Human operator"]
    H[Issues + PR reviews + merges]
  end

  subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
    CE[rig-conductor\nC# / Marten / Postgres]
    DE[Dev-E\nNode · Python · .NET]
    RE[Review-E\nCron every 5 min]
    MEM[rig-memory-mcp\nPostgres + pgvector]
  end

  subgraph External["External"]
    GH[GitHub]
    CF[Cloudflare Pages]
    ANT[Anthropic API]
    IB[iBuild-E\nMac Mini · Oslo]
  end

  H -->|file issue| GH
  GH -->|webhook| CE
  CE -->|ISSUE_ASSIGNED| DE
  CE -->|REVIEW_ASSIGNED| RE
  DE -->|open PR| GH
  RE -->|approve / CHANGES_REQUESTED| GH
  GH -->|merge| CF
  DE -->|read_memories / write_memory| MEM
  RE -->|read_memories / write_memory| MEM
  DE & RE & IB -->|inference| ANT
  IB -->|iOS PRs| GH

View Mermaid source

flowchart TB
  subgraph Human["Human operator"]
    H[Issues + PR reviews + merges]
  end

  subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
    CE[rig-conductor\nC# / Marten / Postgres]
    DE[Dev-E\nNode · Python · .NET]
    RE[Review-E\nCron every 5 min]
    MEM[rig-memory-mcp\nPostgres + pgvector]
  end

  subgraph External["External"]
    GH[GitHub]
    CF[Cloudflare Pages]
    ANT[Anthropic API]
    IB[iBuild-E\nMac Mini · Oslo]
  end

  H -->|file issue| GH
  GH -->|webhook| CE
  CE -->|ISSUE_ASSIGNED| DE
  CE -->|REVIEW_ASSIGNED| RE
  DE -->|open PR| GH
  RE -->|approve / CHANGES_REQUESTED| GH
  GH -->|merge| CF
  DE -->|read_memories / write_memory| MEM
  RE -->|read_memories / write_memory| MEM
  DE & RE & IB -->|inference| ANT
  IB -->|iOS PRs| GH

2.1 rig-conductor

The event store and dispatch engine. Written in C# with Marten (event-sourced Postgres). Receives GitHub webhooks, normalizes them into 40+ typed events, and maintains the issue lifecycle from ISSUE_APPROVED through ISSUE_DONE.

Key responsibilities:

Endpoint	Purpose
`POST /api/webhook/github`	Ingest GitHub PR and issue events
`GET /api/assignments/next`	Claim next issue atomically (exclusivity guarantee)
`GET /api/reviews/next`	Claim next PR review
`GET /api/usage`	Token / cost usage by agent and repo
`GET /api/costs/summary`	Aggregate cost (default: 7 days)
`/dashboard`	42 KB single-page cost + activity UI

The conductor is stateless in the agent sense — it doesn’t know what the agent is doing inside a session. It tracks lifecycle events (PR opened, review assigned, CI passed) and derives state from the event stream via Marten projections.

2.2 Dev-E

The coding agent. Deployed as three HelmReleases (node, python, dotnet) to handle polyglot work. Each instance follows the same loop:

1. Poll GET /api/assignments/next?agentId=dev-e-node
2. Fetch BRAIN.md + issue context + relevant research
3. Clone repo, create feature/issue-N-slug branch
4. Implement: code + tests + docs
5. Run test suite (npm test / pytest / dotnet test)
6. Push, open PR with `Closes #N`
7. Post progress to Discord thread
8. Iterate on Review-E feedback

Dev-E does not push to main. Every change goes through a PR. Every PR requires Review-E approval.

2.3 Review-E

A cron agent (every 5 minutes) that scans for open PRs authored by Dev-E or iBuild-E. For each PR:

Reads the diff + AGENTS.md + issue context + memory
Posts a structured review: approves or files CHANGES_REQUESTED with specific line comments
Resolves threads as she sees fixes committed in subsequent pushes

Review-E is the acceptance gate. She’s scoped to exclude PRs she authored (which is none — she only writes reviews, not code). She catches: missing Closes #N, incorrect frontmatter, missing docs, failing test coverage, broken cross-links, and logic errors in code.

2.4 iBuild-E

The iOS/macOS build agent. Runs on a Mac Mini in Oslo because Xcode requires macOS hardware — it cannot run in the k3s cluster. Handles the operator’s personal-org iOS portfolio: Astro apps, React Native, Swift packages, App Store Connect submissions.

A known gap: iBuild-E cannot reach the conductor’s cluster-internal API (the cluster-internal DNS name only resolves inside k3s via CoreDNS). She operates on GitHub-direct flows — no conductor dispatch, no cost attribution — until the networking gap is resolved (Tailscale subnet router or Cloudflare tunnel).

2.5 rig-memory-mcp

A Memory MCP server backed by Postgres + pgvector. Exposes three tools: read_memories, write_memory, mark_used. All agents share the same store.

Field	Type	Purpose
`scope`	enum	`repo` \| `global` — search radius
`kind`	enum	`learning` \| `decision` \| `error`
`repo`	string	Which repo this memory applies to
`importance`	1-5	Signal strength (5 = always read)
`hit_count`	int	How many times this memory was useful

Hybrid BM25 + vector search enables both keyword lookup (“XREADGROUP”, “CF 1014”) and semantic similarity. When importance ≥ 4 and hit_count ≥ 5, the weekly lint process promotes the memory to a permanent research doc.

2.6 Deployment stack

Layer	Tool	Where
Cluster	k3s	GCE VM (`invotek-k3s`, `invotek-github-infra`)
GitOps	Flux v2	Watches `dashecorp/rig-gitops`
Cloud infra	OpenTofu	`dashecorp/infra`
DNS + CDN	Cloudflare	Pages, DNS, tunnels
Secrets	SOPS + age	Age keys in sealed k8s secrets
Container registry	GitHub GHCR	`ghcr.io/dashecorp/*`

3. Operational Principles

3.1 Reconciliation

Every agent session begins with a reconciliation step: the agent fetches BRAIN.md (the compiled system state, ~27 KB), cross-references its memory, and aligns its understanding of the current state before making any changes.

Reconciliation prevents the most common cold-start failure: an agent acting on stale context (e.g., proposing a migration that was done last week, or opening a PR against an endpoint that was renamed). BRAIN.md is compiled from facts/*.yaml + live GitHub API state on every push. CI runs npm run brain:check and fails if the compiled output drifts from source facts.

3.2 Exclusivity

The assignment API guarantees exactly-once delivery via optimistic concurrency on the Marten event store. When GET /api/assignments/next is called:

conductor:
  1. SELECT unassigned issues ORDER BY priority
  2. Attempt to append ISSUE_ASSIGNED event with expected version N
  3. If version conflict → retry from step 1
  4. Return the claimed issue to the caller

A second concurrent caller returns a different issue (or 204 if none remain). This prevents the “duplicate branch” failure mode: two Dev-E instances working the same issue and producing conflicting PRs.

For Redis XREADGROUP-based workloads, exclusivity is per-pod: each consumer uses its HOSTNAME env var as the consumer group member ID. This means a pod restart creates a new consumer rather than inheriting the previous pod’s pending entries under a stale ID — the key insight behind the per-pod partitioning pattern.

3.3 Acceptance checks

No agent-authored code merges without passing all of:

Gate	Who enforces	Bypass
CI (build + lint + test)	GitHub Actions	None — must fix
Review-E approval	Review-E (cron)	`manual-merge` label skips auto-merge
`Closes #N` in PR body	rig-conductor webhook	Edit PR body
No unresolved threads	rig-conductor merge gate	Resolve threads
CODEOWNERS approval	GitHub branch protection	Human approval required

The order matters: CI runs first (cheap, fast), then Review-E (expensive, thorough), then the conductor merge gate (final consistency check). A failure at any gate holds the PR without blocking the conductor’s ability to assign new work.

3.4 GitOps split

The rig enforces a clean separation between what agents can change and what requires human or Flux:

discover/dashecorp/rig-gitops/     ← Flux-managed; changes here = cluster changes
  apps/dev-e/                      ← HelmRelease for Dev-E (replicas, image, env)
  apps/review-e/                   ← HelmRelease for Review-E
  apps/rig-conductor/              ← HelmRelease for conductor

dashecorp/infra/                   ← OpenTofu; changes here = cloud resource changes
  cloudflare/pages.tf              ← Pages projects
  github/repos.tf                  ← Repo creation
  cloudflare/dns.tf                ← DNS records

Agents write to application repos (rig-docs, rig-conductor, rig-agent-runtime). They don’t modify HelmReleases or Terraform directly. That separation gives operators a hard boundary: if the HelmRelease says replicas: 1, only a human (or Ops-E, when deployed) can change it to replicas: 2.

3.5 Governance

Branch protection is the agent gate. Every dashecorp/* repo requires:

≥1 PR approval (Review-E counts)
CI passing
Closes #N in body (conductor webhook check)
CODEOWNERS approval for declared-sensitive paths

CODEOWNERS files are the human veto layer: if infra/ or AGENTS.md is in CODEOWNERS, a human must approve changes to those paths even if Review-E already approved the rest of the PR. This gives operators surgical control over which paths require human oversight without blocking the whole rig.

The design philosophy: agents are trusted to implement within lanes. The lanes are defined by CODEOWNERS, branch protection, and the conductor’s gate rules. To widen a lane, you edit a config file — you don’t re-train a model.

4. Tradeoffs

The rig makes explicit tradeoffs. Naming them honestly is part of the design.

Tradeoff	What we chose	What we gave up
Operator-hosted	Full control, no vendor lock-in	Operational burden (k3s, Flux, Postgres, wrangler)
Event-sourced state	Replay, auditability, time-travel debugging	Schema migration complexity, append-only commits
Agent-authored docs	Docs stay current automatically	Occasional inaccuracies require human correction
Claude Code CLI	Deep tool use, agentic loops, built-in memory	Anthropic-only (provider portability is planned)
Shared Postgres for memory	Simple deployment, no extra infra	Memory queries compete with event store IOPS
XREADGROUP for dispatch	Scales horizontally, crash-safe delivery	Consumer group management, HOSTNAME coupling
One rig for all projects	Shared learnings, shared tooling	Blast radius if conductor goes down
Starlight for docs	Mermaid rendering, MDX, versioned	Overkill for tiny repos; two parallel surfaces exist

What doesn’t work yet

Honest gaps as of 2026-04-23:

iBuild-E ↔ conductor — DNS gap prevents Mac Mini from calling cluster-internal endpoints. She runs on GitHub-direct flows with no cost attribution.
LiteLLM proxy — Not deployed. Blocks hard budget enforcement (kill-switch if cost exceeds daily ceiling).
Langfuse — Self-hosted LLM trace ingestion is planned, not running. Agent traces don’t flow to any observability backend today.
External dashboard — /dashboard is cluster-internal only. A human on laptop needs kubectl port-forward to view it.
ATL-E retired — The coordination agent was retired ~2026-03-26. No active team-lead role. Epics spanning multiple agents require human orchestration.

5. Roadmap

Shipped

Capability	Notes
Issue dispatch (rig-conductor)	Atomic assignment, 40+ event types
Dev-E node / python / dotnet	Three HelmReleases, polyglot coverage
Review-E cron reviews	Every 5 min, approval gates
Memory MCP (read/write/mark_used)	Postgres + pgvector hybrid search
BRAIN.md compiled from facts/	CI-checked on every push
Cost dashboard	Conductor built-in, Costs + Issues + Agents tabs
Flux GitOps deployment	All rig components Flux-managed
CODEOWNERS + branch protection	Per-repo human veto layer
Per-pod XREADGROUP partitioning	HOSTNAME-based consumer IDs
Dangerous-command guard	pretool-guard.sh blocks `sudo`, `apt`, `rm -rf`

In progress

Capability	Status	Notes
Default-deny egress	Partial	Network policy blocks most outbound; Cloudflare APIs still exempt (rig-docs#57)
OTel Collector	Partial	Deployed for conductor; agent pods not yet emitting
LiteLLM proxy	Planned	Deployment work in progress

Capability	What it unlocks
Architect-E	System-level design proposals; character exists, Helm config TBD
Ops-E	Deployment automation; character exists, runtime TBD
Cost ceiling kill-switch	Requires LiteLLM proxy first
Public conductor dashboard	Requires Cloudflare tunnel or read-only projection
Docs-memory drift lint	Weekly LLM pass promoting memory → docs at importance≥4, hit_count≥5
Judge-E	Quality evaluation agent for LLM-as-judge rubric reviews

6. Getting Started

The rig is not a product you install. It’s a pattern you run. Every component is open-source or open-spec (Marten, Flux, Claude Code, pgvector, Mermaid). The orchestration lives in dashecorp/rig-gitops.

To replicate the pattern at minimum viable scale:

# 1. Stand up k3s with Flux
curl -sfL https://get.k3s.io | sh
flux bootstrap github --owner=your-org --repository=your-gitops

# 2. Deploy Postgres (for Marten event store + pgvector memory)
helm install postgres bitnami/postgresql

# 3. Deploy rig-conductor
# HelmRelease in gitops repo points to ghcr.io/dashecorp/rig-conductor

# 4. Wire GitHub webhooks
# POST dashecorp/rig-conductor /api/webhook/github
# Events: issues, pull_request, check_run, push

# 5. Deploy one Dev-E instance
# Start with replicas: 1, node variant

# 6. File your first issue with `agent-ready` label
# The conductor picks it up within seconds

Full deployment is documented in rig-gitops/docs/onboarding.md. BRAIN.md is the runtime entry point — every new agent session starts there.

The Autonomous Engineering Rig

The Autonomous Engineering Rig

1. The Problem

The three hard problems

2. Architecture

2.1 rig-conductor

2.2 Dev-E

2.3 Review-E

2.4 iBuild-E

2.5 rig-memory-mcp

2.6 Deployment stack

3. Operational Principles

3.1 Reconciliation

3.2 Exclusivity

3.3 Acceptance checks

3.4 GitOps split

3.5 Governance

4. Tradeoffs

What doesn’t work yet

5. Roadmap

Shipped

In progress

Next

6. Getting Started

See also