Skip to content

The Autonomous Engineering Rig

A shared infrastructure for human-supervised autonomous software development.


Software development at pace has a fundamental coordination cost. An engineer maintaining six services faces the same overhead at every service boundary: context switch, test setup, review back-and-forth, deploy check. This overhead compounds as the number of maintained surfaces grows — but the productive work per surface doesn’t.

LLMs offered a partial answer: they can suggest code, draft tests, explain unfamiliar systems. But copilots are still human-proxies. They wait for direction. They don’t close issues. They don’t open PRs. They don’t iterate on review feedback autonomously. A copilot saves keystrokes; it doesn’t free engineering capacity.

The rig inverts the model. Humans own intent (the issue) and acceptance (merge). Agents own the work between those two events — research, implementation, review, iteration. The human’s job becomes: write clear issues, review PRs, merge. Not: context-switch, diagnose, iterate over multiple sessions.

Building an autonomous engineering system that works in production surfaces three problems that copilot tools don’t face:

ProblemWhy it’s hard
Assignment exclusivityIf two agents pick up the same issue, they fork work and produce conflicting PRs. Must guarantee exactly-once delivery.
Acceptance gatingAgent output must pass human-authored quality checks before it can land. Agents cannot approve their own work.
Failure recoveryAgents get stuck. Systems go down. A production rig must detect, escalate, and recover without human polling.

The rig’s architecture is shaped by these three problems.


The rig runs on a single k3s cluster on a GCE VM (invotek-k3s), managed via Flux GitOps from dashecorp/rig-gitops. There are six deployed components plus two planned:

flowchart TB
  subgraph Human["Human operator"]
    H[Issues + PR reviews + merges]
  end

  subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
    CE[rig-conductor\nC# / Marten / Postgres]
    DE[Dev-E\nNode · Python · .NET]
    RE[Review-E\nCron every 5 min]
    MEM[rig-memory-mcp\nPostgres + pgvector]
  end

  subgraph External["External"]
    GH[GitHub]
    CF[Cloudflare Pages]
    ANT[Anthropic API]
    IB[iBuild-E\nMac Mini · Oslo]
  end

  H -->|file issue| GH
  GH -->|webhook| CE
  CE -->|ISSUE_ASSIGNED| DE
  CE -->|REVIEW_ASSIGNED| RE
  DE -->|open PR| GH
  RE -->|approve / CHANGES_REQUESTED| GH
  GH -->|merge| CF
  DE -->|read_memories / write_memory| MEM
  RE -->|read_memories / write_memory| MEM
  DE & RE & IB -->|inference| ANT
  IB -->|iOS PRs| GH
View Mermaid source
flowchart TB
  subgraph Human["Human operator"]
    H[Issues + PR reviews + merges]
  end

  subgraph Cluster["k3s cluster (GCE invotek-k3s)"]
    CE[rig-conductor\nC# / Marten / Postgres]
    DE[Dev-E\nNode · Python · .NET]
    RE[Review-E\nCron every 5 min]
    MEM[rig-memory-mcp\nPostgres + pgvector]
  end

  subgraph External["External"]
    GH[GitHub]
    CF[Cloudflare Pages]
    ANT[Anthropic API]
    IB[iBuild-E\nMac Mini · Oslo]
  end

  H -->|file issue| GH
  GH -->|webhook| CE
  CE -->|ISSUE_ASSIGNED| DE
  CE -->|REVIEW_ASSIGNED| RE
  DE -->|open PR| GH
  RE -->|approve / CHANGES_REQUESTED| GH
  GH -->|merge| CF
  DE -->|read_memories / write_memory| MEM
  RE -->|read_memories / write_memory| MEM
  DE & RE & IB -->|inference| ANT
  IB -->|iOS PRs| GH

The event store and dispatch engine. Written in C# with Marten (event-sourced Postgres). Receives GitHub webhooks, normalizes them into 40+ typed events, and maintains the issue lifecycle from ISSUE_APPROVED through ISSUE_DONE.

Key responsibilities:

EndpointPurpose
POST /api/webhook/githubIngest GitHub PR and issue events
GET /api/assignments/nextClaim next issue atomically (exclusivity guarantee)
GET /api/reviews/nextClaim next PR review
GET /api/usageToken / cost usage by agent and repo
GET /api/costs/summaryAggregate cost (default: 7 days)
/dashboard42 KB single-page cost + activity UI

The conductor is stateless in the agent sense — it doesn’t know what the agent is doing inside a session. It tracks lifecycle events (PR opened, review assigned, CI passed) and derives state from the event stream via Marten projections.

The coding agent. Deployed as three HelmReleases (node, python, dotnet) to handle polyglot work. Each instance follows the same loop:

1. Poll GET /api/assignments/next?agentId=dev-e-node
2. Fetch BRAIN.md + issue context + relevant research
3. Clone repo, create feature/issue-N-slug branch
4. Implement: code + tests + docs
5. Run test suite (npm test / pytest / dotnet test)
6. Push, open PR with `Closes #N`
7. Post progress to Discord thread
8. Iterate on Review-E feedback

Dev-E does not push to main. Every change goes through a PR. Every PR requires Review-E approval.

A cron agent (every 5 minutes) that scans for open PRs authored by Dev-E or iBuild-E. For each PR:

  1. Reads the diff + AGENTS.md + issue context + memory
  2. Posts a structured review: approves or files CHANGES_REQUESTED with specific line comments
  3. Resolves threads as she sees fixes committed in subsequent pushes

Review-E is the acceptance gate. She’s scoped to exclude PRs she authored (which is none — she only writes reviews, not code). She catches: missing Closes #N, incorrect frontmatter, missing docs, failing test coverage, broken cross-links, and logic errors in code.

The iOS/macOS build agent. Runs on a Mac Mini in Oslo because Xcode requires macOS hardware — it cannot run in the k3s cluster. Handles the operator’s personal-org iOS portfolio: Astro apps, React Native, Swift packages, App Store Connect submissions.

A known gap: iBuild-E cannot reach the conductor’s cluster-internal API (the cluster-internal DNS name only resolves inside k3s via CoreDNS). She operates on GitHub-direct flows — no conductor dispatch, no cost attribution — until the networking gap is resolved (Tailscale subnet router or Cloudflare tunnel).

A Memory MCP server backed by Postgres + pgvector. Exposes three tools: read_memories, write_memory, mark_used. All agents share the same store.

FieldTypePurpose
scopeenumrepo | global — search radius
kindenumlearning | decision | error
repostringWhich repo this memory applies to
importance1-5Signal strength (5 = always read)
hit_countintHow many times this memory was useful

Hybrid BM25 + vector search enables both keyword lookup (“XREADGROUP”, “CF 1014”) and semantic similarity. When importance ≥ 4 and hit_count ≥ 5, the weekly lint process promotes the memory to a permanent research doc.

LayerToolWhere
Clusterk3sGCE VM (invotek-k3s, invotek-github-infra)
GitOpsFlux v2Watches dashecorp/rig-gitops
Cloud infraOpenTofudashecorp/infra
DNS + CDNCloudflarePages, DNS, tunnels
SecretsSOPS + ageAge keys in sealed k8s secrets
Container registryGitHub GHCRghcr.io/dashecorp/*

Every agent session begins with a reconciliation step: the agent fetches BRAIN.md (the compiled system state, ~27 KB), cross-references its memory, and aligns its understanding of the current state before making any changes.

Reconciliation prevents the most common cold-start failure: an agent acting on stale context (e.g., proposing a migration that was done last week, or opening a PR against an endpoint that was renamed). BRAIN.md is compiled from facts/*.yaml + live GitHub API state on every push. CI runs npm run brain:check and fails if the compiled output drifts from source facts.

The assignment API guarantees exactly-once delivery via optimistic concurrency on the Marten event store. When GET /api/assignments/next is called:

conductor:
1. SELECT unassigned issues ORDER BY priority
2. Attempt to append ISSUE_ASSIGNED event with expected version N
3. If version conflict → retry from step 1
4. Return the claimed issue to the caller

A second concurrent caller returns a different issue (or 204 if none remain). This prevents the “duplicate branch” failure mode: two Dev-E instances working the same issue and producing conflicting PRs.

For Redis XREADGROUP-based workloads, exclusivity is per-pod: each consumer uses its HOSTNAME env var as the consumer group member ID. This means a pod restart creates a new consumer rather than inheriting the previous pod’s pending entries under a stale ID — the key insight behind the per-pod partitioning pattern.

No agent-authored code merges without passing all of:

GateWho enforcesBypass
CI (build + lint + test)GitHub ActionsNone — must fix
Review-E approvalReview-E (cron)manual-merge label skips auto-merge
Closes #N in PR bodyrig-conductor webhookEdit PR body
No unresolved threadsrig-conductor merge gateResolve threads
CODEOWNERS approvalGitHub branch protectionHuman approval required

The order matters: CI runs first (cheap, fast), then Review-E (expensive, thorough), then the conductor merge gate (final consistency check). A failure at any gate holds the PR without blocking the conductor’s ability to assign new work.

The rig enforces a clean separation between what agents can change and what requires human or Flux:

discover/dashecorp/rig-gitops/ ← Flux-managed; changes here = cluster changes
apps/dev-e/ ← HelmRelease for Dev-E (replicas, image, env)
apps/review-e/ ← HelmRelease for Review-E
apps/rig-conductor/ ← HelmRelease for conductor
dashecorp/infra/ ← OpenTofu; changes here = cloud resource changes
cloudflare/pages.tf ← Pages projects
github/repos.tf ← Repo creation
cloudflare/dns.tf ← DNS records

Agents write to application repos (rig-docs, rig-conductor, rig-agent-runtime). They don’t modify HelmReleases or Terraform directly. That separation gives operators a hard boundary: if the HelmRelease says replicas: 1, only a human (or Ops-E, when deployed) can change it to replicas: 2.

Branch protection is the agent gate. Every dashecorp/* repo requires:

  • ≥1 PR approval (Review-E counts)
  • CI passing
  • Closes #N in body (conductor webhook check)
  • CODEOWNERS approval for declared-sensitive paths

CODEOWNERS files are the human veto layer: if infra/ or AGENTS.md is in CODEOWNERS, a human must approve changes to those paths even if Review-E already approved the rest of the PR. This gives operators surgical control over which paths require human oversight without blocking the whole rig.

The design philosophy: agents are trusted to implement within lanes. The lanes are defined by CODEOWNERS, branch protection, and the conductor’s gate rules. To widen a lane, you edit a config file — you don’t re-train a model.


The rig makes explicit tradeoffs. Naming them honestly is part of the design.

TradeoffWhat we choseWhat we gave up
Operator-hostedFull control, no vendor lock-inOperational burden (k3s, Flux, Postgres, wrangler)
Event-sourced stateReplay, auditability, time-travel debuggingSchema migration complexity, append-only commits
Agent-authored docsDocs stay current automaticallyOccasional inaccuracies require human correction
Claude Code CLIDeep tool use, agentic loops, built-in memoryAnthropic-only (provider portability is planned)
Shared Postgres for memorySimple deployment, no extra infraMemory queries compete with event store IOPS
XREADGROUP for dispatchScales horizontally, crash-safe deliveryConsumer group management, HOSTNAME coupling
One rig for all projectsShared learnings, shared toolingBlast radius if conductor goes down
Starlight for docsMermaid rendering, MDX, versionedOverkill for tiny repos; two parallel surfaces exist

Honest gaps as of 2026-04-23:

  • iBuild-E ↔ conductor — DNS gap prevents Mac Mini from calling cluster-internal endpoints. She runs on GitHub-direct flows with no cost attribution.
  • LiteLLM proxy — Not deployed. Blocks hard budget enforcement (kill-switch if cost exceeds daily ceiling).
  • Langfuse — Self-hosted LLM trace ingestion is planned, not running. Agent traces don’t flow to any observability backend today.
  • External dashboard/dashboard is cluster-internal only. A human on laptop needs kubectl port-forward to view it.
  • ATL-E retired — The coordination agent was retired ~2026-03-26. No active team-lead role. Epics spanning multiple agents require human orchestration.

CapabilityNotes
Issue dispatch (rig-conductor)Atomic assignment, 40+ event types
Dev-E node / python / dotnetThree HelmReleases, polyglot coverage
Review-E cron reviewsEvery 5 min, approval gates
Memory MCP (read/write/mark_used)Postgres + pgvector hybrid search
BRAIN.md compiled from facts/CI-checked on every push
Cost dashboardConductor built-in, Costs + Issues + Agents tabs
Flux GitOps deploymentAll rig components Flux-managed
CODEOWNERS + branch protectionPer-repo human veto layer
Per-pod XREADGROUP partitioningHOSTNAME-based consumer IDs
Dangerous-command guardpretool-guard.sh blocks sudo, apt, rm -rf
CapabilityStatusNotes
Default-deny egressPartialNetwork policy blocks most outbound; Cloudflare APIs still exempt (rig-docs#57)
OTel CollectorPartialDeployed for conductor; agent pods not yet emitting
LiteLLM proxyPlannedDeployment work in progress
CapabilityWhat it unlocks
Architect-ESystem-level design proposals; character exists, Helm config TBD
Ops-EDeployment automation; character exists, runtime TBD
Cost ceiling kill-switchRequires LiteLLM proxy first
Public conductor dashboardRequires Cloudflare tunnel or read-only projection
Docs-memory drift lintWeekly LLM pass promoting memory → docs at importance≥4, hit_count≥5
Judge-EQuality evaluation agent for LLM-as-judge rubric reviews

The rig is not a product you install. It’s a pattern you run. Every component is open-source or open-spec (Marten, Flux, Claude Code, pgvector, Mermaid). The orchestration lives in dashecorp/rig-gitops.

To replicate the pattern at minimum viable scale:

Terminal window
# 1. Stand up k3s with Flux
curl -sfL https://get.k3s.io | sh
flux bootstrap github --owner=your-org --repository=your-gitops
# 2. Deploy Postgres (for Marten event store + pgvector memory)
helm install postgres bitnami/postgresql
# 3. Deploy rig-conductor
# HelmRelease in gitops repo points to ghcr.io/dashecorp/rig-conductor
# 4. Wire GitHub webhooks
# POST dashecorp/rig-conductor /api/webhook/github
# Events: issues, pull_request, check_run, push
# 5. Deploy one Dev-E instance
# Start with replicas: 1, node variant
# 6. File your first issue with `agent-ready` label
# The conductor picks it up within seconds

Full deployment is documented in rig-gitops/docs/onboarding.md. BRAIN.md is the runtime entry point — every new agent session starts there.