Skip to content

Event-sourced agent state: Marten + Postgres in rig-conductor

Event-sourced agent state: Marten + Postgres in rig-conductor

Section titled “Event-sourced agent state: Marten + Postgres in rig-conductor”

An agent system has three properties that make event sourcing a natural fit:

  1. The “what happened” question matters as much as the “what is” question. When an issue stalls, you want to know: was it assigned? Did CI fail? Was Review-E called? A snapshot-only store answers “what is the state now” but not “what happened between creation and now.”

  2. Multiple consumers need the same facts. The cost projection, the stuck-detection service, the Discord notifier, and the merge gate all need to react to events. An event stream is a natural fan-out.

  3. Replay is essential for debugging. When the rig behaves unexpectedly, you can replay the event stream up to any point in time and inspect the state. No log-diving required.

Marten is a .NET library that turns Postgres into a document + event store. Rig-conductor uses its event store features:

// Appending an event (atomic, versioned)
await session.Events.AppendAsync(
streamId: issueStreamId, // stream per issue
expectedVersion: current, // optimistic concurrency
new IssueAssignedEvent
{
IssueNumber = issue.Number,
AgentId = agentId,
AssignedAt = DateTimeOffset.UtcNow
}
);
await session.SaveChangesAsync(); // commits atomically

If two callers race to claim the same issue, one succeeds (version N → N+1) and the other gets a ConcurrencyException and must retry from the latest state. This is the exclusivity guarantee.

Each GitHub issue gets its own Marten event stream, keyed by {owner}/{repo}#{number}. The stream accumulates all lifecycle events:

Stream: dashecorp/rig-agent-runtime#88
[0] ISSUE_APPROVED 2026-04-21T09:00:00Z
[1] ISSUE_ASSIGNED 2026-04-21T09:00:05Z agent=dev-e-node
[2] HEARTBEAT 2026-04-21T09:10:00Z
[3] HEARTBEAT 2026-04-21T09:20:00Z
[4] AGENT_STUCK 2026-04-21T09:47:00Z elapsed=47m
[5] ISSUE_UNASSIGNED 2026-04-21T09:47:01Z (re-queued by StaleHeartbeatService)
[6] ISSUE_ASSIGNED 2026-04-21T09:50:00Z agent=dev-e-node (new pod)
[7] BRANCH_CREATED 2026-04-21T09:51:30Z
[8] PR_CREATED 2026-04-21T10:15:00Z
[9] CI_PASSED 2026-04-21T10:22:00Z
[10] REVIEW_ASSIGNED 2026-04-21T10:23:00Z
[11] REVIEW_PASSED 2026-04-21T10:31:00Z
[12] MERGED 2026-04-21T10:32:00Z
[13] ISSUE_DONE 2026-04-21T10:32:05Z

This stream is the complete history. You can answer any question: How long did it take? How many times did it stall? What was the cost?

Marten’s projection system builds read models from the event stream. Rig-conductor has several:

ProjectionRead modelPurpose
IssueProjectionIssueState documentCurrent state per issue (assigned/stuck/done)
TokenUsageProjectionTokenUsage documentToken counts per issue per agent
CostProjectionDailyCost documentAggregated spend by date, agent, repo
AgentStatusProjectionAgentStatus documentLast heartbeat per agent instance

Projections run synchronously on event append (inline) for low-latency reads. The cost projections run asynchronously (Marten’s async daemon) because they aggregate across many streams.

  • Full audit trail — every state change is permanent and timestamped
  • Time-travel debugging — replay stream to any version to inspect intermediate state
  • Parallel read models — add a new projection without changing the event append path
  • Atomic exclusivity — optimistic concurrency on append prevents double-assignment
  • Postgres reliability — one backing store, standard ops tooling (pg_dump, replicas)
  • Schema migration complexity — changing an event’s shape requires handling old versions in deserializers (upcasting). Marten supports this but it requires discipline.
  • Append-only cost — you cannot “edit” history. Corrections require a compensating event.
  • Query patterns — “give me all issues assigned to dev-e-node in the last hour” requires a projection or a cross-stream query. Pure event sourcing doesn’t do ad-hoc queries cheaply.

For the rig’s use case — 60–100 issues/day, full audit trail required, 6 agents producing structured lifecycle events — Marten’s tradeoffs land clearly in favor. A CRUD store (update status column in-place) would be simpler to query but wouldn’t answer “what happened and when” without a separate audit log.