What to implement next — raising the floor before raising the ceiling
For engineering leadership deciding what to fund next. The rig runs autonomous issue → merged PR loops today (~20 min, ~$0.62/task, zero human interventions). But the implementation matrix is honest: 21 of 78 tracked capabilities are deployed or partial (27%); 44 are planned (56%). The question isn’t whether the rig works — it’s which gap to close first.
TL;DR: raise the floor before the ceiling. Four investments, in this order: (1) safety foundation — dangerous-command guard, worktrees, egress policy; (2) agent observability — one env var + a trace store; (3) hard cost ceiling — LiteLLM proxy, per-agent virtual keys; (4) nightly quality gate — golden suite as merge blocker. None of these add headline features. All of them make it safe to add headline features later.
The claim in one picture
Section titled “The claim in one picture”flowchart LR
NOW["📍 Today<br/>27% deployed<br/>no hard guards"] --> F1
F1["🛡 Safety floor<br/>blocks the unrecoverable"] --> F2
F2["👁 Visibility<br/>we can see what agents do"] --> F3
F3["💰 Cost ceiling<br/>proxy-enforced, not trust-based"] --> F4
F4["✅ Quality gate<br/>regression blocks merge"] --> NEXT["🚀 Ready for<br/>ambition investments"]
style NOW fill:#ffebee,stroke:#c62828
style F1 fill:#fff3e0,stroke:#ef6c00
style F2 fill:#fff8e1,stroke:#f9a825
style F3 fill:#e3f2fd,stroke:#1976d2
style F4 fill:#e8f5e9,stroke:#388e3c
style NEXT fill:#c8e6c9,stroke:#2e7d32,stroke-width:2pxView Mermaid source
flowchart LR
NOW["📍 Today<br/>27% deployed<br/>no hard guards"] --> F1
F1["🛡 Safety floor<br/>blocks the unrecoverable"] --> F2
F2["👁 Visibility<br/>we can see what agents do"] --> F3
F3["💰 Cost ceiling<br/>proxy-enforced, not trust-based"] --> F4
F4["✅ Quality gate<br/>regression blocks merge"] --> NEXT["🚀 Ready for<br/>ambition investments"]
style NOW fill:#ffebee,stroke:#c62828
style F1 fill:#fff3e0,stroke:#ef6c00
style F2 fill:#fff8e1,stroke:#f9a825
style F3 fill:#e3f2fd,stroke:#1976d2
style F4 fill:#e8f5e9,stroke:#388e3c
style NEXT fill:#c8e6c9,stroke:#2e7d32,stroke-width:2pxEach layer unblocks the next. Without the floor, the ceiling is aspirational.
Where we are today
Section titled “Where we are today”pie title 78 tracked capabilities · updated 2026-04-21
"Deployed" : 17
"Partial" : 7
"Planned" : 32
"Deferred" : 9
"Rejected" : 13View Mermaid source
pie title 78 tracked capabilities · updated 2026-04-21
"Deployed" : 17
"Partial" : 7
"Planned" : 32
"Deferred" : 9
"Rejected" : 13Source: rig-gitops/docs/whitepaper/implementation-status.md, updated every merge. (2026-04-21: +3 Deployed and +1 Partial from Priority 1 shipping — dangerous-command-guard, worktrees per task, GuardBlocked events, and partial GitHub-App-1h-tokens.)
What works today. rig-conductor event store (Marten/Postgres, 28 event types, projections live). Valkey per-agent streams + KEDA autoscaling. rig-dev / rig-reviewer / rig-macos runtimes deployed. Memory MCP with pgvector + HNSW. Brain compiled from facts/*.yaml on every merge with CI drift checks. Cost attribution via TokenUsageProjection. SOPS + age + Flux inline decryption (the security foundation). Three autonomous merges on 2026-04-19 in a 70-minute window. Priority 1 safety floor 3.5 of 5 complete as of 2026-04-21: PreToolUse dangerous-command guard active on all agent pods, per-task git worktrees, GuardBlocked events flowing to rig-conductor, GitHub App 1h-token hardening (no PAT fallback on mint failure).
What’s still missing. Default-deny egress policy (AC 5 — the heaviest). All of Priority 2 observability (parked on destination pick + startup-credit decision). Hard cost cap. Regression gate. These are all planned with concrete tickets. “Raising the floor” is one AC from done.
Priority 1 · Safety foundation
Section titled “Priority 1 · Safety foundation”flowchart TB
subgraph P0["🛡 Phase 0 — deterministic guards"]
G1["✅ Dangerous-command guard<br/>PreToolUse hook · shipped"]
G2["✅ Git worktrees per task<br/>bare + worktree · shipped"]
G3["🟡 Default-deny egress<br/>phased plan scoped · Phase 1 YAML pending"]
G4["✅ GitHub App tokens · 1h TTL<br/>no PAT fallback · shipped"]
end
A["🤖 Agent tool call"] --> G1
G1 -->|allow| G2
G1 -->|block + event| GB["GuardBlocked<br/>→ metrics dashboard"]
G2 --> G3
G3 -->|allow host| NET((internet))
G3 -->|deny| DROP["🚫"]
style P0 fill:#fff3e0,stroke:#ef6c00
style GB fill:#ffebee,stroke:#c62828
style DROP fill:#ffebee,stroke:#c62828View Mermaid source
flowchart TB
subgraph P0["🛡 Phase 0 — deterministic guards"]
G1["✅ Dangerous-command guard<br/>PreToolUse hook · shipped"]
G2["✅ Git worktrees per task<br/>bare + worktree · shipped"]
G3["🟡 Default-deny egress<br/>phased plan scoped · Phase 1 YAML pending"]
G4["✅ GitHub App tokens · 1h TTL<br/>no PAT fallback · shipped"]
end
A["🤖 Agent tool call"] --> G1
G1 -->|allow| G2
G1 -->|block + event| GB["GuardBlocked<br/>→ metrics dashboard"]
G2 --> G3
G3 -->|allow host| NET((internet))
G3 -->|deny| DROP["🚫"]
style P0 fill:#fff3e0,stroke:#ef6c00
style GB fill:#ffebee,stroke:#c62828
style DROP fill:#ffebee,stroke:#c62828Four independent guards that sit between agent reasoning and tool execution — deterministic, no LLM in the loop.
| Guard | What it stops | Cost to build |
|---|---|---|
| Dangerous-command guard | sudo, rm -rf /, git push --force, drop table, chmod 777, curl | sh, unreviewed package installs | ~1 week. Pattern already specified in safety.md — Gastown’s tap_guard_dangerous as reference. No override flag. |
| Git worktrees per task | One agent’s mistake reaching another agent’s workspace | ~1 week. Cursor 2026 pattern, well-trodden. |
| Default-deny egress | Data exfiltration via prompt injection | ~2 weeks. Needs Cilium L7 — the biggest ROI prompt-injection defense. |
| GitHub App installation tokens (1h TTL) | Long-lived PAT leakage replayability | ~3 days. Replaces the classic PAT in agent pods. |
Why now
Section titled “Why now”No override flag is the non-obvious choice. “Add --confirm-dangerous and it works” becomes a learned pattern in any agent’s training data. The escape hatch is the human running the command manually outside the agent loop — that’s working as intended, not a gap. Every blocked call emits a GuardBlocked event to rig-conductor, so block counts become visible signal: a spike means a prompt-injection attempt or an agent bug worth looking at.
What it unblocks
Section titled “What it unblocks”Every higher-trust tier. The trust model’s T2 and T3 gates depend on “the agent can’t do the unrecoverable thing without a human.” That’s what this priority buys.
Evidence base: safety.md pillars 1–2, implementation-status.md → Safety domain (8 capabilities; today 0 deployed).
Priority 2 · Agent observability
Section titled “Priority 2 · Agent observability”flowchart LR
A1["rig-dev pod<br/>CLAUDE_CODE_ENABLE_TELEMETRY=1"] -->|OTLP| OC[OTel Collector]
A2["rig-reviewer pod"] -->|OTLP| OC
A3["rig-macos pod"] -->|OTLP| OC
OC -->|LLM traces| LF["🔭 Langfuse Cloud<br/>startup 50% off<br/>LLM-specific UX"]
OC -->|infra + trace waterfalls| GC2["📊 Grafana Cloud<br/>$100k startup credit · 12 mo<br/>LGTM stack"]
DEV["🧑 Dev inner loop"] -->|OTLP localhost| PX["🔥 Phoenix<br/>docker compose<br/>instant feedback"]
OC -->|infra metrics| GC["Grafana Cloud"]
LF --> UI1["Quality · cost · per-task UI"]
GC --> UI2["SLO · error budget"]
style OC fill:#fff3e0,stroke:#ef6c00
style LF fill:#e3f2fd,stroke:#1976d2
style GC fill:#e3f2fd,stroke:#1976d2
style UI1 fill:#e8f5e9,stroke:#388e3c
style UI2 fill:#e8f5e9,stroke:#388e3cView Mermaid source
flowchart LR
A1["rig-dev pod<br/>CLAUDE_CODE_ENABLE_TELEMETRY=1"] -->|OTLP| OC[OTel Collector]
A2["rig-reviewer pod"] -->|OTLP| OC
A3["rig-macos pod"] -->|OTLP| OC
OC -->|LLM traces| LF["🔭 Langfuse Cloud<br/>startup 50% off<br/>LLM-specific UX"]
OC -->|infra + trace waterfalls| GC2["📊 Grafana Cloud<br/>$100k startup credit · 12 mo<br/>LGTM stack"]
DEV["🧑 Dev inner loop"] -->|OTLP localhost| PX["🔥 Phoenix<br/>docker compose<br/>instant feedback"]
OC -->|infra metrics| GC["Grafana Cloud"]
LF --> UI1["Quality · cost · per-task UI"]
GC --> UI2["SLO · error budget"]
style OC fill:#fff3e0,stroke:#ef6c00
style LF fill:#e3f2fd,stroke:#1976d2
style GC fill:#e3f2fd,stroke:#1976d2
style UI1 fill:#e8f5e9,stroke:#388e3c
style UI2 fill:#e8f5e9,stroke:#388e3cAgents emit OpenTelemetry GenAI spans for every LLM call. Collector already runs per cluster for rig-conductor. Flip the env var, ship the spans to a trace store.
The small step first
Section titled “The small step first”Set CLAUDE_CODE_ENABLE_TELEMETRY=1 in agent pods. That single line turns on native OTel emission with GenAI semantic conventions. No code change. One helmrelease.yaml edit per agent.
Then:
- Apply for startup credits (Invotek AS qualifies on the plain criteria):
- Grafana Cloud for Startups — $100k / 12 mo; covers Tempo traces + Loki logs + Mimir metrics + Enterprise plugins. At our 1.5M–15M spans/mo workload, list is $22–$47/mo so the credit gives effectively unlimited runway.
- Langfuse early-stage discount — 50% off first year on Core/Pro; keeps LLM-specific UX (prompt diff, eval scoring, datasets) affordable.
- Both approvals take 1–2 weeks; free tiers cover the gap.
- Wire OTel Collector dual-export —
gen_ai.*spans → Langfuse Cloud (Hobby free until discount lands); infra + full OTel → Grafana Cloud (free 50 GB / 14-day until credit lands). One Collector config change. - Phoenix stays for the dev inner loop — engineers run a local
docker composePhoenix while iterating on prompts or agent code. Local latency matters; no network hop during the tight inner loop. - Fallback path, documented: if the Grafana credit is denied, drop back to OpenObserve self-hosted on the rig k3s cluster (~$30/mo flat, zero lock-in, S3/GCS-backed). If the Langfuse discount is denied, stay on Hobby free tier until volume forces a decision.
The second-pass research — startup programs + storage economics (2026-04-21) — supersedes the original options doc on pricing; the structural comparison of 11 candidates in the earlier research still stands.
Why now
Section titled “Why now”Right now we know the rig works because three merges landed cleanly on 2026-04-19. We don’t know why a bad run is bad. Every other priority on this list depends on being able to distinguish a healthy agent from an unhealthy one at a glance — cost attribution, quality regression, drift detection, tier promotion, self-healing loops, all of it.
What it unblocks
Section titled “What it unblocks”Priority 3 (cost dashboards need trace data), Priority 4 (regression metrics need baselines), and every principle that contains the word “measure”.
Evidence base: observability.md TL;DR + implementation-status → Observability domain (7 capabilities; today 1 deployed, 2 partial).
Priority 3 · Hard cost ceiling
Section titled “Priority 3 · Hard cost ceiling”flowchart LR
classDef off fill:#ffebee,stroke:#c62828,color:#000
classDef on fill:#e8f5e9,stroke:#388e3c,color:#000
classDef work fill:#fff3e0,stroke:#ef6c00,color:#000
L1["1 · Pre-flight prediction<br/>cheap model<br/>abort if over budget"]:::off
L2["2 · Dispatch token-bucket<br/>rig-conductor<br/>circuit breaker"]:::work
L3["3 · LiteLLM proxy<br/>per-agent virtual keys<br/>HARD 429 CEILING"]:::off
L4["4 · Langfuse attribution<br/>post-hoc per-task cost"]:::off
L1 --> L2 --> L3 --> LLM["LLM provider"]
LLM --> L4
L4 -.->|weekly review| L2View Mermaid source
flowchart LR
classDef off fill:#ffebee,stroke:#c62828,color:#000
classDef on fill:#e8f5e9,stroke:#388e3c,color:#000
classDef work fill:#fff3e0,stroke:#ef6c00,color:#000
L1["1 · Pre-flight prediction<br/>cheap model<br/>abort if over budget"]:::off
L2["2 · Dispatch token-bucket<br/>rig-conductor<br/>circuit breaker"]:::work
L3["3 · LiteLLM proxy<br/>per-agent virtual keys<br/>HARD 429 CEILING"]:::off
L4["4 · Langfuse attribution<br/>post-hoc per-task cost"]:::off
L1 --> L2 --> L3 --> LLM["LLM provider"]
LLM --> L4
L4 -.->|weekly review| L2Legend: 🔴 not built · 🟡 partial · 🟢 deployed.
Four layers of cost control from pre-flight estimation to post-hoc attribution. Today: only the cheapest, lowest-guarantee layer (TokenUsageProjection) exists. The one that matters — Layer 3, proxy-level hard ceiling — is unbuilt.
The blocking piece
Section titled “The blocking piece”LiteLLM proxy with per-agent virtual keys. No override, no trust-based limiter. Returns 429 before the request reaches the LLM provider. A compromised or looping agent cannot exceed its budget because the call fails at the proxy, not at the agent.
Honest caveat from the cost-framework whitepaper: LiteLLM issue #12905 shows user-level budgets are not enforced inside team configs. Treat the proxy as the primary defense, not an absolute one. Every LiteLLM upgrade needs a synthetic budget-overrun test to verify 429 still fires as configured.
Why now
Section titled “Why now”One looping agent on a shared provider burns the hourly rate-limit budget for every other agent. Today that’s a trust-based hope (“don’t let agents loop”), not an enforced guarantee. Phase 2 work — prompt caching, pre-flight prediction, circuit breakers — all depend on having the proxy to route through.
What it unblocks
Section titled “What it unblocks”Multi-tenant operation with confidence. A new project onboarding can be given a virtual key with a hard dollar cap without any engineering per-tenant. “Stage 1 tenant, $50/day hard cap” becomes a LiteLLM config row, not a feature request.
Evidence base: cost-framework.md four layers, implementation-status.md → Cost framework domain (7 capabilities; today 1 deployed, 1 partial).
Priority 4 · Nightly quality gate
Section titled “Priority 4 · Nightly quality gate”flowchart LR
N["🌙 Nightly<br/>~$3–8/run"] --> G[Golden suite<br/>10 internal tasks]
N --> S[SWE-bench Pro<br/>weekly subset<br/>$20–40/week]
N --> R[Regression cases<br/>one per past incident]
G & S & R --> EVAL{Regression<br/>over 10%?}
EVAL -->|yes| BLOCK["🚫 Fails pipeline<br/>alert + merge block"]
EVAL -->|no| OK["✅ Green · continue"]
EVAL -.->|all results| LF["Langfuse trends"]
style N fill:#fff9c4
style BLOCK fill:#ffebee,stroke:#c62828
style OK fill:#e8f5e9,stroke:#388e3c
style LF fill:#e3f2fd,stroke:#1976d2View Mermaid source
flowchart LR
N["🌙 Nightly<br/>~$3–8/run"] --> G[Golden suite<br/>10 internal tasks]
N --> S[SWE-bench Pro<br/>weekly subset<br/>$20–40/week]
N --> R[Regression cases<br/>one per past incident]
G & S & R --> EVAL{Regression<br/>over 10%?}
EVAL -->|yes| BLOCK["🚫 Fails pipeline<br/>alert + merge block"]
EVAL -->|no| OK["✅ Green · continue"]
EVAL -.->|all results| LF["Langfuse trends"]
style N fill:#fff9c4
style BLOCK fill:#ffebee,stroke:#c62828
style OK fill:#e8f5e9,stroke:#388e3c
style LF fill:#e3f2fd,stroke:#1976d2A nightly harness runs the rig against a fixed set of tasks and fails the pipeline if any metric regresses more than 10%. Cost estimate from quality-and-evaluation.md: ~$3–8/night, ~$1.1–2.9k/year. Runs in 30–60 minutes. Catches actual regressions in tasks we care about, not synthetic leaderboard tasks.
Why now
Section titled “Why now”Every prompt change, dependency bump, and model upgrade today is a hope that nothing regressed. The brain-and-memory whitepaper claim — “measured today: 20 min issue→merge, $0.62/task” — is a snapshot, not an invariant. Without a nightly gate the numbers drift quietly.
What it unblocks
Section titled “What it unblocks”Autonomy tier promotion. The rig advances from T1 (suggest) to T2 (merge-with-approval) only when 20 successful runs land with zero rollbacks — and that’s measurable only if “successful run” has a fixed definition. The nightly suite is that definition.
Also: property-based tests (Hypothesis) on labeled changes, LLM-as-judge sampling (10% T0 / 100% T2), DORA metrics adapted to agents — all depend on the nightly pipeline existing as a scaffold to hang per-PR gates on.
Evidence base: quality-and-evaluation.md + implementation-status → Quality and evaluation domain (7 capabilities; today 0 deployed).
Roadmap
Section titled “Roadmap”gantt
title Floor-raising roadmap · next ~60 working days
dateFormat YYYY-MM-DD
section 1 · Safety floor
Dangerous-command guard :active, s1, 2026-04-21, 5d
Git worktrees per task :s2, after s1, 5d
GitHub App tokens (1h TTL) :s3, after s1, 3d
Default-deny egress + Cilium L7 :s4, after s2, 10d
section 2 · Observability
CLAUDE_CODE_ENABLE_TELEMETRY=1 :o1, after s3, 1d
Apply Grafana + Langfuse startup credits :o2, after o1, 1d
OTel Collector dual-export config :o3, after o2, 2d
Phoenix local docker compose :o4, after o2, 1d
Credits-granted vs fallback ADR :o5, after o3, 1d
section 3 · Cost ceiling
LiteLLM proxy deploy :c1, after o2, 5d
Per-agent virtual keys + caps :c2, after c1, 3d
Synthetic budget-overrun test :c3, after c2, 2d
Pre-flight prediction (Haiku) :c4, after c3, 5d
section 4 · Quality gate
Golden suite · 10 tasks :q1, after c2, 5d
Nightly harness + alert :q2, after q1, 5d
Regression blocker in CI :q3, after q2, 3dView Mermaid source
gantt
title Floor-raising roadmap · next ~60 working days
dateFormat YYYY-MM-DD
section 1 · Safety floor
Dangerous-command guard :active, s1, 2026-04-21, 5d
Git worktrees per task :s2, after s1, 5d
GitHub App tokens (1h TTL) :s3, after s1, 3d
Default-deny egress + Cilium L7 :s4, after s2, 10d
section 2 · Observability
CLAUDE_CODE_ENABLE_TELEMETRY=1 :o1, after s3, 1d
Apply Grafana + Langfuse startup credits :o2, after o1, 1d
OTel Collector dual-export config :o3, after o2, 2d
Phoenix local docker compose :o4, after o2, 1d
Credits-granted vs fallback ADR :o5, after o3, 1d
section 3 · Cost ceiling
LiteLLM proxy deploy :c1, after o2, 5d
Per-agent virtual keys + caps :c2, after c1, 3d
Synthetic budget-overrun test :c3, after c2, 2d
Pre-flight prediction (Haiku) :c4, after c3, 5d
section 4 · Quality gate
Golden suite · 10 tasks :q1, after c2, 5d
Nightly harness + alert :q2, after q1, 5d
Regression blocker in CI :q3, after q2, 3dNot hard-committed dates — shape and sequencing. Actual ticket landing will slip; that’s fine. What matters is the order: floor before ceiling, each layer unblocking the next.
What is explicitly NOT next
Section titled “What is explicitly NOT next”Honest deferrals, with the reason to defer:
flowchart TB
subgraph NOT["🔭 Deliberately not now"]
N1["Spec-E / Architect-E<br/>(new agents)"]
N2["Reproduction harness<br/>(self-healing Stage 2)"]
N3["Sigstore + SLSA L3 + Kyverno<br/>(supply-chain hardening)"]
N4["flagd + OpenFeature<br/>(feature-flag platform)"]
N5["Cross-provider fallback<br/>(LiteLLM fallback_models)"]
end
N1 -.->|"reason"| R1[Headline feature · needs<br/>floor first]
N2 -.-> R2[Frontier work ·<br/>needs quality gate first]
N3 -.-> R3[Phase 4 · right thing<br/>eventually, not this quarter]
N4 -.-> R4[YAGNI · env vars +<br/>Kustomize cover today]
N5 -.-> R5[Deferred · only meaningful<br/>with multi-provider config]
style NOT fill:#f3e5f5,stroke:#7b1fa2View Mermaid source
flowchart TB
subgraph NOT["🔭 Deliberately not now"]
N1["Spec-E / Architect-E<br/>(new agents)"]
N2["Reproduction harness<br/>(self-healing Stage 2)"]
N3["Sigstore + SLSA L3 + Kyverno<br/>(supply-chain hardening)"]
N4["flagd + OpenFeature<br/>(feature-flag platform)"]
N5["Cross-provider fallback<br/>(LiteLLM fallback_models)"]
end
N1 -.->|"reason"| R1[Headline feature · needs<br/>floor first]
N2 -.-> R2[Frontier work ·<br/>needs quality gate first]
N3 -.-> R3[Phase 4 · right thing<br/>eventually, not this quarter]
N4 -.-> R4[YAGNI · env vars +<br/>Kustomize cover today]
N5 -.-> R5[Deferred · only meaningful<br/>with multi-provider config]
style NOT fill:#f3e5f5,stroke:#7b1fa2These are good ideas. They are not the next idea. Any of them built before the four floor layers above compounds technical debt in a system that doesn’t yet have the observability to tell when the debt has become a problem.
Success criteria
Section titled “Success criteria”How we know this is done, ~60 working days out:
| Criterion | How measured | Status |
|---|---|---|
Zero sudo / rm -rf / / git push --force calls land from agent pods | GuardBlocked event count with allow rate 100% on legitimate commands | ✅ Guard shipped + activated; event projection live |
| Tasks run in isolated workspaces with no cross-task leakage | /workspace/tasks/<task-id>/ per task, bare clone reused | ✅ Shipped |
| Agents use only short-lived (≤1h) GitHub credentials | getGitHubToken() returns App-minted token; no PAT fallback when App creds are configured | ✅ Shipped — PAT env var removed from dev-e + review-e pods (2026-04-21) |
| Every agent LLM call is traceable end-to-end in Langfuse Cloud + Grafana Cloud | Spot-check 10 random tasks; all visible with timing, tokens, cost | ⏳ Parked on credit-application + destination pick |
| Proxy enforcement verified by synthetic overrun | Dedicated test job deliberately exceeds a key’s daily cap; 429 fires before provider billed | ⏳ Priority 3 |
| Nightly run completes green 7 days in a row | Grafana dashboard green streak | ⏳ Priority 4 |
| Tier promotion unblocked — T1 → T2 policy engine has data to act on | 20+ successful nightly runs, zero rollbacks, quality metrics within tolerance | ⏳ Depends on Priorities 2–4 |
Then — and only then — the rig is ready for the next class of investment: headline agents (Spec-E, Architect-E), complex refactor capability, multi-runtime portability, reproduction-harness self-healing.
Tracked as user stories
Section titled “Tracked as user stories”Each priority has a dedicated user story with acceptance criteria, estimated effort, and a GitHub issue:
- Priority 1 — safety foundation · #57
- Priority 2 — agent observability · #58
- Priority 3 — hard cost ceiling · #59
- Priority 4 — nightly quality gate · #60
Further reading
Section titled “Further reading”/BRAIN.md— current rig brain (canonical)/whitepapers/2026-04-19-brain-and-memory— the system this paper builds on- OTel startup programs + storage economics — current Priority 2 vendor recommendation (Grafana Cloud + Langfuse via startup programs)
- OTel-native LLM observability options — structural comparison of 11 candidates; superseded on pricing
implementation-status.md— 78-row capability matrix with ticket links- Research — deep dives supporting these priorities
- Proposals — decisions already shipped
Naming note
Section titled “Naming note”Same as the brain-and-memory whitepaper: this document uses the target agent names (rig-conductor, rig-dev, rig-reviewer, rig-macos). The running deployment today still uses the original -E suffixes (Dev-E, Review-E, iBuild-E); both forms appear in infrastructure code, Discord channels, and event payloads during the transition.