User story: safety foundation — block the unrecoverable before higher-trust tiers

User story

As the rig operator

I want deterministic runtime guards between agent reasoning and tool execution — dangerous-command blocklist, per-task git worktrees, default-deny egress, short-lived GitHub tokens

So that a compromised or looping agent cannot do the unrecoverable thing (filesystem destruction, secret exfiltration, force-push to main, long-lived token replay) without a human in the loop, and the rig earns the right to advance autonomy tiers (T1 → T2 → T3).

Progress (as of 2026-04-22 evening)

AC	Status	Ship
1 · Dangerous-command guard	✅ Shipped	rig-agent-runtime #97 + #98
2 · `GuardBlocked` events + dashboard panel	✅ Shipped	rig-conductor #90, rig-agent-runtime #99, rig-conductor #99
3 · Git worktrees per task	✅ Shipped	rig-agent-runtime #101
4 · GitHub App 1h tokens	✅ Shipped	rig-agent-runtime #103 + rig-gitops #119 + #121
5 · Default-deny egress (Phase 1)	🟡 Pod-scoped DNS live on review-e (burn-in)	rig-agent-runtime #115, rig-gitops #161 + #162

Score: 4 shipped + 1 partial across 2026-04-20 → 2026-04-22. AC 5 Phase 1 ipBlock attempt was shipped and reverted 2026-04-22 morning (api.anthropic.com is Cloudflare anycast, not the published Anthropic CIDR). Afternoon: three spikes landed the redesign direction — LiteLLM ruled out (error wrapping + OAuth incompat, bundled with Priority 3); Envoy SNI gateway verified end-to-end and shipped to the cluster (rig-gitops #153). Two integration attempts reverted same day — HTTPS_PROXY (SNI-inspector doesn’t speak CONNECT, #154 → #155) and cluster-wide CoreDNS rewrite (caught Flux’s own github.com fetch, #156 → #158). Evening: pod-scoped DNS path shipped. Chart 1.1.0 adds dnsPolicy / dnsConfig pass-through (rig-agent-runtime #115). Dedicated CoreDNS in the egress-gw namespace rewrites each allowlisted public host to the in-cluster Envoy egress gateway — target resolved via real kube-dns, so no Envoy ClusterIP is baked into config (rig-gitops #161 + fix #162 for the forward-plugin zone-parse trap). review-e wired with dnsPolicy: None + a dnsConfig.nameservers pointing at the dedicated CoreDNS (kube-dns kept as secondary). Live verification from a manually-scaled review-e pod: Discord gateway WSS + Anthropic + GitHub App token mint + MCP servers + Valkey stream all reached through the new path; rig-conductor sees a fresh heartbeat. 24h burn-in before dev-e rollout and the default-deny NetworkPolicy that terminates Phase 1. Cluster-reality correction (rig-docs #95 — k3s, not GKE) stands.

Context

See whats-next whitepaper §Priority 1 and the source safety.md whitepaper (pillars 1–2).

Today the rig has zero runtime guards. Trust is prompt-level (“don’t do bad things”) plus branch protection after the fact. The implementation-status matrix lists 8 safety capabilities with 0 deployed. This is the highest-leverage first investment because every higher-trust tier depends on it — you cannot promote an agent to T2 (merge-with-approval) if T1 has no floor.

Acceptance criteria

✅ Dangerous-command guard — PreToolUse hook reads tool-call JSON on stdin, matches tool_input.command against a blocklist, exits 2 (block + reason) on match. Minimum blocklist: sudo, rm -rf / (not rm -rf ./), git push --force (without --force-with-lease), git reset --hard, git clean -f, drop table, drop database, truncate table, kubectl delete namespace, package installers, chmod 777, chmod -R 000, curl … | sh. No override flag. Escape hatch is the human running the command outside the agent loop. Shipped in rig-agent-runtime#97 (script + tests + CI) and #98 (activated by default via baked-in ~/.claude/settings.json). 43 test cases pass.
✅ GuardBlocked event emission — every block emits a non-blocking event to rig-conductor; counts visible via GET /api/guard-blocked (optional agentId filter) and on the rig-conductor dashboard Safety panel (header stat + per-agent table with top reason, last command, last-blocked time). Shipped in rig-conductor#90 (event + projection + endpoint, 46/46 tests), rig-agent-runtime#99 (hook payload shape fix), and rig-conductor#99 (dashboard Safety panel).
✅ Git worktrees per agent task — each dispatched task runs in its own worktree under /workspace/tasks/<task-id>/<repo>/, backed by a shared bare clone at /workspace/_bare/<owner>/<repo>.git. One task’s workspace cannot reach another’s. Cursor 2026 pattern. Shipped in rig-agent-runtime#101 (task-workspace helper + 17 tests + CI, wired into the agent task prompt).
✅ GitHub App installation tokens (1h TTL) — replaces the classic PAT in agent pods. Tokens minted per dispatch, expire in 60 minutes, never persisted to disk. Shipped across:
- rig-agent-runtime#103 — removes the PAT fallback when App-mint fails (fail loud, not silent).
- rig-gitops#119 — removes the GITHUB_PERSONAL_ACCESS_TOKEN env var from dev-e + review-e pods entirely.
- rig-gitops#121 — implementation-status matrix updated. The only remaining trace is the github-token key still present inside the SealedSecrets; pruning requires a re-seal and is deferred to the next rotation. Nothing in the running pods references it.
🟡 Default-deny egress NetworkPolicy (Phase 1) — pod-scoped DNS path live on review-e; 24h burn-in then dev-e + NetworkPolicy. Five-attempt rollout story:
- Cluster reality correction (stands): the rig runs on k3s v1.34.6 on a GCE VM (not GKE as BRAIN.md had drifted to claim). BRAIN + research corrected in rig-docs #95; “GKE Dataplane V2 / FQDNNetworkPolicy” plan was always inapplicable.
- Attempt 1 — ipBlock allowlist (rig-gitops #137 → reverted #143 + #144, 2026-04-22 AM): plain k8s NetworkPolicy on dev-e + review-e allowing kube-dns, rig-conductor API (8080), valkey (6379), Anthropic 160.79.104.0/21, GitHub /meta CIDRs. Weekly refresh workflow (#139). Reverted because api.anthropic.com resolves to Cloudflare anycast (162.159.x.x), not Anthropic’s published origin CIDR. Side-gap: postgres 5432 was also blocked (rig-conductor rule only permitted 8080 + 6379). The split {ns}-github-egress policy was also removed — k8s NetworkPolicy Egress rules are not additive; any matching policy creates default-deny for unmatched traffic.
- Afternoon spikes (2026-04-22 PM):
  - LiteLLM spike #1 + #2 rule out a quick LiteLLM drop-in: error responses wrapped (breaks Claude Code retry), and the rig’s OAuth subscription tokens aren’t compatible with LiteLLM’s x-api-key forward path. LiteLLM route bundled with Priority 3 instead.
  - Envoy SNI egress gateway spike verified hostname allowlisting via SNI works end-to-end on the k3s cluster.
- Attempt 2 — Envoy gateway standalone (rig-gitops #153, still live): pod healthy, idle. Gateway works in isolation; no agents routed through it yet.
- Attempt 3 — HTTPS_PROXY env var on review-e (rig-gitops #154 → reverted #155): the SNI-inspector listener doesn’t speak HTTP CONNECT; all HTTPS_PROXY requests failed at the CONNECT step.
- Attempt 4 — Cluster-wide CoreDNS rewrite (rig-gitops #156 → reverted #158): rewriting github.com caught Flux’s own source-controller in the loop; emergency delete of the rewrite ConfigMap restored Flux.
- Attempt 5 — Pod-scoped DNS (rig-agent-runtime #115 chart 1.1.0 + rig-gitops #161 + fix #162, 2026-04-22 evening — LIVE for review-e):
  - dashecorp/rig-agent-runtime chart 1.1.0 adds .Values.dnsPolicy + .Values.dnsConfig pass-through on every pod spec (single-mode StatefulSet, split-mode gateway/worker Deployments, CronJob). Defaults empty; existing releases render byte-identical.
  - Dedicated CoreDNS in the egress-gw namespace (2 replicas) rewrites each allowlisted public host to the in-cluster Envoy egress gateway — target resolved via real kube-dns, so no Envoy ClusterIP is baked into config (Service recreation stays survivable).
  - forward plugin parse trap: multi-zone form forward cluster.local in-addr.arpa ip6.arpa 10.43.0.10 is rejected silently; must split one zone per directive. Fix in #162.
  - review-e wired with dnsPolicy: None + dnsConfig.nameservers: [10.43.200.53, 10.43.0.10]. kube-dns kept as secondary for availability fallback.
  - Live verification (manually scaled review-e pod with KEDA paused): /etc/resolv.conf correct; Discord gateway WSS connected; Anthropic GitHub-App token minted; 3 MCP servers connected; Valkey stream consumer attached to rig-conductor. Fresh heartbeat visible in /api/agents. Pod scaled back to 0; KEDA unpaused.
  - Third trap: the egress-dns Corefile rewrite list and the Envoy SNI filter_chain_match.server_names list are two places holding the same allowlist. Drift → pod succeeds DNS, Envoy resets the TLS. A CI check is the next safety follow-up.
- Pending (gated on 24h review-e burn-in, ~2026-04-23 evening):
  - Apply the same dnsPolicy / dnsConfig to dev-e (node + dotnet + python) — values-only PR, no chart work.
  - Default-deny egress NetworkPolicy — pod-selector based (not ipBlock), allowlist kube-dns (10.43.0.10:53), egress-dns (10.43.200.53:53), Envoy (10.43.79.56:443 + :8443), and rig-conductor pods on 8080, 6379, AND 5432 (Postgres — the gap from the first spike).
- Parallel prompt fix (shipped, stands): stream-consumer.js:226 rewritten in rig-agent-runtime#110 — no more dead sudo apt-get advice.

What it unblocks

T1 → T2 tier promotion. Per trust-model.md, T2 is “agent merges with approval; no prod deploy creds.” That policy is meaningless if the agent can rm -rf its way around approval. AC 1–3 are what make T2 real.
Priority 2 observability can be wired to the GuardBlocked event stream as an early signal.
Priority 3 cost ceiling — the egress policy (AC 5) is the chokepoint through which the LiteLLM proxy is made mandatory (if the only allowed LLM egress is the proxy, no agent can bypass it).

Out of scope

Kyverno admission policies (Phase 4 per implementation-status)
Sigstore + cosign + SLSA L3 attestation (Phase 4)
CaMeL trust separation (Phase 6; only prompt-injection defense with a formal guarantee)
Schema-validated tool use via Pydantic/Instructor (continuous, not phase-gated)

Priority

High. Prerequisite for Priorities 2–4. No higher-trust autonomy tier is honest without it.

Estimated effort

AC 1 (dangerous-command guard): ~1 week. Pattern specified in safety.md; reference implementation Gastown’s tap_guard_dangerous.
AC 2 (GuardBlocked events): ~1 day. New event type + projection + dashboard panel.
AC 3 (worktrees per task): ~1 week. Well-trodden Cursor 2026 pattern.
AC 4 (GitHub App 1h tokens): ~3 days. Replaces classic PAT; installation-token mint loop in the agent startup.
AC 5 (default-deny egress): Phase 1 pod-scoped DNS live on review-e as of 2026-04-22 evening after four reverted approaches; 24h burn-in → dev-e rollout → default-deny NetworkPolicy terminates Phase 1. LiteLLM-based cost/model centralisation bundled with Priority 3.

Total: ~5 weeks of focused work, parallelisable across 2 engineers.

Adjacent ships (context)

Work that landed alongside the AC deliverables but isn’t a formal AC:

rig-agent-runtime#110 — rewrote the ## Runtime installs block in stream-consumer.js to match guard reality. The old prompt advised sudo + apt-get install, both blocked by the AC 1 guard, so agents following their own guidance hit GuardBlocked and got stuck. Prep for AC 5 as well (primes agents for a future egress policy that denies arbitrary hosts). Surfaced by the agent runtime-install audit research.
dashecorp/infra#112 — declarative per-repo provisioning of RIG_BOT_PAT via Terraform (needs_rig_bot_pat = true in github/dashecorp/variables.tf:repos). Not part of this user story, but the 2026-04-20 CI resuscitation that unblocked rig-conductor’s publish-image → PR-based update-gitops flow depended on a manually-created PAT secret; #112 makes that pattern reproducible so the next dashecorp repo that needs PR-on-main doesn’t rediscover the trap. See dashecorp/infra/BOOTSTRAP.md.