Skip to content

User story: safety foundation — block the unrecoverable before higher-trust tiers

As the rig operator

I want deterministic runtime guards between agent reasoning and tool execution — dangerous-command blocklist, per-task git worktrees, default-deny egress, short-lived GitHub tokens

So that a compromised or looping agent cannot do the unrecoverable thing (filesystem destruction, secret exfiltration, force-push to main, long-lived token replay) without a human in the loop, and the rig earns the right to advance autonomy tiers (T1 → T2 → T3).

ACStatusShip
1 · Dangerous-command guard✅ Shippedrig-agent-runtime #97 + #98
2 · GuardBlocked events + dashboard panel✅ Shippedrig-conductor #90, rig-agent-runtime #99, rig-conductor #99
3 · Git worktrees per task✅ Shippedrig-agent-runtime #101
4 · GitHub App 1h tokens✅ Shippedrig-agent-runtime #103 + rig-gitops #119 + #121
5 · Default-deny egress (Phase 1)🟡 Pod-scoped DNS live on review-e (burn-in)rig-agent-runtime #115, rig-gitops #161 + #162

Score: 4 shipped + 1 partial across 2026-04-20 → 2026-04-22. AC 5 Phase 1 ipBlock attempt was shipped and reverted 2026-04-22 morning (api.anthropic.com is Cloudflare anycast, not the published Anthropic CIDR). Afternoon: three spikes landed the redesign direction — LiteLLM ruled out (error wrapping + OAuth incompat, bundled with Priority 3); Envoy SNI gateway verified end-to-end and shipped to the cluster (rig-gitops #153). Two integration attempts reverted same day — HTTPS_PROXY (SNI-inspector doesn’t speak CONNECT, #154 → #155) and cluster-wide CoreDNS rewrite (caught Flux’s own github.com fetch, #156 → #158). Evening: pod-scoped DNS path shipped. Chart 1.1.0 adds dnsPolicy / dnsConfig pass-through (rig-agent-runtime #115). Dedicated CoreDNS in the egress-gw namespace rewrites each allowlisted public host to the in-cluster Envoy egress gateway — target resolved via real kube-dns, so no Envoy ClusterIP is baked into config (rig-gitops #161 + fix #162 for the forward-plugin zone-parse trap). review-e wired with dnsPolicy: None + a dnsConfig.nameservers pointing at the dedicated CoreDNS (kube-dns kept as secondary). Live verification from a manually-scaled review-e pod: Discord gateway WSS + Anthropic + GitHub App token mint + MCP servers + Valkey stream all reached through the new path; rig-conductor sees a fresh heartbeat. 24h burn-in before dev-e rollout and the default-deny NetworkPolicy that terminates Phase 1. Cluster-reality correction (rig-docs #95 — k3s, not GKE) stands.

See whats-next whitepaper §Priority 1 and the source safety.md whitepaper (pillars 1–2).

Today the rig has zero runtime guards. Trust is prompt-level (“don’t do bad things”) plus branch protection after the fact. The implementation-status matrix lists 8 safety capabilities with 0 deployed. This is the highest-leverage first investment because every higher-trust tier depends on it — you cannot promote an agent to T2 (merge-with-approval) if T1 has no floor.

  1. Dangerous-command guard — PreToolUse hook reads tool-call JSON on stdin, matches tool_input.command against a blocklist, exits 2 (block + reason) on match. Minimum blocklist: sudo, rm -rf / (not rm -rf ./), git push --force (without --force-with-lease), git reset --hard, git clean -f, drop table, drop database, truncate table, kubectl delete namespace, package installers, chmod 777, chmod -R 000, curl … | sh. No override flag. Escape hatch is the human running the command outside the agent loop. Shipped in rig-agent-runtime#97 (script + tests + CI) and #98 (activated by default via baked-in ~/.claude/settings.json). 43 test cases pass.
  2. GuardBlocked event emission — every block emits a non-blocking event to rig-conductor; counts visible via GET /api/guard-blocked (optional agentId filter) and on the rig-conductor dashboard Safety panel (header stat + per-agent table with top reason, last command, last-blocked time). Shipped in rig-conductor#90 (event + projection + endpoint, 46/46 tests), rig-agent-runtime#99 (hook payload shape fix), and rig-conductor#99 (dashboard Safety panel).
  3. Git worktrees per agent task — each dispatched task runs in its own worktree under /workspace/tasks/<task-id>/<repo>/, backed by a shared bare clone at /workspace/_bare/<owner>/<repo>.git. One task’s workspace cannot reach another’s. Cursor 2026 pattern. Shipped in rig-agent-runtime#101 (task-workspace helper + 17 tests + CI, wired into the agent task prompt).
  4. GitHub App installation tokens (1h TTL) — replaces the classic PAT in agent pods. Tokens minted per dispatch, expire in 60 minutes, never persisted to disk. Shipped across:
    • rig-agent-runtime#103 — removes the PAT fallback when App-mint fails (fail loud, not silent).
    • rig-gitops#119 — removes the GITHUB_PERSONAL_ACCESS_TOKEN env var from dev-e + review-e pods entirely.
    • rig-gitops#121 — implementation-status matrix updated. The only remaining trace is the github-token key still present inside the SealedSecrets; pruning requires a re-seal and is deferred to the next rotation. Nothing in the running pods references it.
  5. 🟡 Default-deny egress NetworkPolicy (Phase 1) — pod-scoped DNS path live on review-e; 24h burn-in then dev-e + NetworkPolicy. Five-attempt rollout story:
    • Cluster reality correction (stands): the rig runs on k3s v1.34.6 on a GCE VM (not GKE as BRAIN.md had drifted to claim). BRAIN + research corrected in rig-docs #95; “GKE Dataplane V2 / FQDNNetworkPolicy” plan was always inapplicable.
    • Attempt 1 — ipBlock allowlist (rig-gitops #137 → reverted #143 + #144, 2026-04-22 AM): plain k8s NetworkPolicy on dev-e + review-e allowing kube-dns, rig-conductor API (8080), valkey (6379), Anthropic 160.79.104.0/21, GitHub /meta CIDRs. Weekly refresh workflow (#139). Reverted because api.anthropic.com resolves to Cloudflare anycast (162.159.x.x), not Anthropic’s published origin CIDR. Side-gap: postgres 5432 was also blocked (rig-conductor rule only permitted 8080 + 6379). The split {ns}-github-egress policy was also removed — k8s NetworkPolicy Egress rules are not additive; any matching policy creates default-deny for unmatched traffic.
    • Afternoon spikes (2026-04-22 PM):
      • LiteLLM spike #1 + #2 rule out a quick LiteLLM drop-in: error responses wrapped (breaks Claude Code retry), and the rig’s OAuth subscription tokens aren’t compatible with LiteLLM’s x-api-key forward path. LiteLLM route bundled with Priority 3 instead.
      • Envoy SNI egress gateway spike verified hostname allowlisting via SNI works end-to-end on the k3s cluster.
    • Attempt 2 — Envoy gateway standalone (rig-gitops #153, still live): pod healthy, idle. Gateway works in isolation; no agents routed through it yet.
    • Attempt 3 — HTTPS_PROXY env var on review-e (rig-gitops #154 → reverted #155): the SNI-inspector listener doesn’t speak HTTP CONNECT; all HTTPS_PROXY requests failed at the CONNECT step.
    • Attempt 4 — Cluster-wide CoreDNS rewrite (rig-gitops #156 → reverted #158): rewriting github.com caught Flux’s own source-controller in the loop; emergency delete of the rewrite ConfigMap restored Flux.
    • Attempt 5 — Pod-scoped DNS (rig-agent-runtime #115 chart 1.1.0 + rig-gitops #161 + fix #162, 2026-04-22 evening — LIVE for review-e):
      • dashecorp/rig-agent-runtime chart 1.1.0 adds .Values.dnsPolicy + .Values.dnsConfig pass-through on every pod spec (single-mode StatefulSet, split-mode gateway/worker Deployments, CronJob). Defaults empty; existing releases render byte-identical.
      • Dedicated CoreDNS in the egress-gw namespace (2 replicas) rewrites each allowlisted public host to the in-cluster Envoy egress gateway — target resolved via real kube-dns, so no Envoy ClusterIP is baked into config (Service recreation stays survivable).
      • forward plugin parse trap: multi-zone form forward cluster.local in-addr.arpa ip6.arpa 10.43.0.10 is rejected silently; must split one zone per directive. Fix in #162.
      • review-e wired with dnsPolicy: None + dnsConfig.nameservers: [10.43.200.53, 10.43.0.10]. kube-dns kept as secondary for availability fallback.
      • Live verification (manually scaled review-e pod with KEDA paused): /etc/resolv.conf correct; Discord gateway WSS connected; Anthropic GitHub-App token minted; 3 MCP servers connected; Valkey stream consumer attached to rig-conductor. Fresh heartbeat visible in /api/agents. Pod scaled back to 0; KEDA unpaused.
      • Third trap: the egress-dns Corefile rewrite list and the Envoy SNI filter_chain_match.server_names list are two places holding the same allowlist. Drift → pod succeeds DNS, Envoy resets the TLS. A CI check is the next safety follow-up.
    • Pending (gated on 24h review-e burn-in, ~2026-04-23 evening):
      • Apply the same dnsPolicy / dnsConfig to dev-e (node + dotnet + python) — values-only PR, no chart work.
      • Default-deny egress NetworkPolicy — pod-selector based (not ipBlock), allowlist kube-dns (10.43.0.10:53), egress-dns (10.43.200.53:53), Envoy (10.43.79.56:443 + :8443), and rig-conductor pods on 8080, 6379, AND 5432 (Postgres — the gap from the first spike).
    • Parallel prompt fix (shipped, stands): stream-consumer.js:226 rewritten in rig-agent-runtime#110 — no more dead sudo apt-get advice.
  • T1 → T2 tier promotion. Per trust-model.md, T2 is “agent merges with approval; no prod deploy creds.” That policy is meaningless if the agent can rm -rf its way around approval. AC 1–3 are what make T2 real.
  • Priority 2 observability can be wired to the GuardBlocked event stream as an early signal.
  • Priority 3 cost ceiling — the egress policy (AC 5) is the chokepoint through which the LiteLLM proxy is made mandatory (if the only allowed LLM egress is the proxy, no agent can bypass it).
  • Kyverno admission policies (Phase 4 per implementation-status)
  • Sigstore + cosign + SLSA L3 attestation (Phase 4)
  • CaMeL trust separation (Phase 6; only prompt-injection defense with a formal guarantee)
  • Schema-validated tool use via Pydantic/Instructor (continuous, not phase-gated)

High. Prerequisite for Priorities 2–4. No higher-trust autonomy tier is honest without it.

  • AC 1 (dangerous-command guard): ~1 week. Pattern specified in safety.md; reference implementation Gastown’s tap_guard_dangerous.
  • AC 2 (GuardBlocked events): ~1 day. New event type + projection + dashboard panel.
  • AC 3 (worktrees per task): ~1 week. Well-trodden Cursor 2026 pattern.
  • AC 4 (GitHub App 1h tokens): ~3 days. Replaces classic PAT; installation-token mint loop in the agent startup.
  • AC 5 (default-deny egress): Phase 1 pod-scoped DNS live on review-e as of 2026-04-22 evening after four reverted approaches; 24h burn-in → dev-e rollout → default-deny NetworkPolicy terminates Phase 1. LiteLLM-based cost/model centralisation bundled with Priority 3.

Total: ~5 weeks of focused work, parallelisable across 2 engineers.

Work that landed alongside the AC deliverables but isn’t a formal AC:

  • rig-agent-runtime#110 — rewrote the ## Runtime installs block in stream-consumer.js to match guard reality. The old prompt advised sudo + apt-get install, both blocked by the AC 1 guard, so agents following their own guidance hit GuardBlocked and got stuck. Prep for AC 5 as well (primes agents for a future egress policy that denies arbitrary hosts). Surfaced by the agent runtime-install audit research.
  • dashecorp/infra#112 — declarative per-repo provisioning of RIG_BOT_PAT via Terraform (needs_rig_bot_pat = true in github/dashecorp/variables.tf:repos). Not part of this user story, but the 2026-04-20 CI resuscitation that unblocked rig-conductor’s publish-image → PR-based update-gitops flow depended on a manually-created PAT secret; #112 makes that pattern reproducible so the next dashecorp repo that needs PR-on-main doesn’t rediscover the trap. See dashecorp/infra/BOOTSTRAP.md.