Skip to content

Agent runtime installs — what's baked in, what's needed, and why Phase 1 egress CIDR-only breaks agents

TL;DR. Audit prep work for AC 5 Phase 1 of the safety-foundation user story. Finds two issues: (1) the current agent task prompt in stream-consumer.js:226 tells agents to install tools at runtime via sudo + apt-get install, both already blocked by the pretool-guard shipped in AC 1. Dead advice, worth fixing. (2) Phase 1 of the egress policy options research recommended a CIDR-only NetworkPolicy with npm blocked at the network layer. That breaks every agent task that runs on a repo with a package.json because npm install needs registry access. Revised plan: skip Phase 1 CIDR-only and go direct to GKE FQDNNetworkPolicy with an expanded allowlist (npm + pypi + nuget registries in addition to GitHub + Anthropic). Still Preview, still acceptable at 2 agents.

LayerPackages installed at build time
Dockerfile.base (common)ca-certificates, curl, jq, git, openssh-client, GitHub CLI gh, Node 22 (via parent image node:22-slim), @anthropic-ai/claude-code, @openai/codex, @dashecorp/rig-memory-mcp
Dockerfile.node (dev-e, review-e stacks)adds typescript, tsx, jest, vitest, eslint, prettier (global)
Dockerfile.python (if used)adds python3, python3-pip, python3-venv, pytest, black, ruff
Dockerfile.dotnet (rig-conductor agent)adds .NET 10 SDK

Total: ~20 common dev tools baked in. Developers working on the rig itself rarely need more.

From src/stream-consumer.js:226:

## Runtime installs
If you need a tool that is not installed, install it yourself
(npm install -g, pip install, apt-get install). You have sudo access
for apt-get. Prefer global installs so they persist for the session.

This is doubly broken today, before any egress policy even lands:

Command in promptpretool-guard behaviorResult
sudo apt-get install …blocked (sudo + apt install regex)agent stuck
apt-get install …blocked (apt install regex)agent stuck
brew install …blocked (brew install regex)agent stuck
npm install -g …allowedworks today, would break under egress policy
pip install …allowedworks today, would break under egress policy

So: two of the three installer families the prompt recommends are already stopped by AC 1’s guard. The guard’s GuardBlocked dashboard panel (shipped yesterday) will surface these attempts.

Agents work on customer repos, not just rig-agent-runtime itself. A typical task flow:

  1. task-workspace create <owner/repo> issue-<N> → worktree with the customer repo
  2. cd <WORKDIR> and read the project
  3. Implement changes — which frequently requires installing the project’s declared deps:
    • Node repo: npm install (or npm ci) resolving package.json
    • Python repo: pip install -r requirements.txt
    • .NET repo: dotnet restore
  4. Run tests: npm test / pytest / dotnet test
  5. Open PR

Step 3 is the one that egress policy breaks. The project-level npm install is not optional — it’s how dependencies come down to run the project’s tests. Baking customer deps into the image is not feasible: every customer repo is different.

Why Phase 1 (CIDR-only allowlist) doesn’t work

Section titled “Why Phase 1 (CIDR-only allowlist) doesn’t work”

The egress policy options research recommended:

Phase 1 — default-deny NetworkPolicy with CIDR allowlist (Anthropic 160.79.104.0/21, GitHub /meta, kube-dns, rig-memory-mcp). Block npm at the network layer; bake deps into agent images.

The bake-deps advice was reasonable for our own agent runtime (dep list is small, stable). But it collapses as soon as agents operate on customer repos. And the Cloudflare problem remains:

  • registry.npmjs.org → Cloudflare (≈1500 prefixes that rotate)
  • nuget.org → Azure CDN (frequently changing)
  • pypi.org → Fastly (less-frequent rotation but still not stable-enough for an ipBlock)

Putting these in an ipBlock NetworkPolicy produces a monthly stream of npm install failures as prefixes rotate.

Skip Phase 1 as originally drafted. Go direct to Phase 2 — GKE-native FQDNNetworkPolicy (Preview in 2026, acceptable for 2 agents):

  1. Default-deny egress NetworkPolicy on the agent namespace.
  2. Allow kube-dns (TCP/UDP 53).
  3. Allow rig-memory-mcp service via podSelector.
  4. FQDNNetworkPolicy allowing:
    • api.github.com, github.com, codeload.github.com, objects.githubusercontent.com
    • api.anthropic.com (or route this via the LiteLLM proxy when Priority 3 ships)
    • registry.npmjs.org, registry.yarnpkg.com (Yarn registry, alias to npm)
    • pypi.org, files.pythonhosted.org
    • api.nuget.org, *.nuget.org
  5. Everything else denied.

FQDN Preview limits per GKE docs: 50 IPs per FQDN resolution, 100 IP/hostname quota, one-label-deep wildcards. *.nuget.org matches api.nuget.org but not foo.bar.nuget.org — acceptable for the registries above.

Also fix: the prompt contradicts the guard

Section titled “Also fix: the prompt contradicts the guard”

stream-consumer.js:226 should be rewritten to match what the pretool-guard actually allows. Proposed:

## Runtime installs
Most common tools (git, gh, jq, curl, claude, codex, typescript, jest,
vitest, eslint, prettier, pytest, black, ruff, dotnet) are pre-installed.
For project dependencies, use `npm install`, `pip install`, or
`dotnet restore` inside your worktree. Do NOT use `sudo`, `apt-get`, or
`brew` — those are blocked by the PreToolUse guard. If you need a tool
that is genuinely missing from the image, open a separate PR against
rig-agent-runtime/Dockerfile.* to add it; do not try to install it at
runtime.

Two benefits: aligns with guard reality today, and primes agents correctly for a future FQDNNetworkPolicy where arbitrary-host network calls are denied.

Two PRs, in order:

  1. rig-agent-runtime: rewrite the ## Runtime installs block in src/stream-consumer.js. One-line test: prompt no longer mentions sudo / apt-get / brew install. ~1 hour.
  2. rig-gitops: add the FQDNNetworkPolicy (and the default-deny base NetworkPolicy) under apps/dev-e/ and apps/review-e/. Needs kubectl apply --dry-run=server against the cluster before merge. ~½ day plus cluster validation.

Neither PR needs Cilium CRDs; both rely only on GKE Dataplane V2 features available to Invotek today.

This research supersedes the Phase 1 slice of research/2026-04-21-egress-policy-options. The seven-option comparison and the Phase 3 egress-gateway recommendation in that doc still stand; the “Phase 1 CIDR-only first” advice is superseded. The combined plan is now: skip to what that doc called Phase 2, and keep Phase 3 as the later scale-up milestone.

  • rig-agent-runtime/Dockerfile.base — base image layers
  • rig-agent-runtime/Dockerfile.{node,python,dotnet} — per-language layers
  • rig-agent-runtime/src/stream-consumer.js:226 — runtime-install prompt text
  • rig-agent-runtime/hooks/pretool-guard.sh — blocklist (shipped in rig-agent-runtime#97)
  • research/2026-04-21-egress-policy-options — superseded Phase-1 recommendation