User story: hard cost ceiling — LiteLLM proxy with per-agent virtual keys
User story
Section titled “User story”As the rig operator
I want an enforced dollar ceiling on every agent’s hourly spend — at the proxy layer, not the agent layer — with per-agent virtual keys and hard 429s before the request reaches the LLM provider
So that one looping agent cannot burn the shared rate-limit budget for every other agent, and a future tenant can be onboarded with a hard per-key cap as a config row, not an engineering project.
Context
Section titled “Context”See whats-next whitepaper §Priority 3 and the source cost-framework.md whitepaper.
Today only the cheapest, lowest-guarantee layer exists: a TokenUsageProjection aggregates per-agent × per-repo cost after the fact. Layer 3 — proxy-level hard ceiling — is unbuilt. Without it, cost control is trust-based (“don’t let agents loop”), which is exactly the failure mode a shared infrastructure cannot accept.
Honest caveat from the source whitepaper: LiteLLM issue #12905 shows user-level budgets are not enforced inside team configurations. The proxy is the primary defense, not an absolute one. Every LiteLLM upgrade needs a synthetic budget-overrun test.
Acceptance criteria
Section titled “Acceptance criteria”- ⏳ LiteLLM proxy deployed to the rig cluster. Helm release under
apps/litellm/. Postgres-backed config (not in-memory) so virtual keys and budgets survive restarts. - ⏳ Per-agent virtual keys — one key per agent (dev-e, review-e, macos-e) plus one per onboarded tenant project (Dashecorp today; a virtual key is added for each new tenant). Each key has a hard daily budget and an hourly rate limit.
- ⏳ Anthropic as the default backend, OpenAI and Gemini configured as fallbacks via LiteLLM
fallback_models(deferred per cost-framework.md — enable only when multi-provider config exists). - ⏳ Agent pods use virtual keys, not the raw Anthropic key — the Anthropic account key exists only in the LiteLLM config. Agent pods get their virtual key via SealedSecret.
- ⏳ Synthetic budget-overrun test — a dedicated CronJob deliberately exceeds a test key’s daily cap. CI asserts: (a) 429 fires, (b) event emitted to rig-conductor, (c) provider not billed past the cap. Run on every LiteLLM upgrade.
- ⏳ Weekly budget review projection —
TokenUsageProjectionextended withvirtual_keydimension; per-key costs visible in the rig-conductor dashboard. - ⏳ Pre-flight cost prediction (nice-to-have, Phase 2 of cost-framework.md) — cheap model (Haiku) estimates tokens before dispatch; abort if estimated cost exceeds the task’s budget fraction.
What it unblocks
Section titled “What it unblocks”- Multi-tenant onboarding. A new tenant becomes a LiteLLM virtual-key config row with
daily_budget: 50andmodels: [sonnet-4.6, haiku-4.5]. No per-tenant engineering. - Priority 1 Cilium egress policy (AC 5 of safety-foundation) can narrow its allowlist to the LiteLLM proxy service only — no direct LLM egress from agent pods. Defense in depth.
- Cost attribution by tenant —
TokenUsageevents carryvirtual_key/projectdimension; billing becomes queryable rather than reconstructed. - Circuit breaker on 529 storms — the proxy is the natural place to implement “pause dispatch for 5 min after 3 consecutive 529s” without per-agent code.
Out of scope
Section titled “Out of scope”- Cross-provider fallback routing (deferred per cost-framework.md; adopt when multi-provider is actually configured)
- Prompt caching verification (Claude Code does this automatically; verify via cost dashboard, not a code change)
- flagd/OpenFeature feature flags (YAGNI — Kustomize + env vars cover today)
Priority
Section titled “Priority”High. Sequenced after Priority 2 because cost dashboards need the trace store; before Priority 4 because nightly quality-gate runs need a bounded budget to be safe.
Estimated effort
Section titled “Estimated effort”- AC 1 (proxy deploy): ~5 days. HelmRelease + Postgres + SealedSecret for the master Anthropic key.
- AC 2 (virtual keys + caps): ~3 days. LiteLLM admin UI or
/key/generateAPI; document the key-rotation runbook. - AC 3 (Anthropic default, fallbacks deferred): ~1 day. Config only.
- AC 4 (agents use virtual keys): ~2 days. SealedSecret rotation per agent; helm chart plumb.
- AC 5 (synthetic overrun test): ~2 days. CronJob + assertion + alert. Load-bearing AC — the
#12905caveat requires this to validate reality. - AC 6 (budget-review projection): ~3 days. Marten projection + dashboard panel.
- AC 7 (pre-flight prediction, nice-to-have): ~5 days. Defer if time pressured.
Total: ~2.5 weeks focused, ~3.5 weeks with AC 7.