LiteLLM passthrough spike #2 — real key, OAuth incompatibility found

TL;DR. Second spike, using the rig’s actual dev-e-secrets.anthropic-api-key. Discovered that credential is an OAuth subscription token (sk-ant-oat...), not a standard API key. LiteLLM’s passthrough forwards it as x-api-key, which Anthropic’s Messages API rejects for OAuth tokens with a rate_limit_error (misleading). Consequence for AC 5: a straight “drop LiteLLM in as the Anthropic egress” change will NOT work with the rig’s current auth. The redesign needs either (a) acquire a real Anthropic API key and switch dev-e + review-e off OAuth, or (b) put a tiny shim in front of LiteLLM that translates OAuth tokens to Authorization: Bearer. Error-wrapping issue from spike #1 also confirmed (error response envelope has type: "None" with the real Anthropic error JSON nested as an escaped string in message).

What this spike answered

Question	Answer
Does LiteLLM passthrough actually hit api.anthropic.com with a real token?	✅ Response header `x-litellm-model-api-base: https://api.anthropic.com/v1/messages` confirms it
Does the rig’s current Anthropic credential work through LiteLLM?	❌ No — it’s an OAuth subscription token (`sk-ant-oat...`), LiteLLM passes it as `x-api-key`, Anthropic rejects
Is the response-wrapping issue from spike #1 still present?	✅ Confirmed. `429` came back as `{"error":{"message":"<escaped upstream JSON>","type":"None","param":"None","code":"429"}}` — the `type:"rate_limit_error"` signal that Claude Code’s retry logic needs is nested in a string
Does LiteLLM expose the upstream Anthropic `request_id`?	✅ Yes, inside the escaped-string message. Not in a dedicated header; clients must parse

What it did not answer

Prompt cache preservation (cache_read_input_tokens) — no successful request to check
SSE streaming fidelity — same
tool_use round-trip — same
These all need a third spike with a standard sk-ant-... API key

Evidence

Spike setup identical to spike #1 except ANTHROPIC_API_KEY sourced from the live dev-e-secrets.anthropic-api-key (never exposed outside the VM — copied into a spike namespace secret via kubectl, read via envFrom.secretRef).

Token prefix: sk-ant-oat... (Claude Code OAuth subscription — matches the existing isOAuthToken check in rig-agent-runtime/src/agent/providers/anthropic-sdk.js:14).

Request:

POST /anthropic/v1/messages
headers:
  x-api-key: sk-spike-dummy                # LiteLLM master key
  anthropic-version: 2023-06-01
body: { model, max_tokens, messages }

Response:

HTTP/1.1 429 Too Many Requests
x-litellm-model-api-base: https://api.anthropic.com/v1/messages
x-litellm-call-id: de2607b9-be04-4a4f-9083-15aade7eecb4
body: {"error":{
  "message":"{\"type\":\"error\",\"error\":{\"type\":\"rate_limit_error\",\"message\":\"Error\"},
             \"request_id\":\"req_011CaJev3s1yA3RRP5HA1j5J\"}",
  "type":"None","param":"None","code":"429"
}}

The x-litellm-model-api-base header proves LiteLLM hit Anthropic’s /v1/messages. The request_id prefix req_011C... is Anthropic’s. The rate_limit_error on the first request is the actual signal that OAuth tokens used on the x-api-key header receive — Anthropic’s public behavior is to reject them with a misleading rate-limit code.

Why OAuth rejection matters

The rig uses OAuth subscription tokens for Dev-E + Review-E because the authentication is the Claude Code OAuth flow (claude auth login), not an API-key purchase. This is both cheaper (subscription vs metered) and how the agents currently work. Any AC 5 redesign that routes through LiteLLM breaks this model unless:

Switch to a real Anthropic API key for agent pods, set on LiteLLM only, pay metered. Dev-E/Review-E would lose their OAuth-sub-style auth. Cost projection from the TokenUsageProjection in rig-conductor must then shift to come from LiteLLM callbacks rather than agent-emitted usage events. Operationally plausible but a material change.
Front LiteLLM with a tiny auth-translating proxy. A sidecar that rewrites x-api-key: sk-ant-oat... → Authorization: Bearer sk-ant-oat... before LiteLLM, and the reverse on the other side. Adds a component; fragility risk on every LiteLLM version upgrade.
Abandon LiteLLM for this layer; use Envoy egress gateway. Transparent SNI allowlist on api.anthropic.com. Loses the cost-centralization win but sidesteps both the OAuth issue and the error-wrapping issue entirely.

Recommendation

Fork the AC 5 redesign into two tracks:

Short-term (this month): Envoy egress gateway. Restores AC 5 Phase 1 with a real, byte-transparent allowlist that works against Cloudflare-fronted APIs. No auth translation, no cost accounting, no prompt-cache risk. Deploy as one Envoy pod + a NetworkPolicy that allows egress only to that pod + DNS + internal services. This ships the safety benefit without the LiteLLM integration burden.
Longer-term (when Priority 3 activates): LiteLLM with real API key. Bundle the “acquire real API key, migrate dev-e/review-e off OAuth, wire cost accounting through LiteLLM callbacks” work with Priority 3’s cost-ceiling story. This is a cost-model change, not just a network change.

Neither track blocks the other. Ship Envoy first; LiteLLM lands when the accounting story is ready.

Sources

Spike #1 findings: /research/2026-04-22-litellm-passthrough-spike/
AC 5 retrospective: /research/2026-04-22-egress-policy-pitfall-cloudflare-fronted-apis/
Earlier egress options (retracted): /research/2026-04-21-egress-policy-options/
OAuth token handling in current rig: dashecorp/rig-agent-runtime/src/agent/providers/anthropic-sdk.js:12-22