LiteLLM passthrough spike #2 — real key, OAuth incompatibility found
TL;DR. Second spike, using the rig’s actual
dev-e-secrets.anthropic-api-key. Discovered that credential is an OAuth subscription token (sk-ant-oat...), not a standard API key. LiteLLM’s passthrough forwards it asx-api-key, which Anthropic’s Messages API rejects for OAuth tokens with arate_limit_error(misleading). Consequence for AC 5: a straight “drop LiteLLM in as the Anthropic egress” change will NOT work with the rig’s current auth. The redesign needs either (a) acquire a real Anthropic API key and switch dev-e + review-e off OAuth, or (b) put a tiny shim in front of LiteLLM that translates OAuth tokens toAuthorization: Bearer. Error-wrapping issue from spike #1 also confirmed (error response envelope hastype: "None"with the real Anthropic error JSON nested as an escaped string inmessage).
What this spike answered
Section titled “What this spike answered”| Question | Answer |
|---|---|
| Does LiteLLM passthrough actually hit api.anthropic.com with a real token? | ✅ Response header x-litellm-model-api-base: https://api.anthropic.com/v1/messages confirms it |
| Does the rig’s current Anthropic credential work through LiteLLM? | ❌ No — it’s an OAuth subscription token (sk-ant-oat...), LiteLLM passes it as x-api-key, Anthropic rejects |
| Is the response-wrapping issue from spike #1 still present? | ✅ Confirmed. 429 came back as {"error":{"message":"<escaped upstream JSON>","type":"None","param":"None","code":"429"}} — the type:"rate_limit_error" signal that Claude Code’s retry logic needs is nested in a string |
Does LiteLLM expose the upstream Anthropic request_id? | ✅ Yes, inside the escaped-string message. Not in a dedicated header; clients must parse |
What it did not answer
Section titled “What it did not answer”- Prompt cache preservation (
cache_read_input_tokens) — no successful request to check - SSE streaming fidelity — same
tool_useround-trip — same- These all need a third spike with a standard
sk-ant-...API key
Evidence
Section titled “Evidence”Spike setup identical to spike #1 except ANTHROPIC_API_KEY sourced from the live dev-e-secrets.anthropic-api-key (never exposed outside the VM — copied into a spike namespace secret via kubectl, read via envFrom.secretRef).
Token prefix: sk-ant-oat... (Claude Code OAuth subscription — matches the existing isOAuthToken check in rig-agent-runtime/src/agent/providers/anthropic-sdk.js:14).
Request:
POST /anthropic/v1/messagesheaders: x-api-key: sk-spike-dummy # LiteLLM master key anthropic-version: 2023-06-01body: { model, max_tokens, messages }Response:
HTTP/1.1 429 Too Many Requestsx-litellm-model-api-base: https://api.anthropic.com/v1/messagesx-litellm-call-id: de2607b9-be04-4a4f-9083-15aade7eecb4body: {"error":{ "message":"{\"type\":\"error\",\"error\":{\"type\":\"rate_limit_error\",\"message\":\"Error\"}, \"request_id\":\"req_011CaJev3s1yA3RRP5HA1j5J\"}", "type":"None","param":"None","code":"429"}}The x-litellm-model-api-base header proves LiteLLM hit Anthropic’s /v1/messages. The request_id prefix req_011C... is Anthropic’s. The rate_limit_error on the first request is the actual signal that OAuth tokens used on the x-api-key header receive — Anthropic’s public behavior is to reject them with a misleading rate-limit code.
Why OAuth rejection matters
Section titled “Why OAuth rejection matters”The rig uses OAuth subscription tokens for Dev-E + Review-E because the authentication is the Claude Code OAuth flow (claude auth login), not an API-key purchase. This is both cheaper (subscription vs metered) and how the agents currently work. Any AC 5 redesign that routes through LiteLLM breaks this model unless:
- Switch to a real Anthropic API key for agent pods, set on LiteLLM only, pay metered. Dev-E/Review-E would lose their OAuth-sub-style auth. Cost projection from the
TokenUsageProjectionin rig-conductor must then shift to come from LiteLLM callbacks rather than agent-emitted usage events. Operationally plausible but a material change. - Front LiteLLM with a tiny auth-translating proxy. A sidecar that rewrites
x-api-key: sk-ant-oat...→Authorization: Bearer sk-ant-oat...before LiteLLM, and the reverse on the other side. Adds a component; fragility risk on every LiteLLM version upgrade. - Abandon LiteLLM for this layer; use Envoy egress gateway. Transparent SNI allowlist on
api.anthropic.com. Loses the cost-centralization win but sidesteps both the OAuth issue and the error-wrapping issue entirely.
Recommendation
Section titled “Recommendation”Fork the AC 5 redesign into two tracks:
- Short-term (this month): Envoy egress gateway. Restores AC 5 Phase 1 with a real, byte-transparent allowlist that works against Cloudflare-fronted APIs. No auth translation, no cost accounting, no prompt-cache risk. Deploy as one Envoy pod + a NetworkPolicy that allows egress only to that pod + DNS + internal services. This ships the safety benefit without the LiteLLM integration burden.
- Longer-term (when Priority 3 activates): LiteLLM with real API key. Bundle the “acquire real API key, migrate dev-e/review-e off OAuth, wire cost accounting through LiteLLM callbacks” work with Priority 3’s cost-ceiling story. This is a cost-model change, not just a network change.
Neither track blocks the other. Ship Envoy first; LiteLLM lands when the accounting story is ready.
Sources
Section titled “Sources”- Spike #1 findings:
/research/2026-04-22-litellm-passthrough-spike/ - AC 5 retrospective:
/research/2026-04-22-egress-policy-pitfall-cloudflare-fronted-apis/ - Earlier egress options (retracted):
/research/2026-04-21-egress-policy-options/ - OAuth token handling in current rig:
dashecorp/rig-agent-runtime/src/agent/providers/anthropic-sdk.js:12-22