Skip to content

LiteLLM passthrough spike #2 — real key, OAuth incompatibility found

TL;DR. Second spike, using the rig’s actual dev-e-secrets.anthropic-api-key. Discovered that credential is an OAuth subscription token (sk-ant-oat...), not a standard API key. LiteLLM’s passthrough forwards it as x-api-key, which Anthropic’s Messages API rejects for OAuth tokens with a rate_limit_error (misleading). Consequence for AC 5: a straight “drop LiteLLM in as the Anthropic egress” change will NOT work with the rig’s current auth. The redesign needs either (a) acquire a real Anthropic API key and switch dev-e + review-e off OAuth, or (b) put a tiny shim in front of LiteLLM that translates OAuth tokens to Authorization: Bearer. Error-wrapping issue from spike #1 also confirmed (error response envelope has type: "None" with the real Anthropic error JSON nested as an escaped string in message).

QuestionAnswer
Does LiteLLM passthrough actually hit api.anthropic.com with a real token?✅ Response header x-litellm-model-api-base: https://api.anthropic.com/v1/messages confirms it
Does the rig’s current Anthropic credential work through LiteLLM?❌ No — it’s an OAuth subscription token (sk-ant-oat...), LiteLLM passes it as x-api-key, Anthropic rejects
Is the response-wrapping issue from spike #1 still present?✅ Confirmed. 429 came back as {"error":{"message":"<escaped upstream JSON>","type":"None","param":"None","code":"429"}} — the type:"rate_limit_error" signal that Claude Code’s retry logic needs is nested in a string
Does LiteLLM expose the upstream Anthropic request_id?✅ Yes, inside the escaped-string message. Not in a dedicated header; clients must parse
  • Prompt cache preservation (cache_read_input_tokens) — no successful request to check
  • SSE streaming fidelity — same
  • tool_use round-trip — same
  • These all need a third spike with a standard sk-ant-... API key

Spike setup identical to spike #1 except ANTHROPIC_API_KEY sourced from the live dev-e-secrets.anthropic-api-key (never exposed outside the VM — copied into a spike namespace secret via kubectl, read via envFrom.secretRef).

Token prefix: sk-ant-oat... (Claude Code OAuth subscription — matches the existing isOAuthToken check in rig-agent-runtime/src/agent/providers/anthropic-sdk.js:14).

Request:

POST /anthropic/v1/messages
headers:
x-api-key: sk-spike-dummy # LiteLLM master key
anthropic-version: 2023-06-01
body: { model, max_tokens, messages }

Response:

HTTP/1.1 429 Too Many Requests
x-litellm-model-api-base: https://api.anthropic.com/v1/messages
x-litellm-call-id: de2607b9-be04-4a4f-9083-15aade7eecb4
body: {"error":{
"message":"{\"type\":\"error\",\"error\":{\"type\":\"rate_limit_error\",\"message\":\"Error\"},
\"request_id\":\"req_011CaJev3s1yA3RRP5HA1j5J\"}",
"type":"None","param":"None","code":"429"
}}

The x-litellm-model-api-base header proves LiteLLM hit Anthropic’s /v1/messages. The request_id prefix req_011C... is Anthropic’s. The rate_limit_error on the first request is the actual signal that OAuth tokens used on the x-api-key header receive — Anthropic’s public behavior is to reject them with a misleading rate-limit code.

The rig uses OAuth subscription tokens for Dev-E + Review-E because the authentication is the Claude Code OAuth flow (claude auth login), not an API-key purchase. This is both cheaper (subscription vs metered) and how the agents currently work. Any AC 5 redesign that routes through LiteLLM breaks this model unless:

  • Switch to a real Anthropic API key for agent pods, set on LiteLLM only, pay metered. Dev-E/Review-E would lose their OAuth-sub-style auth. Cost projection from the TokenUsageProjection in rig-conductor must then shift to come from LiteLLM callbacks rather than agent-emitted usage events. Operationally plausible but a material change.
  • Front LiteLLM with a tiny auth-translating proxy. A sidecar that rewrites x-api-key: sk-ant-oat...Authorization: Bearer sk-ant-oat... before LiteLLM, and the reverse on the other side. Adds a component; fragility risk on every LiteLLM version upgrade.
  • Abandon LiteLLM for this layer; use Envoy egress gateway. Transparent SNI allowlist on api.anthropic.com. Loses the cost-centralization win but sidesteps both the OAuth issue and the error-wrapping issue entirely.

Fork the AC 5 redesign into two tracks:

  1. Short-term (this month): Envoy egress gateway. Restores AC 5 Phase 1 with a real, byte-transparent allowlist that works against Cloudflare-fronted APIs. No auth translation, no cost accounting, no prompt-cache risk. Deploy as one Envoy pod + a NetworkPolicy that allows egress only to that pod + DNS + internal services. This ships the safety benefit without the LiteLLM integration burden.
  2. Longer-term (when Priority 3 activates): LiteLLM with real API key. Bundle the “acquire real API key, migrate dev-e/review-e off OAuth, wire cost accounting through LiteLLM callbacks” work with Priority 3’s cost-ceiling story. This is a cost-model change, not just a network change.

Neither track blocks the other. Ship Envoy first; LiteLLM lands when the accounting story is ready.