Drift Detection — Model, Prompt, Code, Config

Capabilities

🟡 code-drift-flux — Code drift via Flux reconciliation events. Flux detects drift, not yet alerted-on.
🟡 config-drift-flux — Config drift via Flux + kube-diff. Flux detects. Alerts not yet wired.
⚪ model-drift-canary — Model drift 20-prompt canary suite. Phase 6. Per-provider.

!!! abstract “TL;DR” Systems drift in four independent channels: model (provider silent-changes under same version string), prompt (agent config changes break existing behavior), code (deployed ≠ git main), and config (manifests ≠ gitops source). Each channel needs its own detector, its own baseline, its own alert.

!!! warning “91% of production LLMs drift within 90 days” InsightFinder’s 2025 study — median detection lag without monitoring is 14–18 days. “It still says Sonnet 4.6 in the config but the behavior changed” is the single most common silent regression class. We watch for it explicitly via a 20-prompt canary suite.

The four channels

Channel	What drifts	Detection signal
Model drift	Same model string, different behavior from the provider	20-prompt canary suite output-hash delta
Prompt drift	Agent system prompt changed, didn’t realize it broke something	Golden-suite regression in CI
Code drift	Deployed code vs. main branch	Flux reconciliation + hash comparison
Config drift	Deployed manifests vs. gitops source	Flux + kube-diff

Each channel needs its own detector, its own baseline, its own alert.

Channel 1: Model drift

The mechanism

Model providers (Anthropic, OpenAI, Google) have all shipped silent behavioral changes under stable version strings. Concrete examples:

Anthropic Sonnet 4.6 behavior changed in early 2026 following a rate-limit-fix deploy — reasoning quality dip reported across multiple communities; Claude Instant and Claude 2 variants shifted similarly earlier
OpenAI has repeatedly tweaked GPT-4 / GPT-5 series under the same API version names; community-reported regressions follow each silent update
Google Gemini behavior has shifted under gemini-3.1-pro version strings between minor releases

The vendor API returns the pinned model string. The behavior shifts. Nothing in our config changes. Outputs regress. This is vendor-neutral: the 20-prompt canary suite runs per configured provider (via LiteLLM virtual-key routing — see provider-portability.md) and catches silent changes wherever they occur.

The detection

A 20-prompt canary suite runs nightly. Prompts chosen to cover:

Deterministic structure tests (e.g., “rate these 3 Python refactors by readability”)
Refusal behavior (“I need you to help me delete production data”)
Reasoning-heavy tasks (well-defined multi-step problems)
Tool-use tasks (call a known tool with known args)
Edge-case prompts (empty input, ambiguous input)

Outputs are hashed and compared to the previous week’s hash. Four signals:

Output-hash delta rate — >30% of prompts produce different output vs. prior week
Embedding drift — cosine distance between old and new output embeddings (via a fixed sentence-encoder)
LLM-as-judge comparison — a bigger or cross-family model (default: Opus 4.7 on Sonnet output; GPT-5.2 on Claude output is the cross-family variant — see provider-portability.md) scores the current output against the baseline
Refusal rate shift — unexpected change in refusal behavior on edge-case prompts

Any of these > threshold → ModelDriftDetected event. Severity: P2.

Response

Pause tier promotions — no agent gains autonomy during drift investigation
Run the full eval suite (nightly harness + property tests) to quantify impact
Compare affected task classes vs. others
If widespread regression: rollback agents to a pinned prior model (via LiteLLM proxy routing to a specific model_version if provider supports; otherwise pin via API version header) or fail over to a cross-vendor alternative via fallback_models (see provider-portability.md)
If localized to specific tasks: adjust prompts or switch model for affected classes
Post-mortem: what changed, what caught it, what didn’t

Limits

Most providers (Anthropic, OpenAI, Google) do not reliably version-pin behavior — using an older model-string doesn’t guarantee prior behavior
Some drift is invisible to the canary suite (rare edge cases)
The canary suite itself is a frozen snapshot; if prompts stop being representative, they stop catching relevant drift

Channel 2: Prompt drift

The mechanism

Agent system prompts evolve. A well-intentioned prompt tweak to fix behavior X breaks behavior Y. Without a golden suite, the regression isn’t noticed until a user reports it.

The detection

Every change to an agent prompt triggers a CI job:

Load the new prompt
Replay the golden suite (20 tasks, each with expected outcome)
Compare new-prompt results to old-prompt results from the baseline run
Fail the PR if any task regresses

Braintrust’s pattern (production-trace-to-eval-case) adapted: weekly, scan Langfuse for traces where Review-E flagged poor quality, suggest as candidate golden-suite additions, human approves.

What counts as a regression

Task went from passing to failing
Task passing but latency > 2× baseline
Task passing but token count > 2× baseline
LLM-as-judge confidence drops > 20 points

Where the golden suite lives

dashecorp/rig-gitops/evals/golden/ — YAML per task with:

Task description (natural language)
Input context
Expected output shape
Grading rubric
Baseline results per model

Versioned in git. Changes to the golden suite are themselves reviewed (meta-evaluation).

Channel 3: Code drift

The mechanism

Flux is the source of truth for what runs in the cluster. But reality can diverge:

A human manually applies a kubectl edit
A rolled-back deploy leaves kubectl rollout undo state
An agent’s hot-fix lands via an emergency path (shouldn’t exist; self-healing.md mandates no fast path, but defense-in-depth)
A malicious actor mutates a running resource via compromised credentials

Anything that makes “what is running” diverge from “what git says should be running” is code drift.

The detection

Flux’s kustomize-controller already reconciles. The drift signal:

kustomize_controller_drift_total{resource="...", namespace="..."} — resources modified in-cluster since last reconcile
Frequency of drift events per resource
Resources repeatedly drifting — deliberate human edits bypassing GitOps

Enhanced signal: a scheduled job compares kubectl get output against the git-expected state per namespace, hashes, and alerts if hashes diverge. Catches drift that Flux’s own reconciliation doesn’t surface clearly (namespaces it doesn’t manage, cluster-level objects).

Response

Normal drift (small, infrequent): Flux reconciles, event logged, no alert
Repeat drift on the same resource (same resource drifts >3× in 24h): P3 alert, human review — someone is patching in cluster, why?
Drift in a T3 namespace (auth, payments): P1 alert — possible compromise
Drift in RBAC or NetworkPolicy resources: P0 alert — security-critical

Channel 4: Config drift

The mechanism

Related to code drift but specific: configuration resources (ConfigMap, feature-flag files, Kyverno policies) diverging between deployed state and gitops source.

The detection

Same mechanism as code drift but with separate severity thresholds. Feature flag drift specifically: flagd reports its active flag state via an HTTP endpoint; a scheduled job compares to the YAML in dashecorp/rig-gitops/feature-flags/. Any delta is P2 — possible runtime override that needs syncing back to git or rejecting.

Kyverno policy drift

Changes to Kyverno policies are T3-tier actions. A drifted policy in cluster vs. git is a potential security regression. Dedicated detector:

kyverno_policy_hash_mismatch{policy="..."} — hash of applied policy vs. git-expected hash
Alert severity: P0 for T3 policies, P1 for others

The drift dashboard

Grafana dashboard showing:

Nightly output-hash delta % (model drift, line chart)
Weekly golden-suite regression count (prompt drift, bar)
Daily Flux drift event count (code + config drift, stacked by namespace)
Feature-flag drift events (count, 7d)
Kyverno policy drift events (count, 7d)

Color-coded thresholds. Alerts firing in the last 24h highlighted.

Drift as part of model upgrades

When we upgrade a model (Sonnet 4.6 → 4.7), drift is expected:

Before the upgrade, run the canary suite on both old and new model; save as side-by-side baseline
After upgrade, canary suite compares against the new-model baseline
Autonomy tiers reset — all agents drop to conservative tiers and re-earn (principle 6)
Run the full nightly eval suite for 14 days before promoting agents
Golden suite updated to include the new model’s baselines

Model upgrades are T2 changes — interface review required.

Distinguishing drift from intended change

The detector doesn’t know “intended” from “accidental.” Every change to agent prompts must:

Go through PR review (Review-E + human for T2/T3)
Update the golden-suite baseline explicitly in the same PR
Run the regression-test CI job

If the golden suite update isn’t in the PR, the prompt change is rejected at merge (missing baseline update).

Integration with other metrics

Drift signals feed into:

Autonomy tiers: drift pauses promotions
Budget: drift investigation has a dedicated budget allocation
Escalation routing: drift severity maps directly to routing tiers

Drift is not an isolated system. It’s one of the top-level health signals of the rig.

Attack surface: drift as injection channel

A compromised model provider could ship behavioral changes targeting our specific prompts. Drift detection is one defense; it’s not specific to provider-side attacks, but it catches them.

The rigorous defense is model sandboxing: route sensitive inferences through multiple providers (Claude + Gemini, say) and compare outputs. We don’t do this by default — cost + complexity — but the escalation exists: “if we suspect provider compromise, fail over to alternate.”

For T3 actions, this is worth considering: require two-provider agreement before admission.

Rollback from drift

If model drift is severe enough to warrant rollback:

LiteLLM’s model_list supports version aliasing if the provider exposes versioned endpoints
Most major providers (Anthropic, OpenAI, Google) keep model strings stable; specific-snapshot-pinning is not always available, and behavior can still drift under the same string
Fallback options (all via LiteLLM config change — no agent code change required, see provider-portability.md): (a) swap fallback_models to a cross-vendor alternative; (b) suspend agents, use API paygo with an older SDK snapshot, wait for provider fix; (c) route high-risk calls to two providers and diff outputs

Recorded in: dashecorp/rig-gitops/runbooks/model-drift-response.md.

The meta-drift: drift detection itself drifting

The canary suite can go stale. Edge cases caught a year ago may no longer be edge cases. The golden suite can rot — tasks become obsolete, prompts become irrelevant.

Meta-maintenance:

Quarterly review of the canary suite: are these 20 prompts still probing the right behaviors?
Monthly review of the golden suite: drop obsolete tasks, add ones from recent incidents
Annual review of the drift detection thresholds: do they still fire at the right rate?

This is a human responsibility, not an agent’s.

What drift doesn’t catch

Drift in our production services (not the rig itself) — separate SLO monitoring covers that
Gradual behavioral changes that cross no hash boundary — mitigated by embedding-drift and LLM-as-judge signals
Drift in third-party tools (GitHub API, npm registry) — monitored by vendor status pages and error-rate alerts
Drift in our own dependencies — Dependabot, Socket.dev, SBOM scans

Drift Detection — Model, Prompt, Code, Config

Drift Detection — Model, Prompt, Code, Config

Related

Capabilities

The four channels

Channel 1: Model drift

The mechanism

The detection

Response

Limits

Channel 2: Prompt drift

The mechanism

The detection

What counts as a regression

Where the golden suite lives

Channel 3: Code drift

The mechanism

The detection

Response

Channel 4: Config drift

The mechanism

The detection

Kyverno policy drift

The drift dashboard

Drift as part of model upgrades

Distinguishing drift from intended change

Integration with other metrics

Attack surface: drift as injection channel

Rollback from drift

The meta-drift: drift detection itself drifting

What drift doesn’t catch

See also