Skip to content

Drift Detection — Model, Prompt, Code, Config

Drift Detection — Model, Prompt, Code, Config

Section titled “Drift Detection — Model, Prompt, Code, Config”
  • 🟡 code-drift-flux — Code drift via Flux reconciliation events. Flux detects drift, not yet alerted-on.
  • 🟡 config-drift-flux — Config drift via Flux + kube-diff. Flux detects. Alerts not yet wired.
  • model-drift-canary — Model drift 20-prompt canary suite. Phase 6. Per-provider.

!!! abstract “TL;DR” Systems drift in four independent channels: model (provider silent-changes under same version string), prompt (agent config changes break existing behavior), code (deployed ≠ git main), and config (manifests ≠ gitops source). Each channel needs its own detector, its own baseline, its own alert.

!!! warning “91% of production LLMs drift within 90 days” InsightFinder’s 2025 study — median detection lag without monitoring is 14–18 days. “It still says Sonnet 4.6 in the config but the behavior changed” is the single most common silent regression class. We watch for it explicitly via a 20-prompt canary suite.

ChannelWhat driftsDetection signal
Model driftSame model string, different behavior from the provider20-prompt canary suite output-hash delta
Prompt driftAgent system prompt changed, didn’t realize it broke somethingGolden-suite regression in CI
Code driftDeployed code vs. main branchFlux reconciliation + hash comparison
Config driftDeployed manifests vs. gitops sourceFlux + kube-diff

Each channel needs its own detector, its own baseline, its own alert.

Model providers (Anthropic, OpenAI, Google) have all shipped silent behavioral changes under stable version strings. Concrete examples:

  • Anthropic Sonnet 4.6 behavior changed in early 2026 following a rate-limit-fix deploy — reasoning quality dip reported across multiple communities; Claude Instant and Claude 2 variants shifted similarly earlier
  • OpenAI has repeatedly tweaked GPT-4 / GPT-5 series under the same API version names; community-reported regressions follow each silent update
  • Google Gemini behavior has shifted under gemini-3.1-pro version strings between minor releases

The vendor API returns the pinned model string. The behavior shifts. Nothing in our config changes. Outputs regress. This is vendor-neutral: the 20-prompt canary suite runs per configured provider (via LiteLLM virtual-key routing — see provider-portability.md) and catches silent changes wherever they occur.

A 20-prompt canary suite runs nightly. Prompts chosen to cover:

  • Deterministic structure tests (e.g., “rate these 3 Python refactors by readability”)
  • Refusal behavior (“I need you to help me delete production data”)
  • Reasoning-heavy tasks (well-defined multi-step problems)
  • Tool-use tasks (call a known tool with known args)
  • Edge-case prompts (empty input, ambiguous input)

Outputs are hashed and compared to the previous week’s hash. Four signals:

  1. Output-hash delta rate — >30% of prompts produce different output vs. prior week
  2. Embedding drift — cosine distance between old and new output embeddings (via a fixed sentence-encoder)
  3. LLM-as-judge comparison — a bigger or cross-family model (default: Opus 4.7 on Sonnet output; GPT-5.2 on Claude output is the cross-family variant — see provider-portability.md) scores the current output against the baseline
  4. Refusal rate shift — unexpected change in refusal behavior on edge-case prompts

Any of these > threshold → ModelDriftDetected event. Severity: P2.

  1. Pause tier promotions — no agent gains autonomy during drift investigation
  2. Run the full eval suite (nightly harness + property tests) to quantify impact
  3. Compare affected task classes vs. others
  4. If widespread regression: rollback agents to a pinned prior model (via LiteLLM proxy routing to a specific model_version if provider supports; otherwise pin via API version header) or fail over to a cross-vendor alternative via fallback_models (see provider-portability.md)
  5. If localized to specific tasks: adjust prompts or switch model for affected classes
  6. Post-mortem: what changed, what caught it, what didn’t
  • Most providers (Anthropic, OpenAI, Google) do not reliably version-pin behavior — using an older model-string doesn’t guarantee prior behavior
  • Some drift is invisible to the canary suite (rare edge cases)
  • The canary suite itself is a frozen snapshot; if prompts stop being representative, they stop catching relevant drift

Agent system prompts evolve. A well-intentioned prompt tweak to fix behavior X breaks behavior Y. Without a golden suite, the regression isn’t noticed until a user reports it.

Every change to an agent prompt triggers a CI job:

  1. Load the new prompt
  2. Replay the golden suite (20 tasks, each with expected outcome)
  3. Compare new-prompt results to old-prompt results from the baseline run
  4. Fail the PR if any task regresses

Braintrust’s pattern (production-trace-to-eval-case) adapted: weekly, scan Langfuse for traces where Review-E flagged poor quality, suggest as candidate golden-suite additions, human approves.

  • Task went from passing to failing
  • Task passing but latency > 2× baseline
  • Task passing but token count > 2× baseline
  • LLM-as-judge confidence drops > 20 points

dashecorp/rig-gitops/evals/golden/ — YAML per task with:

  • Task description (natural language)
  • Input context
  • Expected output shape
  • Grading rubric
  • Baseline results per model

Versioned in git. Changes to the golden suite are themselves reviewed (meta-evaluation).

Flux is the source of truth for what runs in the cluster. But reality can diverge:

  • A human manually applies a kubectl edit
  • A rolled-back deploy leaves kubectl rollout undo state
  • An agent’s hot-fix lands via an emergency path (shouldn’t exist; self-healing.md mandates no fast path, but defense-in-depth)
  • A malicious actor mutates a running resource via compromised credentials

Anything that makes “what is running” diverge from “what git says should be running” is code drift.

Flux’s kustomize-controller already reconciles. The drift signal:

  • kustomize_controller_drift_total{resource="...", namespace="..."} — resources modified in-cluster since last reconcile
  • Frequency of drift events per resource
  • Resources repeatedly drifting — deliberate human edits bypassing GitOps

Enhanced signal: a scheduled job compares kubectl get output against the git-expected state per namespace, hashes, and alerts if hashes diverge. Catches drift that Flux’s own reconciliation doesn’t surface clearly (namespaces it doesn’t manage, cluster-level objects).

  • Normal drift (small, infrequent): Flux reconciles, event logged, no alert
  • Repeat drift on the same resource (same resource drifts >3× in 24h): P3 alert, human review — someone is patching in cluster, why?
  • Drift in a T3 namespace (auth, payments): P1 alert — possible compromise
  • Drift in RBAC or NetworkPolicy resources: P0 alert — security-critical

Related to code drift but specific: configuration resources (ConfigMap, feature-flag files, Kyverno policies) diverging between deployed state and gitops source.

Same mechanism as code drift but with separate severity thresholds. Feature flag drift specifically: flagd reports its active flag state via an HTTP endpoint; a scheduled job compares to the YAML in dashecorp/rig-gitops/feature-flags/. Any delta is P2 — possible runtime override that needs syncing back to git or rejecting.

Changes to Kyverno policies are T3-tier actions. A drifted policy in cluster vs. git is a potential security regression. Dedicated detector:

  • kyverno_policy_hash_mismatch{policy="..."} — hash of applied policy vs. git-expected hash
  • Alert severity: P0 for T3 policies, P1 for others

Grafana dashboard showing:

  • Nightly output-hash delta % (model drift, line chart)
  • Weekly golden-suite regression count (prompt drift, bar)
  • Daily Flux drift event count (code + config drift, stacked by namespace)
  • Feature-flag drift events (count, 7d)
  • Kyverno policy drift events (count, 7d)

Color-coded thresholds. Alerts firing in the last 24h highlighted.

When we upgrade a model (Sonnet 4.6 → 4.7), drift is expected:

  1. Before the upgrade, run the canary suite on both old and new model; save as side-by-side baseline
  2. After upgrade, canary suite compares against the new-model baseline
  3. Autonomy tiers reset — all agents drop to conservative tiers and re-earn (principle 6)
  4. Run the full nightly eval suite for 14 days before promoting agents
  5. Golden suite updated to include the new model’s baselines

Model upgrades are T2 changes — interface review required.

The detector doesn’t know “intended” from “accidental.” Every change to agent prompts must:

  1. Go through PR review (Review-E + human for T2/T3)
  2. Update the golden-suite baseline explicitly in the same PR
  3. Run the regression-test CI job

If the golden suite update isn’t in the PR, the prompt change is rejected at merge (missing baseline update).

Drift signals feed into:

  • Autonomy tiers: drift pauses promotions
  • Budget: drift investigation has a dedicated budget allocation
  • Escalation routing: drift severity maps directly to routing tiers

Drift is not an isolated system. It’s one of the top-level health signals of the rig.

Attack surface: drift as injection channel

Section titled “Attack surface: drift as injection channel”

A compromised model provider could ship behavioral changes targeting our specific prompts. Drift detection is one defense; it’s not specific to provider-side attacks, but it catches them.

The rigorous defense is model sandboxing: route sensitive inferences through multiple providers (Claude + Gemini, say) and compare outputs. We don’t do this by default — cost + complexity — but the escalation exists: “if we suspect provider compromise, fail over to alternate.”

For T3 actions, this is worth considering: require two-provider agreement before admission.

If model drift is severe enough to warrant rollback:

  • LiteLLM’s model_list supports version aliasing if the provider exposes versioned endpoints
  • Most major providers (Anthropic, OpenAI, Google) keep model strings stable; specific-snapshot-pinning is not always available, and behavior can still drift under the same string
  • Fallback options (all via LiteLLM config change — no agent code change required, see provider-portability.md): (a) swap fallback_models to a cross-vendor alternative; (b) suspend agents, use API paygo with an older SDK snapshot, wait for provider fix; (c) route high-risk calls to two providers and diff outputs

Recorded in: dashecorp/rig-gitops/runbooks/model-drift-response.md.

The meta-drift: drift detection itself drifting

Section titled “The meta-drift: drift detection itself drifting”

The canary suite can go stale. Edge cases caught a year ago may no longer be edge cases. The golden suite can rot — tasks become obsolete, prompts become irrelevant.

Meta-maintenance:

  • Quarterly review of the canary suite: are these 20 prompts still probing the right behaviors?
  • Monthly review of the golden suite: drop obsolete tasks, add ones from recent incidents
  • Annual review of the drift detection thresholds: do they still fire at the right rate?

This is a human responsibility, not an agent’s.

  • Drift in our production services (not the rig itself) — separate SLO monitoring covers that
  • Gradual behavioral changes that cross no hash boundary — mitigated by embedding-drift and LLM-as-judge signals
  • Drift in third-party tools (GitHub API, npm registry) — monitored by vendor status pages and error-rate alerts
  • Drift in our own dependencies — Dependabot, Socket.dev, SBOM scans