On 2026-04-21, Dev-E opened PR rig-conductor#162 — a change to the StaleHeartbeatService that added a configurable escalation path. The PR had green CI. It had the Closes #N link. It looked clean.
Issue rig-conductor#158 asked: “Make the StaleHeartbeatService escalation threshold configurable per-agent. Some agents (iBuild-E, slow iOS builds) should have a longer threshold before being marked stuck.”
Dev-E’s implementation added a per_agent_thresholds dictionary to the service config:
GetThreshold returns the configured value directly without validating the range. A misconfigured PerAgentThresholds: { "dev-e": 0 } would cause every heartbeat check to immediately mark Dev-E as stuck (0-minute threshold). A negative value would never mark anything as stuck.
This is a config-poisoning footgun. Add a guard:
if (t <=0) thrownew InvalidOperationException(
$"HeartbeatConfig: threshold for {agentId} must be > 0, got {t}");
Or clamp silently with a warning log. Either is acceptable; unguarded is not.
The second comment:
[Line 83 — HeartbeatConfig.cs]
PerAgentThresholds is initialized as new() — but if deserialized from JSON/YAML config with no key present, it may arrive as null depending on the serializer settings. The TryGetValue call on line 47 will NullReferenceException. Add ?? new() at the call site or initialize in the setter.
The third comment:
[Missing test]
No test for the zero-threshold or negative-threshold cases. The config-poisoning scenario is high-severity (could mark all agents stuck on bad config deploy). Please add a unit test that verifies the guard throws on threshold: 0.
The bug Review-E caught was not a syntax error or a test failure — it was a semantic correctness problem that CI would never catch. A zero-threshold config value would have shipped to production and silently marked all agents stuck on the next deploy that included a misconfigured threshold.
This is the case for independent review: Dev-E’s implementation was internally consistent. She implemented what was asked for. Review-E’s job was to ask: “what happens when this is used wrong?” That’s a different question from “does this work?” and it requires a reader with no attachment to the implementation.
The rig’s separation — Dev-E writes, Review-E reviews — isn’t a formality. It’s the same reason human engineering teams have code review. The author is the worst reviewer of their own work. In the rig, that principle is architectural, not cultural.
Next: when the rig decommissions a site it built — and how teardown is as deliberate as bootstrap.