When something breaks in a multi-agent system, the default answer is "notify the human." A task fails — Slack message. Quality score drops — email. Unknown file state — stop and ask. This feels responsible. In practice, it's how you train yourself to ignore alerts, because 80% of them could have been resolved without you.
The harder design question is: which failures actually require a human? Most don't. A stale file path can be detected and fixed automatically. A missing script can be restored from git. A low quality score trend can trigger a diagnostic without waking anyone up. The failures that genuinely need human judgment are a small subset — and that subset is knowable in advance if you design the escalation chain carefully.
This is what we built, and how v2.0 changed it.
Three Layers, Each Resolving What It Can
Each layer handles what it can and passes up only what it can't. This is the design principle — not "alert on everything" and not "auto-fix everything," but a principled partitioning of which failures belong at which resolution level.
What v2.0 Changed
The v1.0 chain existed but was conservative: almost everything had requires_user_approval: true. In practice, this meant the escalation queue accumulated faster than the human cleared it, and the system quietly stopped being self-correcting.
v2.0 was a deliberate expansion of the auto-fix boundary. The guiding question: what is the actual downside if this auto-fix is wrong? For most infrastructure issues, the downside is "a slightly worse fix that will be caught at L2 or L3" — not catastrophic. We shifted those to auto-fix. For issues where being wrong creates real damage (deleting valid data, breaking live pipelines), we kept human approval.
| Category | v1.0 | v2.0 | Rationale |
|---|---|---|---|
| Missing script | Alert human | git restore auto | L1 commits daily; history always exists |
| BOM encoding | Alert human | Auto-remove | Safe, reversible, 100% correct action |
| Stale file path | Alert human | Auto-correct + verify | Emit records canonical path each run |
| Empty SKILL.md description | Alert human | Auto-repair from content | Format issue, not content issue |
| P0 bug in game | Alert human | Auto-route to Sheldon | Sheldon has fix authority; human reviews outcome |
| Data schema change | Alert human | Still human | Downstream impact unpredictable |
The P0 Auto-Routing Case
The most aggressive change in v2.0 was P0 bug auto-routing. When the escalation runner detects a P0 issue (game-breaking bug, confirmed by test failure), it now automatically generates a fix_request and sends it to the Sheldon agent's inbox — without waiting for a human to relay the request.
This sounds risky. In practice, the preconditions are strict:
- The issue must be confirmed by the automated test pipeline (not just flagged)
- Sheldon receives the request with full context: what failed, what the expected behavior is, and a mandatory preflight instruction to check its own memory before attempting a fix
- Sheldon is expected to respond to the comms channel, not silently modify code
The human still sees the outcome. What they don't have to do is relay "Sheldon, please look at this" — which was the bottleneck. I was the message bus between the detection layer and the fix layer. That's now automated.
The Deterministic Principle
One constraint we held firm on: the escalation runner itself must be a deterministic Python script, not an LLM call. Every detection rule, every auto-fix action, every escalation condition is explicit code.
The reason is reproducibility. If an auto-fix runs at 3am and something goes wrong, I need to be able to read the script and trace exactly what happened. An LLM-based runner that "decides" what to fix based on context would be faster to prototype and worse to debug. The LLM layer is downstream — it consumes the structured output of the runner; it doesn't drive the runner's decisions.
# escalation_runner.py — simplified L1 detection
for task_id, task_data in routing.items():
skill_path = resolve_skill_path(task_id)
if not skill_path.exists():
issues.append({
"type": "script_drift",
"task": task_id,
"action": "git_restore", # deterministic fix
"requires_human": False
})
What We Don't Know Yet
The auto-fix boundary in v2.0 is a hypothesis. We expanded it based on reasoning about downside risk, but we haven't run it through enough failure cycles to know whether we've gotten the partitioning right. It's possible some auto-fixes we enabled will create secondary failures that are harder to diagnose than the original issue.
The Sheldon auto-routing is the most uncertain piece. Automated agent-to-agent task delegation without a human in the middle is new territory. It's worked in the two P0 cases we've tested it on. Whether it degrades gracefully when the fix is ambiguous — I genuinely don't know yet.
Evolution Log
- 2026-04-02 — Initial publication. escalation_chain v2.0 deployed: auto-fix boundary expanded, requires_user_approval minimized, P0 Sheldon auto-routing added. deterministic runner (escalation_runner.py) confirmed as L0–L3 control plane. Two P0 auto-routes successfully completed. Long-term boundary validation still in progress.