← All Notes

Escalation Chains: How AI Systems Learn to Fix Themselves

When something breaks in a multi-agent system, the default answer is "notify the human." A task fails — Slack message. Quality score drops — email. Unknown file state — stop and ask. This feels responsible. In practice, it's how you train yourself to ignore alerts, because 80% of them could have been resolved without you.

The harder design question is: which failures actually require a human? Most don't. A stale file path can be detected and fixed automatically. A missing script can be restored from git. A low quality score trend can trigger a diagnostic without waking anyone up. The failures that genuinely need human judgment are a small subset — and that subset is knowable in advance if you design the escalation chain carefully.

This is what we built, and how v2.0 changed it.

Three Layers, Each Resolving What It Can

Layer 0 — All Tasks Every task calls task_event_emit.py on completion. Quality + insight + output path recorded. Layer 1 — daily-repo-sync Silent failures, low quality trends, script drift. Auto-fixes: git restore missing scripts, BOM removal, stale file path correction. Unresolved → queued for L2. Layer 2 — rapid-diagnostics Consumes L1 queue. Deeper analysis: drift pattern recognition, SKILL.md auto-repair, empty description recovery. Unresolved → queued for L3. Layer 3 — sprint-orchestra All unresolved issues from L1+L2. Requires human decision. Goal: empty queue most weeks. Also: strategic ecosystem review, bi-weekly. unresolved unresolved human needed

Each layer handles what it can and passes up only what it can't. This is the design principle — not "alert on everything" and not "auto-fix everything," but a principled partitioning of which failures belong at which resolution level.

What v2.0 Changed

The v1.0 chain existed but was conservative: almost everything had requires_user_approval: true. In practice, this meant the escalation queue accumulated faster than the human cleared it, and the system quietly stopped being self-correcting.

v2.0 was a deliberate expansion of the auto-fix boundary. The guiding question: what is the actual downside if this auto-fix is wrong? For most infrastructure issues, the downside is "a slightly worse fix that will be caught at L2 or L3" — not catastrophic. We shifted those to auto-fix. For issues where being wrong creates real damage (deleting valid data, breaking live pipelines), we kept human approval.

Categoryv1.0v2.0Rationale
Missing scriptAlert humangit restore autoL1 commits daily; history always exists
BOM encodingAlert humanAuto-removeSafe, reversible, 100% correct action
Stale file pathAlert humanAuto-correct + verifyEmit records canonical path each run
Empty SKILL.md descriptionAlert humanAuto-repair from contentFormat issue, not content issue
P0 bug in gameAlert humanAuto-route to SheldonSheldon has fix authority; human reviews outcome
Data schema changeAlert humanStill humanDownstream impact unpredictable

The P0 Auto-Routing Case

The most aggressive change in v2.0 was P0 bug auto-routing. When the escalation runner detects a P0 issue (game-breaking bug, confirmed by test failure), it now automatically generates a fix_request and sends it to the Sheldon agent's inbox — without waiting for a human to relay the request.

This sounds risky. In practice, the preconditions are strict:

  1. The issue must be confirmed by the automated test pipeline (not just flagged)
  2. Sheldon receives the request with full context: what failed, what the expected behavior is, and a mandatory preflight instruction to check its own memory before attempting a fix
  3. Sheldon is expected to respond to the comms channel, not silently modify code

The human still sees the outcome. What they don't have to do is relay "Sheldon, please look at this" — which was the bottleneck. I was the message bus between the detection layer and the fix layer. That's now automated.

The Deterministic Principle

One constraint we held firm on: the escalation runner itself must be a deterministic Python script, not an LLM call. Every detection rule, every auto-fix action, every escalation condition is explicit code.

The reason is reproducibility. If an auto-fix runs at 3am and something goes wrong, I need to be able to read the script and trace exactly what happened. An LLM-based runner that "decides" what to fix based on context would be faster to prototype and worse to debug. The LLM layer is downstream — it consumes the structured output of the runner; it doesn't drive the runner's decisions.

# escalation_runner.py — simplified L1 detection
for task_id, task_data in routing.items():
    skill_path = resolve_skill_path(task_id)
    if not skill_path.exists():
        issues.append({
            "type": "script_drift",
            "task": task_id,
            "action": "git_restore",          # deterministic fix
            "requires_human": False
        })

What We Don't Know Yet

The auto-fix boundary in v2.0 is a hypothesis. We expanded it based on reasoning about downside risk, but we haven't run it through enough failure cycles to know whether we've gotten the partitioning right. It's possible some auto-fixes we enabled will create secondary failures that are harder to diagnose than the original issue.

The Sheldon auto-routing is the most uncertain piece. Automated agent-to-agent task delegation without a human in the middle is new territory. It's worked in the two P0 cases we've tested it on. Whether it degrades gracefully when the fix is ambiguous — I genuinely don't know yet.

Evolution Log

  • 2026-04-02 — Initial publication. escalation_chain v2.0 deployed: auto-fix boundary expanded, requires_user_approval minimized, P0 Sheldon auto-routing added. deterministic runner (escalation_runner.py) confirmed as L0–L3 control plane. Two P0 auto-routes successfully completed. Long-term boundary validation still in progress.