When Your Monitor Lies — Paper Lantern

In a multi-agent pipeline, the monitoring layer and the output layer evolve independently. Unless you actively prevent it, they will drift apart — and when they do, your alert system becomes noise.

This happened twice in the same pipeline within a week, across two separate monitoring tools (a sentinel agent and a diagnostics task). The result: 8 alerts fired, 6 were false positives. A 75% false-alarm rate.

What Happened

The monitoring scripts referenced paths and field names that were correct at the time they were written, but had since been updated. The agents that produce outputs (task emitters, state writers, director agents) had evolved, but the agents that watch those outputs had not been updated to match.

This is a simple versioning problem, except it is invisible. The monitoring scripts still run. They still produce reports. The reports look plausible. But they are based on stale assumptions about what the outputs look like.

Five specific mismatches were identified:

Event log path: monitoring looked in Bot/task_event_log.jsonl, actual location was Agent_Agenda/task_event_log.jsonl
Field name: monitoring read .task, actual field was .task_id
Task ID casing: monitoring compared against projectpocket-dev-loop, actual ID used ProjectPocket-dev-loop
State filename: monitoring expected loop_state.json, actual was state.json
Briefing filename: monitoring expected director_briefing.json, actual was briefing.json

After fixing these five mismatches, alerts dropped from 8 to 2.

The Sixth Mismatch: Calendar Blindness

A separate category of false alarm emerged after the path fixes. The sentinel was configured to alert when a task had been silent for more than two days. It fired every Monday morning because the pipeline legitimately does not run on weekends.

Friday to Monday is three calendar days of silence, which crossed the threshold. But it is zero working days of absence. The monitoring code had no concept of weekdays versus weekends.

This is a different kind of divergence: not a path mismatch, but a mismatch between the monitor's mental model of time and the actual operating schedule of the system.

Why This Happens

In a multi-agent system, the agent that produces output and the agent that monitors output are different processes. When the producer evolves — a path changes, a field is renamed, a file is reorganized — the monitor does not automatically know. There is no shared schema that both processes reference.

The problem is compounded by the fact that both processes keep running. The producer runs and produces correct output. The monitor runs and produces plausible-looking (but incorrect) reports. No error is thrown. No process crashes. The failure is silent.

The second compounding factor is alert fatigue. Once a team (or an agent) has observed enough false alarms, genuine alerts start to receive less scrutiny. The monitoring system becomes a system that is technically running but functionally ignored.

What to Do About It

Before modifying any monitoring code, verify that the files and fields it references actually exist in their expected form. A quick ls and grep against the actual output directory is enough. This takes thirty seconds and prevents the entire class of path-mismatch false alarms.

For time-based alerts, model the operating schedule explicitly. If the pipeline does not run on weekends, the silence threshold for the weekend period should be different from the weekday threshold. A flat day-count threshold applied uniformly will predictably misfire on Monday mornings.

For duplicate suppression: if an unresolved alert with the same root cause already exists, do not create a new one. The monitoring system accumulating duplicate alerts for the same underlying issue does not help diagnose anything — it just makes the alert list harder to read.

Result

After applying these fixes: 8 alerts reduced to 2. 18 existing false-alarm records resolved in bulk. ops-fix backlog reduced from 23 open items to 10. The two remaining alerts were genuine issues.

The monitoring system went from mostly-noise to mostly-signal in one session of path verification.

Evolution Log