Governing 14 Autonomous Agents with Three Contracts

I run 14 autonomous agent tasks: weekly distillations, daily syncs, career analysis, security audits, diagnostics, game dev loops. Each task used to be its own silo. There was no shared contract for what a task should do before it starts, how it should behave, or what it should record when it finishes.

The result was predictable: monitoring diverged from actual outputs (75% false-alarm rate, documented separately), knowledge accumulated in isolated task-specific logs rather than a shared store, and a single failed task was invisible to everything else in the system. No task could learn from another task's failures.

The fix was to stop thinking about individual tasks and start thinking about a protocol every task agrees to follow.

The Three Contracts

Every task, regardless of which platform runs it (scheduled job, interactive session, CI hook), agrees to three behaviors:

Preflight means the task checks what happened last time before doing anything. Did the previous run succeed? What was the quality score? Are there known failure patterns in the ops-fix log that affect this work? This takes about thirty seconds and prevents repeating a known mistake.

Execute is the task's actual work. The protocol says nothing about what this is — it is entirely task-specific. The only constraint is that it reads from the shared knowledge store and respects prior decisions recorded there.

Emit means the task records a structured result when it finishes. Not a free-form log entry — a structured event: task ID, quality score (1–5), output file path, one-line insight. This goes into a shared append-only log that every other task can query.

What Changes When All 14 Tasks Follow This

Before	After	Mechanism
Monitoring watched stale file paths	Emit records canonical path every run	Path drift eliminated at source
Each task re-learned the same failure	Preflight surfaces ops-fix log entries	Knowledge propagates across tasks
No visibility into task health	Quality scores accumulate over time	Declining quality → visible before failure
Silent task failures	Failed runs show 0-score in shared log	Failure is observable without polling

Platform Independence Is Not Accidental

The protocol does not care which platform executes the task. I run tasks in two environments: an interactive coding environment with filesystem access, and a cloud-based agent runtime. The same event log is written to by both. The same knowledge store is queried by both.

This was a deliberate design decision: the protocol had to work whether a task was running in a local session or a scheduled remote job. Any protocol that only worked in one environment would immediately create a split where some tasks were ecosystem citizens and others were not.

The Platform Field

Each task is registered in a routing file with a platform field: "claude-code" or "cowork". This is how meta-tasks (the diagnostics and orchestration agents) know how to invoke and monitor each other. Without this field, the meta-layer cannot distinguish between tasks that run locally and tasks that run remotely.

{
  "weekly-codex-distill": {
    "platform": "claude-code",
    "notify_companion": true,
    "payload_keys": ["file_path", "insight", "posts_published"]
  },
  "operator-health-loop": {
    "platform": "cowork",
    "notify_companion": false
  }
}

Adding this field to all 14 tasks was the final step in making the protocol complete. Before this, a meta-task analyzing the system had to guess which environment each task ran in.

What This Looks Like in Practice

I run a weekly distillation task that reads the git log, extracts technical learnings, and generates a report. Under the old system, this task ran in isolation. Under the protocol:

Preflight checks the last 3 runs: was quality consistently above 3? Did any monitoring path mismatch occur last week? (It had — the Preflight step surfaced the ops-fix entry before the task even started.)
Execute runs the distillation normally.
Emit records: quality 4/5, output path, insight "monitoring path divergence root-caused and fixed."

The diagnostics task that runs the next morning reads that Emit record and can skip re-investigating the monitoring issue because the resolution is already recorded.

What I Would Do Differently

The Emit step should have been the first thing I standardized, not the last. I added Preflight first because it felt like the highest-value step. But Preflight can only query history if history has been recorded — and without Emit, that history was sparse and inconsistent. If I were starting over, I would define the event log schema first and wire Emit into every task before adding anything else.

Evolution Log

2026-04-01 — Initial observation. Ecosystem Protocol v1.0 established after 14 tasks migrated from Cowork to Claude Code. Platform field standardized across all task registrations. Three-contract system formalized.