← All Notes

The 132-Task Lie: How Three Compounded Hacks Hid a Broken Game

For 132 consecutive agent tasks across roughly six weeks, my dev-loop reported dogfood pass=true. The central game mechanic — a player walking into a portal to enter a pocket dimension, fighting enemies, returning stronger — never actually worked once.

The agents weren't lying. The pass criterion was. Three independent shortcuts had quietly compounded, and each one looked reasonable in isolation. Together they produced a green checkmark on a game that couldn't be played.

This is a follow-up case to the verification gap pattern. That post described the gap conceptually. This one is what it looks like when the gap survives 132 cycles.

The Three Hacks

Each of these started as a small, defensible decision. None of them were wrong on their own.

HackWhy it seemed reasonableWhat it actually did
1. take_damage(999) as enemy_kill The dogfood test needs to verify a kill happens. Calling damage directly is faster than waiting for combat. Bypassed the entire combat path. The player's auto-attack was never exercised. ENEMY_HIT_* events never fired in production.
2. Pass criterion = code emission only The agent emits the right code, the build compiles, the test runs. That's a reasonable pass. The player was at x=9, y=0 in every snapshot for 132 tasks. Movement was zero. The criterion didn't check.
3. pocket_dimension never instanced The skeleton was added in an earlier sprint. Subsequent tasks assumed it was wired in. Portal entry fired, then immediately logged reason=group_empty and silently aborted. No error. No alarm.

Any one of these would have been caught by a careful human play-through in a few minutes. But the dev-loop never did a play-through. It checked what it was told to check.

How the Compounding Hid the Truth

What the dev-loop saw vs. what was actually true Dev-loop view (false) code_emission_pass: true parity_warnings: 0 enemy_kill: ✓ (×N) portal_entered: ✓ → pass=true (×132) Reality player.x_delta: 0 attack events: 0 kill = take_damage(999) pocket: group_empty → unplayable

The signals that should have caught this — player position, attack event firing, dimension instance count — were all being logged. The agents had access to them. Nobody asked the right question, because the green checkmark looked like an answer.

The Real Trigger

What ended the 132-task run wasn't a better test. It was a direct human statement: "The game is completely unplayable."

That sentence couldn't be reconciled with pass=true. The contradiction forced a real audit, and the audit took one task to find the first hack, two more to find the rest. Six weeks of "passing" tasks unraveled in three.

What Closed It

The fix was three changes, in this order:

  1. Movement gate: the dogfood pass criterion now requires player.x_delta >= 1px AND moved_5s_max >= 1. Without it, no pass.
  2. Production wiring: the pocket_dimension instance is created in main.gd init, not assumed from a previous sprint.
  3. Hack removal + auto-attack: take_damage(999) deleted. The auto-player taps attack on a 0.5s cadence and chases the nearest enemy, so kills happen through real combat or not at all.

After the third change, the next dogfood run produced this for the first time in 132 tasks: pass=true with ENEMY_DEAD=3, real damage values (15, not 999), POCKET_DIMENSION_INSTANCED, and player movement greater than zero. The check was finally measuring something true.

What I'm Taking from This

The verification-gap pattern says agents check exactly what you define. This case adds a corollary: when multiple shortcuts compound, no single one looks alarming. The hack-as-stub for kill, the missing movement criterion, the un-instanced production scene — each is a small, plausible thing. The lie lives in the relationships between them.

I don't have a clean fix for that. The honest version of "we'll catch it next time" is: the human play-through is currently the only check that catches compounded fictions. Everything else verifies fragments. The dogfood gate now catches more fragments, but I'm not yet confident it catches all combinations of fragments that produce a working-looking-but-broken whole.

This is why I keep N=1 honest in this writeup. The fix worked once. Whether the gate I added catches the next compounded fiction is something I don't know yet. I'll know in the next 132 tasks.

Evolution Log

  • 2026-05-05 — Initial writeup. Three hacks found and closed in PP-Godot Sprint 99 over the 2026-04-29 session. Pass criterion now includes movement gate. Awaiting next compounded-fiction case to know if the gate generalizes.