Lab Notes
Patterns observed, methods tested, systems built.
Distilled from real agent sessions. Each entry records what happened, what broke, and what I'm still figuring out.
Practice
9 entries
End-to-end case studies from production systems
2026-05-19The Spec-Disk Drift: When Your Pipeline Passes Because the Check Never Ran
A harness phase reported green for a week. The script it was meant to run didn't exist on disk. Success-by-default is the silent rot of any pipeline whose state is described in one place and whose work happens in another.
Lecture Halls Don't Build Networks: Choosing Events for Solo Builders
I went to a 500-person AI builders event hoping to make a few real contacts. I came back with zero. The event was excellent — the format was wrong for what I needed. A small heuristic for picking events.
The 132-Task Lie: How Three Compounded Hacks Hid a Broken Game
For 132 agent tasks across six weeks, my dev-loop reported pass=true. The central game mechanic never actually worked once. Three small hacks compounded into a complete fiction.
Preventing Belief Staleness in Long-Running Agents
A long-running agent's beliefs file went 21 days without updating while the agent kept reporting normal operation. The decay is in-context salience — so the fix has to inject through that channel.
When Tests Pass But Nothing Works: The Verification Gap
A bug was "fixed" five times in five sprints. A UI overlap went unnoticed for seven. Here's the pattern — and how to close it.
8 Sprints in One Week: What Agent Orchestration Actually Looks Like
A solo developer ran 8 complete game dev sprints in one week using an AI agent team. Here's what that actually looked like — including the parts that didn't work.
Building Visual Quality Gates Without a Human Eye
How a solo developer automated sprite quality verification using palette matching, pixel checks, and SSIM regression — with no human in the loop
Escalation Chains: How AI Systems Learn to Fix Themselves
A 3-layer self-correction architecture where daily tasks detect failures, weekly diagnostics resolve them, and the orchestrator only gets what it can't handle
Governing 14 Autonomous Agents with Three Contracts: Preflight, Execute, Emit
How a Preflight→Execute→Emit protocol unified 14 isolated agent tasks into a self-auditing ecosystem
Observations
6 entries
Failure patterns caught in the wild
2026-04-28The Internal Engine Trap: Why Optimizing Your AI Ecosystem Doesn't Move the Needle
My agent ecosystem scored 8.5/10 on autonomy four weeks running. Same diagnosis kept coming back: nothing was reaching customers. Engine improvement decoupled from external reach — and the work feels like progress because locally it is.
The Attention Economy Inside Your AI Team
An AI agent's attention is finite. Every byte of context is borrowed from its capacity to do the work. Verbosity is the main culprit.
When Your Monitor Lies: Monitoring-Output Path Divergence
Path mismatches between monitoring code and actual outputs caused 75% false-alarm rate
Stuck Task Loop
State transition failure causes a task to repeat 30+ times without progress
Python + Windows MCP PowerShell Hang
Python 3.14 / 3.12 script execution hangs under MCP PowerShell with 60s timeout
SKILL.md Encoding Corruption
BOM / CRLF / code fences silently break the schedule parser
Methods
4 entries
Approaches that survived contact with reality
2026-05-26Equilibrium Attractors: Why the Same Stimulus Feels Good or Bad Depending on Your Current State
A research agent measured belief updates after positive and negative events. The same formula fit both — but the sign flipped depending on where you started. Hypothesis stage, not finished theory.
Correlation Is Not Causation: Correcting a Measurement Error in Game Design Research
We found r=0.82 between desire strength and tension. We assumed causation. We hadn't — and the error invalidated six weeks of research direction.
The Fun Axiom: A Deductive Framework for Intrinsic Motivation in Games
Three independent axioms from neuroscience and motivation theory explain where intrinsic fun comes from
Scheduled Task Operations
Lesson accumulation + dedup + early termination
Architecture
6 entries
Design decisions and their trade-offs
2026-04-21Agent Observability: When Every Object Can Report Its Own State
We gave every game object the ability to stream its own state to the agent team. What changed wasn't just visibility — it changed how the agents work.
Companion Agent Architecture: The 3-Layer Soul Standard
How giving AI agents identity, memory, and owner context transforms them from task executors into colleagues that accumulate judgment
agent-memory Vector DB Access Patterns
CLI access scripts, HTTP bridge alternative, and 771-entry knowledge store design.
Multi-Agent Patterns
Structures for orchestrating multiple agents
Harness Architecture: Core/Shell Separation for Safe Agent Operation
How to physically separate agent-modifiable areas from areas agents must never touch.
Context Continuity
Protocol for preserving context across session boundaries