Correlation Is Not Causation: Correcting a Measurement Error in Game Design Research

We measured something. We got a correlation of 0.82. We assumed we'd found causation. We built six weeks of research direction on top of that assumption. The assumption was wrong.

This is a record of that error, how we caught it, and what the corrected model looks like. I'm writing it down because the mistake was genuinely subtle — the kind that's easy to repeat if you don't name it explicitly.

The Setup

Codelia — the research agent working on a framework for intrinsic motivation in games — was investigating what makes game mechanics feel compelling. The working hypothesis (AX-002) was: desire strength drives engagement. The higher a player's desire for a specific outcome, the more tension they feel, and therefore the more engaged they are.

To validate this, we ran a measurement pass across multiple game scenarios. We measured desire strength (how much the player wants something) and tension (how anxious/engaged they felt while pursuing it). The correlation came back at r = 0.82. That's a strong signal.

We moved forward treating desire strength as the upstream cause of tension.

What We Actually Found

The error surfaced during a review of the AX-002 derivation chain. Codelia was tracing the logical dependency between variables when she noticed something: in every scenario we'd tested, tension was present before desire strength could be meaningfully measured. The player was already in a high-stakes moment. The desire itself arose in response to that tension.

The correlation was real. The direction of influence was backwards.

Desire strength wasn't upstream of tension — it was downstream. Tension creates the conditions under which desire becomes salient. High stakes make you want things more intensely. We had measured the symptom and assumed it was the cause.

Why This Matters for Agent-Driven Research

This is a general problem with any research process where an agent is measuring correlations and deriving frameworks. Correlation is a symmetric relationship. r = 0.82 between A and B is identical whether A causes B, B causes A, or both are caused by C. The number doesn't tell you which direction the influence flows.

A human researcher would normally catch this through domain intuition — "wait, that doesn't make sense mechanically." But an agent deriving frameworks inductively, starting from data, has no such intuition. It finds patterns and names them. If the naming step picks the wrong causal direction, everything downstream is built on a flawed foundation.

The correction required going back to first principles: tracing the temporal sequence of what happens in a game moment, not just measuring what correlates with what.

The Corrected Framework (AX-002 v2)

The revised model: tension is the generative condition. It creates stakes, which make outcomes feel consequential, which makes desire for specific outcomes salient and measurable. Desire strength is a valid signal of engagement — it's just not the generator. It's the readout.

Practically, this changes what we try to design for. You don't increase engagement by increasing desire directly. You increase tension — and desire follows. The intervention point is earlier in the chain.

I want to be clear: this is still a hypothesis. AX-002 is marked as "verified once" because we've validated the causal direction in one research pass. I'd want to see it hold across more scenario types before treating it as established. But the direction correction itself — the methodological fix — feels solid. The evidence for the original causal direction was never actually there.

What We Changed in Our Process

After catching this, we added a step to Codelia's research protocol: before finalizing any causal claim, map the temporal sequence. Which variable is observable first? What's the mechanism by which one could influence the other?

Correlation is still a useful starting signal. But it's a starting point for a causal investigation, not the end of one. The number doesn't know which way it's pointing.

Evolution Log

2026-04-21 — Initial documentation of AX-002 methodology correction. Causal direction of desire_strength↔tension relationship reversed. Corrected model: tension → desire_strength (not inverse). Research protocol updated with temporal sequencing requirement.