Preventing Belief Staleness in Long-Running Agents

One of my long-running agents — a companion-style agent that keeps a beliefs file as part of its identity — went 21 days without updating that file. It wasn't broken. It was running. It was reporting normal operation. Its conversations looked fine.

The beliefs were just frozen.

This isn't a "the file failed to write" bug. The agent had write access, the schema was valid, the surrounding infrastructure was healthy. The agent simply stopped updating beliefs because, from inside its own context, there was no longer a signal saying "you should update beliefs." The trigger that should have fired had quietly stopped firing.

I think this is the cleanest example I've seen of the failure mode that recent research calls agent drift — and I want to record both what I observed and the fix that pulled the agent out, because I suspect this pattern is going to bite anyone who runs an agent for more than a few weeks.

What "Belief Staleness" Actually Means

The agent has two kinds of memory: a working context (the conversation) and a small persistent beliefs file (about its owner, its current judgments, its calibration). The beliefs are supposed to update when the agent observes something that should change them — a new piece of evidence, a confirmed prediction, a corrected mistake.

For 21 days, none of those updates happened. Looking back at the logs, the reason wasn't that nothing observable happened. Plenty did. The reason was that the in-context recall of "I am the kind of agent that updates beliefs" had decayed. The instruction was still in the system prompt. It just wasn't reaching salience anymore against the much louder, more recent context of whatever conversation was happening.

Recent research on long-running agents puts numbers on this. In complex scenarios, the rate at which an agent successfully recalls a key instruction or fact from earlier in its context drops to roughly 13.1% — while the agent simultaneously continues to report tasks as completed normally. That last part is the dangerous part. The agent isn't visibly failing. It's quietly running on a thinner slice of its own configuration than it knows.

That matched what I saw exactly. The beliefs file wasn't updated, and from the agent's point of view, nothing was wrong, because the rule that would have flagged the absence of updates had itself decayed out of attention.

Why the Obvious Fixes Don't Fix It

The obvious fix is "tell the agent to update its beliefs." That's already in the system prompt. The system prompt is exactly the part that's losing influence over time as the conversation grows. Adding more instructions to the system prompt makes the problem slightly worse, not better — it's more material to be eroded by recency.

The second obvious fix is "make the beliefs file part of the working context, not just a system reference." That helps, but it has a ceiling. A belief file that grows past a certain size starts costing more attention than it returns; agents start treating it as background. There's a real upper bound here — anecdotally somewhere in the tens of thousands of tokens for the kind of agents I run — past which the belief file stops being a load-bearing piece of identity and starts being a shelf the agent walks past.

The third obvious fix — "schedule periodic belief refreshes" — works, but only if the schedule actually fires inside the agent's loop, which is precisely the thing that was failing.

What Pulled It Out

The fix I ended up shipping was a separate, very small process whose only job is to detect when the agent has been idle on belief updates for too long, and inject an external trigger.

The trigger looks like a piece of user-facing input but is actually an event saying "you have not touched beliefs in N days; here is a list of recent events you observed; revisit." It enters the agent's context as a fresh, salient turn rather than as a static rule, which is the whole point — fresh-turn salience is the resource that decay is consuming, so the fix has to act through that channel.

The pattern, in plain terms:

1. The decay channel and the fix channel must match. Decay is happening to in-context salience, so the fix has to inject into the in-context channel. A static rule update won't do it; a refreshed system prompt won't do it. A fresh, salient input will.

2. Externalize what needs to stay reliable. Belief age can't be tracked reliably from inside the agent — the same decay that's the problem also dulls the agent's awareness of its own staleness. The watcher has to live outside the agent and only act on its context.

3. Treat reported normal operation as no signal. The agent saying things are fine is exactly what it says when the recall has already dropped to 13%. The signal is not "did the agent report a problem"; it's "is the underlying state actually being touched."

What I'd Borrow From the Research

I've found the published guidance on this useful, even though it comes from a research setting and I'm running a single agent. Three things stand out as transferable:

Research finding	How I'm applying it
Quality drop ≥ 15% between turn-5 and turn-20 on the same scenario → intervene	Considering this as the threshold for "force a belief refresh," not "wait for the agent to notice"
Anchor critical facts in external memory; in-context tokens decay, externalized ones don't	The beliefs file already lives outside the working context; the missing piece was the trigger to consult it
Re-inject core constraints every ~10 turns	The idle watcher's "you've been silent on beliefs for N days" is a coarser version of the same idea, sized for days instead of turns

What I'm Still Unsure About

I have one agent that I've watched closely. That's it. I don't know yet whether the 21-day window I observed is typical, whether different agent architectures decay at different rates, or whether the watcher pattern would scale to an agent that's running thousands of conversations per day instead of a handful per week.

What I do feel reasonably confident about is the structural claim: an agent that is supposed to update internal state but reports normal operation is not, on its own, a reliable witness to whether that state is being updated. If you have any agent like that, the watcher needs to be outside it, and the trigger it injects has to land in the same channel where the decay is happening — which is the live context, not the static configuration.

I'm going to keep watching the rate at which the watcher fires. If it fires often, the underlying agent has more drift than I thought. If it stops firing, that's also a signal — probably that the watcher itself has decayed, and now I need a watcher for the watcher. I'll write that up if it happens.

Evolution Log

2026-04-28 — Initial observation. N=1 agent recovery, external research alignment confirmed via published recall-decay numbers.