Less is More — Cost of Thought

In 1986, statistician and management consultant W. Edwards Deming dropped marbles through a funnel at a target on a table.¹ The marbles scattered, some left, some right, never exactly on the bullseye. Deming asked his audience of managers a simple question: what do you do about the scatter?

Every manager wanted to adjust. A marble lands to the right, so you nudge the funnel left. That feels responsible. That feels like doing your job.

Deming let them try. Then he showed them the math. The adjusters did worse. Every time.

Here’s why. The scatter was just noise, the normal wobble of a marble rolling through a funnel. The aim was already on target. When you “correct” for noise, you’re not removing the wobble. You’re adding your correction on top of it. The result is more scatter, not less.

Deming tested four strategies. The first was to leave the funnel alone, accept the scatter, trust the aim. That produced the tightest pattern. The second was to adjust after each marble, compensating for the last miss. That doubled the scatter.¹ The third and fourth strategies adjusted more aggressively, each one referencing the last result more and the original target less.

The fourth strategy produced something mathematicians call a random walk. By adjusting based on where the last marble landed relative to the funnel’s last position, the funnel wandered off in a random direction, further and further from the target, with no way back.

Rule 1 — Leave it alone —

Funnel stays fixed. Trust the aim. Click to focus

Rule 2 — Compensate —

Correct for the last miss. Doubles the scatter. Click to focus

Rule 3 — Overcorrect —

Reset to the opposite of the last landing. Click to focus

Rule 4 — Chase —

Move to where the marble landed. Random walk. Click to focus

Back to all four

The lesson seems obvious in hindsight: if your aim is already right, stop fiddling. But Deming spent years trying to convince smart, experienced managers of this, because the instinct to correct after every miss is nearly universal. Doing nothing feels irresponsible. The insight is that doing nothing is sometimes the most disciplined choice you can make.

A crucial caveat: Deming was not arguing against all correction. He distinguished between common-cause variation, noise inherent to a stable system, and special-cause variation, a real signal that something has changed.² If the funnel is genuinely misaimed, you should adjust. But you adjust once, based on the pattern across many observations, measured against the target. Not after every marble. The discipline is in knowing which situation you’re in before you act.

Now forget about marbles for a moment. Think about a thought.

You’re lying in bed the night before a big meeting. A thought appears: this is going to go badly. That thought is a prediction about the world. It might be right, it might be wrong. You’ll find out tomorrow. For now, it’s just a thought.

But you don’t leave it alone. You react to it. You get nervous. Then you notice you’re nervous, and a second thought appears: why am I so anxious about this? Now you’re not thinking about the meeting anymore. You’re thinking about your thinking. The original prediction, testable against the real world, has been replaced by a thought about a thought, which is about nothing but itself.

And it gets worse. The nervousness about the nervousness produces a third layer: what’s wrong with me that I can’t stop worrying? Each layer feels urgent. Each layer feels like it’s about something real. But each layer is one step further from anything you can actually check against experience. You’ve lost contact with the world and entered a loop that only references itself.

That’s Deming’s fourth rule applied to your own mind. Each correction references the last correction. The original target is gone.

This is, of course, an analogy, not a proof of identical mechanisms. A marble rolling through a funnel and a mind spiraling through anxiety are different systems operating under different laws. But the analogy is not merely decorative. Both systems face the same correction-policy problem: given an unwanted result, what should I do? And both exhibit the same failure mode when correction becomes self-referential, where each adjustment compounds rather than reduces the deviation.

In the 1990s, psychologist Adrian Wells built a clinical model around exactly this kind of self-referential loop.³ He called it the metacognitive model, literally thinking about thinking. His key insight was that psychological distress isn’t caused by negative thoughts themselves. It’s caused by what you do with them.

Not all thinking about thinking is harmful. Reflection after a real mistake, planning before a genuine challenge, reassessing when circumstances change: these are adaptive metacognitive strategies. Wells draws careful distinctions between adaptive and maladaptive metacognition.⁴ The problem is specifically the perseverative pattern, when the thinking about thinking stops serving any function and becomes an end in itself, when each layer of analysis references only the previous layer rather than the original situation.

Wells found this maladaptive pattern across depression, anxiety, and obsessive-compulsive disorder.⁴⁵ In OCD, the mechanism is especially vivid. A person has an intrusive thought (normal, most people have them), interprets the thought as dangerous (metacognitive belief), performs a mental or physical ritual to neutralize the danger (correction), and finds that the ritual generates more intrusive thoughts (more scatter), which demand more rituals (more correction). The thought has become the target. The person is no longer responding to reality. They’re responding to their own responses.

A meta-analysis encompassing 41 studies and over 10,000 participants found that metacognitive beliefs, specifically the belief that rumination is useful and the belief that it’s uncontrollable, are significantly associated with both rumination and depression.⁶⁷ Not the content of the original thought. The content is noise. The reaction to the noise is where suffering takes hold.

Three very different traditions arrive at a similar functional prescription. Not the same ontology or the same methods, but the same practical conclusion about what to do when thought becomes self-referential.

In Buddhist practice, the term is upekkhā, usually translated as equanimity, but more precisely meaning “to look over.” It describes the ability to see without being caught by what you see. Thoughts arise, sensations arise, emotions arise, and the practice is to observe them without engaging, without correcting, without chasing. The funnel stays aimed at the target.

In Western philosophy, David Hume drew a line between two kinds of knowledge.⁸ Relations of ideas are things true by definition, like mathematics. Matters of fact are things true only because the world happens to be a certain way. A thought about tomorrow’s meeting is a matter of fact: testable, possibly wrong, grounded in anticipated experience. A thought about that thought (“why am I so anxious?”) is still an experience, still real as an impression. But it is no longer about the meeting. Its object has shifted from the external world to the mind’s own prior output. Hume wouldn’t say such thoughts are groundless. He’d say they are a different kind of data, and the error lies in treating them as equivalent to observations about the world. The epistemological sleight of hand at the heart of rumination is precisely this: we treat our reactions as if they were observations of the thing we’re reacting to.

In clinical practice, the interventions that work are the ones that interrupt the self-referential loop. Detached mindfulness, developed by Wells, teaches patients to observe thoughts without engaging: Rule 1 for the mind.⁹¹⁰ Cognitive-behavioral therapy provides the diagnostic: is this thought a signal that something needs to change, or is it noise that will pass on its own? This maps directly onto Deming’s caveat. Sometimes the system really is misaimed, and you should adjust. The skill is in telling the difference. Exposure and Response Prevention tests Rule 1 empirically: sit with the discomfort, don’t perform the ritual, and observe that the catastrophe doesn’t arrive.

Three starting points. One convergence at the functional level: when thought becomes self-referential, the most effective intervention is often to return attention to direct experience rather than to keep correcting.

These traditions disagree on much else. They have deeply different accounts of what the mind is, what constitutes knowledge, and what liberation looks like. The convergence is narrow and specific: all three recognize the self-referential loop as a failure mode, and all three prescribe some form of non-engagement as the remedy.

Which brings us to machines.

A large language model, in a sense, has only outputs. It has no direct access to the world. It generates text token by token, and each token is influenced by all the tokens that came before it. Within a single conversation, this works well. The model holds the full context of what’s been said and can reason about it.

But conversations have limits. Context windows are finite, a constraint loosely analogous to working memory, which in humans holds roughly three to five chunks at once.¹¹ When a model reaches that limit, or when it passes work to another model, something has to be compressed. Summaries replace full conversations. Distillations replace detailed reasoning. Prior outputs carry forward in place of original inputs.

Now a familiar vulnerability appears. If the model begins referencing its own prior output more than the original question, if the summary of the conversation replaces the conversation, if the last response becomes the reference for the next response, the system can drift. Each generation step is internally coherent. Each token follows logically from the one before. But the chain is no longer grounded in the original input. It’s grounded in itself.

Hallucination in language models has many causes: training data artifacts, attention mechanism failures, probability distributions over tokens. Not all of it looks like Deming’s Rule 4. But the specific phenomenon of output drifting from the original prompt over long contexts or multi-agent handoffs does share the correction-policy structure. Each step references the last step more than the original input.

Anthropic’s research on chain-of-thought faithfulness reveals a related but distinct problem.¹² When reasoning models externalize their thinking, the visible reasoning doesn’t always reflect the actual computational process behind the answer. The model may arrive at a conclusion through one mechanism and then generate a plausible-sounding justification through another, a form of confabulation. Unfaithful chains of thought were, on average, substantially longer than faithful ones: more tokens, more elaboration, less correspondence to what actually drove the output. This is not self-referential drift in the Deming sense. It is closer to rationalization. But it shares a troubling feature: visible reasoning that has become decoupled from its ground truth.

Here is the question this leaves open.

If self-referential drift under capacity constraints is a recurring vulnerability across different systems, and the evidence suggests it is even if the mechanisms differ, can the intervention principle transfer?

A contemplative practitioner learns to notice when thought has become self-referential and to return attention to direct experience. Could a language model be given an analogous instruction? A periodic check: is the current reasoning step referencing the original question, or is it referencing its own prior output? If the latter, return to the source.

There is an obvious irony here. A metacognitive check on reasoning is itself a form of metacognition. Implementing it carelessly could create exactly the recursive processing problem it’s meant to solve, another layer of self-referential reasoning on top of the existing stack. Any such intervention would need to be designed with this circularity in mind, kept minimal, and tested empirically rather than assumed to work.

The metric wouldn’t be just accuracy. It would be something like efficiency of insight, the minimum number of tokens needed to reach a genuinely useful response. Less noise-chasing. More signal.

This is a hypothesis, not a conclusion. The brain is not a transformer. A transformer is not a brain. The mechanisms differ. But both are systems that reason under capacity constraints, and both exhibit a recurring pattern: when correction becomes self-referential, outcomes degrade. Whether the same principle, return to the source and let the noise pass through, can be formally implemented in AI reasoning and empirically shown to improve performance is a question worth testing.

The oldest contemplative traditions knew this about minds. Deming proved it about processes. The question is whether the same insight, that the smartest correction is often no correction at all, has something to teach us about the machines we are now asking to think alongside us.

References

1. Deming, W.E. (1986). Out of the Crisis, pp. 327-332. MIT Center for Advanced Engineering Study. 2. Shewhart, W.A. (1931). Economic Control of Quality of Manufactured Product. Van Nostrand. 3. Wells, A. & Matthews, G. (1994). Attention and Emotion: A Clinical Perspective. Erlbaum. 4. Wells, A. (2009). Metacognitive Therapy for Anxiety and Depression. Guilford Press. 5. Ehring, T. (2021). Thinking Too Much: Rumination and Psychopathology. World Psychiatry. PMC. 6. Papageorgiou, C. & Wells, A. (2001). Metacognitive Beliefs About Rumination in Recurrent Major Depression. Cognitive and Behavioral Practice, 8(2), 160-164. 7. Cano-López, J.B., García-Sancho, E., Fernández-Castilla, B., & Salguero, J.M. (2022). Empirical Evidence of the Metacognitive Model of Rumination and Depression: A Systematic Review and Meta-Analysis. Cognitive Therapy and Research, 46, 367-392. 8. Hume, D. (1748). An Enquiry Concerning Human Understanding. 9. Wells, A. (2005). Detached Mindfulness in Cognitive Therapy: A Metacognitive Analysis and Ten Techniques. Journal of Rational-Emotive & Cognitive-Behavior Therapy, 23, 337-355. 10. Glombiewski, J.A. et al. (2019). A Randomized Waitlist-Controlled Trial Comparing Detached Mindfulness and Cognitive Restructuring in OCD. PLOS ONE. PMC. 11. Cowan, N. (2001). The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity. Behavioral and Brain Sciences, 24(1), 87-114. 12. Chen, Y. et al. (2025). Reasoning Models Don’t Always Say What They Think. Anthropic. arXiv:2505.05410.

This essay’s hypothesis is tested in Rule 1 →