Reflexivity and the Dark Room

Part II · The Recurring Pattern — Chapter 8

This is the fragile assumption (A5) made into a chapter, because it is the design's central theoretical risk — and the place where the math delivers its most reassuring surprise.

The problem: a market that believes its own lies

The truth machine pays forecasters for predicting resolver consensus. But what if resolvers, when deciding their verdicts, look at the published forecast? Then the forecast is partly predicting itself. Push that far enough and the system can lock onto a self-fulfilling fiction: a belief everyone holds because everyone holds it, decoupled from reality.

The Free Energy Principle has a name for this failure: the dark room. An agent can trivially minimize "surprise" by sitting in a dark, predictable room doing nothing — perfectly predicting a trivial world. For us the dark room appears at three scales as one failure:

the truth machine's stable self-fulfilling fiction = governance's captured commons = the social echo chamber.

The reassuring result

Model a resolver's verdict as a mix: a fraction $λ$ driven by deference to the published forecast $p$ , and the rest by their own private signal. Forecasters then publish the fixed point $p^{*} = λ g (p^{*}) + (1 - λ) π$ , where $π$ is the honest posterior and $g$ is how resolvers respond to the forecast.

If deference is linear ( $g (p) = p$ ) and $λ < 1$ , the fixed point is exactly $p^{*} = π$ . The influence passes straight through and cancels; the forecast remains the honest posterior. Reflexivity does not bias the forecast at all in the linear regime.

So mere influence is harmless. The beauty-contest pathology requires non-linearity — specifically, a conformity response strong enough that the deference slope

$L = λ \cdot sup ∣ g^{'} ∣$

exceeds 1. Below $L = 1$ the honest belief is the unique outcome; above it, the system bifurcates and fiction equilibria appear. The whole reflexivity question collapses to one measurable scalar and one safety condition: $L < 1$ .

Three levers that control $L$

Blind resolution (information design). Don't show the current consensus to a resolver while they record a verdict. You can't make $λ = 0$ (knowledge leaks), but you remove the highest-bandwidth channel from forecast to verdict. A UI choice with a theorem behind it.
Reality coupling (incentive design). Pay resolvers a small bonus on their own verdicts scored against later-arriving ground truth. This rewards using the private signal, directly lowering $λ$ . At the largest scale, the reality-coupling mechanism is exit — a self-deluding network loses members to one that isn't deluded (The Living System).
Epistemic value (reward routing). Steer reward budget toward questions where resolving uncertainty matters, so optimization power flows to live questions instead of comfortable ones — without touching the scoring rule's properness.

⚠️ Honest caveat. $L < 1$ is measurable and controllable, but the shape of the human deference curve $g$ in this setting is unknown until a real pilot measures it — and the coupled, whole-system stability proof is still open.

📐 Formal version: The Mathematical Core §2 — the convergence propositions, the bifurcation, continuous degradation below threshold, and perturbation audits that estimate $L$ directly. The closely related causal version of this trap is in Futarchy and Causality.

Dither Ideas

Reflexivity and the Dark Room

The problem: a market that believes its own lies

The reassuring result

Three levers that control L

Three levers that control $L$