Feedback Loops in Opinion Modeling | Danielle Ensign

TL;DR

Self-feeding feedback loops can destabilize opinion modeling by driving models toward either low-diversity collapse or high-entropy randomness.

Briefing Cornell Notes

Briefing

Feedback loops in opinion modeling can drive language models toward “collapse” or runaway randomness when their generated outputs get fed back into future training or sampling. The core finding is that iterative self-feeding—especially under temperature-based sampling—reduces diversity over time, either locking the system onto a narrow set of outputs (mode collapse) or pushing it toward near-maximum entropy behavior. That matters for AI safety and fairness because it can reshape preferences and the “opinion ecosystem” without any explicit intent, potentially amplifying technical bias (and possibly real-world bias) as models become increasingly tuned to their own quirks.

The work starts by framing why opinion modeling needs dynamics, not just static snapshots. Traditional language modeling learns from large datasets to represent a moment in time, but it misses how preferences evolve. Prior approaches include simulation-style studies (e.g., social or code ecosystems), agent-based/system-theory models that struggle with choosing the right granularity, and network-based analyses that track how data moves through opinion structures. Against that backdrop, the study targets a specific mechanism: models that output data which then becomes training or sampling input for later models.

A formal setup repeats a simple loop: train a model on some data, generate new data, use that generated data to train another model (or fine-tune), then repeat. To build intuition, the presenter analyzes a discrete “coin flip” toy model. With repeated sampling and retraining, the system can drift until it becomes stuck on one outcome—an absorbing state—because the model may repeatedly output the same token, making that token increasingly likely. Two key theoretical insights emerge: more data tends to reduce the effective step size (slowing change), and stochastic dynamics can still end in extreme collapse.

Temperature—the sampling knob that reshapes probabilities—turns out to be central. Temperature effectively pushes probabilities above 0.5 upward and below 0.5 downward, which can perpetuate existing biases and accelerate collapse. In the toy analysis, collapse corresponds to the system converging to a single path with no cycles; cycles represent competing trajectories that eventually get eliminated.

When the same question is tested on n-gram models, the expected collapse pattern appears: after many iterations, the process converges to a narrow sequence with no cycling. Transformers introduce complications because collapse is harder to measure directly, so the study uses entropy as a proxy. By generating many sentences and estimating entropy from word-by-word probabilities, the analysis finds two regimes. In one, entropy shoots toward high randomness and the model behaves like a near-uniform generator. In the other, entropy rises initially and then falls into collapse, accompanied by repetitive loops in generated text (e.g., short back-and-forth repetitions or repeated phrases). The system becomes “used to” outputs that are weirder than its original inputs, and once a loop appears, it becomes self-reinforcing.

Temperature thresholds align with the observed behavior: below roughly 1.0, collapse happens quickly; above 1.0, the system tends toward maximum entropy. The presenter also discusses mitigation and next steps, including “grounding” feedback—mixing real data with model-generated data so the random walk is biased toward a stable distribution. Questions raised in the Q&A emphasize open issues: whether entropy always increases then decreases, how grounding would behave in real language models, and how much mixing is needed when only a few feedback iterations occur.

Cornell Notes

Iterative opinion modeling that feeds a model’s own generated outputs back into later training or sampling can produce two unstable outcomes: diversity collapse or runaway randomness. In discrete toy settings, repeated self-feeding can drive the system toward absorbing states where one token dominates, and temperature accelerates this by sharpening probability mass. N-gram experiments show collapse into a single path, while transformer experiments use entropy estimates to reveal two regimes: entropy can either rise toward maximum randomness or rise then fall into collapse with repetitive cycles. Temperature below about 1.0 leads to faster collapse; above about 1.0 tends toward high-entropy behavior. The work suggests that grounding—mixing real data into the feedback loop—could stabilize the process, though it remains unvalidated for full language models.

What is the mechanism behind “collapse” in self-feeding opinion modeling?

The loop trains on data, generates new outputs, and then uses those outputs as future training/sampling input. In discrete settings, once a token starts being produced repeatedly, it becomes increasingly likely to be produced again, creating an absorbing state (e.g., all heads or all tails in the coin example). In graph terms, collapse corresponds to converging to a single path; cycles represent competing trajectories that eventually get eliminated because one direction becomes dominant.

How does temperature change the dynamics of feedback loops?

Temperature reshapes sampling probabilities: values above 0.5 get pushed upward and values below 0.5 get pushed downward. That sharpening can perpetuate technical bias and accelerate collapse by making the model more likely to keep sampling the same high-probability outputs. Empirically, transformer experiments show a threshold-like behavior: temperature below about 1.0 collapses quickly, while temperature above 1.0 tends toward maximum entropy (high randomness).

Why do transformer experiments rely on entropy rather than a direct “collapse” metric?

Unlike n-gram models, transformers make it harder to measure collapse as a simple structural convergence. The study estimates entropy by generating many sentences, computing each sentence’s probability as the product of word probabilities, and averaging to approximate the model’s entropy. Tracking entropy over feedback iterations reveals whether the system is moving toward high randomness or toward a low-entropy repetitive regime.

What two regimes appear when transformer outputs are fed back into training/sampling?

One regime shows entropy shooting up toward a near-uniform, high-randomness generator. The other shows entropy rising initially and then falling into collapse, with repetitive cycles in generated text (examples include short repeated patterns like “hi hello hi hello” and repeated phrases such as repeatedly saying “Twitter” or “ally,” and in other runs “enemy” over and over). Once a loop appears, it becomes more likely to recur because the model trains on its own outputs.

How might “grounding” mitigate feedback-loop instability?

Grounding mixes real, ground-truth data into the feedback process rather than replacing training data entirely with model samples. In small engram experiments, biasing the random walk toward the real distribution helps keep the system from drifting into collapse or extreme entropy. The presenter notes this is promising in principle for language models, but it has not yet been validated there.

What does the Q&A suggest about practical relevance and iteration counts?

The discussion highlights that real deployments likely involve only a few feedback iterations, not thousands. In that context, even a small percentage of grounding data could matter substantially because it can steer the system before it fully converges to collapse or maximum-entropy behavior.

Review Questions

In the coin-flip toy model, what causes the system to reach an absorbing state, and how does that relate to token repetition in language generation?
How does temperature influence whether feedback loops lead to entropy growth versus entropy decline in transformer experiments?
What role does grounding (mixing real data with model-generated data) play in stabilizing the feedback process, and why might it be especially important when only a few iterations occur?

Key Points

1
Self-feeding feedback loops can destabilize opinion modeling by driving models toward either low-diversity collapse or high-entropy randomness.
2
Temperature acts as a probability-sharpening mechanism that can accelerate collapse by making dominant outputs more self-reinforcing.
3
Discrete toy dynamics show how repeated sampling can create absorbing states where one token/outcome dominates.
4
N-gram models exhibit collapse into a single path with no cycles, matching theoretical expectations about eliminating competing trajectories.
5
Transformer experiments reveal two entropy regimes when generated outputs are fed back: high-entropy runaway or entropy rise followed by collapse with repetitive cycles.
6
Grounding—mixing real data into the feedback loop—can bias the system toward stable distributions and may prevent collapse, though results for full language models remain open.
7
Practical deployments likely run only a few feedback steps, so even small amounts of grounding data could have outsized effects.

Highlights

Temperature below about 1.0 leads to rapid collapse in transformer feedback loops; temperature above 1.0 pushes behavior toward maximum entropy.

Collapse in generated text shows up as self-reinforcing cycles—once a repetitive loop appears, it becomes more likely to recur after retraining/sampling.

In discrete settings, repeated self-feeding can produce absorbing states (e.g., all heads or all tails) because the model keeps outputting the same token.

Grounding real data into the loop is proposed as a stabilizer, with early evidence from small engram models and open validation for transformers.

Topics

Opinion Modeling
Feedback Loops
Temperature Sampling
Entropy Collapse
Grounding Data

Mentioned

Danielle Ensign
Jeff

Feedback Loops in Opinion Modeling | Danielle Ensign | OpenAI Scholars Demo Day 2021