A Universal Theory of Brain Function
Based on Artem Kirsanov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Perception is treated as active inference: the brain predicts causes of sensory input and tests those predictions against what it receives.
Briefing
A universal theory of brain function frames perception as an ongoing act of hypothesis testing: the brain predicts what caused incoming sensory signals, then updates those predictions to minimize a quantity called variational free energy. That approach explains why perception can feel “real” even when it conflicts with the physical stimulus—such as the classic mask illusion where a concave face is perceived as convex. The key point is that the nervous system is not a passive receiver of data; it is an active model-builder optimized by evolution to reduce uncertainty in a noisy world.
The theory starts with the survival problem brains evolved to solve. Simple organisms can react to stimuli with basic biochemistry, but complex environments deliver ambiguous, partial, and noisy information. When a retinal pattern only partially matches a learned threat like a tiger, a purely pattern-matching system may fail—potentially getting an organism killed. Brains instead infer hidden causes: they combine sensory evidence with prior knowledge to propose plausible explanations, such as a full tiger occluded by a tree.
This inference is described through a “judge on a scale” metaphor. On one side sits raw sensory input from eyes, ears, and other modalities; on the other side sits prior beliefs built through evolution and experience. The brain seeks a balance that reduces “tension” in the system, formalized as variational free energy. Explanations that fit both the data and expectations lower free energy; explanations that contradict strong priors raise it. In the tiger example, seeing a “half-tiger” pattern creates a dilemma: treating it as a novel, asymmetric creature clashes with the expectation that tigers are whole and symmetric, whereas interpreting it as a whole tiger partially hidden fits both evidence and world structure.
To make this computationally feasible, the brain uses latent (hidden) representations. Neurons directly tied to sensory inputs are paired with other neurons that encode higher-level causes—like “object occlusion” or “tiger”—without direct ground truth. A generative model maps latent causes to predicted sensory patterns, effectively acting like a decoder that can “render” what the world would look like under a given cause. Priors encode how likely different causes are in different contexts (a tiger in a city park is far less likely than a striped shirt; on a safari, the balance shifts).
Inference is handled by a recognition model that runs in the opposite direction, mapping sensory observations to a distribution over latent causes. Exact inversion of the generative model would be too expensive, so the recognition model provides an approximation—then perception can refine it through iterative back-and-forth between recognition and generation until predictions match sensory input well enough to minimize free energy.
The mask illusion becomes a direct consequence of this optimization. The retinal image suggests a concave inward face, but a deeply learned prior favors convex faces. Minimizing free energy leads the brain to preserve the prior by attributing the discrepancy to unusual lighting rather than accepting an inwardly protruding face. Even conscious knowledge doesn’t easily break the illusion because the underlying circuitry is evolutionarily conserved.
Overall, the framework unifies perception and learning: rapid inference reduces uncertainty in the moment, while slower learning tunes the recognition and generative models so future predictions and priors become more accurate. The result is a brain that continually compresses sensory data into explanations that best reconcile what is seen with what is expected—an approach that also points toward how machine learning systems might build generative, predictive models of their own.
Cornell Notes
The free energy principle portrays the brain as a prediction machine that infers hidden causes of sensory input. It balances raw evidence against prior beliefs, selecting explanations that minimize variational free energy—so “best fit” is not just about matching the eyes and ears, but also about respecting what the brain expects to be likely. The brain uses a generative model to predict sensory consequences of latent causes and a recognition model to propose those causes from observations, often iterating until predictions align. Strong priors can dominate perception, producing illusions like a concave mask being seen as convex. Over longer timescales, learning adjusts connections so both models improve, tightening the match between predictions and the world.
Why does the brain treat perception as inference rather than direct reading of sensory data?
What does “variational free energy” measure in this framework?
How do latent neurons and generative models make prediction computationally manageable?
Why is a recognition model needed, and what role does iteration play?
How do priors explain context-dependent perception (city park vs. safari)?
Why doesn’t knowing the mask is concave reliably eliminate the illusion?
Review Questions
- How does the framework distinguish between minimizing sensory mismatch and minimizing variational free energy?
- What are the functional roles of the generative model and recognition model in the recognition–generation loop?
- In the mask illusion, which prior dominates, and what alternative explanation does the brain choose to reduce free energy?
Key Points
- 1
Perception is treated as active inference: the brain predicts causes of sensory input and tests those predictions against what it receives.
- 2
Variational free energy formalizes the trade-off between sensory evidence and prior beliefs, with lower values corresponding to better explanations.
- 3
Latent neurons represent hidden causes at multiple abstraction levels, enabling compressed representations of complex sensory data.
- 4
A generative model predicts sensory consequences of latent causes, while a recognition model approximates which causes likely produced the observation.
- 5
Because exact inference is computationally impossible, the system relies on approximations and iterative recognition–generation cycles to improve explanations.
- 6
Strong priors can override sensory details, producing robust illusions such as perceiving a concave mask as convex.
- 7
Learning tunes both recognition and generative models over time so future predictions and priors better match the environment.