The Brain’s Learning Algorithm Isn’t Backpropagation
Based on Artem Kirsanov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Backpropagation’s need for phase-separated computation and precise global coordination conflicts with biological constraints like continuous processing and local autonomy.
Briefing
Backpropagation’s core mechanics clash with how brains can plausibly operate—especially because it needs tightly coordinated, phase-separated computation. Predictive coding offers a different learning framework: neurons continuously generate top-down predictions, compare them to incoming activity, and use the resulting local prediction errors to update both neural activity and synaptic strengths. The payoff is a model that fits biological constraints like local autonomy and continuous (not paused) processing, while also matching known plasticity-like learning rules.
The central technical obstacle in any neural learning system is credit assignment: when many adjustable parameters (synaptic weights) influence an output, how does the system determine which parameters to change and by how much? Standard artificial neural networks solve this with automatic differentiation and the chain rule, yielding backpropagation. But backpropagation relies on two features that are biologically hard to reconcile. First, it effectively runs in separated forward and backward phases: neurons would need to “freeze” feedforward activity for hundreds of milliseconds so error signals can propagate backward without disrupting ongoing computation. Second, it requires global coordination—an orchestrated backward pass in a precise temporal order—so that each neuron’s error computation waits for downstream errors to be ready. Biological tissue instead runs as a massively parallel, locally autonomous system, with coordination mechanisms (oscillations, neuromodulators like dopamine, attention) operating at coarser scales than the cell-by-cell timing backprop demands.
Predictive coding reframes the brain’s job as prediction rather than mere stimulus processing. Higher layers form an internal model that predicts the activity of lower layers; discrepancies become prediction errors. In a hierarchical arrangement, top-down connections carry predictions downward, while bottom-up connections carry errors upward. When predictions are accurate, little extra work is needed; when they fail, error signals drive the system to revise its internal state.
To make this concrete, the framework is built as an energy-based model. Each network state is assigned an “energy” equal to the total squared prediction errors across layers. As the system evolves, it “rolls downhill” on this energy landscape: neuron activities adjust to reduce their own mismatch with predictions and to improve the predictions they help generate for the layer below. Crucially, the derived dynamics imply that each representational neuron should be inhibited by its local error signal and excited by error signals from the layer below—mirroring a circuit-level implementation.
Learning then updates synaptic weights using local information. Weight changes scale with the product of pre- and post-synaptic activities and the relevant prediction error, resembling Hebbian plasticity (“fire together, wire together”) while remaining tied to error reduction. A known complication is the weight transport problem: backprop-like rules would require symmetric, effectively shared weights for forward and backward pathways. Predictive coding mitigates this by allowing feedback and feedforward synapses to learn independently yet converge toward similar values through comparable update structure; perfect symmetry may not be necessary, especially once nonlinearities are included.
Finally, the model avoids trivial solutions by clamping sensory-driven bottom layers (and, for supervised tasks, clamping the top layer to the target label). Training proceeds via iterative relaxation toward an equilibrium that minimizes energy for each example. After learning, unclamping and running to equilibrium enables generative behavior, while clamping supports classification. The bottom line: predictive coding replaces global, phase-separated error backprop with continuous, local error-driven computation—an approach that could both explain biological learning and improve efficiency or robustness in artificial networks, potentially reducing issues like catastrophic forgetting.
Cornell Notes
Predictive coding replaces backpropagation with a biologically friendlier learning mechanism built around local prediction errors. Neurons in a hierarchy generate top-down predictions of lower-layer activity and send bottom-up error signals when predictions fail. The system is formulated as an energy minimization problem where total energy equals the sum of squared prediction errors across layers, and neuron activities update by descending the energy gradient. Synaptic weights then change using local rules tied to prediction errors, resembling Hebbian plasticity while avoiding the need for a separate backward pass. Clamping sensory (and sometimes label) layers prevents trivial zero-energy solutions and lets the network settle into an equilibrium that supports inference, classification, and generation.
Why does backpropagation struggle to map onto biological neural tissue?
What is “credit assignment,” and how does predictive coding address it differently than backprop?
How does predictive coding formalize learning as an energy minimization process?
What circuit-level neuron roles does predictive coding require?
How are synaptic weights updated, and what problem does predictive coding face with forward/backward pathways?
How does the model perform inference and learning without trivial solutions?
Review Questions
- What two biological constraints make backpropagation’s standard forward/backward phase separation difficult to reconcile with neurophysiology?
- In predictive coding, how do representational neurons and error neurons interact, and what signals drive each population?
- How do clamping choices (sensory-only vs sensory plus label) change what equilibrium corresponds to in inference, classification, and generation?
Key Points
- 1
Backpropagation’s need for phase-separated computation and precise global coordination conflicts with biological constraints like continuous processing and local autonomy.
- 2
Predictive coding treats learning as minimizing prediction error in a hierarchical model where higher layers predict lower-layer activity.
- 3
Formulating predictive coding as an energy-based model makes neuron dynamics correspond to gradient descent on total squared errors across layers.
- 4
Local circuit implementation requires explicit error units that compare actual activity to top-down predictions and feed back to modulate representational neurons.
- 5
Synaptic learning rules update weights using local products of activities and prediction errors, resembling Hebbian plasticity while remaining tied to error reduction.
- 6
The weight transport problem is mitigated because feedback and feedforward pathways can learn independently yet converge toward similar values through structurally related update rules.
- 7
Clamping sensory (and optionally label) layers prevents trivial solutions and enables equilibrium-based inference for classification and generation.