The Brain’s Learning Algorithm Isn’t Backpropagation

TL;DR

Backpropagation’s need for phase-separated computation and precise global coordination conflicts with biological constraints like continuous processing and local autonomy.

Briefing Cornell Notes

Briefing

Backpropagation’s core mechanics clash with how brains can plausibly operate—especially because it needs tightly coordinated, phase-separated computation. Predictive coding offers a different learning framework: neurons continuously generate top-down predictions, compare them to incoming activity, and use the resulting local prediction errors to update both neural activity and synaptic strengths. The payoff is a model that fits biological constraints like local autonomy and continuous (not paused) processing, while also matching known plasticity-like learning rules.

The central technical obstacle in any neural learning system is credit assignment: when many adjustable parameters (synaptic weights) influence an output, how does the system determine which parameters to change and by how much? Standard artificial neural networks solve this with automatic differentiation and the chain rule, yielding backpropagation. But backpropagation relies on two features that are biologically hard to reconcile. First, it effectively runs in separated forward and backward phases: neurons would need to “freeze” feedforward activity for hundreds of milliseconds so error signals can propagate backward without disrupting ongoing computation. Second, it requires global coordination—an orchestrated backward pass in a precise temporal order—so that each neuron’s error computation waits for downstream errors to be ready. Biological tissue instead runs as a massively parallel, locally autonomous system, with coordination mechanisms (oscillations, neuromodulators like dopamine, attention) operating at coarser scales than the cell-by-cell timing backprop demands.

Predictive coding reframes the brain’s job as prediction rather than mere stimulus processing. Higher layers form an internal model that predicts the activity of lower layers; discrepancies become prediction errors. In a hierarchical arrangement, top-down connections carry predictions downward, while bottom-up connections carry errors upward. When predictions are accurate, little extra work is needed; when they fail, error signals drive the system to revise its internal state.

To make this concrete, the framework is built as an energy-based model. Each network state is assigned an “energy” equal to the total squared prediction errors across layers. As the system evolves, it “rolls downhill” on this energy landscape: neuron activities adjust to reduce their own mismatch with predictions and to improve the predictions they help generate for the layer below. Crucially, the derived dynamics imply that each representational neuron should be inhibited by its local error signal and excited by error signals from the layer below—mirroring a circuit-level implementation.

Learning then updates synaptic weights using local information. Weight changes scale with the product of pre- and post-synaptic activities and the relevant prediction error, resembling Hebbian plasticity (“fire together, wire together”) while remaining tied to error reduction. A known complication is the weight transport problem: backprop-like rules would require symmetric, effectively shared weights for forward and backward pathways. Predictive coding mitigates this by allowing feedback and feedforward synapses to learn independently yet converge toward similar values through comparable update structure; perfect symmetry may not be necessary, especially once nonlinearities are included.

Finally, the model avoids trivial solutions by clamping sensory-driven bottom layers (and, for supervised tasks, clamping the top layer to the target label). Training proceeds via iterative relaxation toward an equilibrium that minimizes energy for each example. After learning, unclamping and running to equilibrium enables generative behavior, while clamping supports classification. The bottom line: predictive coding replaces global, phase-separated error backprop with continuous, local error-driven computation—an approach that could both explain biological learning and improve efficiency or robustness in artificial networks, potentially reducing issues like catastrophic forgetting.

Cornell Notes

Predictive coding replaces backpropagation with a biologically friendlier learning mechanism built around local prediction errors. Neurons in a hierarchy generate top-down predictions of lower-layer activity and send bottom-up error signals when predictions fail. The system is formulated as an energy minimization problem where total energy equals the sum of squared prediction errors across layers, and neuron activities update by descending the energy gradient. Synaptic weights then change using local rules tied to prediction errors, resembling Hebbian plasticity while avoiding the need for a separate backward pass. Clamping sensory (and sometimes label) layers prevents trivial zero-energy solutions and lets the network settle into an equilibrium that supports inference, classification, and generation.

Why does backpropagation struggle to map onto biological neural tissue?

Two constraints are emphasized: (1) lack of local autonomy due to global coordination, and (2) discontinuous processing due to phase separation. Backprop needs a forward pass to compute predictions, then a backward pass where neurons effectively freeze their feedforward activity while error signals propagate in a precise temporal sequence. Biological communication is slower and lacks evidence for such cell-by-cell “pause-and-update” timing. It also would require a controller-like mechanism to switch the whole network into backward mode and to ensure errors are computed in the correct order, which conflicts with the brain’s distributed, locally autonomous operation.

What is “credit assignment,” and how does predictive coding address it differently than backprop?

Credit assignment asks which parameters (e.g., synaptic weights) should change, and by how much, to improve outputs. Backprop solves it via automatic differentiation and the chain rule, propagating gradients from the output loss backward through the network. Predictive coding instead assigns credit locally: each neuron adjusts to reduce its own prediction error (difference between actual activity and predicted activity) and also to improve predictions it sends to the layer below. Learning signals are therefore local prediction errors rather than a single global output error that must be distributed through the network.

How does predictive coding formalize learning as an energy minimization process?

The model assigns an “energy” to each network state equal to the total squared prediction errors across all layers. Predictions come from higher layers via weighted sums of activities; errors measure the mismatch between a neuron’s actual activity and its predicted value. Network dynamics then “roll downhill” on this energy surface by updating neuron activities in the direction that reduces the energy. Mathematically, this corresponds to moving opposite the gradient of the energy with respect to each parameter (including neuron activities).

What circuit-level neuron roles does predictive coding require?

It requires two interacting populations per layer: representational neurons that carry predicted/encoded activity and error neurons that explicitly encode prediction errors. Representational neurons are inhibited by their corresponding error neurons and excited by error signals from the layer below. Error neurons act like comparators: they compute the difference between representational activity and the predicted value coming from the layer above, using excitatory input from partner representational neurons and inhibitory input from higher-level prediction pathways.

How are synaptic weights updated, and what problem does predictive coding face with forward/backward pathways?

Weight updates are derived so they reduce the same global energy, but the rule is local: the change in a weight connecting a presynaptic neuron to a postsynaptic neuron depends on the relevant prediction error and the product of pre- and post-synaptic activities. This resembles Hebbian plasticity in form. The weight transport problem arises because the derivation would ideally use the same synaptic weight for both forward (prediction) and backward (error-driven) influence. Predictive coding addresses this by noting that the two opposing synapses have update rules that are structurally similar (even if not identical once nonlinearities are included), so feedback and feedforward weights can converge approximately without requiring instantaneous symmetry.

How does the model perform inference and learning without trivial solutions?

If all neurons were free to move, the system could settle into a trivial zero-energy state. Instead, certain layers are clamped. The bottom sensory layer is driven by input and cannot vary freely. For supervised tasks, the topmost layer is clamped to the desired label. With these constraints, the network iteratively relaxes—updating activities and weights via local rules—until it reaches an equilibrium that encodes the input-output relationship. For generation, the output layer is unclamped while weights are frozen, and the network relaxes to produce samples consistent with the learned model.

Review Questions

What two biological constraints make backpropagation’s standard forward/backward phase separation difficult to reconcile with neurophysiology?
In predictive coding, how do representational neurons and error neurons interact, and what signals drive each population?
How do clamping choices (sensory-only vs sensory plus label) change what equilibrium corresponds to in inference, classification, and generation?

Key Points

1
Backpropagation’s need for phase-separated computation and precise global coordination conflicts with biological constraints like continuous processing and local autonomy.
2
Predictive coding treats learning as minimizing prediction error in a hierarchical model where higher layers predict lower-layer activity.
3
Formulating predictive coding as an energy-based model makes neuron dynamics correspond to gradient descent on total squared errors across layers.
4
Local circuit implementation requires explicit error units that compare actual activity to top-down predictions and feed back to modulate representational neurons.
5
Synaptic learning rules update weights using local products of activities and prediction errors, resembling Hebbian plasticity while remaining tied to error reduction.
6
The weight transport problem is mitigated because feedback and feedforward pathways can learn independently yet converge toward similar values through structurally related update rules.
7
Clamping sensory (and optionally label) layers prevents trivial solutions and enables equilibrium-based inference for classification and generation.

Highlights

Backpropagation would require neurons to effectively “pause” and coordinate backward error computation, a pattern that clashes with how biological tissue communicates and processes information.

Predictive coding replaces global output-loss gradients with local prediction-error signals computed at each layer.

The framework’s energy function equals the sum of squared prediction errors, turning inference and learning into downhill dynamics on an error landscape.

Explicit error neurons act as comparators, combining excitatory input from representational activity with inhibitory input from higher-level predictions.

Local learning rules update weights using pre/post activity products and prediction errors, potentially reducing catastrophic forgetting by preserving existing knowledge structure.

Topics

Predictive Coding
Backpropagation
Credit Assignment
Energy-Based Models
Neurophysiology
Hebbian Plasticity

Mentioned

Artem Kirsanov