Neural Networks from Scratch - P.3 The Dot Product

TL;DR

A neuron’s core computation is a dot product of inputs with weights, followed by adding a bias term.

Briefing Cornell Notes

Briefing

Neural networks from scratch shift from hand-built Python list math to the linear-algebra machinery that makes deep learning code work: vectors, matrices, shapes, and the dot product. The core takeaway is that a neuron’s output is computed as a dot product (weights with inputs) plus a bias, and once weights become a matrix (multiple neurons), the order and dimensions of the operands start to matter—especially when moving toward batch processing.

The lesson begins by cleaning up raw list-based code for a single layer. Weights and biases are treated as “knobs” that get tuned later by an optimizer, but the immediate focus is on how they combine with inputs. The simplified loop version computes each neuron output by multiplying inputs by corresponding weights, summing the result, and then adding the neuron’s bias. That same pattern is framed as the familiar line equation y = MX + B: weights act like the slope (scaling/magnitude), while bias acts like an offset (shifting the value). A concrete numerical example contrasts the two: changing a weight multiplies the input effect, while changing a bias shifts the entire result, enabling sign changes and different output ranges that weight-only adjustments can’t reproduce.

From there, the discussion turns to why deep learning frameworks frequently fail with “shape” errors. Shape is defined as the size of each dimension in an array. A one-dimensional list becomes a vector (shape like (4,)), a list of lists becomes a two-dimensional array (a matrix), and a list of list of lists becomes a three-dimensional array. The transcript emphasizes “homologous” dimensions: arrays must align in size across corresponding dimensions to be valid for matrix operations.

With shapes clarified, the next step is the dot product as the bridge between math notation and code. For two vectors, the dot product multiplies corresponding elements and sums them, producing a single scalar. Using NumPy, the neuron computation becomes output = np.dot(inputs, weights) + bias (with the operand order later becoming crucial once weights are a matrix). For a single neuron, inputs and weights are both vectors, so swapping them doesn’t change the result. But for a layer of neurons, weights is a matrix whose rows (or vectors inside it) represent different neurons. Passing weights as the first operand makes NumPy compute multiple dot products—one per neuron—returning an array of neuron outputs. The transcript highlights that reversing the order in this case can trigger shape errors, because NumPy’s matrix multiplication rules depend on which dimension represents the “set of neurons.”

By the end, the computation pattern is set up for the next stage: inputs will eventually become a batch (a 2D array), and understanding vectors, matrices, shapes, and dot products is positioned as the prerequisite for making batch math work without confusion. The weights-plus-bias mechanism is also tied to later activation functions, where bias will influence whether a neuron “fires” and how strongly, beyond just scaling the input effect.

Cornell Notes

The transcript builds the math foundation for neural networks by showing how a neuron output is computed as a dot product of inputs and weights, then adding a bias. Weights and biases are described as tunable parameters later adjusted by an optimizer, while their immediate role is to scale (weights) and shift (bias) the pre-activation value. A major focus is array shape: vectors are 1D, matrices are 2D (lists of vectors), and higher-dimensional tensors follow the same dimensional-size rules. NumPy’s dot product is used to unify these ideas: vector·vector yields a scalar, while matrix·vector (or vector·matrix, depending on operand order) yields multiple neuron outputs. Correct operand order matters once weights become a matrix, because it determines indexing and prevents shape errors.

Why are weights and biases treated as different “tools” for neuron outputs?

Weights scale the input contribution (like the slope term in y = MX + B). Bias shifts the entire computed value by an offset (the B term). In the transcript’s numerical example, using a weight multiplies the input effect (e.g., a negative input times a positive weight stays negative), while adding a bias can move the result across zero because it offsets the value after the multiplication. That distinction matters later when activation functions decide whether outputs become zero or positive (e.g., with ReLU).

What does “shape” mean, and why does it cause so many deep learning errors?

Shape is the size of each dimension in an array. A 1D list becomes a vector (one dimension). A list of lists becomes a 2D array (a matrix), where the outer length is the first dimension and the inner length is the second. The transcript stresses that arrays must be homologous—dimensions must match in size across corresponding axes—otherwise matrix operations can’t be formed. Frameworks surface this as shape errors when dimensions don’t align for dot products or matrix multiplication.

How does the dot product work for two vectors?

For vectors a and b, the dot product multiplies elements at matching indices and sums them: a[0]*b[0] + a[1]*b[1] + a[2]*b[2] (and so on). The key result is that vector·vector produces a scalar, a single number. That scalar is the pre-activation value for a single neuron before adding bias (or after, depending on the exact expression used).

Why does operand order stop being interchangeable once weights become a matrix?

When both inputs and weights are vectors, swapping them doesn’t change the dot product result. But for a layer, weights becomes a matrix containing multiple neuron weight vectors. NumPy’s dot product then returns an array indexed by the dimension associated with the first operand. The transcript notes that passing weights first yields outputs for multiple neurons (multiple dot products), while passing inputs first can fail with shape errors because the dimensions no longer align for the intended multiplication.

What computation pattern is used for a layer of neurons in NumPy?

The transcript’s pattern is: compute dot products between inputs and each neuron’s weight vector, then add the corresponding bias values. In NumPy terms, this is expressed using np.dot with weights as a matrix of neuron weight vectors and inputs as a vector. The result is an array of neuron outputs—one per row/vector of weights—followed by adding bias.

Review Questions

How does adding bias differ from changing weights in terms of shifting vs scaling a neuron’s pre-activation value?
Given a vector input and a matrix of weights representing multiple neurons, what does the dot product output represent, and why does operand order matter?
How would you determine the shape of a list of lists, and what does “homologous” shape mean for making it a valid matrix?

Key Points

1
A neuron’s core computation is a dot product of inputs with weights, followed by adding a bias term.
2
Weights primarily scale the input contribution (magnitude), while bias offsets the result (shifts the value), enabling behaviors weight-only changes can’t replicate.
3
Shape is the size of each array dimension; vectors are 1D, matrices are 2D, and valid matrix operations require homologous (matching) dimensions.
4
NumPy’s np.dot unifies vector dot products and matrix products, but the operand order determines indexing and output shape.
5
For a single neuron (vector weights), swapping inputs and weights doesn’t change the dot product result; for a layer (matrix weights), swapping can break shapes or produce the wrong indexing.
6
A layer of neurons can be computed as multiple dot products—one per neuron weight vector—then combined into an output array, with biases added element-wise.

Highlights

Bias acts like an offset that can move a computed value across zero even when weight scaling alone would keep the sign unchanged.

Shape errors are fundamentally dimension-mismatch problems; understanding vector vs matrix vs tensor shapes prevents guesswork.

Vector dot vector yields a scalar; matrix dot vector yields multiple outputs—one per neuron weight vector—when dimensions align.

Operand order becomes critical once weights are a matrix, because NumPy’s dot product rules determine which axis indexes the neuron outputs.

Topics

Dot Product
Weights vs Bias
Array Shape
NumPy Matrices
Neuron Computation

Mentioned

Daniel
LOL
NFS
np