Matrix multiplication as composition | Chapter 4, Essence of linear algebra

TL;DR

A linear transformation in 2D is fully determined by the images of i-hat and j-hat, because any vector is a linear combination of them.

Briefing Cornell Notes

Briefing

Matrix multiplication isn’t just a computational trick—it’s a compact way to represent composing linear transformations. A linear transformation is determined entirely by where it sends the basis vectors i-hat and j-hat, because any vector (x, y) can be written as x·i-hat + y·j-hat. Once the transformed positions of i-hat and j-hat are known, the transformed landing spot of any vector follows automatically: x times the transformed i-hat plus y times the transformed j-hat. Recording those transformed basis-vector coordinates as the columns of a matrix turns that geometric rule into matrix-vector multiplication, which is the computational meaning of “apply the transformation.”

That framework becomes especially powerful when two transformations must be applied in sequence. If one matrix represents a rotation and another represents a shear, applying them one after the other produces a new linear transformation with its own matrix. The key construction is column-by-column: the first column of the composition matrix is what you get by sending i-hat through the right-hand matrix first, then through the left-hand matrix; the second column comes from the same process for j-hat. This directly matches the rule for matrix multiplication: multiplying two matrices corresponds to applying the right matrix first, then the left matrix. That right-to-left order can feel unintuitive at first, but it’s consistent with function composition—variables are written on the right, so the “later” transformation appears on the right.

A concrete example makes the mechanics clear. One matrix, M1, has columns (1, 1) and (-2, 0), and another, M2, has columns (0, 1) and (2, 0). To build the matrix for “apply M1 then M2,” the first column comes from applying M2 to the vector where i-hat lands after M1: M1 sends i-hat to (1, 1), and then M2 sends (1, 1) to (2, 1). The second column comes from applying M2 to where j-hat lands after M1: M1 sends j-hat to (-2, 0), and M2 sends that to (0, -2). The resulting composition matrix captures the combined effect as a single transformation.

Thinking in terms of transformations also resolves two classic algebraic questions about matrix multiplication. Order matters: composing a shear with a 90-degree rotation in one order produces a different final placement of i-hat and j-hat than composing them in the opposite order, so the product depends on which matrix comes first. Meanwhile, associativity holds: for three matrices A, B, and C, the placement of parentheses doesn’t change the outcome because the same sequence of transformations is applied either way—first C, then B, then A. Numerically proving associativity can be messy, but conceptually it’s immediate: both parenthesizations apply the identical chain of operations in the same order. The takeaway is that matrix multiplication’s algebraic properties reflect the geometry of composing linear maps, not just memorized arithmetic rules.

Cornell Notes

Matrix multiplication provides a way to combine linear transformations. A linear transformation is fixed by where it sends the basis vectors i-hat and j-hat, so a matrix’s columns record those images. When two matrices are multiplied, the result represents applying the right-hand transformation first and then the left-hand one; the composition matrix’s columns are found by tracking where i-hat and j-hat land through both steps. This viewpoint makes two key properties intuitive: swapping the order of matrices changes the final transformation, while associativity holds because both parenthesizations apply the same three transformations in the same order. The practical payoff is that multiplication becomes a geometric “apply one map after another” operation rather than a rote algorithm.

Why does a matrix representing a linear transformation only need to know what happens to i-hat and j-hat?

In two dimensions, any vector (x, y) can be written as x·i-hat + y·j-hat. Linearity means the transformation of x·i-hat + y·j-hat equals x times the transformed i-hat plus y times the transformed j-hat. So once the images of i-hat and j-hat are known, the image of every vector follows automatically.

How do you construct the matrix for a composition of two transformations from their matrices?

Take the composition matrix column-by-column. The first column comes from where i-hat ends up after applying the right matrix first and then the left matrix. Concretely: if the right matrix sends i-hat to some vector v, then the left matrix sends v to the final location; that final location becomes the first column. The second column is the same process for j-hat.

What does the “right-to-left” order in matrix multiplication mean geometrically?

Multiplying matrices corresponds to function composition: the transformation on the right acts first. So for a product L·R, a vector is first transformed by R, and then the result is transformed by L. This matches the idea that the columns of the product record where basis vectors land after both steps.

Does matrix multiplication depend on order? Use the transformation viewpoint to justify the answer.

Yes. A shear and a 90-degree rotation do not commute. Doing shear then rotate sends i-hat and j-hat to one pair of locations, while rotating then shearing sends them to different locations (i-hat lands at (0, 1) vs (1, 1), and j-hat lands at (-1, 1) vs (-1, 0) in the described examples). Different final images mean the overall linear transformation differs, so the matrix product changes when the order changes.

Why is matrix multiplication associative, and why is that easier to see with transformations than with raw arithmetic?

Associativity means (A·B)·C = A·(B·C). With transformations, both sides apply the same three maps in the same order to any vector: first C, then B, then A. Since the sequence of operations is identical, the final result must match. Numerical verification can be tedious, but the transformation chain makes the property immediate.

Review Questions

Given two matrices, how would you determine the first column of their product using only basis-vector tracking?
What geometric evidence shows that matrix multiplication is not commutative?
Explain associativity of matrix multiplication in terms of applying transformations in sequence.

Key Points

1
A linear transformation in 2D is fully determined by the images of i-hat and j-hat, because any vector is a linear combination of them.
2
Placing the transformed coordinates of i-hat and j-hat into the columns of a matrix makes matrix-vector multiplication equal to applying the transformation.
3
The product L·R represents applying R first and then L, which is why multiplication reads right-to-left.
4
The composition matrix is built column-by-column: multiply the left matrix by each column of the right matrix to get the corresponding column of the product.
5
Matrix multiplication is order-dependent: swapping two transformations generally changes where basis vectors end up.
6
Matrix multiplication is associative because both parenthesizations apply the same sequence of transformations to any vector.

Highlights

Matrix multiplication encodes composition: multiplying matrices corresponds to applying one linear transformation after another.

The first column of a product comes from tracking i-hat through the right matrix and then the left matrix; the second column comes from the same process for j-hat.

Order matters: a shear followed by a 90-degree rotation produces a different result than the reverse order.

Associativity becomes obvious when viewed as applying the same three transformations in the same order, regardless of parentheses.

Topics

Linear Transformations
Matrix Composition
Basis Vectors
Matrix Multiplication Order
Associativity