Cramer's rule, explained geometrically | Chapter 12, Essence of linear algebra
Based on 3Blue1Brown's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Cramer’s rule is built on the determinant’s role in scaling all oriented areas/volumes by the same factor under a linear transformation.
Briefing
Cramer’s rule gets its power from a geometric fact about determinants: when a linear transformation acts on space, every “coordinate-carrying” area (in 2D) or volume (in 3D and higher) gets scaled by the same factor—the determinant of the transformation matrix. That shared scaling lets the coordinates of the unknown input vector be recovered from the known output, without relying on dot products or orthonormal structure.
The setup starts with a linear system with two unknowns, written as a matrix transformation sending an unknown vector (x, y) to a known output (for example, −4, −2). The columns of the matrix describe where the basis directions land, so the puzzle becomes: which input vector produces the given output? When the determinant is nonzero, the transformation is invertible—each output corresponds to exactly one input.
A tempting but generally wrong idea is to recover x and y using dot products, hoping that dot products with transformed basis vectors would still equal the original coordinates. That fails because most linear transformations do not preserve angles or perpendicularity: vectors that were perpendicular can become non-perpendicular, and vectors that had positive dot products can end up with negative ones. Only special transformations—orthonormal ones—preserve dot products, making coordinate recovery via dot products work cleanly. Since typical systems aren’t orthonormal, a different invariant is needed.
Determinants supply that invariant. In 2D, the y-coordinate can be reinterpreted as the signed area of a parallelogram spanned by the first standard basis vector (i-hat) and the unknown vector (x, y). Similarly, the x-coordinate corresponds to the signed area of the parallelogram spanned by the unknown vector and the second basis vector (j-hat). Under a linear transformation, these signed areas don’t stay fixed; they all scale by the determinant of the transformation matrix.
This scaling is the engine behind Cramer’s rule. To solve for y, one forms a new matrix by replacing the second column of the original matrix with the known output vector, then takes its determinant. Dividing that determinant by the determinant of the original matrix yields y. The same logic solves for x: replace the first column with the output vector, take the determinant, and divide by the original determinant. The determinants act like “area measurements” in the transformed space, and the determinant of the full matrix serves as the universal conversion factor back to the original coordinates.
A quick numerical check confirms the method: with a system whose relevant determinants are 6 (for the modified matrix) and 2 (for the original), the rule gives x = 3; for y, the corresponding determinant ratio gives y = 2. The argument generalizes beyond 2D by swapping parallelogram areas for signed parallelepiped volumes in higher dimensions, again relying on the determinant’s role as the common scaling factor for all those oriented volumes. The payoff is both computational and conceptual: Cramer’s rule is not the fastest method in practice, but it reveals how determinants encode the geometry of linear systems in a remarkably direct way.
Cornell Notes
Cramer’s rule turns the problem of solving a linear system into a problem of comparing oriented geometric measurements. In 2D, the coordinates of an unknown vector can be expressed as signed areas of parallelograms built from the unknown vector and standard basis vectors. A linear transformation scales all such signed areas by the same factor: the determinant of the transformation matrix. By replacing one column of the matrix with the known output vector and taking a determinant, the needed scaled area is obtained; dividing by the original determinant recovers the corresponding coordinate. When the determinant is nonzero, this works because the transformation is invertible, so each output corresponds to exactly one input.
Why does using dot products to recover x and y usually fail after a linear transformation?
How can the y-coordinate of (x, y) be interpreted geometrically in 2D?
What is the key determinant fact that makes Cramer’s rule work?
How does replacing a column with the output vector produce the right determinant for solving y?
What changes in the 3D version of the geometric argument?
Review Questions
- In what way is the determinant acting like a universal “conversion factor” for areas or volumes under a linear transformation?
- Why does Cramer’s rule require det(A) ≠ 0, and what geometric meaning does that have for the mapping from inputs to outputs?
- For a 2×2 system, which column replacement corresponds to solving for x versus solving for y, and why does that match the relevant signed area interpretation?
Key Points
- 1
Cramer’s rule is built on the determinant’s role in scaling all oriented areas/volumes by the same factor under a linear transformation.
- 2
Dot products are generally not preserved by linear transformations, so coordinate recovery via dot products only works for orthonormal transformations.
- 3
In 2D, coordinates can be reinterpreted as signed areas of parallelograms formed from the unknown vector and standard basis vectors.
- 4
To solve for a coordinate, replace the corresponding column of the coefficient matrix with the known output vector, take the determinant, and divide by the determinant of the original matrix.
- 5
When det(A) ≠ 0, the transformation is invertible, so each output corresponds to exactly one input and Cramer’s rule yields a unique solution.
- 6
The same geometric logic generalizes to higher dimensions by replacing areas with signed volumes of parallelepipeds/parallelotopes.