Cramer's rule, explained geometrically | Chapter 12, Essence of linear algebra

TL;DR

Cramer’s rule is built on the determinant’s role in scaling all oriented areas/volumes by the same factor under a linear transformation.

Briefing Cornell Notes

Briefing

Cramer’s rule gets its power from a geometric fact about determinants: when a linear transformation acts on space, every “coordinate-carrying” area (in 2D) or volume (in 3D and higher) gets scaled by the same factor—the determinant of the transformation matrix. That shared scaling lets the coordinates of the unknown input vector be recovered from the known output, without relying on dot products or orthonormal structure.

The setup starts with a linear system with two unknowns, written as a matrix transformation sending an unknown vector (x, y) to a known output (for example, −4, −2). The columns of the matrix describe where the basis directions land, so the puzzle becomes: which input vector produces the given output? When the determinant is nonzero, the transformation is invertible—each output corresponds to exactly one input.

A tempting but generally wrong idea is to recover x and y using dot products, hoping that dot products with transformed basis vectors would still equal the original coordinates. That fails because most linear transformations do not preserve angles or perpendicularity: vectors that were perpendicular can become non-perpendicular, and vectors that had positive dot products can end up with negative ones. Only special transformations—orthonormal ones—preserve dot products, making coordinate recovery via dot products work cleanly. Since typical systems aren’t orthonormal, a different invariant is needed.

Determinants supply that invariant. In 2D, the y-coordinate can be reinterpreted as the signed area of a parallelogram spanned by the first standard basis vector (i-hat) and the unknown vector (x, y). Similarly, the x-coordinate corresponds to the signed area of the parallelogram spanned by the unknown vector and the second basis vector (j-hat). Under a linear transformation, these signed areas don’t stay fixed; they all scale by the determinant of the transformation matrix.

This scaling is the engine behind Cramer’s rule. To solve for y, one forms a new matrix by replacing the second column of the original matrix with the known output vector, then takes its determinant. Dividing that determinant by the determinant of the original matrix yields y. The same logic solves for x: replace the first column with the output vector, take the determinant, and divide by the original determinant. The determinants act like “area measurements” in the transformed space, and the determinant of the full matrix serves as the universal conversion factor back to the original coordinates.

A quick numerical check confirms the method: with a system whose relevant determinants are 6 (for the modified matrix) and 2 (for the original), the rule gives x = 3; for y, the corresponding determinant ratio gives y = 2. The argument generalizes beyond 2D by swapping parallelogram areas for signed parallelepiped volumes in higher dimensions, again relying on the determinant’s role as the common scaling factor for all those oriented volumes. The payoff is both computational and conceptual: Cramer’s rule is not the fastest method in practice, but it reveals how determinants encode the geometry of linear systems in a remarkably direct way.

Cornell Notes

Cramer’s rule turns the problem of solving a linear system into a problem of comparing oriented geometric measurements. In 2D, the coordinates of an unknown vector can be expressed as signed areas of parallelograms built from the unknown vector and standard basis vectors. A linear transformation scales all such signed areas by the same factor: the determinant of the transformation matrix. By replacing one column of the matrix with the known output vector and taking a determinant, the needed scaled area is obtained; dividing by the original determinant recovers the corresponding coordinate. When the determinant is nonzero, this works because the transformation is invertible, so each output corresponds to exactly one input.

Why does using dot products to recover x and y usually fail after a linear transformation?

Dot products are not generally preserved. A linear transformation can change angles: vectors that start perpendicular (dot product 0) can become non-perpendicular, and vectors with positive dot product can end up with negative dot product if the transformation stretches and flips directions. Only special orthonormal transformations preserve dot products, which is why dot-product-based coordinate recovery works only in that restricted case.

How can the y-coordinate of (x, y) be interpreted geometrically in 2D?

The y-coordinate can be represented as the signed area of the parallelogram spanned by the first basis vector i-hat = (1, 0) and the unknown vector (x, y). The “signed” part matters: the orientation determines whether the area is positive or negative, matching the sign of the coordinate under the chosen ordering of vectors.

What is the key determinant fact that makes Cramer’s rule work?

Under a linear transformation, all oriented areas (in 2D) or oriented volumes (in higher dimensions) scale by the same factor, namely the determinant of the transformation matrix. That means the transformed parallelogram/parallelotope measurements are all multiplied by det(A), allowing coordinates to be recovered by dividing the appropriate transformed measurement by det(A).

How does replacing a column with the output vector produce the right determinant for solving y?

To solve for y, form a new matrix where the first column stays the same as in the original matrix, but the second column is replaced by the known output vector. Taking the determinant of this altered matrix gives the transformed signed area corresponding to the y-coordinate parallelogram. Dividing by the determinant of the original matrix converts that scaled area back into the original y value.

What changes in the 3D version of the geometric argument?

Parallelogram areas become signed parallelepiped volumes. For example, a coordinate like z can be tied to the signed volume of a parallelepiped spanned by i-hat, j-hat, and the unknown vector. After transformation by a 3x3 matrix, that volume scales by det(A). Replacing the appropriate column with the output vector and taking determinants yields the coordinate via the same divide-by-det(A) principle.

Review Questions

In what way is the determinant acting like a universal “conversion factor” for areas or volumes under a linear transformation?
Why does Cramer’s rule require det(A) ≠ 0, and what geometric meaning does that have for the mapping from inputs to outputs?
For a 2×2 system, which column replacement corresponds to solving for x versus solving for y, and why does that match the relevant signed area interpretation?

Key Points

1
Cramer’s rule is built on the determinant’s role in scaling all oriented areas/volumes by the same factor under a linear transformation.
2
Dot products are generally not preserved by linear transformations, so coordinate recovery via dot products only works for orthonormal transformations.
3
In 2D, coordinates can be reinterpreted as signed areas of parallelograms formed from the unknown vector and standard basis vectors.
4
To solve for a coordinate, replace the corresponding column of the coefficient matrix with the known output vector, take the determinant, and divide by the determinant of the original matrix.
5
When det(A) ≠ 0, the transformation is invertible, so each output corresponds to exactly one input and Cramer’s rule yields a unique solution.
6
The same geometric logic generalizes to higher dimensions by replacing areas with signed volumes of parallelepipeds/parallelotopes.

Highlights

Cramer’s rule works because determinants measure how oriented geometric quantities scale under linear transformations.

Most linear transformations break the dot-product intuition; orthonormal transformations are the exception.

Coordinates can be encoded as signed areas (2D) or signed volumes (3D+), making determinants the natural tool for extracting them.

Replacing one column with the output vector creates the determinant that corresponds to the needed scaled geometric measurement.