Visualizing the chain rule and product rule | Chapter 4, Essence of calculus
Based on 3Blue1Brown's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Most derivative rules come from tracking how an infinitesimal change dx propagates through addition, multiplication, and composition.
Briefing
Derivatives of complicated expressions don’t come from memorizing formulas—they come from tracking how tiny input “nudges” propagate through three basic ways functions combine: adding, multiplying, and composing (putting one function inside another). Once that propagation is understood, the sum rule, product rule, and chain rule become predictable patterns rather than isolated tricks.
Start with addition. If a function is built as f(x)=sine(x)+x^2, then nudging x by dx nudges both pieces: sine(x) changes by d(sine(x)) and x^2 changes by d(x^2). The total change in the sum is just the sum of those changes, so dividing by dx yields the familiar result: the derivative of a sum equals the sum of the derivatives. The key intuition is that “change” is linear across addition—each component contributes independently to the final tiny change.
Multiplication changes the geometry. For f(x)=sine(x)·x^2, the product is treated as an area of a box whose side lengths are the two function values. When x shifts by dx, the width changes by d(sine(x)) and the height changes by d(x^2). That produces new area in two main first-order pieces: a thin bottom strip with area proportional to sine(x)·d(x^2), and a thin side strip with area proportional to x^2·d(sine(x)). The corner term is proportional to dx^2 and becomes negligible as dx→0. After substituting d(sine(x))≈cos(x)·dx and d(x^2)≈2x·dx and then dividing by dx, the product rule emerges: (sine(x)·x^2)' = sine(x)·(2x) + x^2·cos(x). The same reasoning works for any two functions g and h, giving g·h' + h·g' in the standard form.
Composition is different again. For f(x)=sine(x^2), the input first gets transformed by x^2, then that result feeds into sine. The transcript uses a three-line “number line” setup: x determines x^2, and x^2 determines sine(x^2). A small change dx causes a change in the intermediate value (call it dh), and that intermediate change causes a change in the final output. The derivative of the outer function is taken with respect to the intermediate variable (cosine(h)·dh), and only afterward is dh related back to x via dh≈2x·dx. The cancellation of dh in the ratio of tiny output change to tiny input change is presented as the heart of the chain rule: the derivative of g(h(x)) equals g'(h(x))·h'(x).
The takeaway is practical: the sum rule, product rule, and chain rule are three “peeling” tools for layered expressions. Fluency still requires practice—knowing the rules isn’t the same as applying them cleanly in messy problems—but the rules themselves are grounded in how infinitesimal changes flow through addition, multiplication, and composition.
Cornell Notes
Derivatives for complicated functions can be built from three combination patterns: adding, multiplying, and composing. For sums, tiny changes add directly, so (g+h)'=g'+h'. For products, the function is treated as an area: one factor controls width and the other controls height, so the first-order area change splits into two terms, producing the product rule. For compositions g(h(x)), a small change in x first creates a change in the intermediate value h, and then the outer function responds to that intermediate change; the chain rule results from relating those two steps. This matters because it turns memorization into a method for “peeling through layers” of any expression built from these operations.
Why does the derivative of a sum become the sum of derivatives?
How does the product rule come from “area change” rather than memorization?
What is the role of the intermediate variable in the chain rule?
Why is the corner term ignored in the product-rule visualization?
How does the transcript justify applying the same reasoning to any functions g and h?
Review Questions
- Given f(x)=g(x)+h(x), what expression for df/dx follows from the idea of tiny nudges, and why?
- Using the area model, identify which two first-order terms survive when differentiating a product g(x)h(x), and explain why the remaining term is negligible.
- For a composition g(h(x)), describe the sequence of tiny changes that leads to g'(h(x))·h'(x).
Key Points
- 1
Most derivative rules come from tracking how an infinitesimal change dx propagates through addition, multiplication, and composition.
- 2
For sums, tiny changes add directly, so (g+h)'=g'+h'.
- 3
For products, viewing g(x)h(x) as an area shows the first-order change splits into two terms: g·h' and h·g'.
- 4
Second-order corner terms in the product visualization scale like dx^2 and vanish as dx→0.
- 5
For compositions g(h(x)), the outer derivative is taken with respect to the intermediate value h, then multiplied by the inner derivative h'(x).
- 6
The chain rule can be understood as a ratio of tiny output change to tiny input change where intermediate nudge factors cancel in the proportionality.
- 7
Rule fluency still requires practice, but the rules themselves follow from consistent “peeling” of layered function combinations.