Get AI summaries of any video or article — Sign up free
Visualizing the chain rule and product rule | Chapter 4, Essence of calculus thumbnail

Visualizing the chain rule and product rule | Chapter 4, Essence of calculus

3Blue1Brown·
5 min read

Based on 3Blue1Brown's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Most derivative rules come from tracking how an infinitesimal change dx propagates through addition, multiplication, and composition.

Briefing

Derivatives of complicated expressions don’t come from memorizing formulas—they come from tracking how tiny input “nudges” propagate through three basic ways functions combine: adding, multiplying, and composing (putting one function inside another). Once that propagation is understood, the sum rule, product rule, and chain rule become predictable patterns rather than isolated tricks.

Start with addition. If a function is built as f(x)=sine(x)+x^2, then nudging x by dx nudges both pieces: sine(x) changes by d(sine(x)) and x^2 changes by d(x^2). The total change in the sum is just the sum of those changes, so dividing by dx yields the familiar result: the derivative of a sum equals the sum of the derivatives. The key intuition is that “change” is linear across addition—each component contributes independently to the final tiny change.

Multiplication changes the geometry. For f(x)=sine(x)·x^2, the product is treated as an area of a box whose side lengths are the two function values. When x shifts by dx, the width changes by d(sine(x)) and the height changes by d(x^2). That produces new area in two main first-order pieces: a thin bottom strip with area proportional to sine(x)·d(x^2), and a thin side strip with area proportional to x^2·d(sine(x)). The corner term is proportional to dx^2 and becomes negligible as dx→0. After substituting d(sine(x))≈cos(x)·dx and d(x^2)≈2x·dx and then dividing by dx, the product rule emerges: (sine(x)·x^2)' = sine(x)·(2x) + x^2·cos(x). The same reasoning works for any two functions g and h, giving g·h' + h·g' in the standard form.

Composition is different again. For f(x)=sine(x^2), the input first gets transformed by x^2, then that result feeds into sine. The transcript uses a three-line “number line” setup: x determines x^2, and x^2 determines sine(x^2). A small change dx causes a change in the intermediate value (call it dh), and that intermediate change causes a change in the final output. The derivative of the outer function is taken with respect to the intermediate variable (cosine(h)·dh), and only afterward is dh related back to x via dh≈2x·dx. The cancellation of dh in the ratio of tiny output change to tiny input change is presented as the heart of the chain rule: the derivative of g(h(x)) equals g'(h(x))·h'(x).

The takeaway is practical: the sum rule, product rule, and chain rule are three “peeling” tools for layered expressions. Fluency still requires practice—knowing the rules isn’t the same as applying them cleanly in messy problems—but the rules themselves are grounded in how infinitesimal changes flow through addition, multiplication, and composition.

Cornell Notes

Derivatives for complicated functions can be built from three combination patterns: adding, multiplying, and composing. For sums, tiny changes add directly, so (g+h)'=g'+h'. For products, the function is treated as an area: one factor controls width and the other controls height, so the first-order area change splits into two terms, producing the product rule. For compositions g(h(x)), a small change in x first creates a change in the intermediate value h, and then the outer function responds to that intermediate change; the chain rule results from relating those two steps. This matters because it turns memorization into a method for “peeling through layers” of any expression built from these operations.

Why does the derivative of a sum become the sum of derivatives?

With f(x)=g(x)+h(x), nudging x by dx produces tiny changes dg and dh in the two parts. The total tiny change in f is df=dg+dh. When dividing by dx, linearity of addition carries through: df/dx=(dg/dx)+(dh/dx), so (g+h)'=g'+h'. The transcript illustrates this using sine(x)+x^2, where the tiny change in sine contributes cosine(x)·dx and the tiny change in x^2 contributes 2x·dx, and these add.

How does the product rule come from “area change” rather than memorization?

For f(x)=g(x)h(x), the product is visualized as the area of a box with width g(x) and height h(x). A small input shift dx changes the width by dg and the height by dh. The first-order area change has two main pieces: (i) a bottom strip with area proportional to g·dh and (ii) a side strip with area proportional to h·dg. The corner piece is proportional to dg·dh, which is second-order (like dx^2) and becomes negligible as dx→0. Substituting dg≈g'·dx and dh≈h'·dx and dividing by dx yields (gh)'=g'h+gh'.

What is the role of the intermediate variable in the chain rule?

In g(h(x)), a change in x first causes a change in h(x). The transcript renames the intermediate nudge as dh to emphasize that the outer function’s derivative is naturally taken with respect to that intermediate variable. The outer change is d(g)=g'(h)·dh, and the inner change is dh=h'(x)·dx. When forming the ratio df/dx, the dh cancels in the sense that the final proportionality constant becomes g'(h(x))·h'(x). This is why the chain rule multiplies the derivative of the outer function evaluated at the inside by the derivative of the inside.

Why is the corner term ignored in the product-rule visualization?

In the box picture, the corner contribution corresponds to multiplying two tiny changes: one from width (d(sine(x)) or dg) and one from height (d(x^2) or dh). That term scales like dg·dh, which is second-order—effectively proportional to dx^2. Since derivatives are defined via the limit as dx→0, second-order terms vanish compared with first-order terms, leaving only the two linear contributions.

How does the transcript justify applying the same reasoning to any functions g and h?

The argument isn’t tied to sine or x^2. The sum rule relies only on how tiny changes distribute over addition. The product rule relies only on treating g(x)h(x) as an area where width and height each change by small amounts, producing two first-order area terms. The chain rule relies only on two-step propagation: x→h(x) then h(x)→g(h(x)). Replace sine with any outer function and x^2 with any inner function, and the same “propagate tiny nudges through layers” logic produces the corresponding rules.

Review Questions

  1. Given f(x)=g(x)+h(x), what expression for df/dx follows from the idea of tiny nudges, and why?
  2. Using the area model, identify which two first-order terms survive when differentiating a product g(x)h(x), and explain why the remaining term is negligible.
  3. For a composition g(h(x)), describe the sequence of tiny changes that leads to g'(h(x))·h'(x).

Key Points

  1. 1

    Most derivative rules come from tracking how an infinitesimal change dx propagates through addition, multiplication, and composition.

  2. 2

    For sums, tiny changes add directly, so (g+h)'=g'+h'.

  3. 3

    For products, viewing g(x)h(x) as an area shows the first-order change splits into two terms: g·h' and h·g'.

  4. 4

    Second-order corner terms in the product visualization scale like dx^2 and vanish as dx→0.

  5. 5

    For compositions g(h(x)), the outer derivative is taken with respect to the intermediate value h, then multiplied by the inner derivative h'(x).

  6. 6

    The chain rule can be understood as a ratio of tiny output change to tiny input change where intermediate nudge factors cancel in the proportionality.

  7. 7

    Rule fluency still requires practice, but the rules themselves follow from consistent “peeling” of layered function combinations.

Highlights

The sum rule is essentially linearity of tiny changes: nudging x nudges each addend, and the total change is the sum of those contributions.
The product rule emerges from first-order area changes in a box: width changes and height changes create two surviving strips, while the corner term is second-order and disappears.
The chain rule is a two-step propagation story: x changes h(x), and then h(x) changes g(h(x)); multiplying the outer sensitivity by the inner sensitivity produces the final derivative.

Topics