Multivariable Calculus 7 | Chain, Sum and Factor rule

TL;DR

Total differentiation is linear: d(f+g)(x̃)=df(x̃)+dg(x̃) and d(λf)(x̃)=λ·df(x̃) when both derivatives exist at x̃.

Briefing Cornell Notes

Briefing

Multivariable calculus keeps the same “algebra of derivatives” from one-variable calculus: total differentiation behaves linearly. If two totally differentiable vector-valued functions f and g (from ℝ^n to ℝ^m) are both totally differentiable at a fixed point x̃, then their sum f+g is also totally differentiable at x̃, and the total derivative satisfies d(f+g)(x̃)=df(x̃)+dg(x̃). Likewise, scaling by a real number λ preserves total differentiability: the function λf is totally differentiable at x̃ and its total derivative is d(λf)(x̃)=λ·df(x̃). In Jacobian terms, this becomes J_{f+g}(x̃)=J_f(x̃)+J_g(x̃) and J_{λf}(x̃)=λJ_f(x̃). The key takeaway is that addition and scalar multiplication “pull through” total differentiation because the total derivative is a linear map from ℝ^n to ℝ^m.

The next major result is the multivariable chain rule for totally differentiable maps, now formulated for possibly different dimensions at each step. Consider a composition where g maps ℝ^k to ℝ^n and f maps ℝ^n to ℝ^m, producing f∘g as a map from ℝ^k to ℝ^m. If g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃. The total derivative is obtained by composing the derivatives: the Jacobian matrix of f∘g at x̃ equals the matrix product J_f(g(x̃)) · J_g(x̃). This mirrors the one-dimensional chain rule, but replaces ordinary multiplication with composition of linear maps (or equivalently, matrix multiplication).

The proof strategy relies on the definition of total differentiability as a linear approximation plus a remainder term that becomes negligible compared with the size of the input change. First, g(x̃+h) is written as g(x̃)+dg(x̃)h plus an error term that goes to 0 faster than ||h||. Next, f is linearized around the point g(x̃), using f(g(x̃)+something)≈f(g(x̃))+df(g(x̃))·(something), again with a remainder that is small relative to the change. Substituting the linear approximation for g into the linearization for f produces the desired linear approximation for f∘g, with the derivative identified as df(g(x̃))∘dg(x̃). The remaining work is to verify that the combined remainder still satisfies the “small-o” condition required by total differentiability: after dividing by ||h||, the new error term tends to 0 as h→0. With that check, the chain rule follows.

Together, the sum rule, factor rule, and chain rule establish a consistent calculus toolkit for vector-valued functions in higher dimensions: total differentiation is linear, and compositions differentiate by composing the corresponding linear approximations. That framework sets up the next step—working through concrete examples of the chain rule in multivariable settings.

Cornell Notes

Total differentiation in multivariable calculus acts like a linear operator. If f and g are totally differentiable at x̃, then f+g is totally differentiable at x̃ and d(f+g)(x̃)=df(x̃)+dg(x̃); similarly, λf is totally differentiable at x̃ with d(λf)(x̃)=λ·df(x̃). The chain rule generalizes to compositions across different dimensions: with g:ℝ^k→ℝ^n and f:ℝ^n→ℝ^m, if g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃. Its Jacobian satisfies J_{f∘g}(x̃)=J_f(g(x̃))·J_g(x̃). The proof uses linear approximations plus remainder terms that vanish faster than ||h||.

Why do the sum rule and factor rule hold for total derivatives in multivariable calculus?

Because the total derivative is a linear map from ℝ^n to ℝ^m. If df(x̃) and dg(x̃) are the linear maps giving the best linear approximations to f and g near x̃, then the linear approximation for f+g is just the sum of those linear maps, and the approximation for λf is λ times the original. In Jacobian form, this becomes J_{f+g}(x̃)=J_f(x̃)+J_g(x̃) and J_{λf}(x̃)=λJ_f(x̃).

What conditions are needed to apply the multivariable chain rule for total differentiability?

The composition f∘g must be built from two functions with compatible differentiability points. Specifically, g:ℝ^k→ℝ^n must be totally differentiable at x̃, and f:ℝ^n→ℝ^m must be totally differentiable at the point g(x̃). Under those assumptions, f∘g is guaranteed to be totally differentiable at x̃.

How does the chain rule look when written with Jacobian matrices?

For g:ℝ^k→ℝ^n and f:ℝ^n→ℝ^m, the Jacobian of the composition at x̃ is the matrix product J_{f∘g}(x̃)=J_f(g(x̃))·J_g(x̃). This matches the idea that composing linear maps corresponds to multiplying their matrices.

What role do the error terms play in proving the chain rule?

Total differentiability is defined via a linear approximation plus a remainder that becomes negligible compared with ||h||. The proof linearizes g near x̃, producing an error term that shrinks appropriately. Then it linearizes f near g(x̃) and substitutes the approximation for g into f. The final step is checking that the combined remainder for f∘g still goes to 0 after division by ||h|| as h→0.

Why isn’t the multivariable chain rule just ordinary multiplication of derivatives?

Because derivatives here are linear maps (or Jacobian matrices), not scalars. When dimensions change across the composition, the correct operation is composition of linear maps, which translates to matrix multiplication in Jacobian form.

Review Questions

State the sum rule for total derivatives of vector-valued functions and express it both in linear-map form and Jacobian form.
For g:ℝ^k→ℝ^n and f:ℝ^n→ℝ^m, write the Jacobian formula for J_{f∘g}(x̃) and specify where each Jacobian is evaluated.
In the chain rule proof, what property must the remainder term satisfy as h→0 for total differentiability to hold?

Key Points

1
Total differentiation is linear: d(f+g)(x̃)=df(x̃)+dg(x̃) and d(λf)(x̃)=λ·df(x̃) when both derivatives exist at x̃.
2
For vector-valued functions f,g:ℝ^n→ℝ^m, the Jacobian of a sum equals the sum of Jacobians, and scaling a function scales its Jacobian.
3
The multivariable chain rule applies to compositions f∘g even when intermediate spaces have different dimensions.
4
If g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃.
5
The chain rule for Jacobians is J_{f∘g}(x̃)=J_f(g(x̃))·J_g(x̃), reflecting composition of linear maps.
6
The proof relies on substituting one linear approximation into another and verifying the combined remainder still vanishes faster than ||h||.
7
Total differentiability is preserved under addition, scalar multiplication, and composition, provided the required differentiability points are satisfied.

Highlights

Total derivatives behave like linear maps: addition and scalar factors pass straight through differentiation.

The chain rule in multivariable calculus replaces scalar multiplication with composition of linear maps (matrix multiplication).

For f∘g, the Jacobian at x̃ is the product J_f(g(x̃))·J_g(x̃).

The chain rule proof hinges on controlling remainder terms so the error still becomes negligible compared with ||h||. 

Topics

Total Differentiability
Sum Rule
Factor Rule
Chain Rule
Jacobian Matrices