Multivariable Calculus 7 | Chain, Sum and Factor rule
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Total differentiation is linear: d(f+g)(x̃)=df(x̃)+dg(x̃) and d(λf)(x̃)=λ·df(x̃) when both derivatives exist at x̃.
Briefing
Multivariable calculus keeps the same “algebra of derivatives” from one-variable calculus: total differentiation behaves linearly. If two totally differentiable vector-valued functions f and g (from ℝ^n to ℝ^m) are both totally differentiable at a fixed point x̃, then their sum f+g is also totally differentiable at x̃, and the total derivative satisfies d(f+g)(x̃)=df(x̃)+dg(x̃). Likewise, scaling by a real number λ preserves total differentiability: the function λf is totally differentiable at x̃ and its total derivative is d(λf)(x̃)=λ·df(x̃). In Jacobian terms, this becomes J_{f+g}(x̃)=J_f(x̃)+J_g(x̃) and J_{λf}(x̃)=λJ_f(x̃). The key takeaway is that addition and scalar multiplication “pull through” total differentiation because the total derivative is a linear map from ℝ^n to ℝ^m.
The next major result is the multivariable chain rule for totally differentiable maps, now formulated for possibly different dimensions at each step. Consider a composition where g maps ℝ^k to ℝ^n and f maps ℝ^n to ℝ^m, producing f∘g as a map from ℝ^k to ℝ^m. If g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃. The total derivative is obtained by composing the derivatives: the Jacobian matrix of f∘g at x̃ equals the matrix product J_f(g(x̃)) · J_g(x̃). This mirrors the one-dimensional chain rule, but replaces ordinary multiplication with composition of linear maps (or equivalently, matrix multiplication).
The proof strategy relies on the definition of total differentiability as a linear approximation plus a remainder term that becomes negligible compared with the size of the input change. First, g(x̃+h) is written as g(x̃)+dg(x̃)h plus an error term that goes to 0 faster than ||h||. Next, f is linearized around the point g(x̃), using f(g(x̃)+something)≈f(g(x̃))+df(g(x̃))·(something), again with a remainder that is small relative to the change. Substituting the linear approximation for g into the linearization for f produces the desired linear approximation for f∘g, with the derivative identified as df(g(x̃))∘dg(x̃). The remaining work is to verify that the combined remainder still satisfies the “small-o” condition required by total differentiability: after dividing by ||h||, the new error term tends to 0 as h→0. With that check, the chain rule follows.
Together, the sum rule, factor rule, and chain rule establish a consistent calculus toolkit for vector-valued functions in higher dimensions: total differentiation is linear, and compositions differentiate by composing the corresponding linear approximations. That framework sets up the next step—working through concrete examples of the chain rule in multivariable settings.
Cornell Notes
Total differentiation in multivariable calculus acts like a linear operator. If f and g are totally differentiable at x̃, then f+g is totally differentiable at x̃ and d(f+g)(x̃)=df(x̃)+dg(x̃); similarly, λf is totally differentiable at x̃ with d(λf)(x̃)=λ·df(x̃). The chain rule generalizes to compositions across different dimensions: with g:ℝ^k→ℝ^n and f:ℝ^n→ℝ^m, if g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃. Its Jacobian satisfies J_{f∘g}(x̃)=J_f(g(x̃))·J_g(x̃). The proof uses linear approximations plus remainder terms that vanish faster than ||h||.
Why do the sum rule and factor rule hold for total derivatives in multivariable calculus?
What conditions are needed to apply the multivariable chain rule for total differentiability?
How does the chain rule look when written with Jacobian matrices?
What role do the error terms play in proving the chain rule?
Why isn’t the multivariable chain rule just ordinary multiplication of derivatives?
Review Questions
- State the sum rule for total derivatives of vector-valued functions and express it both in linear-map form and Jacobian form.
- For g:ℝ^k→ℝ^n and f:ℝ^n→ℝ^m, write the Jacobian formula for J_{f∘g}(x̃) and specify where each Jacobian is evaluated.
- In the chain rule proof, what property must the remainder term satisfy as h→0 for total differentiability to hold?
Key Points
- 1
Total differentiation is linear: d(f+g)(x̃)=df(x̃)+dg(x̃) and d(λf)(x̃)=λ·df(x̃) when both derivatives exist at x̃.
- 2
For vector-valued functions f,g:ℝ^n→ℝ^m, the Jacobian of a sum equals the sum of Jacobians, and scaling a function scales its Jacobian.
- 3
The multivariable chain rule applies to compositions f∘g even when intermediate spaces have different dimensions.
- 4
If g is totally differentiable at x̃ and f is totally differentiable at g(x̃), then f∘g is totally differentiable at x̃.
- 5
The chain rule for Jacobians is J_{f∘g}(x̃)=J_f(g(x̃))·J_g(x̃), reflecting composition of linear maps.
- 6
The proof relies on substituting one linear approximation into another and verifying the combined remainder still vanishes faster than ||h||.
- 7
Total differentiability is preserved under addition, scalar multiplication, and composition, provided the required differentiability points are satisfied.