Multivariable Calculus 5 | Total Derivative [dark version]

TL;DR

Total differentiability replaces the one-variable derivative’s scalar with a linear map L that best approximates f near a point.

Briefing Cornell Notes

Briefing

Total differentiability in several variables generalizes the one-dimensional idea of “best linear approximation” from a number to a linear map. Instead of approximating a function near a point by a line, a function f: ℝⁿ → ℝᵐ is approximated near a point x̃ by a linear transformation L acting on the increment h, with an error term that becomes negligible compared with the size of h. This matters because partial derivatives alone don’t capture how all variables change together; total derivatives do.

In one dimension, differentiability at x̃ means there exists a constant B (the derivative) and a remainder term R(h) such that f(x̃ + h) = f(x̃) + B·h + R(h), where R(h)/h → 0 as h → 0. The transcript then lifts this structure to higher dimensions by replacing the scalar B·h with a linear map L(h). For f: ℝⁿ → ℝᵐ, total differentiability at x̃ requires a linear map L and a remainder function Φ(h) so that f(x̃ + h) = f(x̃) + L(h) + Φ(h), with the condition that Φ(h)/||h|| → 0 as h → 0 (||h|| is the Euclidean norm of the vector h). The remainder must shrink faster than the vector’s length, ensuring the approximation works along any path: if h approaches the zero vector through any sequence, the normalized error still vanishes.

The linear map L is the multivariable derivative, often written as DF (or sometimes D with capitalization differences depending on convention). When L is represented using matrices, the Jacobian matrix J_F is used: the Jacobian encodes the linear map so that the approximation becomes a matrix–vector multiplication. In the special case n = m = 1, the Jacobian reduces to a 1×1 matrix containing f′(x̃), showing the multivariable definition truly includes the one-dimensional derivative.

A concrete two-dimensional example makes the idea tangible. Consider a function that flips coordinates: f: ℝ² → ℝ² with f(h₁, h₂) = (h₂, h₁). At the origin, f(0,0) = (0,0), and the function is already linear in h. That means the remainder term is identically zero, and the total derivative exists with Jacobian matrix [[0, 1], [1, 0]]. So the total derivative at (0,0) is exactly the coordinate-flipping linear transformation, captured by the Jacobian.

The takeaway is that total derivatives are not just “more partial derivatives.” They are linear maps (or Jacobians) that provide a single best linear approximation to a multivariable function near a point, with a rigorously controlled error term relative to the size of the input change.

Cornell Notes

Total differentiability generalizes one-variable differentiability by replacing the derivative’s single number with a linear map. For f: ℝⁿ → ℝᵐ, total differentiability at x̃ means there exists a linear map L such that f(x̃ + h) = f(x̃) + L(h) + Φ(h), where the remainder satisfies Φ(h)/||h|| → 0 as h → 0. The condition uses the Euclidean norm of h, ensuring the error shrinks faster than the size of the input change along any sequence approaching the zero vector. When L is expressed as a matrix, it is the Jacobian matrix J_F, which turns the approximation into a matrix–vector multiplication. In one dimension, the Jacobian collapses to the usual derivative f′(x̃).

How does total differentiability mirror the one-dimensional definition of differentiability?

In one dimension, differentiability at x̃ requires f(x̃ + h) = f(x̃) + B·h + R(h), with R(h)/h → 0 as h → 0. In several variables, the scalar B·h becomes a linear map L(h), and the error term Φ(h) must satisfy Φ(h)/||h|| → 0 as h → 0. The role of “h” in the denominator is taken by the vector length ||h||, so the approximation error is controlled relative to how big the input change is.

Why does the remainder condition use ||h|| instead of dividing by h directly?

Because h is a vector, there is no single scalar “h” to divide by. The transcript uses the Euclidean norm ||h|| = sqrt(h₁² + … + hₙ²) as a scalar measure of the size of the vector increment. Requiring Φ(h)/||h|| → 0 ensures the error becomes negligible compared with the magnitude of the input perturbation.

What does it mean that the condition must hold for any sequence of vectors h → 0?

The definition is path-independent: if h approaches the zero vector through any sequence (not just along a straight line), the normalized remainder Φ(h)/||h|| still goes to zero. That guarantees the linear approximation is uniformly valid in all directions near x̃, not just along one chosen direction.

How are the multivariable derivative and the Jacobian matrix related?

The multivariable derivative is the linear map L = DF at x̃. If that linear map is represented in coordinates as a matrix, the matrix is the Jacobian J_F. With the Jacobian, the linear approximation becomes f(x̃ + h) ≈ f(x̃) + J_F · h, i.e., matrix–vector multiplication.

In the two-dimensional coordinate-flip example, why is the remainder term zero?

For f(h₁, h₂) = (h₂, h₁), the output is already linear in the input variables. At the origin, f(0,0) = (0,0), and f(h) equals the linear transformation applied to h with no extra higher-order error. That means the approximation is exact, so Φ(h) is identically zero and the Jacobian fully describes the function near (0,0).

Review Questions

State the formal condition for total differentiability of f: ℝⁿ → ℝᵐ at a point x̃, including the role of the Euclidean norm.
Explain the difference between partial derivatives and the total derivative in terms of what the linear approximation captures.
For a function f: ℝ² → ℝ² that flips coordinates, write the Jacobian matrix at (0,0) and justify why the remainder term is zero.

Key Points

1
Total differentiability replaces the one-variable derivative’s scalar with a linear map L that best approximates f near a point.
2
For f: ℝⁿ → ℝᵐ, total differentiability at x̃ requires f(x̃ + h) = f(x̃) + L(h) + Φ(h) with Φ(h)/||h|| → 0 as h → 0.
3
The denominator uses the Euclidean norm ||h|| to measure the size of a vector increment.
4
The remainder condition must hold along any sequence h → 0, ensuring the approximation works in all directions.
5
When the linear map is written in coordinates, it becomes the Jacobian matrix J_F, turning the approximation into J_F · h.
6
In the one-dimensional case, the Jacobian reduces to the usual derivative f′(x̃), showing consistency with classical calculus.

Highlights

Total differentiability is the multivariable version of “best linear approximation,” but the derivative is a linear map, not a number.

The error term must shrink faster than the input change’s length: Φ(h)/||h|| → 0.

The Jacobian matrix is the coordinate representation of the total derivative.

For the coordinate-flip function f(h₁, h₂) = (h₂, h₁), the Jacobian at (0,0) is [[0,1],[1,0]] and the remainder term is zero because the function is already linear.