Multivariable Calculus 5 | Total Derivative

TL;DR

Total differentiability generalizes the single-variable idea of approximating a function near a point by a linear model.

Briefing Cornell Notes

Briefing

Total differentiability in several variables generalizes the familiar “best linear approximation” from single-variable calculus. Instead of approximating a function near a point by a line with slope, multivariable calculus approximates it near a point by a linear map that takes small input changes (vectors) to output changes. The key requirement is that the difference between the function’s actual change and the linear prediction becomes negligible compared with the size of the input change as the input vector approaches the zero vector.

In one dimension, differentiability at a point x₀ means there exists a number B and a remainder term R(h) such that f(x₀+h) = f(x₀) + B·h + R(h), with the remainder shrinking faster than h as h → 0. The multivariable version keeps the same structure but replaces “multiply by a slope” with “apply a linear map.” For a function f: ℝⁿ → ℝᵐ, total differentiability at a point x₀ requires a linear map L (the multivariable derivative) and an error term that becomes small relative to the length of the input perturbation. Concretely, one writes f(x₀+h) = f(x₀) + L(h) + r(h), where the remainder r(h) satisfies r(h)/||h|| → 0 as h → 0. Because h is a vector, the normalization uses the Euclidean (ℓ₂) norm ||h||, i.e., the distance from h to the zero vector. The definition is pointwise: it must hold at the chosen x₀, and it must work along any way h approaches 0.

This linear map L is the “total derivative” and is often denoted Df or DF at the point x₀. When L is represented using a matrix, that matrix is the Jacobian matrix J_F of f at x₀, and the linear approximation becomes ordinary matrix-vector multiplication. The transcript emphasizes that the derivative is not merely a number in higher dimensions—it is an entire linear transformation capturing how all input directions affect all output components.

The discussion also shows how the definition reduces to the one-dimensional case when n = m = 1: the total derivative becomes a 1×1 matrix whose entry is f′(x₀), and L(h) is just f′(x₀)·h. A two-dimensional example then makes the idea concrete. Consider a function that flips two components: f(h₁, h₂) = (h₂, h₁). At the origin, f(0,0) = (0,0), and the function is already linear, so the remainder term is zero. The corresponding Jacobian matrix at (0,0) is [0 1; 1 0], which, when multiplied by the input vector (h₁, h₂), produces (h₂, h₁). That example illustrates how total differentiability can be checked by finding the correct linear map (or Jacobian) that matches the function’s first-order behavior near the point.

The takeaway is that total differentiability is the formal condition guaranteeing a reliable linear approximation in multiple dimensions, with the Jacobian providing the practical matrix form of the derivative.

Cornell Notes

Total differentiability in several variables is the multivariable version of “best linear approximation.” For f: ℝⁿ → ℝᵐ, being totally differentiable at x₀ means there exists a linear map L such that f(x₀+h) = f(x₀) + L(h) + r(h), where the remainder r(h) satisfies r(h)/||h|| → 0 as h → 0. Because h is a vector, the error is measured relative to the vector’s length (Euclidean norm). The linear map L is the total derivative and can be represented by the Jacobian matrix J_F at x₀. In the 1D case, the Jacobian reduces to the usual derivative f′(x₀), and in a 2D example that swaps coordinates, the Jacobian is the matrix that performs the swap exactly with zero remainder.

How does the definition of differentiability in one variable translate to several variables?

In one variable, differentiability at x₀ means f(x₀+h) = f(x₀) + B·h + R(h), with R(h)/h → 0 as h → 0. In several variables, h is a vector, so the approximation becomes f(x₀+h) = f(x₀) + L(h) + r(h), where L is a linear map ℝⁿ → ℝᵐ. The remainder must shrink faster than the size of h, so r(h)/||h|| → 0 as h → 0, using the Euclidean norm ||h||.

Why does the multivariable definition divide by ||h|| instead of h?

Because h is not a number but a vector, there is no single scalar “size” like h in one dimension. The natural measure of how big the perturbation is uses the Euclidean norm ||h||, which is the distance from h to the zero vector. Normalizing by ||h|| ensures the error is negligible compared with the magnitude of the input change in any direction.

What exactly is the “total derivative” in this setting?

The total derivative is the linear map L that best approximates the function near x₀: f(x₀+h) ≈ f(x₀) + L(h) for small h. It captures first-order behavior across all input directions. When written in matrix form, L corresponds to the Jacobian matrix J_F at x₀, and the approximation becomes f(x₀+h) ≈ f(x₀) + J_F·h.

How does the definition recover the usual derivative when n = m = 1?

When n = m = 1, the linear map L: ℝ → ℝ is just multiplication by a number. The Jacobian becomes a 1×1 matrix whose entry is f′(x₀). The linear approximation becomes f(x₀+h) = f(x₀) + f′(x₀)·h + (smaller error), matching the standard single-variable differentiability definition.

In the coordinate-swap example f(h₁,h₂) = (h₂,h₁), why is the remainder term zero at the origin?

The function is already linear: swapping coordinates can be written exactly as a matrix-vector product. At (0,0), f(0,0) = (0,0), and f(h₁,h₂) equals J_F·(h₁,h₂) with J_F = [[0,1],[1,0]]. Since the function matches its linear approximation exactly, there is no leftover error term, so the remainder is identically zero.

Review Questions

State the formal condition for f: ℝⁿ → ℝᵐ to be totally differentiable at x₀, including how the remainder term behaves.
Explain the role of the Jacobian matrix in representing the total derivative.
Why does the multivariable definition use the Euclidean norm ||h|| when measuring the size of the error?

Key Points

1
Total differentiability generalizes the single-variable idea of approximating a function near a point by a linear model.
2
For f: ℝⁿ → ℝᵐ, total differentiability at x₀ requires a linear map L such that f(x₀+h) = f(x₀) + L(h) + r(h).
3
The remainder must satisfy r(h)/||h|| → 0 as h → 0, using the Euclidean norm of the input perturbation.
4
The total derivative is the linear map L; when expressed as a matrix, it is the Jacobian matrix J_F at x₀.
5
In one dimension, the Jacobian reduces to the usual derivative f′(x₀).
6
A function that is already linear (like swapping two coordinates) is totally differentiable with zero remainder at the point where the linear form matches exactly.

Highlights

Total differentiability guarantees a reliable first-order linear approximation in multiple dimensions: f(x₀+h) ≈ f(x₀) + L(h).

Measuring the error relative to ||h|| is essential because h is a vector, not a scalar.

The Jacobian matrix is the matrix representation of the total derivative (the linear map).

A coordinate-swap function has Jacobian [[0,1],[1,0]] and no remainder because it is exactly linear.