Get AI summaries of any video or article — Sign up free
Multivariable Calculus 8 | Gradient [dark version] thumbnail

Multivariable Calculus 8 | Gradient [dark version]

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

The gradient ∇f(x̃) is the column vector of partial derivatives of a real-valued function f at x̃, obtained by transposing the 1×n Jacobian.

Briefing

The gradient is introduced as the multivariable tool that turns a totally differentiable real-valued function into a vector field—pointing in the direction of greatest increase. For a function f defined on an open subset of R^n, the gradient at a point x̃ is built from the partial derivatives of f with respect to each coordinate. Because f outputs a single real number, its Jacobian at x̃ is a 1×n matrix; transposing it yields an n-dimensional column vector, which is exactly ∇f(x̃). The notation ∇f(x̃) (and the alternative “nabla” symbol) is emphasized as a compact way to represent this vector of sensitivities.

A concrete example makes the geometry tangible: f(x1, x2) = x1^2 + x2^2. Its graph is a paraboloid, while its contour lines are circles centered at the origin. Computing partial derivatives gives ∂f/∂x1 = 2x1 and ∂f/∂x2 = 2x2, so ∇f(x1, x2) = (2x1, 2x2). In the plane, this gradient can be visualized as arrows attached to each point; they point outward and grow longer as the distance from the origin increases, matching the idea that moving away from the center increases f more strongly.

The gradient’s power is then tested through the multivariable chain rule. Consider a curve γ: R → R^2 that traces a circle, for instance γ(t) = (cos t, sin t). Composing f with γ produces a one-variable function t ↦ f(γ(t)). Using the multivariable chain rule, the derivative of the composition can be expressed via Jacobians: J_f(γ(t)) multiplied by J_γ(t). Carrying out the computation yields a result of 0 for all t, meaning the composed quantity f(γ(t)) does not change as the point moves along the circle.

That zero derivative is reinterpreted geometrically using the gradient. Since ∇f is the transpose of the Jacobian of f, the chain-rule product can be rewritten as an inner product in R^2: ⟨∇f(γ(t)), γ′(t)⟩. In this example, the inner product equals 0, so the gradient vector at each point on the circle is perpendicular to the curve’s tangent direction γ′(t). The takeaway is that the gradient provides an immediate geometric criterion for how a function changes along a path: if the tangent direction is orthogonal to ∇f, the function stays constant along that path. The deeper geometric meaning of this orthogonality is saved for the next installment.

Cornell Notes

For a real-valued function f on R^n, the gradient ∇f(x̃) is the column vector of partial derivatives at x̃. It arises by taking the Jacobian of f (a 1×n row) and transposing it into an n-dimensional vector. In the example f(x1,x2)=x1^2+x2^2, the gradient is ∇f(x1,x2)=(2x1,2x2), forming an outward-pointing vector field whose arrows grow with distance from the origin. When f is composed with a circular path γ(t)=(cos t, sin t), the multivariable chain rule gives (f∘γ)′(t)=0 for all t. Rewriting that derivative as an inner product shows ⟨∇f(γ(t)), γ′(t)⟩=0, so the gradient is perpendicular to the curve’s tangent direction, explaining why f stays constant along the circle.

How is the gradient ∇f(x̃) constructed from the Jacobian for a real-valued function f: R^n → R?

For a real-valued f, the Jacobian at x̃ is a 1×n matrix containing the partial derivatives: (∂f/∂x1, ∂f/∂x2, …, ∂f/∂xn) evaluated at x̃. Transposing this row turns it into an n×1 column vector, which is the gradient ∇f(x̃). In symbols, ∇f(x̃) is the transpose of the Jacobian of f at x̃.

Why does ∇f(x1,x2) = (2x1, 2x2) point outward for f(x1,x2)=x1^2+x2^2?

The gradient components are ∂f/∂x1=2x1 and ∂f/∂x2=2x2. At a point (x1,x2), the vector (2x1,2x2) has the same direction as (x1,x2), which points away from the origin. Its magnitude increases as x1^2+x2^2 grows, so the arrows get longer farther from the origin.

What does the chain rule predict for (f∘γ)′(t) when γ(t)=(cos t, sin t) and f(x1,x2)=x1^2+x2^2?

Using the multivariable chain rule, (f∘γ)′(t)=J_f(γ(t))·J_γ(t). The computation yields 0 for every t. Geometrically, this matches the fact that f(cos t, sin t)=cos^2 t+sin^2 t=1, which stays constant along the circle.

How does rewriting the chain-rule product using the gradient turn it into an orthogonality statement?

Since ∇f is the transpose of the Jacobian of f, the chain-rule expression can be rewritten as an inner product: (f∘γ)′(t)=⟨∇f(γ(t)), γ′(t)⟩. In this example the inner product equals 0, so ∇f(γ(t)) is perpendicular to γ′(t). Orthogonality here explains why moving along the circle does not change f.

What geometric relationship is being highlighted between level sets and the gradient?

The example uses a circular path (a level set of f) and shows that the gradient at points on that path is orthogonal to the path’s tangent vector. That pattern signals the general idea: the gradient points in the direction normal to level sets, while tangents to level sets are perpendicular to it.

Review Questions

  1. For a function f: R^n → R, what are the dimensions of its Jacobian and how does transposing it produce the gradient?
  2. In the example f(x1,x2)=x1^2+x2^2, compute ∇f(x1,x2) and describe how its direction changes across the plane.
  3. Why does (f∘γ)′(t)=0 for γ(t)=(cos t, sin t), and how does that translate into an inner product condition involving ∇f and γ′(t)?

Key Points

  1. 1

    The gradient ∇f(x̃) is the column vector of partial derivatives of a real-valued function f at x̃, obtained by transposing the 1×n Jacobian.

  2. 2

    For f(x1,x2)=x1^2+x2^2, the gradient is ∇f(x1,x2)=(2x1,2x2), forming an outward-pointing vector field.

  3. 3

    Contour lines of x1^2+x2^2 are circles centered at the origin, matching the symmetry of the gradient field.

  4. 4

    The multivariable chain rule expresses (f∘γ)′(t) as J_f(γ(t))·J_γ(t).

  5. 5

    For γ(t)=(cos t, sin t), the chain-rule calculation gives (f∘γ)′(t)=0 for all t, so f stays constant along the circle.

  6. 6

    Rewriting the chain-rule result using the gradient yields (f∘γ)′(t)=⟨∇f(γ(t)), γ′(t)⟩, so a zero derivative means orthogonality between ∇f and the curve’s tangent direction.

Highlights

The gradient turns a real-valued multivariable function into a vector field by collecting partial derivatives into a column vector.
For f(x1,x2)=x1^2+x2^2, the gradient is (2x1,2x2), pointing outward and growing with distance from the origin.
Composing with the circular path γ(t)=(cos t, sin t) produces (f∘γ)′(t)=0 for every t.
The zero derivative becomes an orthogonality claim: ⟨∇f(γ(t)), γ′(t)⟩=0, so the gradient is perpendicular to the circle’s tangent.
That orthogonality is presented as the geometric reason the function remains constant along the chosen path.