Measure Theory 16 | Proof of the Substitution Rule for Measure Spaces [dark version]

TL;DR

A measurable map H: X → Y induces an image measure H*μ on Y, defined so that (H*μ)(C) = μ(H^{-1}(C)) for measurable C ⊆ Y.

Briefing Cornell Notes

Briefing

The substitution rule for measure spaces hinges on a measurable map between two spaces and the way measures push forward through that map. Given measure spaces (X, μ) and (Y, ν), a measurable function H: X → Y induces an image measure H*μ on Y. For an integrable measurable function G: Y → ℝ, the rule says the integral of G over Y with respect to the image measure equals the integral of the composition G ∘ H over X with respect to μ—provided at least one of the two integrals exists. The proof proceeds by building the formula from simple building blocks (characteristic functions), then extending by linearity and finally by approximation using the definition of the Lebesgue integral.

First comes the characteristic-function case: take G to be the indicator function 𝟙_C of a measurable set C ⊆ Y. On the Y-side, integrating 𝟙_C against the image measure H*μ returns the measure of C under H*μ, which by definition is μ(H^{-1}(C)). On the X-side, integrating 𝟙_C ∘ H equals integrating the indicator of the set where H(x) ∈ C, which is exactly the same preimage H^{-1}(C). Since both sides compute the same measure, the substitution identity holds for characteristic functions.

Next, the argument moves to simple functions. Any nonnegative simple function on Y can be written as a finite linear combination of characteristic functions, G = Σ_{i=1}^n λ_i 𝟙_{C_i}. Linearity of the Lebesgue integral lets the coefficients λ_i be pulled out, and each term reduces to the characteristic-function case already proved. After applying the identity term-by-term, the sum can be recombined to show the substitution rule holds for all simple functions.

To reach general nonnegative measurable functions, the proof uses the standard construction of the Lebesgue integral: for G ≥ 0, its integral is the supremum of integrals of simple functions S ≤ G. For each such simple S on Y, the composition S ∘ H is again a simple function on X, and the substitution rule already established for simple functions gives equality of the corresponding integrals. Because S ∘ H ≤ G ∘ H pointwise, taking suprema yields the desired equality for G ∘ H on X and G on Y (with the image measure). Finally, an arbitrary measurable G can be split into positive and negative parts, and the argument applies to each part separately. The conclusion: if one of the two integrals exists (finite or not, depending on the setup), the other exists as well, and the substitution formula holds.

Cornell Notes

A measurable map H: X → Y lets measures “push forward” via the image measure H*μ. The substitution rule then links integrals across the two spaces: for measurable G on Y, integrating G over Y with respect to H*μ matches integrating G ∘ H over X with respect to μ, assuming one side exists. The proof starts with G = 𝟙_C, where both sides reduce to μ(H^{-1}(C)). It extends to simple functions by linearity, since simple functions are finite sums of scaled indicators. For general nonnegative measurable G, the Lebesgue integral is defined as a supremum over simple functions below G; composing those simple functions with H preserves simplicity and the inequality, so the supremum carries through. The final step splits an arbitrary G into positive and negative parts and applies the nonnegative result to each.

Why does the substitution rule become a statement about preimages when G is a characteristic function?

When G = 𝟙_C for a measurable set C ⊆ Y, the left side computes ∫_Y 𝟙_C d(H*μ). By definition of the image measure, (H*μ)(C) = μ(H^{-1}(C)). On the right side, 𝟙_C ∘ H is 1 exactly when H(x) ∈ C, i.e., when x ∈ H^{-1}(C). Therefore ∫_X (𝟙_C ∘ H) dμ = μ(H^{-1}(C)) as well, so both sides match.

How does proving the rule for characteristic functions automatically extend to simple functions?

A simple function on Y has the form G = Σ_{i=1}^n λ_i 𝟙_{C_i}. Linearity of the Lebesgue integral lets the integral of G be written as Σ λ_i times the integral of each 𝟙_{C_i}. Each term is handled by the characteristic-function case, and recombining the terms yields ∫_Y G d(H*μ) = ∫_X (G ∘ H) dμ for simple G.

What role does the supremum definition of the Lebesgue integral play for nonnegative measurable functions?

For G ≥ 0, ∫ G dμ is defined as sup{∫ S dμ : S is a nonnegative simple function with S ≤ G}. For each such S on Y, the composition S ∘ H is a simple function on X and satisfies S ∘ H ≤ G ∘ H. Since the substitution rule already holds for simple functions, ∫_Y S d(H*μ) = ∫_X (S ∘ H) dμ. Taking suprema over all S ≤ G transfers the equality to the integrals of G and G ∘ H.

Why is it legitimate to treat S ∘ H as a simple function on X?

Composing a simple function S (built from finitely many indicator sets on Y) with a measurable map H produces a function on X that is still a finite linear combination of indicator functions of preimages H^{-1}(C_i). Because measurability ensures those preimages are measurable in X, S ∘ H remains a simple function.

How does the proof handle a general measurable G that can take negative values?

Any measurable G can be decomposed into its positive and negative parts: G = G^+ − G^−, with G^+, G^− ≥ 0. The substitution rule is proved for nonnegative functions, so it applies separately to G^+ and G^−. If one of the integrals exists, the corresponding other side exists as well, and combining the two parts yields the substitution identity for G.

Review Questions

In the characteristic-function case, what measurable set on X appears naturally, and why?
Which property of the Lebesgue integral is used to move from characteristic functions to simple functions?
How does the supremum over simple functions below G lead to the substitution rule for nonnegative measurable G?

Key Points

1
A measurable map H: X → Y induces an image measure H*μ on Y, defined so that (H*μ)(C) = μ(H^{-1}(C)) for measurable C ⊆ Y.
2
For G = 𝟙_C, the substitution rule holds because both sides equal μ(H^{-1}(C)).
3
Simple functions reduce to characteristic functions via finite linear combinations, and linearity of the integral carries the identity over.
4
For nonnegative measurable G, the Lebesgue integral’s definition as a supremum over simple functions below G lets the substitution rule pass to the limit.
5
Composing simple functions with a measurable map keeps them simple, enabling the supremum argument on X.
6
For general measurable G, splitting into positive and negative parts extends the nonnegative result and ensures existence of the paired integral when one exists.

Highlights

The proof’s backbone is that integrating an indicator function against an image measure turns into measuring a preimage: (H*μ)(C) = μ(H^{-1}(C)).

Linearity is the bridge from characteristic functions to simple functions, turning the substitution rule into a term-by-term identity.

For nonnegative functions, the supremum definition of the Lebesgue integral makes the substitution rule stable under approximation by simple functions.

Handling negative values comes from decomposing G into G^+ and G^− and applying the nonnegative case twice.

Topics

Measure Spaces
Substitution Rule
Image Measure
Lebesgue Integral
Measurable Functions

Mentioned

μ
H*μ