Measure Theory 16 | Proof of the Substitution Rule for Measure Spaces [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
A measurable map H: X → Y induces an image measure H*μ on Y, defined so that (H*μ)(C) = μ(H^{-1}(C)) for measurable C ⊆ Y.
Briefing
The substitution rule for measure spaces hinges on a measurable map between two spaces and the way measures push forward through that map. Given measure spaces (X, μ) and (Y, ν), a measurable function H: X → Y induces an image measure H*μ on Y. For an integrable measurable function G: Y → ℝ, the rule says the integral of G over Y with respect to the image measure equals the integral of the composition G ∘ H over X with respect to μ—provided at least one of the two integrals exists. The proof proceeds by building the formula from simple building blocks (characteristic functions), then extending by linearity and finally by approximation using the definition of the Lebesgue integral.
First comes the characteristic-function case: take G to be the indicator function 𝟙_C of a measurable set C ⊆ Y. On the Y-side, integrating 𝟙_C against the image measure H*μ returns the measure of C under H*μ, which by definition is μ(H^{-1}(C)). On the X-side, integrating 𝟙_C ∘ H equals integrating the indicator of the set where H(x) ∈ C, which is exactly the same preimage H^{-1}(C). Since both sides compute the same measure, the substitution identity holds for characteristic functions.
Next, the argument moves to simple functions. Any nonnegative simple function on Y can be written as a finite linear combination of characteristic functions, G = Σ_{i=1}^n λ_i 𝟙_{C_i}. Linearity of the Lebesgue integral lets the coefficients λ_i be pulled out, and each term reduces to the characteristic-function case already proved. After applying the identity term-by-term, the sum can be recombined to show the substitution rule holds for all simple functions.
To reach general nonnegative measurable functions, the proof uses the standard construction of the Lebesgue integral: for G ≥ 0, its integral is the supremum of integrals of simple functions S ≤ G. For each such simple S on Y, the composition S ∘ H is again a simple function on X, and the substitution rule already established for simple functions gives equality of the corresponding integrals. Because S ∘ H ≤ G ∘ H pointwise, taking suprema yields the desired equality for G ∘ H on X and G on Y (with the image measure). Finally, an arbitrary measurable G can be split into positive and negative parts, and the argument applies to each part separately. The conclusion: if one of the two integrals exists (finite or not, depending on the setup), the other exists as well, and the substitution formula holds.
Cornell Notes
A measurable map H: X → Y lets measures “push forward” via the image measure H*μ. The substitution rule then links integrals across the two spaces: for measurable G on Y, integrating G over Y with respect to H*μ matches integrating G ∘ H over X with respect to μ, assuming one side exists. The proof starts with G = 𝟙_C, where both sides reduce to μ(H^{-1}(C)). It extends to simple functions by linearity, since simple functions are finite sums of scaled indicators. For general nonnegative measurable G, the Lebesgue integral is defined as a supremum over simple functions below G; composing those simple functions with H preserves simplicity and the inequality, so the supremum carries through. The final step splits an arbitrary G into positive and negative parts and applies the nonnegative result to each.
Why does the substitution rule become a statement about preimages when G is a characteristic function?
How does proving the rule for characteristic functions automatically extend to simple functions?
What role does the supremum definition of the Lebesgue integral play for nonnegative measurable functions?
Why is it legitimate to treat S ∘ H as a simple function on X?
How does the proof handle a general measurable G that can take negative values?
Review Questions
- In the characteristic-function case, what measurable set on X appears naturally, and why?
- Which property of the Lebesgue integral is used to move from characteristic functions to simple functions?
- How does the supremum over simple functions below G lead to the substitution rule for nonnegative measurable G?
Key Points
- 1
A measurable map H: X → Y induces an image measure H*μ on Y, defined so that (H*μ)(C) = μ(H^{-1}(C)) for measurable C ⊆ Y.
- 2
For G = 𝟙_C, the substitution rule holds because both sides equal μ(H^{-1}(C)).
- 3
Simple functions reduce to characteristic functions via finite linear combinations, and linearity of the integral carries the identity over.
- 4
For nonnegative measurable G, the Lebesgue integral’s definition as a supremum over simple functions below G lets the substitution rule pass to the limit.
- 5
Composing simple functions with a measurable map keeps them simple, enabling the supremum argument on X.
- 6
For general measurable G, splitting into positive and negative parts extends the nonnegative result and ensures existence of the paired integral when one exists.