Measure Theory 16 | Proof of the Substitution Rule for Measure Spaces [dark version]

TL;DR

The substitution rule equates ∫_Y G d(H_*μ) with ∫_X (G∘H) dμ when H is measurable and the relevant integrals exist.

Briefing Cornell Notes

Briefing

The substitution rule for measure spaces lets integrals be transferred across a measurable map: integrating a function on Y with respect to the image measure can be replaced by integrating the same function composed with the map on X. The key requirement is a measurable function H: X → Y between measure spaces (X, μ) and (Y, ν), where ν is the image measure H_*μ. Under the mild condition that at least one of the two integrals exists, the rule guarantees equality of the two sides—so long as measurability and integrability assumptions line up.

The proof starts with the simplest measurable functions on Y: characteristic functions χ_C of measurable sets C ⊂ Y. For such functions, the integral on the Y-side is just the measure of C. Because the measure on Y is the image measure H_*μ, that quantity equals μ(H^{-1}(C)). On the X-side, the composed function χ_C∘H takes value 1 exactly when H(x) ∈ C, which is the same as x ∈ H^{-1}(C). Therefore, integrating χ_C∘H over X again produces μ(H^{-1}(C)). This establishes the substitution identity for characteristic functions.

Next comes the extension to simple functions. Any nonnegative simple function can be written as a finite linear combination of characteristic functions, say G = Σ_{i=1}^n λ_i χ_{C_i}. Linearity of the integral allows the substitution rule to be applied term-by-term, moving from the characteristic-function case to the full simple-function case. At this stage, the equality holds for all simple measurable G by combining the already-proved set-level identity with the integral’s linearity.

To reach general nonnegative measurable functions, the proof uses the definition of the (extended) Lebesgue integral via the supremum over simple functions. For a nonnegative measurable G on Y, one considers simple functions G-tilde on Y that lie pointwise below G. Composing with H transfers these simple functions to X: G-tilde∘H becomes a simple function on X and remains bounded above by G∘H. Since the substitution rule already holds for each such simple function, taking the supremum over all admissible G-tilde yields equality of the integrals for G∘H and G with respect to the image measure.

Finally, the argument handles arbitrary measurable G by splitting it into positive and negative parts. The substitution rule applies separately to the nonnegative components, and if one of the two integrals exists (in the extended sense), the other exists as well. The result is a complete proof of the substitution rule: integrating G over Y with respect to H_*μ equals integrating G∘H over X with respect to μ, provided the relevant integrability condition is satisfied.

Cornell Notes

The substitution rule transfers integrals across a measurable map H: X → Y. When Y carries the image measure H_*μ, the identity ∫_Y G d(H_*μ) = ∫_X (G∘H) dμ holds whenever one side exists (then the other does too). The proof begins with characteristic functions χ_C of measurable sets C ⊂ Y, where both sides reduce to μ(H^{-1}(C)). It then extends to simple functions by writing them as finite linear combinations of characteristic functions and using linearity of the integral. For general nonnegative measurable G, the integral is defined as a supremum over simple functions below G; composing those simple functions with H preserves simplicity and the inequality, so the supremum matches on both sides. Arbitrary measurable G follows by splitting into positive and negative parts.

Why does the substitution rule become a set-measure identity for characteristic functions χ_C?

For χ_C on Y, the integral ∫_Y χ_C d(H_*μ) equals the measure of C under the image measure, i.e., (H_*μ)(C). By definition of image measure, (H_*μ)(C) = μ(H^{-1}(C)). On the X-side, χ_C∘H equals 1 exactly when H(x) ∈ C, which is equivalent to x ∈ H^{-1}(C); otherwise it is 0. Thus ∫_X (χ_C∘H) dμ also equals μ(H^{-1}(C)), matching the left side.

How does the proof extend from characteristic functions to simple functions?

A simple function G can be expressed as a finite linear combination of characteristic functions: G = Σ_{i=1}^n λ_i χ_{C_i}. Linearity of the integral lets the substitution rule be applied term-by-term: each ∫_Y χ_{C_i} d(H_*μ) becomes ∫_X (χ_{C_i}∘H) dμ. Pulling out coefficients λ_i and summing the resulting equalities yields the substitution identity for the entire simple function G.

What role does the supremum definition of the integral play for nonnegative measurable functions?

For nonnegative measurable G, the integral is defined as the supremum over integrals of simple functions G-tilde that satisfy G-tilde ≤ G pointwise. Composing with H turns each G-tilde into a simple function on X (namely G-tilde∘H), and the inequality transfers: G-tilde∘H ≤ G∘H. Since substitution holds for each simple G-tilde, taking the supremum over all such approximations gives equality of the integrals for G.

Why can the proof handle an arbitrary measurable G by splitting into positive and negative parts?

Any measurable G can be decomposed into G = G^+ − G^−, where both G^+ and G^− are nonnegative. The substitution rule is already established for nonnegative functions, so it applies to G^+ and G^− separately. Subtracting the resulting equalities gives the substitution rule for G itself. The existence of one integral implies the other exists because the extended integral behavior is controlled through this decomposition.

What is the minimal measurability/integrability structure needed for the rule to work?

Measurability of H ensures that compositions like χ_C∘H and G∘H are measurable when G is measurable. The image measure H_*μ is defined so that integrals over Y correspond to preimages under H. For the equality, the proof relies on the condition that at least one of the two integrals exists (in the extended sense); then the construction via nonnegative approximations and positive/negative splitting ensures the other side is well-defined as well.

Review Questions

In the characteristic-function case, which set on X determines the value of ∫_X (χ_C∘H) dμ?
How does linearity of the integral combine with the characteristic-function result to prove the substitution rule for simple functions?
Why does the supremum over simple functions below G on Y translate into the same supremum behavior for G∘H on X?

Key Points

1
The substitution rule equates ∫_Y G d(H_*μ) with ∫_X (G∘H) dμ when H is measurable and the relevant integrals exist.
2
For characteristic functions χ_C, both sides reduce to μ(H^{-1}(C}) via the definition of image measure and the preimage characterization of χ_C∘H.
3
Simple functions follow by writing them as finite linear combinations of characteristic functions and applying linearity of the integral term-by-term.
4
Nonnegative measurable functions use the integral’s definition as a supremum over simple functions below G, with composition preserving simplicity and inequalities.
5
Arbitrary measurable functions are handled by decomposing into positive and negative parts and applying the nonnegative case separately.
6
If one of the two integrals exists (extended sense), the other exists as well due to the positive/negative decomposition and the supremum construction.

Highlights

The proof’s backbone is the identity (H_*μ)(C) = μ(H^{-1}(C)), which makes the characteristic-function case immediate.

Once substitution holds for χ_C, linearity upgrades it to all simple functions without additional measure-theoretic machinery.

For general nonnegative G, the supremum definition of the integral turns substitution into a limit-of-simple-functions argument.

Splitting G into G^+ and G^− ensures the rule works beyond the nonnegative setting while preserving existence of integrals.

Topics

Measure Theory
Substitution Rule
Image Measure
Characteristic Functions
Lebesgue Integral