Get AI summaries of any video or article — Sign up free
Probability Theory 8 | Bayes's Theorem and Total Probability thumbnail

Probability Theory 8 | Bayes's Theorem and Total Probability

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Bayes’s theorem is obtained by equating two formulas for the same intersection probability P(A∩B) and rearranging.

Briefing

Bayes’s theorem and the law of total probability are derived from one simple idea: conditional probability is built from intersections. Starting with the definition of conditional probability, P(A|B)=P(A∩B)/P(B), the transcript flips the conditioning to get P(B|A)=P(A∩B)/P(A). Because the intersection term P(A∩B) is the same in both expressions, the two equations combine into Bayes’s theorem, letting probabilities be “reversed” between A given B and B given A. The key practical move is that Bayes’s theorem trades an unknown conditional probability for known ones—at the cost of dividing by P(A) (or P(B), depending on the arrangement).

The law of total probability then answers a different but related question: how to compute P(A) when A can happen through multiple disjoint “routes.” First, split the sample space Ω using an event B and its complement B^c, which forms a disjoint union Ω = B ∪ B^c. Intersect A with each part, rewrite P(A∩B) and P(A∩B^c) as conditional probabilities times the conditioning event probabilities, and add them. This yields P(A)=P(A|B)P(B)+P(A|B^c)P(B^c). The same logic extends to countably many disjoint sets {B_i} whose union is Ω: if the B_i are disjoint and cover the whole space, then P(A)=∑_i P(A|B_i)P(B_i). In the infinite case, the sum becomes a series, justified by σ-additivity.

Those tools are then put to work on the Monty Hall problem, used as a familiar test case for Bayes’s theorem plus the law of total probability. There are three doors: one hides a car (the prize) and two hide goats. A player initially picks a door (say door 1). The show host then opens one of the two remaining doors, always revealing a goat. Finally, the player either switches to the other unopened door or stays with the original choice.

To compute the benefit of switching, the transcript defines events C_j (“the car is behind door j”) and S_j (“the show master opens door j”). Under the assumption that the show master opens door 3, the conditional probability P(S3|C3)=0 because door 3 cannot contain the car if the host opens it. Also, P(S3|C2)=1 because if the car is behind door 2, the host has no choice but to open door 3 (the only goat door available). The remaining case, where the car is behind door 1, leaves the host with a choice between the two goat doors, giving P(S3|C1)=1/2.

Bayes’s theorem is then used to find the probability of the car being behind the switched-to door (door 2) given that the host opened door 3: P(C2|S3)=P(S3|C2)P(C2) / P(S3). The denominator P(S3) is computed via the law of total probability as a sum over the mutually exclusive car locations: P(S3)=P(S3|C1)P(C1)+P(S3|C2)P(C2)+P(S3|C3)P(C3). With a fair game, P(C1)=P(C2)=P(C3)=1/3. Plugging in the conditional values (1/2, 1, and 0) collapses the expression to 2/3. The conclusion is that switching yields a 2/3 chance of winning the car, while staying yields 1/3—demonstrating how Bayes’s theorem and total probability turn conditional information into a concrete decision advantage.

Cornell Notes

Bayes’s theorem comes directly from the definition of conditional probability: P(A|B)=P(A∩B)/P(B). Swapping A and B and using the shared intersection term gives a way to “reverse” conditional probabilities, at the cost of dividing by P(A) (or P(B)). The law of total probability computes P(A) by splitting the sample space into disjoint parts. For two parts, Ω=B∪B^c, it becomes P(A)=P(A|B)P(B)+P(A|B^c)P(B^c). For countably many disjoint sets {B_i} that cover Ω, it generalizes to P(A)=∑_i P(A|B_i)P(B_i). The Monty Hall problem applies both: conditional host behavior updates the odds so switching wins with probability 2/3.

How does Bayes’s theorem follow from conditional probability without any extra assumptions?

Conditional probability is defined as P(A|B)=P(A∩B)/P(B). Flipping the conditioning gives P(B|A)=P(A∩B)/P(A). Since both expressions share the same intersection P(A∩B), they can be combined to obtain P(A|B)=P(B|A)P(A)/P(B) (equivalently, P(B|A)=P(A|B)P(B)/P(A)). The only requirement is that the conditioning event has positive probability so the division is valid.

What is the core mechanism behind the law of total probability?

It partitions the sample space into disjoint cases and adds their contributions. For Ω=B∪B^c, the event A splits as A=(A∩B)∪(A∩B^c), a disjoint union. Measure additivity gives P(A)=P(A∩B)+P(A∩B^c). Rewriting each intersection as P(A∩B)=P(A|B)P(B) and P(A∩B^c)=P(A|B^c)P(B^c) yields the two-term formula.

How does the law of total probability extend from two sets to infinitely many?

Replace B and B^c with a countable disjoint family {B_i} whose union is Ω. Then A decomposes as A=⋃_i (A∩B_i) with disjoint pieces, so σ-additivity turns P(A) into a sum (or series): P(A)=∑_i P(A∩B_i)=∑_i P(A|B_i)P(B_i). When there are infinitely many B_i, the result is an infinite series.

In Monty Hall, why is P(S3|C3)=0 when the host opens door 3?

S3 means the show master opens door 3. If the car is behind door 3 (event C3), then door 3 is not a goat door. The host’s rule is to open a goat door, so the host would never open the car door. Therefore, given C3, the probability of opening door 3 is zero: P(S3|C3)=0.

Why does the calculation yield P(C2|S3)=2/3 when switching?

Bayes’s theorem gives P(C2|S3)=P(S3|C2)P(C2)/P(S3). Here P(S3|C2)=1 because if the car is behind door 2, the host must open the only remaining goat door (door 3). The denominator uses total probability: P(S3)=P(S3|C1)P(C1)+P(S3|C2)P(C2)+P(S3|C3)P(C3). With a fair game, P(C1)=P(C2)=P(C3)=1/3, and the conditional values are P(S3|C1)=1/2, P(S3|C2)=1, P(S3|C3)=0. So P(S3)=(1/2)(1/3)+1(1/3)+0=1/2, making P(C2|S3)= (1)(1/3)/(1/2)=2/3.

Review Questions

  1. What algebraic step turns the two conditional probability expressions into Bayes’s theorem?
  2. How does the disjoint union property justify turning P(A∩B)+P(A∩B^c) into P(A)?
  3. In Monty Hall, which conditional probabilities are needed to compute P(C2|S3), and how are they obtained from the host’s behavior?

Key Points

  1. 1

    Bayes’s theorem is obtained by equating two formulas for the same intersection probability P(A∩B) and rearranging.

  2. 2

    The law of total probability computes P(A) by splitting Ω into disjoint parts and summing P(A|B_i)P(B_i).

  3. 3

    For two-part splits, Ω=B∪B^c leads to P(A)=P(A|B)P(B)+P(A|B^c)P(B^c).

  4. 4

    For countably many disjoint parts {B_i} covering Ω, the result becomes P(A)=∑_i P(A|B_i)P(B_i).

  5. 5

    In Monty Hall, the host’s rule that a goat is always opened forces P(S3|C3)=0 and P(S3|C2)=1.

  6. 6

    With fair initial odds and the computed conditional probabilities, switching after the host opens a goat door gives a 2/3 chance of winning the car.

Highlights

Bayes’s theorem is essentially a “conditional probability reversal” produced by the shared intersection term P(A∩B).
Total probability turns an unknown probability P(A) into a weighted sum of conditional probabilities across disjoint cases.
Monty Hall’s 2/3 result drops out once P(S3|C1)=1/2, P(S3|C2)=1, and P(S3|C3)=0 are combined with fair priors.
The denominator in Bayes’s theorem, P(S3), is computed via total probability as a sum over all mutually exclusive car locations.

Topics