Probability Theory 8 | Bayes's Theorem and Total Probability
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Bayes’s theorem is obtained by equating two formulas for the same intersection probability P(A∩B) and rearranging.
Briefing
Bayes’s theorem and the law of total probability are derived from one simple idea: conditional probability is built from intersections. Starting with the definition of conditional probability, P(A|B)=P(A∩B)/P(B), the transcript flips the conditioning to get P(B|A)=P(A∩B)/P(A). Because the intersection term P(A∩B) is the same in both expressions, the two equations combine into Bayes’s theorem, letting probabilities be “reversed” between A given B and B given A. The key practical move is that Bayes’s theorem trades an unknown conditional probability for known ones—at the cost of dividing by P(A) (or P(B), depending on the arrangement).
The law of total probability then answers a different but related question: how to compute P(A) when A can happen through multiple disjoint “routes.” First, split the sample space Ω using an event B and its complement B^c, which forms a disjoint union Ω = B ∪ B^c. Intersect A with each part, rewrite P(A∩B) and P(A∩B^c) as conditional probabilities times the conditioning event probabilities, and add them. This yields P(A)=P(A|B)P(B)+P(A|B^c)P(B^c). The same logic extends to countably many disjoint sets {B_i} whose union is Ω: if the B_i are disjoint and cover the whole space, then P(A)=∑_i P(A|B_i)P(B_i). In the infinite case, the sum becomes a series, justified by σ-additivity.
Those tools are then put to work on the Monty Hall problem, used as a familiar test case for Bayes’s theorem plus the law of total probability. There are three doors: one hides a car (the prize) and two hide goats. A player initially picks a door (say door 1). The show host then opens one of the two remaining doors, always revealing a goat. Finally, the player either switches to the other unopened door or stays with the original choice.
To compute the benefit of switching, the transcript defines events C_j (“the car is behind door j”) and S_j (“the show master opens door j”). Under the assumption that the show master opens door 3, the conditional probability P(S3|C3)=0 because door 3 cannot contain the car if the host opens it. Also, P(S3|C2)=1 because if the car is behind door 2, the host has no choice but to open door 3 (the only goat door available). The remaining case, where the car is behind door 1, leaves the host with a choice between the two goat doors, giving P(S3|C1)=1/2.
Bayes’s theorem is then used to find the probability of the car being behind the switched-to door (door 2) given that the host opened door 3: P(C2|S3)=P(S3|C2)P(C2) / P(S3). The denominator P(S3) is computed via the law of total probability as a sum over the mutually exclusive car locations: P(S3)=P(S3|C1)P(C1)+P(S3|C2)P(C2)+P(S3|C3)P(C3). With a fair game, P(C1)=P(C2)=P(C3)=1/3. Plugging in the conditional values (1/2, 1, and 0) collapses the expression to 2/3. The conclusion is that switching yields a 2/3 chance of winning the car, while staying yields 1/3—demonstrating how Bayes’s theorem and total probability turn conditional information into a concrete decision advantage.
Cornell Notes
Bayes’s theorem comes directly from the definition of conditional probability: P(A|B)=P(A∩B)/P(B). Swapping A and B and using the shared intersection term gives a way to “reverse” conditional probabilities, at the cost of dividing by P(A) (or P(B)). The law of total probability computes P(A) by splitting the sample space into disjoint parts. For two parts, Ω=B∪B^c, it becomes P(A)=P(A|B)P(B)+P(A|B^c)P(B^c). For countably many disjoint sets {B_i} that cover Ω, it generalizes to P(A)=∑_i P(A|B_i)P(B_i). The Monty Hall problem applies both: conditional host behavior updates the odds so switching wins with probability 2/3.
How does Bayes’s theorem follow from conditional probability without any extra assumptions?
What is the core mechanism behind the law of total probability?
How does the law of total probability extend from two sets to infinitely many?
In Monty Hall, why is P(S3|C3)=0 when the host opens door 3?
Why does the calculation yield P(C2|S3)=2/3 when switching?
Review Questions
- What algebraic step turns the two conditional probability expressions into Bayes’s theorem?
- How does the disjoint union property justify turning P(A∩B)+P(A∩B^c) into P(A)?
- In Monty Hall, which conditional probabilities are needed to compute P(C2|S3), and how are they obtained from the host’s behavior?
Key Points
- 1
Bayes’s theorem is obtained by equating two formulas for the same intersection probability P(A∩B) and rearranging.
- 2
The law of total probability computes P(A) by splitting Ω into disjoint parts and summing P(A|B_i)P(B_i).
- 3
For two-part splits, Ω=B∪B^c leads to P(A)=P(A|B)P(B)+P(A|B^c)P(B^c).
- 4
For countably many disjoint parts {B_i} covering Ω, the result becomes P(A)=∑_i P(A|B_i)P(B_i).
- 5
In Monty Hall, the host’s rule that a goat is always opened forces P(S3|C3)=0 and P(S3|C2)=1.
- 6
With fair initial odds and the computed conditional probabilities, switching after the host opens a goat door gives a 2/3 chance of winning the car.