Get AI summaries of any video or article — Sign up free
Probability Theory 8 | Bayes's Theorem and Total Probability [dark version] thumbnail

Probability Theory 8 | Bayes's Theorem and Total Probability [dark version]

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Bayes’s theorem is obtained by combining two conditional-probability formulas that share the same intersection term P(A∩B).

Briefing

Bayes’s theorem and the law of total probability are presented as two linked tools for turning conditional information into an overall probability—then the Monty Hall problem is used to show how the machinery delivers the classic “switching wins with probability 2/3” result.

Bayes’s theorem is derived directly from conditional probability. Conditional probability of event A given B is written as P(A∩B)/P(B). Flipping the conditioning gives P(B|A)=P(A∩B)/P(A). Because the intersection A∩B is the same in both expressions, the two formulas can be combined to yield the standard Bayes form: P(A|B)·P(B)=P(B|A)·P(A). The key takeaway is the ability to swap the conditioning order by multiplying by the relevant event probability, and—when the target conditional probability needs to be isolated—dividing by the probability of the conditioning event.

The law of total probability then explains how to “split” a probability by partitioning the sample space. If the sample space Ω is decomposed into two disjoint parts, B and Bᶜ, then for any event A, P(A)=P(A∩B)+P(A∩Bᶜ). Each intersection term is rewritten using conditional probability: P(A∩B)=P(A|B)·P(B) and P(A∩Bᶜ)=P(A|Bᶜ)·P(Bᶜ). The same idea generalizes to countably many disjoint sets {B_i} whose union is Ω. With σ-additivity, the result becomes a series: P(A)=∑_i P(A|B_i)·P(B_i). This framework is what later supplies the denominator in Bayes’s theorem.

The Monty Hall problem is then framed in probability terms. There are three doors: one hides a car and two hide goats. A player initially picks a door (labeled door 1). The host then opens a different door showing a goat (the transcript focuses on the case where the host opens door 3). Events are defined as C_j = “car is behind door j” and S_j = “the host opens door j in the second step.” Under the condition that the car is behind door 3 (C_3), the host cannot open door 3 to show a goat, so P(S_3|C_3)=0. If the car is behind door 2 (C_2), the host has no choice but to open door 3, so P(S_3|C_2)=1. If the car is behind door 1 (C_1), the host has a choice between the two goat doors, making P(S_3|C_1)=1/2.

To find the probability of winning by switching, the analysis targets P(C_1|S_3) (switching from the initially chosen door 1 to the remaining unopened door). Bayes’s theorem requires dividing by P(S_3), which is not directly given. The law of total probability supplies it: P(S_3)=P(S_3|C_1)P(C_1)+P(S_3|C_2)P(C_2)+P(S_3|C_3)P(C_3). With a fair game assumption, each door has probability 1/3 for the car, so P(C_1)=P(C_2)=P(C_3)=1/3. Substituting the conditional probabilities yields P(C_1|S_3)=2/3, meaning switching gives a 2/3 chance to get the car. The point isn’t the car—it’s the demonstrated workflow: Bayes’s theorem for reversing conditioning, backed by the total probability formula for computing the missing normalization term.

Cornell Notes

Bayes’s theorem is derived from the definition of conditional probability: P(A|B)=P(A∩B)/P(B), and swapping conditioning gives P(B|A)=P(A∩B)/P(A). Multiplying by P(B) and P(A) shows the shared intersection term, enabling the conditioning order to be reversed. The law of total probability then computes an event’s probability by partitioning the sample space into disjoint cases {B_i}: P(A)=∑_i P(A|B_i)P(B_i). In the Monty Hall setup, the host’s action S_3 is used to update beliefs about where the car is (events C_1, C_2, C_3). Bayes’s theorem needs P(S_3), and total probability provides it, leading to a 2/3 win probability when switching.

How does Bayes’s theorem follow from the definition of conditional probability?

Start with P(A|B)=P(A∩B)/P(B) and P(B|A)=P(A∩B)/P(A). Since both share the same intersection P(A∩B), multiplying the first by P(B) and the second by P(A) gives P(A|B)·P(B)=P(B|A)·P(A). Rearranging isolates the desired conditional probability, typically requiring division by P(B) (or P(A), depending on the target).

Why does the law of total probability require a disjoint partition of the sample space?

The formula relies on splitting Ω into disjoint pieces so that probabilities add cleanly. For two pieces, Ω=B∪Bᶜ with B∩Bᶜ=∅, so P(A)=P(A∩B)+P(A∩Bᶜ). For many pieces, Ω=⋃_i B_i as a disjoint union, and σ-additivity turns the sum of intersection probabilities into a series: P(A)=∑_i P(A|B_i)P(B_i).

In Monty Hall, what are the conditional probabilities for the host opening door 3?

Let C_j mean the car is behind door j, and S_3 mean the host opens door 3. If C_3 occurs, the host cannot open door 3 to reveal a goat, so P(S_3|C_3)=0. If C_2 occurs, door 3 must be a goat door, so P(S_3|C_2)=1. If C_1 occurs, both remaining doors are goats, so the host chooses between them, giving P(S_3|C_1)=1/2.

Where does the “missing” probability P(S_3) come from in Bayes’s theorem?

Bayes’s theorem needs P(S_3) to normalize the updated belief. Because P(S_3) isn’t given directly, total probability computes it by summing over all mutually exclusive car locations: P(S_3)=P(S_3|C_1)P(C_1)+P(S_3|C_2)P(C_2)+P(S_3|C_3)P(C_3).

How does the fair-game assumption affect the final 2/3 result?

With three doors and one car, fairness means each car location has probability 1/3: P(C_1)=P(C_2)=P(C_3)=1/3. Plugging these into the total-probability denominator and using P(S_3|C_1)=1/2, P(S_3|C_2)=1, P(S_3|C_3)=0 yields P(C_1|S_3)=2/3, which corresponds to the switching advantage.

Review Questions

  1. Given P(A|B) and P(B|A), what additional probability term is needed to compute one from the other using Bayes’s theorem?
  2. How does the law of total probability change when the partition of Ω uses countably many sets {B_i} instead of just B and Bᶜ?
  3. In the Monty Hall setup, why is P(S_3|C_3)=0 and P(S_3|C_2)=1?

Key Points

  1. 1

    Bayes’s theorem is obtained by combining two conditional-probability formulas that share the same intersection term P(A∩B).

  2. 2

    Swapping conditioning requires multiplying by the conditioning event’s probability and then dividing by the probability of the new conditioning event.

  3. 3

    The law of total probability computes P(A) by summing P(A|B_i)P(B_i) across a disjoint partition {B_i} of the sample space.

  4. 4

    For two cases, the partition is B and Bᶜ, giving P(A)=P(A|B)P(B)+P(A|Bᶜ)P(Bᶜ).

  5. 5

    For countably many cases, σ-additivity turns the same idea into a series: P(A)=∑_i P(A|B_i)P(B_i).

  6. 6

    In Monty Hall, the host’s action S_3 determines conditional probabilities for the car locations C_1, C_2, and C_3: 1/2, 1, and 0 respectively.

  7. 7

    Using Bayes’s theorem with total probability to compute P(S_3) yields a 2/3 chance of winning by switching.

Highlights

Bayes’s theorem emerges cleanly from the identity P(A∩B)=P(A|B)P(B)=P(B|A)P(A).
Total probability is essentially probability additivity over a disjoint partition, generalized to countably many sets via σ-additivity.
In Monty Hall, the host’s behavior forces P(S_3|C_3)=0 and P(S_3|C_2)=1, leaving P(S_3|C_1)=1/2.
The 2/3 switching advantage comes from Bayes’s theorem’s normalization term P(S_3), computed by summing over all car locations.

Topics