Get AI summaries of any video or article — Sign up free
Session 40 - Probability Distribution Functions - PDF, PMF & CDF | DSMP 2023 thumbnail

Session 40 - Probability Distribution Functions - PDF, PMF & CDF | DSMP 2023

CampusX·
6 min read

Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Random variables map outcomes of a random experiment to values, and their sample space determines what probabilities can be assigned.

Briefing

Probability distributions become the bridge between raw outcomes and usable probability—especially when data analysts need to estimate what values are likely, how they’re spread, and how to compute probabilities for ranges. The session starts by laying the groundwork: random variables replace algebra’s “unknown value” with a variable that can take multiple possible outcomes from a random experiment. A coin toss example is used to show how a random variable is defined over a sample space, and the session distinguishes discrete random variables (finite countable outcomes like die results) from continuous random variables (outcomes over a range, like exam marks).

From there, the core idea of a probability distribution is introduced as a structured listing of all possible outcomes paired with their probabilities. For discrete cases, this is demonstrated through a table-style probability distribution for sums from dice rolls, highlighting that probabilities differ by outcome (some sums occur more often than others). The session then points out a practical limitation: manually building tables becomes unwieldy as the number of outcomes grows, and it becomes impossible for continuous variables where outcomes are effectively infinite. The solution is to use a mathematical relationship—a probability distribution function—so probabilities can be computed and plotted without enumerating every outcome.

The session emphasizes that probability distribution functions (PDF/PMF/CDF) are not just theoretical graphs; they provide two major benefits. First, the shape of the distribution reveals where the data “concentrates” (e.g., most students’ marks cluster around certain values). Second, if the observed distribution matches a known “famous” distribution (like Normal, Uniform, Bernoulli, Binomial), analysts can reuse established results rather than starting from scratch. The session also notes that each distribution comes with parameters (numerical tuning knobs) that control location and scale, changing the graph’s shape.

A significant portion then distinguishes discrete and continuous distribution functions. For discrete random variables, the probability mass function (PMF) gives the probability at each specific outcome, with the session stressing two constraints: probabilities must be nonnegative and sum to 1. It also demonstrates how PMF can be approximated empirically by running simulations (e.g., rolling dice many times) and comparing observed frequencies to theoretical probabilities.

For continuous random variables, the session introduces the probability density function (PDF) and explains why it cannot be interpreted like PMF. In continuous settings, the probability of any exact single value is effectively zero, so the meaningful quantity is probability over an interval. The session uses the “area under the curve” idea: integrating the PDF over a range yields the probability that the random variable falls within that interval. It also introduces the cumulative distribution function (CDF) as the running probability up to a value (probability of being less than or equal to x), and connects CDF and PDF through calculus intuition: differentiating CDF gives PDF, while integrating PDF gives CDF.

Finally, the session moves toward practical estimation. When the true distribution is unknown, density estimation techniques are needed to approximate the PDF. Two approaches are contrasted: parametric density estimation (assume a distribution family like Normal and estimate parameters such as mean and standard deviation) and non-parametric density estimation (make fewer assumptions and estimate the density directly from data). The session highlights kernel density estimation (KDE) as a common non-parametric method, where each data point contributes a “bump” (often Gaussian) and the bumps are summed to form a smooth estimated density. The session closes by previewing that future coverage will focus on applying these ideas to real datasets, including practical plotting of distributions and extending from 1D to 2D density plots.

Cornell Notes

The session builds from random variables to probability distributions, then to the functions used to compute probabilities. A probability distribution lists outcomes and their probabilities, but continuous data can’t be handled by tables, so mathematical functions are used instead. For discrete variables, the PMF gives the probability at each exact outcome; for continuous variables, the PDF describes density and probabilities come from the area under the curve over an interval. The CDF provides cumulative probability up to a value, and its relationship to PDF is tied to calculus (integration/differentiation). When the true distribution is unknown, density estimation—parametric (assume a family) or non-parametric like KDE—approximates the PDF from observed data.

What makes a random variable different from an algebra variable, and why does that matter for probability distributions?

An algebra variable represents an unknown single value in an equation, while a random variable represents an unknown outcome drawn from a random experiment. The random variable is defined over a sample space: the set of all possible outcomes. That mapping (outcome → value of the random variable) is what lets probability distributions assign probabilities to outcomes, either as discrete point probabilities (PMF) or as interval probabilities via density (PDF).

How do discrete and continuous random variables change what “probability” means on a graph?

For discrete random variables (like a die), probabilities attach to specific outcomes (1,2,3,4,5,6), so PMF values at those points are directly interpretable as probabilities. For continuous random variables (like exam marks over a range), exact single values have probability effectively zero, so the PDF cannot be read as “probability at x.” Instead, probability comes from integrating the PDF over an interval (e.g., probability of marks between 8 and 9).

What is the practical difference between PMF, PDF, and CDF?

PMF (discrete) gives P(X = x) for each possible outcome x. PDF (continuous) gives a density f(x) whose area over an interval equals probability, so P(a ≤ X ≤ b) = ∫[a,b] f(x) dx. CDF accumulates probability up to a value: F(x) = P(X ≤ x). The session also links them through calculus intuition: integrating PDF yields CDF, and differentiating CDF yields PDF (when conditions allow).

Why does the session say probability density “can’t be probability” at a point?

Because continuous random variables have infinitely many possible values across a range, the probability of any exact value is essentially zero. If one tried to interpret f(x) as P(X = x), the result would be misleading. The correct interpretation is that f(x) scales how much probability mass lies near x, and the actual probability is obtained by taking the area under the curve over a finite interval.

How does density estimation help when the true distribution is unknown?

When the distribution family and parameters aren’t known, analysts estimate the PDF from data. Parametric density estimation assumes a specific family (e.g., Normal) and estimates parameters like mean and standard deviation from the sample, then uses the assumed formula to compute density. Non-parametric density estimation avoids assuming a fixed family and estimates density directly from the data; kernel density estimation (KDE) is highlighted as a method where each data point contributes a kernel bump (often Gaussian) and the bumps are summed to form a smooth density curve.

What role do parameters play in probability distributions?

Each distribution has parameters that control its shape—commonly location (where the mass sits) and scale (how spread out it is). Changing parameters like mean and standard deviation for a Normal distribution shifts and stretches the curve. The session stresses that accurate parameter estimation depends on having enough data; with limited data, the estimated curve can deviate from the true distribution.

Review Questions

  1. In what way does the interpretation of a graph’s y-axis differ between PMF and PDF?
  2. Given a continuous random variable with PDF f(x), how would you compute P(a ≤ X ≤ b)?
  3. What is the conceptual difference between parametric density estimation and kernel density estimation (KDE)?

Key Points

  1. 1

    Random variables map outcomes of a random experiment to values, and their sample space determines what probabilities can be assigned.

  2. 2

    Discrete distributions use PMF values at exact outcomes, while continuous distributions use PDF density where probabilities come from integrating over intervals.

  3. 3

    A probability distribution function replaces manual probability tables, especially when outcomes are too many or infinite.

  4. 4

    Matching an observed distribution to a known family lets analysts reuse established properties and computations rather than starting from scratch.

  5. 5

    PMF must be nonnegative and sum to 1 across all discrete outcomes; PDF is interpreted through area under the curve, not point probability.

  6. 6

    CDF provides cumulative probability up to a value and relates to PDF through integration/differentiation ideas.

  7. 7

    When the true distribution is unknown, density estimation (parametric or non-parametric like KDE) approximates the PDF from data.

Highlights

Continuous probability is about intervals: the probability of an exact value is effectively zero, so the PDF’s meaning comes from the area under the curve.
PMF and PDF differ fundamentally in interpretation: PMF gives probability at points; PDF gives density whose integral yields probability.
If a dataset’s distribution shape matches a known distribution (e.g., Normal), analysts can apply known results automatically.
KDE builds a smooth density by centering a kernel (often Gaussian) at each data point and summing the contributions.
Parameters act like tuning knobs—changing them shifts and scales the distribution’s shape.

Topics

  • Random Variables
  • Probability Distributions
  • PMF and PDF
  • CDF
  • Density Estimation
  • Kernel Density Estimation

Mentioned

  • PDF
  • PMF
  • CDF
  • KDE
  • DSMP