Python Tutorial: Generate Random Numbers and Data Using the random Module
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use `random` for simulations, games, and dummy data; switch to `secrets` for security-sensitive randomness.
Briefing
Python’s built-in `random` module makes it easy to generate realistic-looking dummy data—random numbers, random selections from lists, weighted outcomes, shuffling, and unique samples—without installing any third-party packages. The key takeaway is that `random` is ideal for games, simulations, and test data, but it’s not meant for security or cryptography; for that, Python recommends using the `secrets` module instead.
The tutorial starts with importing `random` from the standard library. It then walks through the most common number-generation methods. `random()` returns a float between 0 and 1, where 0 is inclusive and 1 is non-inclusive—so exact 1 never appears. For bounded floats, `uniform(a, b)` produces a random floating-point value between two endpoints (by changing the arguments, any range can be used). For integers, `randint(a, b)` simulates discrete outcomes such as a six-sided die, including both endpoints; the same method can model a coin toss by treating 0 as heads and 1 as tails.
Selection from collections is handled with `choice()` and `choices()`. `choice(sequence)` picks a single random element from a list. `choices(sequence, k=...)` returns multiple picks and can repeat elements, which is useful for simulating repeated events like roulette spins. The tutorial then adds realism with weighted probabilities: by supplying a `weights` list, outcomes can match a non-uniform distribution. Using roulette as the example, red and black are weighted 18 each while green is weighted 2, reflecting the typical pocket counts; the resulting selections show green appearing far less often.
For rearranging data, `shuffle()` randomizes a list in place—illustrated by treating numbers 1 through 52 as a deck of cards and shuffling them. To draw unique cards, `sample(population, k)` is used instead of `choices()`, because `sample()` guarantees no repeats within the draw. The example draws 5 distinct values from the 52-card deck, mirroring a hand of cards.
Finally, the tutorial connects these tools to a practical workflow: generating fake datasets for CSV-style practice. It builds lists of first names, last names, street names, and fake cities/states, then loops to create many records. Phone numbers and addresses are assembled with f-strings using random integer ranges for numeric parts and `choice()` for street/city/state parts. Emails are generated by combining a random first name with a lowercased last name and appending a fixed bogus domain. In roughly a few dozen lines, the approach produces dozens of plausible names, phone numbers, addresses, and emails—useful for testing code without needing real personal data.
Cornell Notes
The `random` module in Python can generate dummy data for testing and simulations: floats (`random()`, `uniform()`), integers (`randint()`), single and multiple list selections (`choice()`, `choices()`), weighted selections (`choices(..., weights=...)`), in-place shuffling (`shuffle()`), and unique sampling (`sample()`). A crucial limitation is that `random` is not suitable for security or cryptography; Python recommends `secrets` for that purpose. Weighted `choices()` lets outcomes match real-world odds, such as roulette pocket counts. `sample()` is the right tool when repeated picks would be wrong, like drawing unique cards from a deck. These primitives can be combined to generate realistic fake names, phone numbers, addresses, and emails using lists plus f-strings.
Why does `random()` return values in a specific range (and what does “0 inclusive, 1 non-inclusive” mean in practice)?
When should `uniform(a, b)` be used instead of `random()`?
What’s the practical difference between `choice()` and `choices()`?
How do weights change the behavior of `choices()`?
Why use `sample()` instead of `choices()` when drawing a “hand” from a deck?
How can these random tools generate fake CSV-style data?
Review Questions
- If you need a random integer between 1 and 6 inclusive, which method fits best and why?
- In a roulette simulation with weights [18, 18, 2], what approximate selection probability should green have?
- Why would `random.sample()` be preferred over `random.choices()` when selecting 5 cards from a deck?
Key Points
- 1
Use `random` for simulations, games, and dummy data; switch to `secrets` for security-sensitive randomness.
- 2
`random()` returns floats in the half-open interval [0, 1), with 0 inclusive and 1 non-inclusive.
- 3
Use `uniform(a, b)` for floats in a custom range and `randint(a, b)` for inclusive integer ranges.
- 4
Pick one element with `choice()` and multiple (possibly repeating) elements with `choices(..., k=...)`.
- 5
Apply non-uniform odds with `choices(..., weights=...)`, where each weight’s relative size determines probability.
- 6
Shuffle lists in place with `shuffle()` and draw unique items with `sample(population, k)`.
- 7
Combine lists plus f-strings and integer ranges to generate realistic fake records like names, phone numbers, addresses, and emails.