Python Tutorial: Generate Random Numbers and Data Using the random Module

TL;DR

Use `random` for simulations, games, and dummy data; switch to `secrets` for security-sensitive randomness.

Briefing Cornell Notes

Briefing

Python’s built-in `random` module makes it easy to generate realistic-looking dummy data—random numbers, random selections from lists, weighted outcomes, shuffling, and unique samples—without installing any third-party packages. The key takeaway is that `random` is ideal for games, simulations, and test data, but it’s not meant for security or cryptography; for that, Python recommends using the `secrets` module instead.

The tutorial starts with importing `random` from the standard library. It then walks through the most common number-generation methods. `random()` returns a float between 0 and 1, where 0 is inclusive and 1 is non-inclusive—so exact 1 never appears. For bounded floats, `uniform(a, b)` produces a random floating-point value between two endpoints (by changing the arguments, any range can be used). For integers, `randint(a, b)` simulates discrete outcomes such as a six-sided die, including both endpoints; the same method can model a coin toss by treating 0 as heads and 1 as tails.

Selection from collections is handled with `choice()` and `choices()`. `choice(sequence)` picks a single random element from a list. `choices(sequence, k=...)` returns multiple picks and can repeat elements, which is useful for simulating repeated events like roulette spins. The tutorial then adds realism with weighted probabilities: by supplying a `weights` list, outcomes can match a non-uniform distribution. Using roulette as the example, red and black are weighted 18 each while green is weighted 2, reflecting the typical pocket counts; the resulting selections show green appearing far less often.

For rearranging data, `shuffle()` randomizes a list in place—illustrated by treating numbers 1 through 52 as a deck of cards and shuffling them. To draw unique cards, `sample(population, k)` is used instead of `choices()`, because `sample()` guarantees no repeats within the draw. The example draws 5 distinct values from the 52-card deck, mirroring a hand of cards.

Finally, the tutorial connects these tools to a practical workflow: generating fake datasets for CSV-style practice. It builds lists of first names, last names, street names, and fake cities/states, then loops to create many records. Phone numbers and addresses are assembled with f-strings using random integer ranges for numeric parts and `choice()` for street/city/state parts. Emails are generated by combining a random first name with a lowercased last name and appending a fixed bogus domain. In roughly a few dozen lines, the approach produces dozens of plausible names, phone numbers, addresses, and emails—useful for testing code without needing real personal data.

Cornell Notes

The `random` module in Python can generate dummy data for testing and simulations: floats (`random()`, `uniform()`), integers (`randint()`), single and multiple list selections (`choice()`, `choices()`), weighted selections (`choices(..., weights=...)`), in-place shuffling (`shuffle()`), and unique sampling (`sample()`). A crucial limitation is that `random` is not suitable for security or cryptography; Python recommends `secrets` for that purpose. Weighted `choices()` lets outcomes match real-world odds, such as roulette pocket counts. `sample()` is the right tool when repeated picks would be wrong, like drawing unique cards from a deck. These primitives can be combined to generate realistic fake names, phone numbers, addresses, and emails using lists plus f-strings.

Why does `random()` return values in a specific range (and what does “0 inclusive, 1 non-inclusive” mean in practice)?

`random()` produces a float between 0 and 1 where 0 can occur exactly, but 1 cannot. That means you may see 0.0, but you’ll never see 1.0; the largest values approach 1 without reaching it. In the tutorial, this is demonstrated by calling `random.random()` and printing the result multiple times, always getting a value in that half-open interval.

When should `uniform(a, b)` be used instead of `random()`?

Use `uniform(a, b)` when the goal is a float within a specific range. The tutorial notes that `random()` only gives 0–1, and while you could scale it manually, `uniform()` directly handles the range. For example, `random.uniform(1, 10)` yields a random float between 1 and 10 each run; changing the arguments changes the range.

What’s the practical difference between `choice()` and `choices()`?

`choice()` selects exactly one random element from a list. `choices()` selects multiple elements and returns them as a list, controlled by `k`. Because `choices()` samples with replacement, the same element can appear multiple times—useful for repeated events like simulating multiple roulette spins.

How do weights change the behavior of `choices()`?

Weights adjust the probability of each element being selected. The tutorial uses roulette outcomes with `colors = ['red', 'black', 'green']` and sets `weights` to reflect pocket counts: red 18, black 18, green 2. Since weights sum to 38, red and black each have an 18/38 chance, while green has a 2/38 chance. Running the selection repeatedly shows green appearing far less often.

Why use `sample()` instead of `choices()` when drawing a “hand” from a deck?

A hand requires unique cards. `choices()` can repeat elements because it allows replacement. `sample(deck, k)` prevents duplicates by guaranteeing unique picks from the population. The tutorial models a 52-card deck (numbers 1–52) and draws 5 unique values with `random.sample(deck, 5)`.

How can these random tools generate fake CSV-style data?

The tutorial builds lists of first names, last names, street names, and fake cities/states. Inside a loop, it uses `random.choice()` to pick name and location components, and uses f-strings with `randint()` ranges to create numeric parts like phone numbers and zip codes. It then formats full addresses by combining street number, street name, city, state, and zip, and creates emails by combining a random first name with a lowercased last name plus a fixed bogus domain.

Review Questions

If you need a random integer between 1 and 6 inclusive, which method fits best and why?
In a roulette simulation with weights [18, 18, 2], what approximate selection probability should green have?
Why would `random.sample()` be preferred over `random.choices()` when selecting 5 cards from a deck?

Key Points

1
Use `random` for simulations, games, and dummy data; switch to `secrets` for security-sensitive randomness.
2
`random()` returns floats in the half-open interval [0, 1), with 0 inclusive and 1 non-inclusive.
3
Use `uniform(a, b)` for floats in a custom range and `randint(a, b)` for inclusive integer ranges.
4
Pick one element with `choice()` and multiple (possibly repeating) elements with `choices(..., k=...)`.
5
Apply non-uniform odds with `choices(..., weights=...)`, where each weight’s relative size determines probability.
6
Shuffle lists in place with `shuffle()` and draw unique items with `sample(population, k)`.
7
Combine lists plus f-strings and integer ranges to generate realistic fake records like names, phone numbers, addresses, and emails.

Highlights

`random` is fine for games and test data, but Python explicitly warns against using it for cryptography—use `secrets` instead.

Weighted `choices()` can model real odds: roulette-style weights (18/18/2) make green appear far less frequently.

`shuffle()` randomizes a list in place, while `sample()` is the go-to method for unique draws without repeats.

`choices()` can repeat elements, which makes it suitable for repeated events but wrong for “no duplicates” scenarios like drawing a hand.

Topics

Random Module
Random Numbers
Weighted Choices
Shuffling and Sampling
Dummy Data Generation