Python Tutorial: Generators - How to use them and the benefits you receive

TL;DR

Use yield to turn a function into a generator that produces one value at a time on demand.

Briefing Cornell Notes

Briefing

Generators in Python trade “build everything first” for “produce values on demand,” using the yield keyword to stream results one at a time. That shift matters because it avoids storing large collections in memory and can improve performance when working with big datasets—while still fitting naturally into familiar loops.

The tutorial starts with a straightforward function, square_numbers, that takes a list of numbers, builds a result list by appending each square, and returns the completed list. Calling it with [1, 2, 3, 4, 5] produces [1, 4, 9, 16, 25]. Converting this into a generator removes the result list and the return statement; instead, the function yields each computed square using yield. Once yield is in place, printing the variable no longer shows the full list. It shows a generator object, because the generator doesn’t compute or store all squares up front. Values are generated only when requested.

To demonstrate on-demand behavior, the tutorial uses next() to pull values from the generator. The first next() call returns 1, the second returns 4, and subsequent calls continue yielding 9, 16, and 25. A further call triggers a StopIteration exception, signaling that the generator is exhausted. For typical usage, a for loop is preferred: for num in mynums iterates through yielded values and stops cleanly without exposing StopIteration to the user.

Readability is presented as an immediate advantage over the list-building approach. The generator version reads like a direct transformation: for each number in the input, yield the square. The tutorial also connects generators to list comprehensions. A list comprehension can compute squares with x * x for x in [1, 2, 3, 4, 5]. Switching from brackets to parentheses creates a generator expression that behaves similarly—still producing values lazily. If all values are needed at once, converting the generator to a list via list(generator) works, but it removes the memory and performance benefits because it forces materialization of every element.

The performance section makes the trade-off concrete using a benchmark that generates dictionaries representing people with random names and majors. One function returns a list of 1 million records; another returns a generator that yields the same dictionaries. The list-based approach increases memory by nearly 300 MB and takes about 1.2 seconds, because it allocates and stores all million items. The generator approach keeps memory nearly flat because it doesn’t generate the million dictionaries immediately; it pauses at the first yield and only produces items as the loop consumes them. Timing reflects that difference: the generator version is effectively near-instant until iteration proceeds. Finally, the tutorial shows that converting the generator back into a list restores the list-like performance and memory cost.

Overall, generators offer a practical way to process large sequences efficiently: clearer “yield as you go” code, lower memory usage, and faster startup—provided the results aren’t immediately forced into a list.

Cornell Notes

Generators use the yield keyword to produce values one at a time instead of building a complete list up front. In the square_numbers example, switching from appending to a result list to yielding each square turns the function into a lazy generator that only computes when values are requested (via next() or a for loop). A for loop consumes generators cleanly without needing to handle StopIteration manually. Generators are especially beneficial for large workloads because they avoid storing millions of items in memory. Converting a generator to a list (list(generator)) forces materialization and removes the memory/performance advantages.

How does yield change the behavior of square_numbers compared with returning a list?

The list-based version computes every square, appends each to a result list, and returns the full list at the end. The generator version removes the result list and return statement and instead uses yield to emit one square per iteration. As a result, calling the generator produces a generator object immediately, and the squares are computed only when next() is called or when a for loop requests the next value.

What does StopIteration mean when using next() on a generator?

StopIteration indicates the generator has no more values to yield. In the tutorial’s square generator, calling next() five times returns 1, 4, 9, 16, and 25. A sixth call raises StopIteration because the input list has been fully consumed and the generator is exhausted.

Why is a for loop the preferred way to consume generators in most cases?

A for loop automatically iterates over yielded values and stops when the generator is exhausted, without requiring explicit StopIteration handling. In contrast, using next() directly requires dealing with the exception once the generator runs out of items.

How do generator expressions relate to list comprehensions?

List comprehensions use brackets to build a full list immediately. Generator expressions use parentheses instead, creating a generator that yields values lazily. The tutorial shows squares computed with a list comprehension and then the same transformation using parentheses so the result is a generator object that only produces values during iteration.

When do generators lose their performance and memory advantages?

When the generator is converted into a list. The tutorial demonstrates that list(generator) forces all values to be generated and stored in memory, effectively restoring the list-like memory growth and runtime cost—especially noticeable with 1 million generated records.

What benchmark result illustrates the memory benefit of generators?

With 1 million people dictionaries, the list-returning function increases memory by nearly 300 MB and takes about 1.2 seconds. The generator-returning function keeps memory almost unchanged because it doesn’t generate all million records immediately; it pauses at the first yield and produces items only as the loop consumes them.

Review Questions

In the generator version of square_numbers, what code elements are removed or replaced compared with the list-returning version?
How would you predict the behavior of a generator if you call next() repeatedly until it raises StopIteration?
What happens to memory usage when you wrap a generator with list(), and why?

Key Points

1
Use yield to turn a function into a generator that produces one value at a time on demand.
2
Generators don’t compute or store the entire result up front, which changes what you see when you print the generator object.
3
next() pulls individual yielded values and eventually raises StopIteration when the generator is exhausted.
4
for loops consume generators cleanly and stop automatically without exposing StopIteration.
5
Generator expressions (parentheses) mirror list comprehensions (brackets) but remain lazy.
6
Converting a generator to a list forces full materialization, eliminating the memory and performance benefits.
7
For large datasets, generators can dramatically reduce memory usage and improve startup time by deferring work until iteration.

Highlights

Switching from building a result list to using yield turns eager computation into lazy, on-demand generation.

next() reveals the generator’s step-by-step production, while for loops handle exhaustion automatically.

With 1 million records, the list approach spikes memory by nearly 300 MB and takes about 1.2 seconds; the generator keeps memory nearly flat and starts almost immediately.

Converting a generator to a list (list(generator)) restores the list’s memory and runtime costs because it forces generation of every element.

Topics

Generators
Yield Keyword
Lazy Evaluation
Generator Expressions
Memory Performance