Python Tutorial: Generators - How to use them and the benefits you receive
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use yield to turn a function into a generator that produces one value at a time on demand.
Briefing
Generators in Python trade “build everything first” for “produce values on demand,” using the yield keyword to stream results one at a time. That shift matters because it avoids storing large collections in memory and can improve performance when working with big datasets—while still fitting naturally into familiar loops.
The tutorial starts with a straightforward function, square_numbers, that takes a list of numbers, builds a result list by appending each square, and returns the completed list. Calling it with [1, 2, 3, 4, 5] produces [1, 4, 9, 16, 25]. Converting this into a generator removes the result list and the return statement; instead, the function yields each computed square using yield. Once yield is in place, printing the variable no longer shows the full list. It shows a generator object, because the generator doesn’t compute or store all squares up front. Values are generated only when requested.
To demonstrate on-demand behavior, the tutorial uses next() to pull values from the generator. The first next() call returns 1, the second returns 4, and subsequent calls continue yielding 9, 16, and 25. A further call triggers a StopIteration exception, signaling that the generator is exhausted. For typical usage, a for loop is preferred: for num in mynums iterates through yielded values and stops cleanly without exposing StopIteration to the user.
Readability is presented as an immediate advantage over the list-building approach. The generator version reads like a direct transformation: for each number in the input, yield the square. The tutorial also connects generators to list comprehensions. A list comprehension can compute squares with x * x for x in [1, 2, 3, 4, 5]. Switching from brackets to parentheses creates a generator expression that behaves similarly—still producing values lazily. If all values are needed at once, converting the generator to a list via list(generator) works, but it removes the memory and performance benefits because it forces materialization of every element.
The performance section makes the trade-off concrete using a benchmark that generates dictionaries representing people with random names and majors. One function returns a list of 1 million records; another returns a generator that yields the same dictionaries. The list-based approach increases memory by nearly 300 MB and takes about 1.2 seconds, because it allocates and stores all million items. The generator approach keeps memory nearly flat because it doesn’t generate the million dictionaries immediately; it pauses at the first yield and only produces items as the loop consumes them. Timing reflects that difference: the generator version is effectively near-instant until iteration proceeds. Finally, the tutorial shows that converting the generator back into a list restores the list-like performance and memory cost.
Overall, generators offer a practical way to process large sequences efficiently: clearer “yield as you go” code, lower memory usage, and faster startup—provided the results aren’t immediately forced into a list.
Cornell Notes
Generators use the yield keyword to produce values one at a time instead of building a complete list up front. In the square_numbers example, switching from appending to a result list to yielding each square turns the function into a lazy generator that only computes when values are requested (via next() or a for loop). A for loop consumes generators cleanly without needing to handle StopIteration manually. Generators are especially beneficial for large workloads because they avoid storing millions of items in memory. Converting a generator to a list (list(generator)) forces materialization and removes the memory/performance advantages.
How does yield change the behavior of square_numbers compared with returning a list?
What does StopIteration mean when using next() on a generator?
Why is a for loop the preferred way to consume generators in most cases?
How do generator expressions relate to list comprehensions?
When do generators lose their performance and memory advantages?
What benchmark result illustrates the memory benefit of generators?
Review Questions
- In the generator version of square_numbers, what code elements are removed or replaced compared with the list-returning version?
- How would you predict the behavior of a generator if you call next() repeatedly until it raises StopIteration?
- What happens to memory usage when you wrap a generator with list(), and why?
Key Points
- 1
Use yield to turn a function into a generator that produces one value at a time on demand.
- 2
Generators don’t compute or store the entire result up front, which changes what you see when you print the generator object.
- 3
next() pulls individual yielded values and eventually raises StopIteration when the generator is exhausted.
- 4
for loops consume generators cleanly and stop automatically without exposing StopIteration.
- 5
Generator expressions (parentheses) mirror list comprehensions (brackets) but remain lazy.
- 6
Converting a generator to a list forces full materialization, eliminating the memory and performance benefits.
- 7
For large datasets, generators can dramatically reduce memory usage and improve startup time by deferring work until iteration.