Python Multiprocessing Tutorial: Run Code in Parallel Using the Multiprocessing Module
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Multiprocessing speeds up independent workloads by running them in parallel across separate processes, often reducing wall-clock time toward the slowest task duration rather than the sum of durations.
Briefing
Multiprocessing speeds up Python workloads by running multiple tasks at the same time across separate processes—often cutting wall-clock time dramatically when work can be parallelized. The tutorial starts with a simple “sleep” function to show the difference between synchronous execution (tasks run back-to-back) and parallel execution (tasks run concurrently on different processes), then scales up to a practical image-processing pipeline where parallelism meaningfully reduces runtime.
In the baseline example, a function prints messages, sleeps for one second, and prints again. Running it once takes about one second; running it twice takes about two seconds—clear evidence of synchronous behavior. That sets up the core motivation: if independent tasks can run in parallel, multiprocessing can reduce total runtime from roughly the sum of task durations toward the duration of the slowest tasks.
The tutorial then contrasts threading and multiprocessing through the lens of CPU-bound vs. I/O-bound work. CPU-bound tasks (heavy number-crunching) benefit from multiprocessing because separate processes can use multiple CPU cores simultaneously. I/O-bound tasks (waiting on file or network operations) may also benefit, though the best choice depends on the workload and hardware. The key takeaway is that multiprocessing spreads work across multiple processors, while threading may not deliver the same speed-up for CPU-heavy tasks.
Next comes the mechanics. The “older way” manually creates multiprocessing.Process objects, starts them with .start(), and waits for completion with .join(). Without join, the main script continues immediately and may report completion before child processes finish—an important pitfall. When join is used correctly, the script waits until all processes complete, producing an accurate runtime measurement.
To run the same function many times, the tutorial shows how to create processes in a loop, store them in a list, and join them afterward so all tasks start before waiting. It also demonstrates passing arguments to worker functions: multiprocessing requires arguments to be pickle-serializable, so values must be convertible for inter-process communication.
For a cleaner approach, the tutorial switches to concurrent.futures.ProcessPoolExecutor. Using executor.submit schedules work and returns Future objects; calling future.result() blocks until completion. A context manager ensures processes are cleaned up and joined automatically. For bulk parallelism, executor.map applies a function across an iterable and returns results in input order, while concurrent.futures.as_completed yields futures as they finish—useful when task durations vary.
The real-world example processes 15 high-resolution images using Pillow: opening each image, applying a Gaussian blur, resizing via thumbnail, and saving to a processed folder. The synchronous version takes about 22 seconds. Rewriting the loop into a single-image worker function and using ProcessPoolExecutor.map to process filenames in parallel reduces runtime to about 7 seconds, with multiple Python processes visible in the system’s activity monitor.
Finally, the tutorial tests the same workload with ThreadPoolExecutor and finds it can be faster here (about 7.2 seconds), reinforcing the rule of thumb: benchmark both threading and multiprocessing for your specific mix of CPU work and disk I/O. The practical conclusion is straightforward—use multiprocessing when parallelism and CPU usage justify it, but validate with measurements because I/O-heavy pipelines can behave differently than expected.
Cornell Notes
Multiprocessing can cut Python runtime by running independent tasks in parallel across multiple processes, which can use multiple CPU cores. The tutorial first demonstrates synchronous execution with a sleep-based function, then shows how .start() and .join() affect timing when using multiprocessing.Process. It then moves to concurrent.futures.ProcessPoolExecutor, where submit() returns Future objects and map() or as_completed() provide structured ways to run many tasks and collect results. A Pillow-based image-processing example shows a real speed-up (about 22 seconds down to about 7 seconds) when processing images in parallel, but a thread-based run can be similarly fast, so benchmarking matters.
Why does multiprocessing reduce runtime compared with running tasks sequentially?
What goes wrong if a script starts multiprocessing.Process workers but doesn’t call join()?
How do you pass arguments to a multiprocessing worker function?
When should you use ProcessPoolExecutor with submit(), map(), or as_completed()?
How does the image-processing example demonstrate multiprocessing’s value?
Why might threading be competitive even when multiprocessing is expected to win?
Review Questions
- In the manual multiprocessing approach, what is the specific role of join() in producing correct timing results?
- How do pickle-serializable arguments constrain what can be passed to multiprocessing workers?
- What behavioral difference between map() and as_completed() affects the order in which results are processed?
Key Points
- 1
Multiprocessing speeds up independent workloads by running them in parallel across separate processes, often reducing wall-clock time toward the slowest task duration rather than the sum of durations.
- 2
Synchronous execution runs tasks back-to-back; a sleep-based benchmark makes the difference obvious before moving to real workloads.
- 3
Manual multiprocessing requires .start() to launch workers and .join() to wait; skipping join can produce misleading “finished” timing.
- 4
Passing arguments to multiprocessing workers requires pickle-serializable values, since data must be transferred between processes.
- 5
concurrent.futures.ProcessPoolExecutor simplifies parallelism: submit() returns Futures, map() returns results in input order, and as_completed() yields results as tasks finish.
- 6
A real Pillow image-processing pipeline can drop from ~22 seconds to ~7 seconds when images are processed in parallel, but threading can still be competitive when disk I/O dominates.
- 7
The best choice between threads and processes depends on the workload mix and hardware, so testing on a subset of real inputs is the safest path.