Python Multiprocessing Tutorial: Run Code in Parallel Using the Multiprocessing Module

TL;DR

Multiprocessing speeds up independent workloads by running them in parallel across separate processes, often reducing wall-clock time toward the slowest task duration rather than the sum of durations.

Briefing Cornell Notes

Briefing

Multiprocessing speeds up Python workloads by running multiple tasks at the same time across separate processes—often cutting wall-clock time dramatically when work can be parallelized. The tutorial starts with a simple “sleep” function to show the difference between synchronous execution (tasks run back-to-back) and parallel execution (tasks run concurrently on different processes), then scales up to a practical image-processing pipeline where parallelism meaningfully reduces runtime.

In the baseline example, a function prints messages, sleeps for one second, and prints again. Running it once takes about one second; running it twice takes about two seconds—clear evidence of synchronous behavior. That sets up the core motivation: if independent tasks can run in parallel, multiprocessing can reduce total runtime from roughly the sum of task durations toward the duration of the slowest tasks.

The tutorial then contrasts threading and multiprocessing through the lens of CPU-bound vs. I/O-bound work. CPU-bound tasks (heavy number-crunching) benefit from multiprocessing because separate processes can use multiple CPU cores simultaneously. I/O-bound tasks (waiting on file or network operations) may also benefit, though the best choice depends on the workload and hardware. The key takeaway is that multiprocessing spreads work across multiple processors, while threading may not deliver the same speed-up for CPU-heavy tasks.

Next comes the mechanics. The “older way” manually creates multiprocessing.Process objects, starts them with .start(), and waits for completion with .join(). Without join, the main script continues immediately and may report completion before child processes finish—an important pitfall. When join is used correctly, the script waits until all processes complete, producing an accurate runtime measurement.

To run the same function many times, the tutorial shows how to create processes in a loop, store them in a list, and join them afterward so all tasks start before waiting. It also demonstrates passing arguments to worker functions: multiprocessing requires arguments to be pickle-serializable, so values must be convertible for inter-process communication.

For a cleaner approach, the tutorial switches to concurrent.futures.ProcessPoolExecutor. Using executor.submit schedules work and returns Future objects; calling future.result() blocks until completion. A context manager ensures processes are cleaned up and joined automatically. For bulk parallelism, executor.map applies a function across an iterable and returns results in input order, while concurrent.futures.as_completed yields futures as they finish—useful when task durations vary.

The real-world example processes 15 high-resolution images using Pillow: opening each image, applying a Gaussian blur, resizing via thumbnail, and saving to a processed folder. The synchronous version takes about 22 seconds. Rewriting the loop into a single-image worker function and using ProcessPoolExecutor.map to process filenames in parallel reduces runtime to about 7 seconds, with multiple Python processes visible in the system’s activity monitor.

Finally, the tutorial tests the same workload with ThreadPoolExecutor and finds it can be faster here (about 7.2 seconds), reinforcing the rule of thumb: benchmark both threading and multiprocessing for your specific mix of CPU work and disk I/O. The practical conclusion is straightforward—use multiprocessing when parallelism and CPU usage justify it, but validate with measurements because I/O-heavy pipelines can behave differently than expected.

Cornell Notes

Multiprocessing can cut Python runtime by running independent tasks in parallel across multiple processes, which can use multiple CPU cores. The tutorial first demonstrates synchronous execution with a sleep-based function, then shows how .start() and .join() affect timing when using multiprocessing.Process. It then moves to concurrent.futures.ProcessPoolExecutor, where submit() returns Future objects and map() or as_completed() provide structured ways to run many tasks and collect results. A Pillow-based image-processing example shows a real speed-up (about 22 seconds down to about 7 seconds) when processing images in parallel, but a thread-based run can be similarly fast, so benchmarking matters.

Why does multiprocessing reduce runtime compared with running tasks sequentially?

Sequential code runs one task after another, so total time grows roughly with the sum of task durations. With multiprocessing, independent tasks are split across separate processes and can execute simultaneously on different CPU cores. In the tutorial’s sleep example, running the one-second task twice takes ~2 seconds synchronously, while running two instances in parallel completes in about ~1 second once processes are started and joined correctly.

What goes wrong if a script starts multiprocessing.Process workers but doesn’t call join()?

Without join(), the main script continues immediately and may compute “finish time” and print completion messages before child processes finish sleeping and printing. The tutorial demonstrates this by showing near-zero reported runtime even though the child processes still run afterward. Adding p1.join() and p2.join() forces the main flow to wait until both processes complete.

How do you pass arguments to a multiprocessing worker function?

Worker functions can accept parameters like seconds, but the arguments must be pickle-serializable so they can be reconstructed in another process. The tutorial modifies do_something(seconds) and passes args via the process creation step using args=[1.5]. The measured runtime changes accordingly (about 1.5 seconds for repeated tasks).

When should you use ProcessPoolExecutor with submit(), map(), or as_completed()?

submit() schedules one task and returns a Future; calling future.result() waits for completion and retrieves the return value. map() applies a function across an iterable and returns results in input order, which can differ from completion order. as_completed() yields futures as they finish, letting code react to faster tasks first—useful when task durations vary, such as sleeping for different numbers of seconds.

How does the image-processing example demonstrate multiprocessing’s value?

The synchronous Pillow pipeline processes 15 images one at a time (open → Gaussian blur → thumbnail resize → save), taking about 22 seconds. Converting the loop into a single-image worker and using ProcessPoolExecutor.map to process filenames in parallel reduces runtime to about 7 seconds, and multiple Python processes appear in the activity monitor during execution.

Why might threading be competitive even when multiprocessing is expected to win?

The tutorial’s image workload includes substantial disk I/O (opening and saving files). Even with some image filtering, the overall bottleneck may be I/O rather than pure CPU computation. In that case, ThreadPoolExecutor can perform similarly or even faster (about 7.2 seconds), so the “threads for I/O, processes for CPU” rule still benefits from benchmarking on real data.

Review Questions

In the manual multiprocessing approach, what is the specific role of join() in producing correct timing results?
How do pickle-serializable arguments constrain what can be passed to multiprocessing workers?
What behavioral difference between map() and as_completed() affects the order in which results are processed?

Key Points

1
Multiprocessing speeds up independent workloads by running them in parallel across separate processes, often reducing wall-clock time toward the slowest task duration rather than the sum of durations.
2
Synchronous execution runs tasks back-to-back; a sleep-based benchmark makes the difference obvious before moving to real workloads.
3
Manual multiprocessing requires .start() to launch workers and .join() to wait; skipping join can produce misleading “finished” timing.
4
Passing arguments to multiprocessing workers requires pickle-serializable values, since data must be transferred between processes.
5
concurrent.futures.ProcessPoolExecutor simplifies parallelism: submit() returns Futures, map() returns results in input order, and as_completed() yields results as tasks finish.
6
A real Pillow image-processing pipeline can drop from ~22 seconds to ~7 seconds when images are processed in parallel, but threading can still be competitive when disk I/O dominates.
7
The best choice between threads and processes depends on the workload mix and hardware, so testing on a subset of real inputs is the safest path.

Highlights

Skipping join() after starting multiprocessing workers can make the main script report completion before child processes finish, producing incorrect runtime measurements.

ProcessPoolExecutor.map returns results in the order tasks were submitted, while as_completed returns them in the order tasks finish—critical when durations vary.

Parallelizing the Pillow workflow across 15 images reduced runtime from about 22 seconds to about 7 seconds by processing images concurrently.

ThreadPoolExecutor performed similarly (about 7.2 seconds), underscoring that I/O-heavy pipelines may not benefit from multiprocessing as much as CPU-only workloads.

Topics

Multiprocessing
ProcessPoolExecutor
Threading vs Multiprocessing
Pillow Image Processing
concurrent.futures