Get AI summaries of any video or article — Sign up free
Python Threading Tutorial: Run Code Concurrently Using the Threading Module thumbnail

Python Threading Tutorial: Run Code Concurrently Using the Threading Module

Corey Schafer·
5 min read

Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Threading provides speedups primarily for I/O-bound workloads where tasks spend time waiting on external operations like network responses or disk access.

Briefing

Threading in Python delivers real speedups when tasks spend most of their time waiting on input/output—like network requests—because threads overlap that waiting. In contrast, CPU-heavy work often gains little from threading and can even slow down due to the overhead of managing threads. The practical takeaway is to treat threading as an I/O-bound performance tool, not a general-purpose “run everything faster” switch.

The tutorial starts with a baseline: a simple function that sleeps for one second. Run once, it takes about one second; run twice synchronously, it takes about two seconds. That behavior matters because the “work” here isn’t CPU computation—it’s idle waiting. The next step introduces concurrency using the standard library’s threading module. Two Thread objects are created with a target function, started with start(), and synchronized with join() so timing is measured correctly. Without join(), the main script prints its “finished” message immediately after launching threads, even though the threads are still sleeping. With join(), the script waits until both threads complete, producing the expected ~1-second total runtime.

To scale up, the tutorial shows how to start many threads (ten) without blocking on each one. Threads are appended to a list, then joined in a second loop. This turns a task that would normally take ~10 seconds sequentially (ten sleeps) into something that completes in about one second, demonstrating how overlapping I/O waits can compress total wall-clock time.

Next comes argument passing: the sleep duration becomes a parameter, and threads are created with args so each thread can run the same function with different inputs (e.g., 1.5 seconds). The tutorial then shifts from manual thread management to a cleaner pattern: concurrent.futures.ThreadPoolExecutor. Using submit() returns Future objects, and future.result() blocks until completion. For batch work, it uses list comprehensions to submit multiple jobs and concurrent.futures.as_completed() to process results in the order tasks finish. It also demonstrates executor.map(), which runs tasks concurrently but yields results in the original input order, and notes that exceptions surface when results are retrieved from the iterator.

The real-world payoff comes with downloading 15 high-resolution images from Unsplash. A synchronous approach using requests downloads one image at a time and takes about 23 seconds. Refactoring the download loop into a single-image function and mapping it across URLs with ThreadPoolExecutor cuts the runtime to roughly five seconds—an outcome consistent with I/O-bound behavior, where threads keep the program busy while waiting on network responses.

Finally, the tutorial draws the boundary for when threading is the wrong tool: CPU-bound tasks like resizing and processing images rely on computation, where threads don’t bypass the limitations of the CPU and can add overhead. For that scenario, it points to multiprocessing as the next step, promising a follow-up focused on parallel processing for compute-heavy workloads.

Cornell Notes

Threading speeds up Python programs mainly when tasks are I/O bound—spending most time waiting on network, disk, or other external operations. A synchronous sleep example grows linearly in runtime, while threading overlaps the waiting so multiple sleeps complete in roughly the time of the longest sleep. Manual threading uses Thread(...), start(), and join() to ensure the main program waits for worker threads. For cleaner code and easier scaling, concurrent.futures.ThreadPoolExecutor provides submit(), as_completed(), and map() to run many tasks concurrently while managing synchronization automatically. The tutorial’s Unsplash image download case shows a drop from about 23 seconds to about five seconds when requests are parallelized with a thread pool.

Why does threading help with some tasks but not others?

Threading helps when the program is I/O bound: it repeatedly waits for input/output (e.g., network responses, file reads/writes). While one thread waits, another can run, so wall-clock time shrinks. For CPU-bound tasks—heavy number crunching—threads don’t eliminate the need for computation and can add overhead from creating and coordinating threads, sometimes making performance worse.

What’s the difference between starting threads and waiting for them to finish?

Calling start() launches the thread, but the main script continues immediately. That’s why timing can look wrong (e.g., printing “script finished” while threads are still sleeping). join() is the synchronization step: it blocks until each thread completes, so the program measures and reports completion only after worker threads finish.

How do you run the same function concurrently many times without blocking?

Create multiple Thread objects in a loop, start each one, store them in a list, then join them in a second loop. This avoids joining inside the creation loop (which would serialize execution). The tutorial demonstrates this pattern with ten sleep tasks, reducing expected runtime from ~10 seconds to ~1 second.

How do ThreadPoolExecutor tools change the workflow compared with manual threads?

ThreadPoolExecutor uses submit() to schedule work and returns Future objects; future.result() waits for completion and retrieves return values. For handling many tasks, as_completed() yields futures as they finish (useful when completion order matters). executor.map() runs tasks concurrently but yields results in the same order as the input iterable, and exceptions surface when iterating over results.

How does the Unsplash download example prove the I/O-bound claim?

A synchronous downloader fetches 15 images one at a time and takes about 23 seconds. Refactoring into a single-image download function and using ThreadPoolExecutor.map() to run downloads concurrently reduces runtime to about five seconds. The speedup aligns with network waiting: threads overlap the time spent waiting for responses.

Review Questions

  1. When would join() be necessary to get correct timing or correct program behavior in a threaded program?
  2. What practical difference does as_completed() make compared with executor.map() when collecting results?
  3. Why might threading slow down a program that resizes and processes images instead of downloading them?

Key Points

  1. 1

    Threading provides speedups primarily for I/O-bound workloads where tasks spend time waiting on external operations like network responses or disk access.

  2. 2

    Manual threading requires start() to launch work and join() to wait for completion; measuring runtime without join() can produce misleading results.

  3. 3

    To run many tasks concurrently, start all threads first (or submit all jobs), then join or collect results afterward rather than blocking inside the creation loop.

  4. 4

    ThreadPoolExecutor simplifies concurrency with submit() (Future objects), as_completed() (completion-order handling), and map() (input-order results).

  5. 5

    ThreadPoolExecutor.map() returns results in the original input order even though tasks run concurrently, while as_completed() yields results as soon as each task finishes.

  6. 6

    Exceptions raised inside threaded tasks typically surface when retrieving results (e.g., when calling future.result() or iterating over map() results).

  7. 7

    For CPU-bound computation-heavy work, threading often offers little benefit and can add overhead; multiprocessing is the better next step.

Highlights

Two threads can cut total runtime for sleep-based tasks from ~2 seconds to ~1 second by overlapping waiting time.
Without join(), a program can report completion immediately after starting threads, even while worker threads are still running.
ThreadPoolExecutor.map() can download 15 Unsplash images in about five seconds versus roughly 23 seconds synchronously.
as_completed() lets results be processed in the order tasks finish, which differs from map()’s input-order behavior.
Threading is a strong fit for network and file I/O, but a poor fit for CPU-heavy image processing.

Topics

  • Threading vs Multiprocessing
  • I/O Bound Concurrency
  • ThreadPoolExecutor
  • Future Objects
  • Unsplash Image Downloads

Mentioned