Python Tutorial: Zip Files - Creating and Extracting Zip Archives

TL;DR

Use `zipfile.ZipFile(path, 'w')` with `write()` to create ZIP archives from specific files.

Briefing Cornell Notes

Briefing

Python can create, compress, and extract ZIP archives using the standard library’s `zipfile` module, and it can also handle archive workflows end-to-end inside automation scripts—downloading archives over the internet, saving them locally, and unpacking their contents. The core takeaway is that ZIP handling in Python is straightforward: open a ZIP for writing, add files with `write()`, then close it (preferably via a context manager). For extraction, open the archive in read mode, inspect contents with `namelist()`, and extract everything with `extractall()` or a single file with `extract()`.

The tutorial starts with local files to build intuition. A text file (`test.txt`) and an image (`thumbnail.png`) are added to a new archive named `files.zip` by opening `zipfile.ZipFile(..., 'w')` and calling `my_zip.write()` for each file. After running the script, the archive appears in the working directory and can be extracted by the operating system to confirm the contents. The workflow is then improved by switching from manual `close()` calls to a context manager (`with zipfile.ZipFile(...) as my_zip:`), which automatically handles closing even if errors occur.

Compression is addressed next. By default, the ZIP created by `zipfile` isn’t necessarily compressed in a way that’s noticeable for small archives. To control compression, the script passes `compression=zipfile.ZIP_DEFLATED` when opening the archive for writing. The same archive can then be read back: change the mode from `'w'` to `'r'`, list members with `my_zip.namelist()`, and extract with `my_zip.extractall('files')`. The tutorial also demonstrates selective extraction—extracting only `thumbnail.png` from a ZIP that contains multiple files.

A second approach uses `shutil`, which simplifies archiving for whole directories. With `shutil.make_archive(base_name, format, root_dir=...)`, the script creates a ZIP from an entire folder in a single call (e.g., producing `another.zip` from the `files` directory). Extraction is equally simple using `shutil.unpack_archive(archive_path, extract_dir)`. `shutil` also supports other archive formats by changing the `format` argument, including `tar`, `gztar` (producing `.tar.gz`), `bztar`, and `xztar`.

Finally, the tutorial moves from local archives to downloading ZIP files online. It uses `requests` to fetch a ZIP file from a URL, writes the response bytes to disk in binary mode, then opens the saved file with `zipfile.ZipFile` to print `namelist()` and (optionally) extract. A hiccup occurs with the Stack Overflow 2019 download link, producing a “bad zip file / file is not a zip file” error, so the tutorial switches to a GitHub repository ZIP download as a working fallback to demonstrate the same pipeline successfully.

Overall, the practical value is automation: scripts can fetch archives, verify their contents, and extract data without manual clicking—ideal for repeatable data ingestion tasks like downloading datasets and preparing them for analysis.

Cornell Notes

ZIP archives are easy to manage in Python with `zipfile` and `shutil`. Use `zipfile.ZipFile('files.zip', 'w')` plus `write()` to add files, and prefer a context manager so the archive closes safely. For extraction, open with mode `'r'`, inspect contents via `namelist()`, then use `extractall()` or `extract()` for a single member. Compression can be enabled by passing `compression=zipfile.ZIP_DEFLATED`. For whole-directory archiving, `shutil.make_archive()` and `shutil.unpack_archive()` provide one-line creation and extraction, and they also support formats like `gztar` (`.tar.gz`). This matters because it enables fully automated download-and-unpack pipelines for datasets.

How does Python create a ZIP archive from individual files using `zipfile`?

Create a `ZipFile` object in write mode: `with zipfile.ZipFile('files.zip', 'w') as my_zip:`. Then add each file with `my_zip.write('test.txt')` and `my_zip.write('thumbnail.png')`. When the `with` block ends, the archive is closed automatically, and the resulting `files.zip` contains those files.

What changes when extracting a ZIP archive with `zipfile`?

Open the existing archive in read mode: `with zipfile.ZipFile('files.zip', 'r') as my_zip:`. To see what’s inside, call `my_zip.namelist()` (note the parentheses). To extract everything, use `my_zip.extractall('files')`. To extract only one file, use `my_zip.extract('thumbnail.png')`.

How is compression enabled when writing a ZIP file?

When opening the archive for writing, pass a compression parameter: `compression=zipfile.ZIP_DEFLATED`. The script demonstrates that without this parameter, the ZIP may not be meaningfully compressed, while adding `ZIP_DEFLATED` produces a compressed archive.

Why use a context manager with `zipfile` instead of manually calling `close()`?

The tutorial shows that manual `my_zip.close()` works, but it’s easy to forget or mishandle in larger scripts. A context manager (`with zipfile.ZipFile(...) as my_zip:`) guarantees the archive is properly closed when the block exits, even if errors occur, making automation more reliable.

How does `shutil` simplify archiving compared with `zipfile`?

`shutil` focuses on whole directories. One call creates an archive: `shutil.make_archive('another', 'zip', root_dir='files')`, producing `another.zip`. Extraction is similarly direct: `shutil.unpack_archive('another.zip', 'another')`. This avoids manually iterating over files when the goal is directory-level archiving.

What’s the workflow for downloading a ZIP over the internet and extracting it?

Use `requests` to download bytes from a URL, then write them to disk in binary mode: `with open('data.zip', 'wb') as f: f.write(r.content)`. After saving, open the file with `zipfile.ZipFile('data.zip', 'r')` and inspect contents with `namelist()`. Extraction uses the same `extractall()` / `extract()` methods as local ZIPs. The tutorial notes a “bad zip file / not a zip file” error for the Stack Overflow link, then demonstrates the same approach with a GitHub ZIP download that works.

Review Questions

When writing a ZIP with `zipfile`, what mode should be used, and how do you add files to the archive?
How do `extractall()` and `extract()` differ in `zipfile` usage?
What `shutil` function pair creates and unpacks archives, and how do you specify the archive format (e.g., `gztar`)?

Key Points

1
Use `zipfile.ZipFile(path, 'w')` with `write()` to create ZIP archives from specific files.
2
Prefer `with zipfile.ZipFile(...) as my_zip:` so the archive closes automatically and safely.
3
Enable compression explicitly with `compression=zipfile.ZIP_DEFLATED` when creating ZIPs.
4
Extract ZIP contents by opening in read mode (`'r'`), listing members with `namelist()`, and using `extractall()` or `extract()` for targeted extraction.
5
Use `shutil.make_archive()` and `shutil.unpack_archive()` for one-line creation and extraction of whole directories.
6
For online ZIPs, download with `requests`, save bytes in binary mode (`'wb'`), then open the saved file with `zipfile.ZipFile` to verify and extract contents.

Highlights

A context manager around `zipfile.ZipFile` removes the need for manual `close()` calls and makes ZIP handling safer in scripts.

Selective extraction is as simple as `my_zip.extract('thumbnail.png')`, even when the archive contains many files.

`shutil.make_archive()` can zip an entire directory in one line, while `shutil.unpack_archive()` extracts it just as directly.

Compression in `zipfile` requires an explicit `compression=zipfile.ZIP_DEFLATED` parameter.

When downloading ZIPs, a “not a zip file” error usually means the saved bytes aren’t actually a valid ZIP—so verifying the download URL/response matters.

Topics

Zipfile Module
Archive Compression
Context Managers
Shutil Archives
Downloading ZIPs

Mentioned

Corey Schafer
ZIP_DEFLATED