Python Tutorial: Zip Files - Creating and Extracting Zip Archives
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use `zipfile.ZipFile(path, 'w')` with `write()` to create ZIP archives from specific files.
Briefing
Python can create, compress, and extract ZIP archives using the standard library’s `zipfile` module, and it can also handle archive workflows end-to-end inside automation scripts—downloading archives over the internet, saving them locally, and unpacking their contents. The core takeaway is that ZIP handling in Python is straightforward: open a ZIP for writing, add files with `write()`, then close it (preferably via a context manager). For extraction, open the archive in read mode, inspect contents with `namelist()`, and extract everything with `extractall()` or a single file with `extract()`.
The tutorial starts with local files to build intuition. A text file (`test.txt`) and an image (`thumbnail.png`) are added to a new archive named `files.zip` by opening `zipfile.ZipFile(..., 'w')` and calling `my_zip.write()` for each file. After running the script, the archive appears in the working directory and can be extracted by the operating system to confirm the contents. The workflow is then improved by switching from manual `close()` calls to a context manager (`with zipfile.ZipFile(...) as my_zip:`), which automatically handles closing even if errors occur.
Compression is addressed next. By default, the ZIP created by `zipfile` isn’t necessarily compressed in a way that’s noticeable for small archives. To control compression, the script passes `compression=zipfile.ZIP_DEFLATED` when opening the archive for writing. The same archive can then be read back: change the mode from `'w'` to `'r'`, list members with `my_zip.namelist()`, and extract with `my_zip.extractall('files')`. The tutorial also demonstrates selective extraction—extracting only `thumbnail.png` from a ZIP that contains multiple files.
A second approach uses `shutil`, which simplifies archiving for whole directories. With `shutil.make_archive(base_name, format, root_dir=...)`, the script creates a ZIP from an entire folder in a single call (e.g., producing `another.zip` from the `files` directory). Extraction is equally simple using `shutil.unpack_archive(archive_path, extract_dir)`. `shutil` also supports other archive formats by changing the `format` argument, including `tar`, `gztar` (producing `.tar.gz`), `bztar`, and `xztar`.
Finally, the tutorial moves from local archives to downloading ZIP files online. It uses `requests` to fetch a ZIP file from a URL, writes the response bytes to disk in binary mode, then opens the saved file with `zipfile.ZipFile` to print `namelist()` and (optionally) extract. A hiccup occurs with the Stack Overflow 2019 download link, producing a “bad zip file / file is not a zip file” error, so the tutorial switches to a GitHub repository ZIP download as a working fallback to demonstrate the same pipeline successfully.
Overall, the practical value is automation: scripts can fetch archives, verify their contents, and extract data without manual clicking—ideal for repeatable data ingestion tasks like downloading datasets and preparing them for analysis.
Cornell Notes
ZIP archives are easy to manage in Python with `zipfile` and `shutil`. Use `zipfile.ZipFile('files.zip', 'w')` plus `write()` to add files, and prefer a context manager so the archive closes safely. For extraction, open with mode `'r'`, inspect contents via `namelist()`, then use `extractall()` or `extract()` for a single member. Compression can be enabled by passing `compression=zipfile.ZIP_DEFLATED`. For whole-directory archiving, `shutil.make_archive()` and `shutil.unpack_archive()` provide one-line creation and extraction, and they also support formats like `gztar` (`.tar.gz`). This matters because it enables fully automated download-and-unpack pipelines for datasets.
How does Python create a ZIP archive from individual files using `zipfile`?
What changes when extracting a ZIP archive with `zipfile`?
How is compression enabled when writing a ZIP file?
Why use a context manager with `zipfile` instead of manually calling `close()`?
How does `shutil` simplify archiving compared with `zipfile`?
What’s the workflow for downloading a ZIP over the internet and extracting it?
Review Questions
- When writing a ZIP with `zipfile`, what mode should be used, and how do you add files to the archive?
- How do `extractall()` and `extract()` differ in `zipfile` usage?
- What `shutil` function pair creates and unpacks archives, and how do you specify the archive format (e.g., `gztar`)?
Key Points
- 1
Use `zipfile.ZipFile(path, 'w')` with `write()` to create ZIP archives from specific files.
- 2
Prefer `with zipfile.ZipFile(...) as my_zip:` so the archive closes automatically and safely.
- 3
Enable compression explicitly with `compression=zipfile.ZIP_DEFLATED` when creating ZIPs.
- 4
Extract ZIP contents by opening in read mode (`'r'`), listing members with `namelist()`, and using `extractall()` or `extract()` for targeted extraction.
- 5
Use `shutil.make_archive()` and `shutil.unpack_archive()` for one-line creation and extraction of whole directories.
- 6
For online ZIPs, download with `requests`, save bytes in binary mode (`'wb'`), then open the saved file with `zipfile.ZipFile` to verify and extract contents.