Get AI summaries of any video or article — Sign up free
Python Tutorial: Automate Parsing and Renaming of Multiple Files thumbnail

Python Tutorial: Automate Parsing and Renaming of Multiple Files

Corey Schafer·
4 min read

Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Use `os.chdir` and `os.getcwd` to ensure the script operates in the correct folder before renaming anything.

Briefing

A practical Python script can fix messy, alphabetically sorted video filenames by renaming hundreds of files so they play in the intended numeric order on a phone playlist. The core issue is that downloaded course videos use a title-first naming pattern (title + dash + course name + number), which breaks ordering when devices sort strings alphabetically. The solution is to parse each filename, extract the embedded sequence number, and rebuild a new filename that starts with that number—optionally zero-padded—so “1, 2, 3 … 10” sorts correctly.

The workflow starts by using Python’s `os` module to work directly with the filesystem. The script changes into the folder containing the video files (`os.chdir(...)`), verifies the working directory with `os.getcwd()`, and lists the directory contents (`os.listdir(...)`) to confirm the filenames being processed. From there, each filename is split into its base name and extension using `os.path.splitext`, producing two parts: the name without the file type and the extension itself.

Next comes the parsing logic. The base filename is split on hyphens, yielding three components: the title, the course name, and the numeric sequence token. Those pieces are assigned to variables (e.g., `f_title`, `f_course`, `f_num`) and then recombined using a formatted string. The initial rebuild places the number first, followed by the course name and title, and finally the extension. Minor formatting issues—like stray spaces before the extension—are corrected with `.strip()`.

Two refinements address ordering and personal preference. First, the number token includes an unwanted leading character (a number sign), so the script slices the string to drop the first character (using `f_num[1:]`). Second, plain numeric strings cause “10” to sort near “1” because sorting is character-based. To prevent that, the script zero-pads single-digit numbers with `zfill(2)`, turning “1” into “01” and “2” into “02,” while leaving “10” unchanged. After that, the course name can be removed entirely from the new filename, leaving a cleaner format that still sorts correctly.

Finally, the script performs the rename operation inside a loop. For each original filename, it computes `new_name` and calls `os.rename(original_file, new_name)`. Running it once updates the entire directory, eliminating the tedium and error risk of manual renaming. The broader takeaway is that small, targeted scripts—built step by step with validation prints—can automate repetitive file-management tasks reliably and can be reused when similar naming problems show up again.

Cornell Notes

The script automates renaming course video files so they sort—and therefore play—in the correct numeric order on a phone. It uses `os` to change directories, list files, split filenames into base name and extension with `os.path.splitext`, then parse the base name by hyphens into title, course, and sequence number. It rebuilds a new filename that starts with the sequence number, removes an unwanted leading character from that number, and uses `zfill(2)` so “01, 02, … 10” sorts properly. The final step loops through every file and applies `os.rename` to replace the old names in one pass, saving time and reducing mistakes.

Why does alphabetic sorting break the intended video order, and how does the rename fix it?

The original filenames place the title first, then a course name, then a sequence token. When a phone playlist sorts filenames alphabetically, the ordering depends on the first characters (the title), not the embedded sequence number. The script rebuilds filenames so the sequence number appears at the beginning, making alphabetical sorting align with the intended watch order.

What is the role of `os.path.splitext` in the script?

`os.path.splitext(filename)` separates the filename into two parts: the base name (everything before the extension) and the extension (like `.mp4`). This lets the script safely rearrange the base name while preserving the original file type when constructing the new name.

How does the script extract the sequence number from each filename?

It splits the base filename on hyphens (`file_name.split('-')`), producing three elements: title, course name, and the number token. The number token includes a leading character (shown as a number sign in the example), so the script later removes that first character using slicing (`f_num[1:]`).

Why is `zfill(2)` necessary for correct ordering?

Without padding, string sorting treats “1” and “10” as characters: “10” can appear near “1” because both start with “1.” Using `f_num.zfill(2)` converts single digits into two-character strings (e.g., “1” → “01”, “2” → “02”), so “01, 02, … 10” sorts in the intended numeric sequence.

How does the script actually rename files, and where does the loop fit?

After computing `new_name` for each original filename, the script calls `os.rename(f, new_name)` inside a loop over the directory’s contents. That means every file in the folder gets updated in one run, replacing the old filename with the newly formatted one.

Review Questions

  1. If the sequence number were already clean (no leading number sign), which line(s) would you remove or change in the script?
  2. What would likely happen to the sort order if you removed `zfill(2)` but kept the sequence number at the beginning of the filename?
  3. How would you modify the formatted string if you wanted the new filename to include the course name again (and still keep correct numeric sorting)?

Key Points

  1. 1

    Use `os.chdir` and `os.getcwd` to ensure the script operates in the correct folder before renaming anything.

  2. 2

    Validate inputs early by listing directory contents with `os.listdir` and printing filenames during development.

  3. 3

    Split extensions with `os.path.splitext` so renaming doesn’t break file types.

  4. 4

    Parse structured filenames by splitting on a consistent delimiter (hyphens) and assigning parts to variables.

  5. 5

    Remove unwanted characters from the extracted sequence token using string slicing (e.g., `f_num[1:]`).

  6. 6

    Pad numeric strings with `zfill(2)` to prevent lexicographic sorting errors like “10” appearing near “1”.

  7. 7

    Rename safely in a loop using `os.rename(original, new_name)` after constructing the final filename format.

Highlights

Renaming works because putting the sequence number at the start aligns filename sorting with intended playback order.
`zfill(2)` prevents the classic lexicographic problem where “10” sorts next to “1”.
The script preserves extensions by using `os.path.splitext`, then recombines the extension into the new name.
A single loop with `os.rename` can update hundreds of files in one pass, avoiding manual mistakes.

Topics

Mentioned