Python Tutorial: CSV Module - How to Read, Parse, and Write CSV Files

TL;DR

CSV stands for comma-separated values, but the delimiter can be commas, tabs, dashes, or other characters as long as the parser matches it.

Briefing Cornell Notes

Briefing

CSV files store structured data as plain text, typically using a delimiter like commas to separate fields on each line. A header row names the columns (e.g., First name, Last name, Email), and each subsequent line holds the corresponding values. That simple format is exactly why CSV parsing matters: without a proper parser, names or fields that contain delimiter characters can break naive string-splitting approaches.

Python’s built-in `csv` module streamlines reading, parsing, and writing CSV data. For reading, the workflow starts by opening the file with a context manager and creating a `csv.reader` object. Iterating over that reader yields each row as a list of values, where the header row appears as the first list. With index-based access, the email is consistently the third element (index 2) when the file is structured as First name, Last name, Email. If the header row isn’t desired, the code can advance the iterator using `next(...)` to skip the first line before processing the remaining records.

Writing CSV data follows a parallel pattern: open a new output file for writing, create a `csv.writer`, and pass the desired delimiter. When the delimiter changes—such as switching from commas to dashes—the output becomes harder to read, but it demonstrates an important safety feature: the writer automatically quotes fields that contain the delimiter character. In the example, an email containing a dash is wrapped in quotes so the dash inside the email doesn’t get mistaken for a field separator. Similarly, a hyphenated last name is quoted to preserve it as a single value. Using a more common delimiter like tabs (`\t`) produces a cleaner, readable file, and the same delimiter must be specified when reading back the data.

A key troubleshooting point emerges when the delimiter is wrong: reading a tab-delimited file with the default comma expectation results in rows that don’t split into multiple fields. Fixing the issue requires explicitly setting `delimiter='\t'` in `csv.reader` so the parser matches the file’s actual structure.

For more maintainable code, the tutorial recommends dictionary-based parsing and writing. `csv.DictReader` turns each row into a mapping keyed by the header fields, so accessing values becomes semantic (e.g., `line['Email']`) rather than index-based (e.g., `line[2]`). On the writing side, `csv.DictWriter` requires the field names upfront and can optionally write headers. It also makes column selection straightforward: removing the `'email'` key before writing a row drops that column entirely, producing an output file with only the remaining fields (First name and Last name). Overall, the `csv` module avoids brittle parsing logic while handling delimiters, quoting, headers, and structured access to fields.

Cornell Notes

CSV files store tabular data as plain text, using a delimiter to separate fields on each line, with a header row naming the columns. Python’s `csv.reader` reads rows as lists, where field positions are accessed by index (e.g., email at index 2 for First name, Last name, Email). Writing uses `csv.writer`, and changing the delimiter requires matching it during reading; otherwise, parsing fails (e.g., tab-delimited data read as comma-delimited yields one value per line). For clearer, safer code, `csv.DictReader` and `csv.DictWriter` map each row to dictionaries keyed by header names, enabling direct access like `line['Email']` and easy column removal by deleting keys before writing.

Why is using the `csv` module safer than splitting each line with `str.split(',')`?

A delimiter character can appear inside a field value (for example, an email containing a dash when the delimiter is `-`). Naive splitting would incorrectly break that value into multiple fields. The `csv` module understands CSV structure and quoting rules, so it parses correctly even when delimiter-like characters occur inside fields.

How does `csv.reader` represent each row, and how are fields accessed?

`csv.reader` yields each row as a list of values. With a header like First name, Last name, Email, the first data row becomes a list where index 0 is the first name, index 1 is the last name, and index 2 is the email. The header row itself is also read as the first row unless explicitly skipped.

What’s the correct way to skip the header row when using `csv.reader`?

Because the reader is an iterator, the code can advance it once before the main loop using `next(csv_reader)`. That consumes the header row so subsequent iteration starts with the first person’s data.

What happens if the delimiter used by `csv.reader` doesn’t match the file’s delimiter?

If a tab-delimited file is read with the default comma delimiter, each line may not split into multiple fields—often appearing as a single value per row. The fix is to pass the correct delimiter to `csv.reader`, such as `delimiter='\t'` for tabs.

How do `csv.DictReader` and `csv.DictWriter` improve code clarity?

`csv.DictReader` converts each row into a dictionary keyed by the header names, so values are accessed by meaning (e.g., `line['Email']`) instead of by fragile numeric indexes. `csv.DictWriter` writes dictionaries back to CSV and requires the field names list; it can also omit columns by deleting keys (e.g., remove `'email'` before writing).

Review Questions

When using `csv.reader`, what index corresponds to the email field given a header of First name, Last name, Email?
Why must the delimiter be specified consistently when writing with `csv.writer` and reading with `csv.reader`?
How does deleting the `'email'` key before `csv.DictWriter.writerow(...)` change the output file’s columns?

Key Points

1
CSV stands for comma-separated values, but the delimiter can be commas, tabs, dashes, or other characters as long as the parser matches it.
2
`csv.reader` returns each row as a list, making index-based access possible but less readable than named-field access.
3
Skipping the header row with `next(reader)` prevents treating column names as data records.
4
`csv.writer` automatically quotes fields containing the delimiter character, preserving values like hyphenated emails or names.
5
Reading with the wrong delimiter causes incorrect parsing (often leaving rows unsplit), so `delimiter` must match the file format.
6
`csv.DictReader` and `csv.DictWriter` map rows to dictionaries keyed by header names, enabling clearer access like `line['Email']` and easy column removal by deleting keys.

Highlights

Changing the delimiter (e.g., from commas to dashes) demonstrates automatic quoting: fields containing the delimiter are wrapped so they remain intact.

Delimiter mismatches break parsing: tab-delimited data read as comma-delimited can yield one value per line until `delimiter='\t'` is set.

Dictionary-based CSV handling turns positional indexes into named keys, making code more self-documenting and easier to modify (like dropping the email column).

Topics

CSV Parsing
CSV Reading
CSV Writing
Delimiter Handling
DictReader/DictWriter