Python Pandas Tutorial (Part 10): Working with Dates and Time Series Data
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Convert timestamp text into pandas datetime objects before using `.dt` methods like `day_name()`.
Briefing
Working with date and time data in pandas starts with one non-negotiable step: converting your timestamp column from plain text into a real datetime type. Once that conversion is done correctly—often by supplying an explicit format string—pandas unlocks a full toolkit for weekday extraction, time-based filtering, date slicing, resampling to new time granularities, and time-series plotting.
The tutorial uses historical Ethereum cryptocurrency data stored in an hourly CSV (columns include a “date” field plus open, high, low, close, and volume). After loading the file, the “date” column initially behaves like a string, which breaks datetime-specific methods such as extracting a weekday name. The fix is to convert the column with `pd.to_datetime`. When pandas can’t infer the format automatically, the conversion requires a `format` string that matches the input pattern—here, a year-month-day plus a 12-hour clock with an AM/PM marker. After conversion, the same datetime method calls work: the first timestamp is correctly identified as a Friday, and the “date” values now display as proper datetime objects.
The workflow then shows two equivalent conversion strategies. One converts after reading the CSV by running `pd.to_datetime` on the column. The other converts during ingestion by using `read_csv`’s `parse_dates` argument together with a custom parser function (implemented as a lambda) that calls `pd.to_datetime` with the same explicit format. Either approach ensures the column supports pandas’ datetime accessor `.dt`.
With datetime types in place, the tutorial demonstrates three practical analysis patterns. First, it uses `.dt` to compute features across the entire series, such as adding a “day of week” column. Second, it filters data by year using boolean masks (e.g., rows between 2019-01-01 and 2019-12-31) and also shows how to avoid manual masks by setting the datetime column as the index and using bracket slicing. Index-based slicing supports both single-year selection and date ranges; an example pulls all data from January through February 2020, correctly spanning the leap day.
Finally, the tutorial moves from filtering to time-series transformation. Resampling changes the frequency from hourly to daily or weekly. For daily highs, it resamples the “high” column with `resample('1D')` and aggregates using `max`. For richer summaries across multiple columns, it resamples the full DataFrame and uses `agg` with a dictionary mapping columns to different aggregation functions—mean for close, max for high, min for low, and sum for volume—producing a weekly overview. A simple matplotlib line plot is shown for the resampled series, reinforcing how datetime-aware resampling feeds directly into visualization and downstream analysis.
Cornell Notes
The core lesson is that pandas time-series work depends on turning timestamp text into real datetime objects. After converting the “date” column (often requiring `pd.to_datetime` with an explicit `format` string), pandas datetime methods become available via `.dt`. With datetime types, analysts can extract weekday features, filter by date ranges using boolean masks or index slicing, and compute time spans with min/max and timedeltas. The tutorial then demonstrates resampling: aggregating hourly cryptocurrency data into daily or weekly metrics using `resample` plus `max`, `mean`, `min`, and `sum`. This enables both summary analysis and plotting at the desired time granularity.
Why do datetime methods fail until the “date” column is converted, and how is the conversion done when pandas can’t infer the format?
What are two ways to convert timestamps—after reading the CSV or during `read_csv`—and when would each be useful?
How can weekday information be generated for every row once the column is datetime?
What are two practical ways to filter rows by date range in pandas?
How does resampling change time-series data, and how do different aggregation functions map to financial-style metrics?
Review Questions
- What specific error indicates that the timestamp column is still a string, and what `pd.to_datetime` parameters resolve it when pandas can’t infer the format?
- How do boolean-mask filtering and index slicing differ when selecting data for a specific year or month range?
- When resampling OHLCV data, which aggregation functions make sense for close, high, low, and volume, and why?
Key Points
- 1
Convert timestamp text into pandas datetime objects before using `.dt` methods like `day_name()`.
- 2
When automatic parsing fails, supply an explicit `format` string to `pd.to_datetime` that matches the input (including 12-hour clock and AM/PM).
- 3
Use `read_csv(parse_dates=...)` with a custom parser function if you want datetime conversion to happen during ingestion.
- 4
Generate weekday features across the entire dataset with `.dt.day_name()` and store them as new columns for quick analysis.
- 5
Filter by date ranges either with boolean masks on the datetime column or with index slicing after setting the datetime column as the index.
- 6
Resample hourly data to daily/weekly using `resample()` plus appropriate aggregations (e.g., daily high via `max`).
- 7
Use `resample(...).agg({...})` to apply different aggregation functions to different OHLCV columns in one step.