Python Pandas Tutorial (Part 10): Working with Dates and Time Series Data

Q: How does resampling change time-series data, and how do different aggregation functions map to financial-style metrics?

Resampling changes the frequency from hourly to a coarser interval (e.g., daily with `resample('1D')` or weekly with `resample('W')`). For a single column, you can resample and aggregate directly—like `df['high'].resample('1D').max()` to get the daily highest trade. For multiple columns with different rules, resample the whole DataFrame and use `agg` with a dictionary: mean close (`'close': 'mean'`), max high (`'high': 'max'`), min low (`'low': 'min'`), and sum volume (`'volume': 'sum'`). This produces a weekly summary that matches how traders interpret OHLCV data.

TL;DR

Convert timestamp text into pandas datetime objects before using `.dt` methods like `day_name()`.

Briefing Cornell Notes

Briefing

Working with date and time data in pandas starts with one non-negotiable step: converting your timestamp column from plain text into a real datetime type. Once that conversion is done correctly—often by supplying an explicit format string—pandas unlocks a full toolkit for weekday extraction, time-based filtering, date slicing, resampling to new time granularities, and time-series plotting.

The tutorial uses historical Ethereum cryptocurrency data stored in an hourly CSV (columns include a “date” field plus open, high, low, close, and volume). After loading the file, the “date” column initially behaves like a string, which breaks datetime-specific methods such as extracting a weekday name. The fix is to convert the column with `pd.to_datetime`. When pandas can’t infer the format automatically, the conversion requires a `format` string that matches the input pattern—here, a year-month-day plus a 12-hour clock with an AM/PM marker. After conversion, the same datetime method calls work: the first timestamp is correctly identified as a Friday, and the “date” values now display as proper datetime objects.

The workflow then shows two equivalent conversion strategies. One converts after reading the CSV by running `pd.to_datetime` on the column. The other converts during ingestion by using `read_csv`’s `parse_dates` argument together with a custom parser function (implemented as a lambda) that calls `pd.to_datetime` with the same explicit format. Either approach ensures the column supports pandas’ datetime accessor `.dt`.

With datetime types in place, the tutorial demonstrates three practical analysis patterns. First, it uses `.dt` to compute features across the entire series, such as adding a “day of week” column. Second, it filters data by year using boolean masks (e.g., rows between 2019-01-01 and 2019-12-31) and also shows how to avoid manual masks by setting the datetime column as the index and using bracket slicing. Index-based slicing supports both single-year selection and date ranges; an example pulls all data from January through February 2020, correctly spanning the leap day.

Finally, the tutorial moves from filtering to time-series transformation. Resampling changes the frequency from hourly to daily or weekly. For daily highs, it resamples the “high” column with `resample('1D')` and aggregates using `max`. For richer summaries across multiple columns, it resamples the full DataFrame and uses `agg` with a dictionary mapping columns to different aggregation functions—mean for close, max for high, min for low, and sum for volume—producing a weekly overview. A simple matplotlib line plot is shown for the resampled series, reinforcing how datetime-aware resampling feeds directly into visualization and downstream analysis.

Cornell Notes

The core lesson is that pandas time-series work depends on turning timestamp text into real datetime objects. After converting the “date” column (often requiring `pd.to_datetime` with an explicit `format` string), pandas datetime methods become available via `.dt`. With datetime types, analysts can extract weekday features, filter by date ranges using boolean masks or index slicing, and compute time spans with min/max and timedeltas. The tutorial then demonstrates resampling: aggregating hourly cryptocurrency data into daily or weekly metrics using `resample` plus `max`, `mean`, `min`, and `sum`. This enables both summary analysis and plotting at the desired time granularity.

Why do datetime methods fail until the “date” column is converted, and how is the conversion done when pandas can’t infer the format?

Datetime-specific calls like extracting a weekday name fail when the column is still a string (e.g., errors like “string object has no attribute day name”). The fix is converting the column with `pd.to_datetime`. If pandas can’t parse the input automatically (triggering errors such as “unknown string format”), the conversion must include a `format` string that matches the input pattern—here, a year-month-day plus a 12-hour clock with AM/PM (using codes like `%Y`, `%m`, `%d`, `%I` for 12-hour clock, and `%p` for AM/PM). After conversion, the same weekday extraction works and timestamps become proper datetime objects.

What are two ways to convert timestamps—after reading the CSV or during `read_csv`—and when would each be useful?

One approach converts after loading: read the CSV normally, then run `df['date'] = pd.to_datetime(df['date'], format=...)`. The other approach converts during ingestion: use `read_csv(..., parse_dates=[...])` with a custom parser function (often a lambda) that calls `pd.to_datetime` with the explicit format. During ingestion is convenient when you want the DataFrame ready for datetime operations immediately; converting after reading is simpler when you already loaded the data and only later decide you need datetime functionality.

How can weekday information be generated for every row once the column is datetime?

After conversion, pandas exposes datetime accessors through `.dt`. For example, `df['date'].dt.day_name()` returns the weekday name for each timestamp. The tutorial also shows creating a new column like `df['day of week'] = df['date'].dt.day_name()` so weekday context is available alongside other fields such as open/high/low/close/volume.

What are two practical ways to filter rows by date range in pandas?

Boolean masks filter without changing the index: compare the datetime column to date boundaries (e.g., `df.loc[df['date'] >= '2019-01-01']` and `df.loc[df['date'] < '2020-01-01']`, with `pd.to_datetime` used when boundaries must be parsed explicitly). Index slicing is often cleaner: set the datetime column as the index with `df.set_index('date', inplace=True)` and then select with bracket notation like `df['2019']` or slice ranges like `df['2020-01':'2020-02']`. The slice end is inclusive, which matters for month-range queries.

How does resampling change time-series data, and how do different aggregation functions map to financial-style metrics?

Resampling changes the frequency from hourly to a coarser interval (e.g., daily with `resample('1D')` or weekly with `resample('W')`). For a single column, you can resample and aggregate directly—like `df['high'].resample('1D').max()` to get the daily highest trade. For multiple columns with different rules, resample the whole DataFrame and use `agg` with a dictionary: mean close (`'close': 'mean'`), max high (`'high': 'max'`), min low (`'low': 'min'`), and sum volume (`'volume': 'sum'`). This produces a weekly summary that matches how traders interpret OHLCV data.

Review Questions

What specific error indicates that the timestamp column is still a string, and what `pd.to_datetime` parameters resolve it when pandas can’t infer the format?
How do boolean-mask filtering and index slicing differ when selecting data for a specific year or month range?
When resampling OHLCV data, which aggregation functions make sense for close, high, low, and volume, and why?

Key Points

1
Convert timestamp text into pandas datetime objects before using `.dt` methods like `day_name()`.
2
When automatic parsing fails, supply an explicit `format` string to `pd.to_datetime` that matches the input (including 12-hour clock and AM/PM).
3
Use `read_csv(parse_dates=...)` with a custom parser function if you want datetime conversion to happen during ingestion.
4
Generate weekday features across the entire dataset with `.dt.day_name()` and store them as new columns for quick analysis.
5
Filter by date ranges either with boolean masks on the datetime column or with index slicing after setting the datetime column as the index.
6
Resample hourly data to daily/weekly using `resample()` plus appropriate aggregations (e.g., daily high via `max`).
7
Use `resample(...).agg({...})` to apply different aggregation functions to different OHLCV columns in one step.

Highlights

Datetime operations break when timestamps remain strings; converting with `pd.to_datetime` (plus a matching `format` string) restores full functionality.

Index-based slicing after `set_index('date')` makes year and month-range selection concise and reliable, including inclusive slice endpoints.

Resampling supports both single-metric summaries (like daily `high` via `max`) and multi-metric OHLCV rollups using `agg` dictionaries.

Topics

Datetime Parsing
Time Series Filtering
Resampling
Pandas .dt Accessor
OHLCV Aggregation

Mentioned

Corey Schafer