Python Pandas Tutorial (Part 5): Updating Rows and Columns

TL;DR

Rename all columns by assigning a complete list to `DF.columns`, and rename selected columns with `DF.rename(..., inplace=True)` using a dictionary.

Briefing Cornell Notes

Briefing

Updating existing data in pandas DataFrames hinges on using the right indexing and transformation tools—especially to avoid the “SettingWithCopy” trap. Column names can be changed in bulk or selectively, but row and cell updates should be done through explicit indexers like `.loc` (and sometimes `.@`) so pandas can reliably write back to the intended DataFrame.

For columns, the tutorial starts with renaming everything at once by assigning a new list to `DF.columns`. That approach is useful when every column name should change. For more targeted edits, it uses string-based transformations across all column names—uppercasing via a list comprehension (`DF.columns = [x.upper() for x in DF.columns]`) or replacing spaces with underscores using `DF.columns = DF.columns.str.replace(' ', '_')`. When only a few columns need changes, `DF.rename()` with a dictionary maps old names to new ones; the change only persists when `inplace=True` is set.

Row updates begin with single-value edits using `.loc`. After selecting a specific row by index label (e.g., `DF.loc[2]` for “John Doe”), the tutorial shows three assignment patterns: replacing an entire row by providing values for all columns; updating only selected columns by specifying both row and column selectors (e.g., `DF.loc[2, ['last', 'email']] = ...`); and changing one cell by targeting a single row and single column (e.g., `DF.loc[2, 'last'] = 'Smith'`). It also notes pandas’ alternative indexer `.@` for single-cell access, though it emphasizes that `.loc` remains the common choice.

A key warning follows: assigning to a filtered DataFrame using bracket indexing can trigger a “SettingWithCopy” warning and may silently fail to update the original data. The tutorial demonstrates filtering rows by condition (like matching an email), then attempting `filtered_df['last'] = 'Smith'`—which produces the warning and leaves the original DataFrame unchanged. The fix is to perform the assignment directly on the original DataFrame using `.loc` with the same filter condition, ensuring the update actually lands.

For multiple-row updates, the tutorial shows direct column assignment after vectorized string operations, such as lowercasing all emails with `DF['email'] = DF['email'].str.lower()`. It then distinguishes four commonly confused transformation methods: `apply` (function on each value of a Series, or on each Series/column when used on a DataFrame), `applymap` (function on each individual element of a DataFrame), `map` (substitute values in a Series using a dictionary), and `replace` (similar substitution but keeps values not present in the mapping rather than converting them to `NaN`). Finally, it applies these ideas to a Stack Overflow survey dataset: renaming `converted_comp` to `salary USD`, converting a `hobbyist` column from “yes/no” to boolean `True/False` using `map`, and reinforcing the habit of checking changes before committing them with `inplace=True` where applicable.

Overall, the practical takeaway is straightforward: use `.loc` for reliable writes, use vectorized operations for bulk edits, and pick `apply`/`applymap`/`map`/`replace` based on whether the transformation targets Series values, DataFrame elements, or dictionary-based substitutions.

Cornell Notes

The core lesson is how to modify pandas DataFrames safely and precisely. Column names can be renamed wholesale via `DF.columns = [...]`, transformed in bulk with list comprehensions or `DF.columns.str...`, or selectively changed with `DF.rename(..., inplace=True)`. For data values, `.loc` enables reliable updates to single cells, subsets of columns in a row, or entire rows—while avoiding the “SettingWithCopy” warning that can occur when assigning to filtered bracket slices. Bulk row updates are typically done by assigning transformed columns (e.g., `DF['email'] = DF['email'].str.lower()`). When transforming many values, choose between `apply`, `applymap`, `map`, and `replace` based on whether the target is Series values, DataFrame elements, or dictionary substitutions.

How do you rename all columns versus only a few columns in a pandas DataFrame?

To rename every column, assign a full list to `DF.columns` (e.g., `DF.columns = ['first name','last name','email']`). To change only a subset, use `DF.rename()` with a dictionary mapping old names to new names (e.g., `DF.rename(columns={'first name':'first','last name':'last'}, inplace=True)`). The tutorial stresses that `inplace=True` is needed for the rename to persist.

What’s the correct way to update a single value in a DataFrame row?

Use `.loc` with both row and column selectors. For example, to change the last name for row index label 2: `DF.loc[2, 'last'] = 'Smith'`. The tutorial also mentions `.@` as an alternative for single-cell access, but `.loc` is presented as the practical default.

Why does pandas sometimes warn “SettingWithCopy,” and how do you avoid it?

The warning appears when updates are made to a temporary filtered object rather than directly to the original DataFrame. In the tutorial, filtering with brackets returns a separate object; assigning to it (`filtered['last'] = 'Smith'`) triggers the warning and may not change the original. The fix is to assign through `.loc` on the original DataFrame using the filter condition, so the write happens in-place on the real DataFrame.

When should you use `apply`, `applymap`, `map`, or `replace`?

`apply` works on Series values (function per element) and on DataFrames by applying a function to each Series/column (or each row if `axis` is changed). `applymap` applies a function to every individual element in a DataFrame. `map` substitutes values in a Series using a dictionary; values not in the dictionary become `NaN`. `replace` performs similar substitutions but leaves unmatched values unchanged. The tutorial demonstrates `map` for converting “yes/no” to booleans and `replace` for preserving other values.

How do you update multiple rows efficiently for a single column transformation?

Use vectorized string operations and assign back to the column. For example, to lowercase all emails: `DF['email'] = DF['email'].str.lower()`. The tutorial contrasts this with merely computing `DF['email'].str.lower()` without assignment, which returns results but doesn’t modify the DataFrame.

Review Questions

What specific difference between using bracket assignment on a filtered DataFrame versus using `.loc` prevents the “SettingWithCopy” problem?
Given a DataFrame with string columns, which method (`apply`, `applymap`, `map`, or `replace`) would you use to lowercase every cell, and why?
How would you update only the `last` and `email` columns for a single row without providing values for all other columns?

Key Points

1
Rename all columns by assigning a complete list to `DF.columns`, and rename selected columns with `DF.rename(..., inplace=True)` using a dictionary.
2
Use `.loc[row_index, column_name]` (or `.loc[row_index, [col1, col2]]`) for reliable updates to DataFrame values.
3
Avoid bracket-based assignment on filtered slices; it can trigger “SettingWithCopy” and fail to update the original DataFrame.
4
For bulk updates, transform a column with vectorized operations (e.g., `DF['email'].str.lower()`) and assign the result back to the column.
5
Choose `apply` vs `applymap` based on whether the function should run per Series/column (`apply`) or per individual element (`applymap`).
6
Use `map` for Series value substitution with dictionary mappings (unmatched values become `NaN`), and use `replace` when unmatched values should remain unchanged.
7
When renaming or transforming real datasets, verify the change before committing it with `inplace=True` to reduce silent mistakes.

Highlights

Column renaming can be done globally with `DF.columns = [...]`, but targeted renames require `DF.rename(..., inplace=True)` to persist.

Single-cell updates should go through `.loc` (e.g., `DF.loc[2, 'last'] = 'Smith'`) to ensure pandas writes to the intended DataFrame.

Filtered assignment that triggers “SettingWithCopy” may not update anything—using `.loc` with the filter condition prevents that.

`apply` and `applymap` are easy to mix up: `apply` targets Series/columns, while `applymap` targets every element.

`map` and `replace` both substitute values from dictionaries, but `map` turns unmatched entries into `NaN` while `replace` keeps them.

Topics

Updating Columns
Updating Rows
Pandas Indexing
DataFrame Transformations
apply vs map

Mentioned

Corey Schafer
NaN

Python Pandas Tutorial (Part 5): Updating Rows and Columns - Modifying Data Within DataFrames