Python Pandas Tutorial (Part 5): Updating Rows and Columns - Modifying Data Within DataFrames
Based on Corey Schafer's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Rename all columns by assigning a complete list to `DF.columns`, and rename selected columns with `DF.rename(..., inplace=True)` using a dictionary.
Briefing
Updating existing data in pandas DataFrames hinges on using the right indexing and transformation tools—especially to avoid the “SettingWithCopy” trap. Column names can be changed in bulk or selectively, but row and cell updates should be done through explicit indexers like `.loc` (and sometimes `.@`) so pandas can reliably write back to the intended DataFrame.
For columns, the tutorial starts with renaming everything at once by assigning a new list to `DF.columns`. That approach is useful when every column name should change. For more targeted edits, it uses string-based transformations across all column names—uppercasing via a list comprehension (`DF.columns = [x.upper() for x in DF.columns]`) or replacing spaces with underscores using `DF.columns = DF.columns.str.replace(' ', '_')`. When only a few columns need changes, `DF.rename()` with a dictionary maps old names to new ones; the change only persists when `inplace=True` is set.
Row updates begin with single-value edits using `.loc`. After selecting a specific row by index label (e.g., `DF.loc[2]` for “John Doe”), the tutorial shows three assignment patterns: replacing an entire row by providing values for all columns; updating only selected columns by specifying both row and column selectors (e.g., `DF.loc[2, ['last', 'email']] = ...`); and changing one cell by targeting a single row and single column (e.g., `DF.loc[2, 'last'] = 'Smith'`). It also notes pandas’ alternative indexer `.@` for single-cell access, though it emphasizes that `.loc` remains the common choice.
A key warning follows: assigning to a filtered DataFrame using bracket indexing can trigger a “SettingWithCopy” warning and may silently fail to update the original data. The tutorial demonstrates filtering rows by condition (like matching an email), then attempting `filtered_df['last'] = 'Smith'`—which produces the warning and leaves the original DataFrame unchanged. The fix is to perform the assignment directly on the original DataFrame using `.loc` with the same filter condition, ensuring the update actually lands.
For multiple-row updates, the tutorial shows direct column assignment after vectorized string operations, such as lowercasing all emails with `DF['email'] = DF['email'].str.lower()`. It then distinguishes four commonly confused transformation methods: `apply` (function on each value of a Series, or on each Series/column when used on a DataFrame), `applymap` (function on each individual element of a DataFrame), `map` (substitute values in a Series using a dictionary), and `replace` (similar substitution but keeps values not present in the mapping rather than converting them to `NaN`). Finally, it applies these ideas to a Stack Overflow survey dataset: renaming `converted_comp` to `salary USD`, converting a `hobbyist` column from “yes/no” to boolean `True/False` using `map`, and reinforcing the habit of checking changes before committing them with `inplace=True` where applicable.
Overall, the practical takeaway is straightforward: use `.loc` for reliable writes, use vectorized operations for bulk edits, and pick `apply`/`applymap`/`map`/`replace` based on whether the transformation targets Series values, DataFrame elements, or dictionary-based substitutions.
Cornell Notes
The core lesson is how to modify pandas DataFrames safely and precisely. Column names can be renamed wholesale via `DF.columns = [...]`, transformed in bulk with list comprehensions or `DF.columns.str...`, or selectively changed with `DF.rename(..., inplace=True)`. For data values, `.loc` enables reliable updates to single cells, subsets of columns in a row, or entire rows—while avoiding the “SettingWithCopy” warning that can occur when assigning to filtered bracket slices. Bulk row updates are typically done by assigning transformed columns (e.g., `DF['email'] = DF['email'].str.lower()`). When transforming many values, choose between `apply`, `applymap`, `map`, and `replace` based on whether the target is Series values, DataFrame elements, or dictionary substitutions.
How do you rename all columns versus only a few columns in a pandas DataFrame?
What’s the correct way to update a single value in a DataFrame row?
Why does pandas sometimes warn “SettingWithCopy,” and how do you avoid it?
When should you use `apply`, `applymap`, `map`, or `replace`?
How do you update multiple rows efficiently for a single column transformation?
Review Questions
- What specific difference between using bracket assignment on a filtered DataFrame versus using `.loc` prevents the “SettingWithCopy” problem?
- Given a DataFrame with string columns, which method (`apply`, `applymap`, `map`, or `replace`) would you use to lowercase every cell, and why?
- How would you update only the `last` and `email` columns for a single row without providing values for all other columns?
Key Points
- 1
Rename all columns by assigning a complete list to `DF.columns`, and rename selected columns with `DF.rename(..., inplace=True)` using a dictionary.
- 2
Use `.loc[row_index, column_name]` (or `.loc[row_index, [col1, col2]]`) for reliable updates to DataFrame values.
- 3
Avoid bracket-based assignment on filtered slices; it can trigger “SettingWithCopy” and fail to update the original DataFrame.
- 4
For bulk updates, transform a column with vectorized operations (e.g., `DF['email'].str.lower()`) and assign the result back to the column.
- 5
Choose `apply` vs `applymap` based on whether the function should run per Series/column (`apply`) or per individual element (`applymap`).
- 6
Use `map` for Series value substitution with dictionary mappings (unmatched values become `NaN`), and use `replace` when unmatched values should remain unchanged.
- 7
When renaming or transforming real datasets, verify the change before committing it with `inplace=True` to reduce silent mistakes.