Python Pandas Tutorial (Part 6): Add/Remove Rows and Columns From DataFrames

TL;DR

Add columns by computing a Series and assigning it with bracket notation, e.g., df["full name"] = first_name + " " + last_name.

Briefing Cornell Notes

Briefing

Adding and removing data in pandas DataFrames comes down to a few core operations: assigning new columns from computed Series, using drop to delete columns or rows, and using append (plus ignore_index) to grow a table by rows. The most practical takeaway is that column changes are usually done by assignment (with bracket notation), while row changes often require either append for growth or drop for deletion—then you decide whether to keep changes in place.

To add a column, pandas expects a Series-like object with the same length as the DataFrame. A common example combines two existing string columns—first name and last name—into a single new column. The workflow is to create a combined Series (e.g., first_name + " " + last_name) and then assign it to a new column name using bracket notation: df["full name"] = combined_series. That bracket requirement matters: dot notation (df.full_name = ...) is treated as an attribute assignment on the DataFrame object rather than a column operation, so bracket notation is the reliable approach. For more complex transformations, the same pattern can be paired with apply, letting one column’s values drive new computed columns.

Removing columns uses df.drop with a list of column labels. By default, drop returns a new DataFrame view of what the result would look like; to make the change permanent, set in_place=True. The tutorial also demonstrates reversing a transformation: splitting a combined “full name” column back into “first” and “last.” Using the string split method on the Series produces lists per row; setting expand=True turns those lists into separate columns. Those split outputs then get assigned back into two new DataFrame columns via multi-column indexing (df[["first", "last"]] = split_result).

Row operations follow a different pattern. Adding a single row uses append with a dictionary or Series of values, but pandas requires index handling: if the appended object lacks a name/index, ignore_index=True prevents errors and forces pandas to assign the next index automatically. Appending an entire DataFrame works similarly: df.append(df2, ignore_index=True) stacks rows from df2 onto df, even when indexes conflict. If pandas warns about column ordering, passing sort=False avoids sorting columns and suppresses the warning.

Removing rows mirrors column deletion but targets indexes instead of labels. Dropping a specific row uses df.drop(index=...) with the row index value(s); again, in_place=True makes the removal stick. For conditional row removal, drop can take an index derived from a boolean condition—such as dropping rows where last name equals a target value—often by computing a filtered index first (e.g., filt = df[df["last name"] == "DOE"].index) and then dropping df.drop(index=filt). The result is cleaner, more readable code than embedding the condition directly inside the drop call.

Cornell Notes

Pandas DataFrame edits split into two buckets: column operations and row operations. New columns are typically created by computing a Series and assigning it with bracket notation (df["new_col"] = ...); dot notation can misbehave because it targets DataFrame attributes. Columns are removed with df.drop(columns=[...]) and made permanent using in_place=True. Rows are added with append, usually paired with ignore_index=True to avoid index/name errors and to reindex cleanly when stacking DataFrames. Rows are removed with df.drop(index=...) either by explicit index values or by computing an index from a boolean condition (e.g., df[df["last name"] == "DOE"].index).

How do you add a new column that combines two existing columns (like first name and last name)?

Create a combined Series from the two columns (for strings, first_name + " " + last_name) and assign it to a new column using bracket notation: df["full name"] = combined_series. The combined Series must align with the DataFrame’s row count. Bracket notation is required because dot notation is interpreted as setting a DataFrame attribute, not a column.

What’s the difference between df.drop(...) returning a result and actually changing the DataFrame?

df.drop(...) returns a new DataFrame by default, acting like a view of the would-be result. To apply the change to the existing DataFrame, set in_place=True inside drop. Without in_place=True, the original DataFrame remains unchanged unless you reassign it (e.g., df = df.drop(...)).

How can a combined “full name” column be split back into two columns?

Use the string split method on the Series: df["full name"].str.split(" ", expand=True). With expand=True, the per-row lists produced by split become separate columns. Then assign them back into two DataFrame columns at once, for example: df[["first", "last"]] = split_result.

Why does appending a single row sometimes raise an error, and how do you fix it?

Appending a single row can fail when the appended object lacks an index/name. The error indicates pandas can only append a Series if ignore_index=True or if the Series has a name. Setting ignore_index=True tells pandas to ignore the incoming index and assign one automatically.

What should you do when appending another DataFrame triggers warnings about column order?

If pandas warns about sorting due to mismatched column order, pass sort=False to append. This prevents pandas from sorting columns during the append and removes the warning in the tutorial’s example.

How do you remove rows conditionally (e.g., drop rows where last name equals a value)?

Compute the index of rows that match the condition, then drop by that index. One readable approach is: filt = df[df["last name"] == "DOE"].index, then df.drop(index=filt). This avoids stuffing the boolean logic directly inside the drop call and keeps the code easier to maintain.

Review Questions

When adding a column in pandas, why is bracket notation (df["col"]) preferred over dot notation (df.col = ...)?
How do expand=True and str.split work together to split a single string column into multiple columns?
What role do ignore_index=True and sort=False play when appending rows or DataFrames?

Key Points

1
Add columns by computing a Series and assigning it with bracket notation, e.g., df["full name"] = first_name + " " + last_name.
2
Use df.drop with a list of column labels to remove multiple columns, and set in_place=True to make the deletion permanent.
3
Split a combined string column back into multiple columns with df["full name"].str.split(" ", expand=True), then assign into df[["first", "last"]].
4
Add rows with append, and use ignore_index=True when the appended data lacks a usable index/name.
5
Append another DataFrame with df.append(df2, ignore_index=True) to stack rows while reindexing cleanly.
6
If append warns about column ordering, pass sort=False to keep column order stable and suppress sorting behavior.
7
Remove rows with df.drop(index=...) using either explicit index values or an index derived from a boolean filter.

Highlights

Bracket notation is the safe way to create or replace DataFrame columns; dot notation can be interpreted as attribute assignment instead of column assignment.

Splitting “full name” into “first” and “last” becomes straightforward with str.split(..., expand=True), which converts split lists into separate columns.

Appending rows often requires ignore_index=True to avoid index/name errors and to ensure pandas assigns consistent new row indices.

Topics

Pandas DataFrames
Add Columns
Remove Columns
Append Rows
Drop Rows