Pandas Method Chaining: Write Data Transformations That Read Like a Story

1. What Is Method Chaining?

Method chaining means calling multiple DataFrame methods sequentially on the same object. Note: you'll want to wrap the chain in parentheses to avoid needing to add a backslash after each method.

1import pandas
2
3raw_df = pandas.read_csv('myData.csv')
4
5transformed_df = (raw_df
6    .dropna(subset=['revenue'])
7    .assign(revenue_usd=lambda x: x['revenue'] / 100)
8    .query("""revenue_usd > 10000""")
9    .sort_values('revenue_usd', ascending=False)
10)

This code is much cleaner and reads easier than something like this:

1import pandas
2
3raw_df = pandas.read_csv('myData.csv')
4
5transformed_df = raw_df.copy()
6transformed_df = transformed_df.dropna(subset=['revenue'])
7transformed_df['revenue_usd'] = transformed_df['revenue'] / 100
8transformed_df = transformed_df[transformed_df['revenue_usd'] > 10000]
9transformed_df = transformed_df.sort_values('revenue_usd', ascending=False)

The chained version:

Avoids mutation confusion
Avoids accidental state reuse
Makes transformation order explicit
Feels closer to SQL

2. Building the Chain

Nearly every action that you would take to manipulate a dataset can be done via method chaining — although some methods are more intuitive than others.

Let's import our data:

1import pandas
2
3raw_df = pandas.read_csv('salesData.csv')

From here, we can apply some of the more intuitive data transformations. In the following example, we will sort, then rename columns, select specific columns, and perform a left join with another dataset.

1# Define a dictionary to use for re-naming AND selecting specific columns
2result_columns = {
3    'customer__unique_id': 'customer_id',
4    'account_name': 'customer_name',
5    'contract_id': 'contract_number',
6    'reporting_month': 'month',
7    'monthly_revenue_usd': 'revenue'
8}
9
10transformed_df = (raw_df
11    .sort_values(by=['account_name', 'contract_id'], ascending=[False, True])
12    .rename(columns=result_columns)
13    .filter(result_columns.values())
14    .merge(customer_detail_df, on='customer_id', how='left')
15)

In the example, the following transformations were made:

Sort the DataFrame by account_name then contract_id.
Rename the DataFrame's columns using a dictionary defined above. Each key in the dictionary will be renamed to the corresponding value.
Select only the columns defined in the dictionary. Note: we are using .values() to get an iterable of just the column names after they are modified.
Merge (Left Join) with the DataFrame customer_detail_df.

This dictionary enhances method chaining even futher by allowing us to define logic once and only once.

We can add additional methods to the chain to continue the transformation process. Most pandas methods make this extremely straightforward, so I won't cover every case. In the example below, we will take transformed_df and then group it, aggregate, and only return 10 rows to generate a report of the Top 10 Customers by Revenue and Month.

1report_df = (transformed_df
2    .groupby(['customer_name', 'month'], as_index=False)
3    .agg({'revenue': 'sum'})
4    .sort_values('revenue', ascending=False)
5    .head(10)
6)

Sorting, limiting, de-duplicating, dropping nulls, filling nulls, renaming columns, selecting columns, grouping, aggregating, pivoting, un-pivoting, merging (joining), the list goes on. All of these actions fit nicely into the conceptual framework of method chaining. Although, 2 very important and common data transformation operations were not included in this list. Column mutation and Filtering will be covered below.

3. Performing Column Mutations

The traditional way to create new and manipulate existing columns would look like the following:

1# Modify the existing revenue column by filling NAs
2sales_df['revenue'] = sales_df['revenue'].fillna(value=0)
3
4# Create a margin column
5sales_df['margin'] = sales_df['revenue'] - sales_df['monthly_cost']

This works fine and is convenient for quick analysis. However, when a column needs to be created or modified in the middle of the data transformation process, the concept of method chaining could fall apart.

Use .assign(): This method assigns a new column or overwrites an existing column in a DataFrame.

We can re-write the above examples using assign:

1transformed_df = (sales_df
2    .assign(revenue=lambda x: x['revenue'].fillna(value=0))     # Overwrite existing
3    .assign(margin=lambda x: x['revenue'] - x['monthly_cost'])  # Create new column
4)

Now column mutations can fully integrate into method chaining data pipelines.

4. Filtering Data

Many analysts will filter DataFrames in the following manner:

1# Filtering on a single column
2revenue_df = revenue_df[revenue_df['customer_name'] == 'ACME Corp']
3
4# Filtering on multiple columns
5revenue_df = revenue_df[
6    (revenue_df['customer_name'] == 'ACME Corp') &
7    (revenue_df['invoice_month'] == '2025-01-01')
8]

Similar to the traditional method for column mutations, this approach works, but it breaks method chaining and is harder to read.

There are 2 alternatives and both fit cleanly into method chaining.

1. The `.query()` method

The .query() method is intuitive and works extremely well. Despite this, it is not my preference, only because it uses a string-based expression syntax, which differs slightly from standard pandas boolean masking.

To provide a very simple example, this method works as follows:

1# We can reference this variable in the .query() method
2customer_name = 'ACME Corp'
3
4transformed_df = (raw_df
5    .query("""
6        customer_name == @customer_name and 
7        invoice_month == '2025-01-01'
8    """)
9)

This approach has several pros:

The syntax is clean and clear
This line is readable and works in a method chain
It feels similar to SQL

2. The `.loc[]` method

I prefer using .loc[] because it is explicit and more pythonic. This method uses the same boolean masking that we used in the "bad" example above, but it can flow in a chain of methods.

1customer_name = 'ACME Corp'
2
3filtered_df = (raw_df
4    .loc[lambda x: (
5        x['customer_name'] == customer_name) & 
6        (x['invoice_month'] == '2025-01-01')
7    ]
8)

This approach also has several pros:

Native python syntax
Full IDE support (autocomplete, linting)
Safer for dynamic expressions

Both .query() and .loc[] work well, choose your preference.

A note on chain-safety

In both the .loc[] filtering example and the .assign() column mutation example, I used lambda functions. This is necessary to ensure we receive the most recently modified version of the DataFrame.

See this example, where we rename account_name to customer_name, then attempt to filter on account_name:

1# This won't work
2filtered_df = (raw_df
3    .rename(columns={'account_name': 'customer_name'})
4    .loc[raw_df['customer_name'] == 'ACME Corp']
5)
6
7# This will work
8filtered_df = (raw_df
9    .rename(columns={'account_name': 'customer_name'})
10    .loc[lambda x: x['customer_name'] == 'ACME Corp']
11)

The first code snippet will not work, because we are attemping to filter the DataFrame using the original raw_df DataFrame.
The second code snippet will work, because .loc[] is receiving the latest DataFrame, with account_name renamed to customer_name.

It is my recommendation to default to using lambda when method chaining, even though it isn't technically always necessary.

5. Building Reusable Logic

As your pandas workflows grow, duplication of code becomes a concern. .pipe() solves this by leting you move transformation logic into reusable and composable functions — while keeping your method chain clean.

The core idea

.pipe() does the following:

Passes the DataFrame as the first argument
Accepts additional arguments
Preserves chain flow

1example_df.pipe(my_function, some_arg=value)

Let's say we have a function that takes a DataFrame and filters for an active status.

1import pandas
2
3def filter_active(df: pandas.DataFrame) -> pandas.DataFrame:
4    return df.loc[df['status'] == 'Active']

We could pass our DataFrame into this function like so:

1new_df = filter_active(current_df)

But this doesn't fit into the method chaining framework. This is where .pipe() comes into play. .pipe() works just like any other chained method by taking a method that expects and returns a DataFrame.

1transformed_df = (raw_df
2    .pipe(filter_active)
3)

There are many pros to this approach:

Modular business logic
Reusable definitions for metrics and reporting
Cleaner pipelines
Removal of duplicated business logic across modules
Easier testing

Final Thoughts

Method chaining improves readability and can support virtually any data transformation, as long as you understand the tools available.

Write Clean, Composable Data Transformation Pipelines