
We know that .map_elements() is slow because it runs row-by-row. We know that .group_by().agg() is super fast, but it’s limited to simple functions (like sum, mean). In this article, we’ll look at how to use Polars groupby apply to handle more complex operations efficiently.
Warning: This is slower than .agg() because it breaks out of the optimized Polars engine into pure Python. But it’s much faster than .map_elements() because it only runs once per group, not once per row.
The Goal
Let’s find the sales value for the second transaction in each product group.
import polars as pl
df = pl.DataFrame({
"product": ["A", "B", "A", "B", "A"],
"sales": [10, 20, 30, 40, 50]
})The .apply() Method
The function you pass to .apply() will receive a full DataFrame (the sub-group) as its input.
# 1. Define a function that takes a DataFrame
def get_second_sale(group_df):
if len(group_df) > 1:
# Return the 'sales' value from the 2nd row (index 1)
return group_df.item(1, "sales")
return None
# 2. Use .group_by().apply()
result = df.group_by("product").apply(get_second_sale)
print(result)Output:
shape: (2, 2) โโโโโโโโโโโฌโโโโโโโโ โ product โ apply โ โ --- โ --- โ โ str โ i64 โ โโโโโโโโโโโชโโโโโโโโก โ B โ 40 โ โ A โ 30 โ โโโโโโโโโโโดโโโโโโโโ
This is a powerful tool for when you need to run complex logic (like a mini-machine learning model or a statistical test) on each group of your data.
Key Takeaways
- The .map_elements() method is slow, running row-by-row, while .group_by().agg() is fast but limited to simple functions.
- For complex functions on entire groups, use .group_by().apply() in Polars.
- The .apply() method processes data once per group, making it faster than .map_elements() but slower than .agg() due to Python overhead.
- It allows for advanced logic, such as mini-machine learning models or statistical tests, to be applied to each group.
- The goal is to find the sales value for the second transaction in each product group using this method.





