
The most common data analysis task is “Split-Apply-Combine.” When using Polars, the groupby operation is essential for this task.
- Split data into groups (e.g., “by Product”).
- Apply a function (e.g., “sum of Sales”).
- Combine the results.
In Polars, this is done with groupby() and agg(), and it’s built to be parallel and incredibly fast.
The groupby and agg Syntax
This is the core of Polars analysis.
import polars as pl
df = pl.DataFrame({
"product": ["A", "B", "A", "B", "C"],
"region": ["East", "East", "West", "West", "East"],
"sales": [100, 200, 150, 250, 300]
})
# Get the total sales for EACH product
result = (
df.group_by("product")
.agg(pl.col("sales").sum().alias("total_sales"))
)
print(result)Output:
shape: (3, 2) ┌─────────┬─────────────┐ │ product ┆ total_sales │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════════╪═════════════╡ │ C ┆ 300 │ │ B ┆ 450 │ │ A ┆ 250 │ └─────────┴─────────────┘
Advanced: Multiple Aggregations
You can do many aggregations at once.
# Get the SUM, MEAN, and COUNT of sales for each region
result = (
df.group_by("region")
.agg([
pl.col("sales").sum().alias("Total Sales"),
pl.col("sales").mean().alias("Avg Sales"),
pl.col("sales").count().alias("Num Sales"),
])
)
print(result)Advanced: Window Functions (over)
What if you want to add a column without collapsing the DataFrame? Use over().
# Add a new column showing the average sales FOR THAT product's group
df.with_columns(
pl.col("sales").mean().over("product").alias("avg_sales_for_product")
)This is the power of the Polars Expression API: it’s clean, chainable, and faster than Pandas.





