Polars Window Functions: The over() Method (SQL Partition By)

3D visualization of factory floor partitioned into glass rooms where robots perform independent calculations, representing Polars window functions over method.

We’ve used groupby().agg(), which collapses your data (e.g., 100 rows become 3 rows). In contrast, Polars window functions allow you to compute calculations across groups without collapsing the original data.

what if you want to add a new column that references a group? For example: “What percentage of my department’s total salary is my individual salary?”

This is a Window Function. In SQL, you use OVER(PARTITION BY ...). In Polars, you use .over().

The Setup

import polars as pl
df = pl.DataFrame({
    "department": ["Sales", "Sales", "Eng", "Eng"],
    "employee": ["Alice", "Bob", "Charlie", "David"],
    "salary": [100, 150, 200, 250]
})

The Goal

We want to add a total_salary_by_dept column without collapsing the DataFrame.

df_with_total = df.with_columns(
    pl.col("salary").sum().over("department").alias("dept_total_salary")
)
print(df_with_total)

Output:

shape: (4, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ department โ”† employee โ”† salary โ”† dept_total_salary โ”‚
โ”‚ ---        โ”† ---      โ”† ---    โ”† ---               โ”‚
โ”‚ str        โ”† str      โ”† i64    โ”† i64               โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Sales      โ”† Alice    โ”† 100    โ”† 250               โ”‚
โ”‚ Sales      โ”† Bob      โ”† 150    โ”† 250               โ”‚
โ”‚ Eng        โ”† Charlie  โ”† 200    โ”† 450               โ”‚
โ”‚ Eng        โ”† David    โ”† 250    โ”† 450               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The .over("department") told Polars to create “windows” (partitions) for “Sales” and “Eng,” calculate the sum() for each window, and then map those totals back to the original rows.

Now you can easily calculate that percentage:

df_final = df_with_total.with_columns(
    (pl.col("salary") / pl.col("dept_total_salary")).alias("percent_of_dept_salary")
)
print(df_final)

Key Takeaways

  • The article discusses how to use groupby().agg() to collapse data in Polars.
  • It explains the need to add a new column that references a group, such as total salary per department.
  • Window functions are introduced, using SQL’s OVER(PARTITION BY …) and Polars’ .over() methods.
  • The goal is to add a total_salary_by_dept column without collapsing the DataFrame.
  • The .over(‘department’) creates partitions to calculate and map totals back to the original rows.

Similar Posts

Leave a Reply