A Deep Dive into the Polars Expression API (pl.Expr)

3D visualization of the internal expression tree of the Polars library, representing the pl.Expr API.

We’ve used the Polars Expression API a lot. But what is an expression?

An expression, or pl.Expr, is a recipe for a calculation. It’s not the final value.

  • pl.col("sales") is an expression that says: “When you run, grab the ‘sales’ column.”
  • pl.col("sales").sum() is an expression that says: “Grab ‘sales’ and sum it.”

Polars collects all these recipes, optimizes them, and then runs them in parallel. This is why it’s so fast.

The Core Components

1. pl.col – The Column Selector

This is the most common expression. It selects one or more columns to act upon.

import polars as pl
df = pl.DataFrame({"A": [1, 2], "B": [3, 4]})

# This is a list of two expressions:
expr_list = [
    pl.col("A"),
    pl.col("B") * 2
]
df.select(expr_list)

2. pl.lit – The Literal

What if you want to compare a column to a number, like 5? You can’t just type 5. Polars needs it to be an “expression” too. pl.lit() (literal) wraps a value so it can be used in the expression chain.

# Create a new column 'C' where every value is 5
df.with_columns(
    pl.lit(5).alias("C")
)

# Use it in a filter
df.filter(
    pl.col("A") > pl.lit(1)
)

(Note: Polars is smart. pl.col("A") > 1 is automatically converted to pl.col("A") > pl.lit(1). But pl.lit is essential when the context is ambiguous).

3. when/then/otherwise – The Conditional

This is the pl.Expr version of an if/else statement.

df.select(
    pl.when(pl.col("A") > 1)
      .then(pl.lit("A is big"))
      .otherwise(pl.lit("A is small"))
      .alias("description")
)

By learning to “think in expressions,” you stop thinking step-by-step (like Pandas) and start thinking about the final result (like Polars/SQL), which is the key to massive performance gains.


Key Takeaways

  • An expression in the Polars Expression API is a recipe for calculations, not the final value.
  • Common expressions include pl.col for selecting columns and pl.lit for wrapping literals.
  • Use when/then/otherwise for conditional logic, similar to if/else statements.
  • Polars optimises and executes these expressions in parallel, resulting in high performance.
  • Thinking in expressions enhances efficiency compared to traditional step-by-step methods like Pandas.

Similar Posts

Leave a Reply