Master Polars: A Guide to the Expression API (select, filter, with_columns)

3D isometric illustration of a modular data engine representing Polars select, filter, and with_columns expressions.

In our Polars vs. Pandas article, we showed that Polars is faster. The reason it’s faster is its Expression API. In this article, we’ll take a closer look at the Polars Expression API and why it matters for performance.

Instead of telling Pandas how to do something step-by-step, Polars asks you to define what you want. It then optimizes the query and runs it using all your CPU cores.

The two magic words are:

  • pl.col("name"): Selects a column.
  • pl.lit("value"): Creates a “literal” value.

1. select(): Choosing and Renaming Columns

select is for columns.

import polars as pl
df = pl.DataFrame({
    "temp_c": [10, 20, 30],
    "location": ["A", "B", "A"]
})

# Select a column and rename it with an alias
df.select(
    pl.col("temp_c").alias("temp_in_celsius")
)

2. filter(): Choosing Rows

filter is for rows.

# Find all rows where temp_c is greater than 15
df.filter(
    pl.col("temp_c") > 15
)

3. with_columns(): Creating New Columns

This is the most powerful part. You can create new columns based on expressions.

# Let's create a new 'temp_f' column
df.with_columns([
    (pl.col("temp_c") * 9/5 + 32).alias("temp_f")
])

Chaining It All Together

This is the “Polars way.” Find all temps over 15 from location A, convert them to Fahrenheit, and select just that column.

results = (
    df.filter(
        (pl.col("temp_c") > 15) & (pl.col("location") == "A")
    )
    .with_columns([
        (pl.col("temp_c") * 9/5 + 32).alias("temp_f")
    ])
    .select("temp_f")
)
print(results)

Similar Posts

Leave a Reply