Joining DataFrames in Polars: The Blazing Fast join() Method

3D isometric visualization of two futuristic trains merging onto a single track at high speed, representing Polars join DataFrames .

In Pandas, you use pd.merge() to combine datasets. In Polars, you use the join() method, which is one of the fastest in any library. If you want to learn specifically about how to use Polars join DataFrames functionality, this guide will explain the essentials.

Just like a SQL JOIN, it lets you combine two tables based on a shared “key” column.

The Setup

Let’s create two DataFrames: one for users and one for their orders.

import polars as pl

users = pl.DataFrame({
    "user_id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})

orders = pl.DataFrame({
    "user_id": [1, 1, 2],
    "product": ["Keyboard", "Mouse", "Monitor"],
})

1. The “Inner” Join (Default)

An inner join finds only the rows that have a match in both DataFrames.

# Join 'users' (left) with 'orders' (right) on the "user_id" column
joined_df = users.join(orders, on="user_id")

print(joined_df)

Output:

shape: (3, 3)
┌─────────┬───────┬──────────┐
│ user_id ┆ name  ┆ product  │
│ ---     ┆ ---   ┆ ---      │
│ i64     ┆ str   ┆ str      │
╞═════════╪═══════╪══════════╡
│ 1       ┆ Alice ┆ Keyboard │
│ 1       ┆ Alice ┆ Mouse    │
│ 2       ┆ Bob   ┆ Monitor  │
└─────────┴───────┴──────────┘

Notice Charlie (user_id 3) is gone because he had no orders.

2. The “Left” Join

A left join keeps everything from the left DataFrame (users) and only brings in matches from the right (orders).

left_join_df = users.join(orders, on="user_id", how="left")

print(left_join_df)

Output:

shape: (4, 3)
┌─────────┬─────────┬──────────┐
│ user_id ┆ name    ┆ product  │
│ ---     ┆ ---     ┆ ---      │
│ i64     ┆ str     ┆ str      │
╞═════════╪═════════╪══════════╡
│ 1       ┆ Alice   ┆ Keyboard │
│ 1       ┆ Alice   ┆ Mouse    │
│ 2       ┆ Bob     ┆ Monitor  │
│ 3       ┆ Charlie ┆ null     │
└─────────┴─────────┴──────────┘

Now Charlie is included, but his product is null (Polars’ version of NaN).

Similar Posts

Leave a Reply