How to Find and Remove Duplicate Rows in Polars (2026 Guide)

Duplicate data is a silent killer for analysis and machine learning. Polars provides high-speed, easy-to-use methods for finding and removing duplicate rows.

The Setup

Let’s create a DataFrame with duplicate entries.

import polars as pl
df = pl.DataFrame({
    "id": [1, 2, 1, 3],
    "name": ["Alice", "Bob", "Alice", "Charlie"],
    "data": [100, 200, 100, 300]
})
print(df)

shape: (4, 3)
┌─────┬─────────┬──────┐
│ id  ┆ name    ┆ data │
│ --- ┆ ---     ┆ ---  │
│ i64 ┆ str     ┆ i64  │
╞═════╪═════════╪══════╡
│ 1   ┆ Alice   ┆ 100  │
│ 2   ┆ Bob     ┆ 200  │
│ 1   ┆ Alice   ┆ 100  │
│ 3   ┆ Charlie ┆ 300  │
└─────┴─────────┴──────┘

Row 0 and Row 2 are identical.

1. Find Duplicates (`.is_duplicated()`)

This creates a new boolean column that is True for any row that has been seen before.

df.with_columns(
    pl.col("id").is_duplicated().alias("is_duplicate")
)

Output:

shape: (4, 4)
┌─────┬─────────┬──────┬──────────────┐
│ id  ┆ name    ┆ data ┆ is_duplicate │
│ --- ┆ ---     ┆ ---  ┆ ---          │
│ i64 ┆ str     ┆ i64  ┆ bool         │
╞═════╪═════════╪══════╪══════════════╡
│ 1   ┆ Alice   ┆ 100  ┆ false        │
│ 2   ┆ Bob     ┆ 200  ┆ false        │
│ 1   ┆ Alice   ┆ 100  ┆ true         │
│ 3   ┆ Charlie ┆ 300  ┆ false        │
└─────┴─────────┴──────┴──────────────┘

2. Remove Duplicates (`.unique()`)

This is the simple, one-step method to get a clean DataFrame. unique() and distinct() are aliases for the same operation.

df.unique()

Output:

shape: (3, 3)
┌─────┬─────────┬──────┐
│ id  ┆ name    ┆ data │
│ --- ┆ ---     ┆ ---  │
│ i64 ┆ str     ┆ i64  │
╞═════╪═════════╪══════╡
│ 1   ┆ Alice   ┆ 100  │
│ 2   ┆ Bob     ┆ 200  │
│ 3   ┆ Charlie ┆ 300  │
└─────┴─────────┴──────┘

3. Advanced: `unique(subset=...)`

What if you only want to find duplicates based on one column? (e.g., “Keep the first entry for each id“).

# Keep the first row it finds for each unique 'id'
df.unique(subset=["id"], keep="first")

This is a core data cleaning operation that Polars performs incredibly fast.

Key Takeaways

Duplicate data hinders analysis and machine learning, but Polars offers efficient methods for identifying and removing them.
To find duplicates, use the .is_duplicated() method, which adds a boolean column indicating previously seen rows.
You can remove duplicates easily with the .unique() method, which cleans the DataFrame in one step.
The .unique(subset=…) method allows you to find duplicates based on a specific column, such as an ‘id’.
Polars excels at these data cleaning tasks, performing them with impressive speed.

How to Find and Remove Duplicate Rows in Polars (2026 Guide)

The Setup

1. Find Duplicates (`.is_duplicated()`)

2. Remove Duplicates (`.unique()`)

3. Advanced: `unique(subset=...)`

Key Takeaways

AI Project: Speech-to-Text with Hugging Face (OpenAI Whisper)

Master Polars: A Guide to the Expression API (select, filter, with_columns)

AI Project: Build a Question-Answering Bot with Hugging Face

The Future of DataFrames: Intro to Polars for High-Performance Python (2026 Guide)

AI Project: Building an Instant AI Chatbot with Groq (500+ Tokens/s)

Polars Feature Engineering: Lags, Diffs, and Percent Changes

Leave a Reply Cancel reply

The Setup

1. Find Duplicates (.is_duplicated())

2. Remove Duplicates (.unique())

3. Advanced: unique(subset=...)

Key Takeaways

Similar Posts

Leave a Reply Cancel reply

1. Find Duplicates (`.is_duplicated()`)

2. Remove Duplicates (`.unique()`)

3. Advanced: `unique(subset=...)`