Cleaning Text in Polars: The .str Expression Namespace

ByAhmed Nabil April 10, 2026March 22, 2026

3D visualization of robotic lasers and brushes cleaning rough text blocks, representing the Polars string manipulation - Polars .str namespace.

Text data is almost always messy. One of the most efficient ways to tackle this is with Polars string manipulation. In Pandas, you use .str to clean it. In Polars, you do the same, but it’s part of the powerful Expression API, which makes it faster and more consistent.

All string expressions are available under pl.col("my_column").str, ensures that your data wrangling tasks remain efficient and manageable.

The Setup “Polars string manipulation”

import polars as pl
df = pl.DataFrame({
    "email": ["alice@gmail.com", "bob@yahoo.com", "carol@hotmail.com"],
    "notes": ["Item 1", "Item 2", "Item 1, Item 3"]
})

1. `contains()`: Filtering with Text

Find all rows where the “notes” column mentions “Item 1”.

df.filter(
    pl.col("notes").str.contains("Item 1")
)

2. `replace()`: Cleaning Data

Let’s change all “gmail.com” to “https://www.google.com/url?sa=E&source=gmail&q=google.com”.

df.with_columns(
    pl.col("email").str.replace("gmail.com", "google.com")
)

3. `extract()`: Using Regex to Get Data

This is the most powerful tool. Let’s extract just the domain name from the emails. The regex r"@(.+)" captures everything after the @.

df.with_columns(
    pl.col("email").str.extract(r"@(.+)", 1).alias("domain")
)

Output:

shape: (3, 3)
┌───────────────────┬──────────────────┬─────────────┐
│ email             ┆ notes            ┆ domain      │
│ ---               ┆ ---              ┆ ---         │
│ str               ┆ str              ┆ str         │
╞═══════════════════╪══════════════════╪═════════════╡
│ alice@gmail.com   ┆ Item 1           ┆ gmail.com   │
│ bob@yahoo.com     ┆ Item 2           ┆ yahoo.com   │
│ carol@hotmail.com ┆ Item 1, Item 3   ┆ hotmail.com │
└───────────────────┴──────────────────┴─────────────┘

Key Takeaways

Text data is often messy, and Polars string manipulation offers an efficient way to clean it.
You use the Expression API in Polars for enhanced speed and consistency, similar to Pandas’ .str.
Key functions include contains() for filtering, replace() for data cleaning, and extract() for regex data extraction.

Ahmed Nabil

Python Engineer and the founder of Python Pro Hub. With a focus on modern data science (Polars), backend architecture (FastAPI/Django), and automation, builds production-grade tutorials designed to take developers from absolute beginners to advanced software engineers.

Data Science
Joining DataFrames in Polars: The Blazing Fast join() Method
ByAhmed Nabil March 30, 2026March 14, 2026
In Pandas, you use pd.merge() to combine datasets. In Polars, you use the join() method, which is one of the fastest in any library. If…
Read More Joining DataFrames in Polars: The Blazing Fast join() Method
Data Science | Python Projects
AI Project: Build a Sentiment Analyzer with Hugging Face in 5 Lines
ByAhmed Nabil March 11, 2026February 3, 2026
This is a perfect first project to show the power of our Hugging Face Hub. We will use a pre-trained AI model to instantly determine…
Read More AI Project: Build a Sentiment Analyzer with Hugging Face in 5 Lines
Data Science | Python Projects
AI Project: Text Generation with gpt-2 (Hugging Face)
ByAhmed Nabil March 25, 2026March 25, 2026
We’ve used the Hugging Face pipeline to understand text (sentiment-analysis) and answer questions (question-answering). Now, let’s use it for its most famous task: Text Generation….
Read More AI Project: Text Generation with gpt-2 (Hugging Face)
Data Science | Web Development
PyScript for Data Science: How to Use Pandas & Matplotlib in HTML
ByAhmed Nabil March 14, 2026February 3, 2026
You’ve learned how PyScript can run Python in a browser and how to interact with the page. Now, let’s do something powerful. In this article,…
Read More PyScript for Data Science: How to Use Pandas & Matplotlib in HTML
Python Errors
How to Fix: TypeError: ‘str’ object is not callable in Python
ByAhmed Nabil December 14, 2025March 17, 2026
TypeError: ‘str’ object This is one of the most common—and most confusing—errors for new Python developers. You see it, and you think, “What’s a ‘callable’?”…
Read More How to Fix: TypeError: ‘str’ object is not callable in Python
Data Science | Python Projects
Machine Learning Project: Your First Classifier (Iris Dataset)
ByAhmed Nabil February 27, 2026February 2, 2026
In our House Price project, we did Regression (predicting a number). Today, we’ll do Classification (predicting a category). We’re going to explore a Machine Learning…
Read More Machine Learning Project: Your First Classifier (Iris Dataset)

The Setup “Polars string manipulation”

1. contains(): Filtering with Text

2. replace(): Cleaning Data

3. extract(): Using Regex to Get Data

Key Takeaways

Similar Posts

Leave a Reply Cancel reply

1. `contains()`: Filtering with Text

2. `replace()`: Cleaning Data

3. `extract()`: Using Regex to Get Data