
It’s very common to have a column in your data that contains a JSON string. In Pandas, this is slow and difficult to work with. In Polars, it’s fast and easy, thanks to the .json namespace.
This allows you to query inside the JSON string without having to parse the whole thing first.
The Setup
Let’s create a DataFrame with a messy JSON string column.
import polars as pl
df = pl.DataFrame({
"id": [1, 2],
"json_data": [
'{"name": "Alice", "age": 30, "city": "New York"}',
'{"name": "Bob", "age": 45, "city": "London"}'
]
})1. json_path_match: Check for Data
Let’s find all rows where the person’s name is “Alice”. We use a “JSONPath” expression ($.name) to look inside the string.
# Find rows where the 'name' key inside 'json_data' is "Alice"
result = df.filter(
pl.col("json_data").json.json_path_match("$.name") == "Alice"
)
print(result)This is incredibly fast and efficient.
2. json_extract: Pulling Data Out
What if you just want to get the “age” and “city” out into their own columns? Use json_extract().
df.with_columns(
# Parse the JSON string into a Polars "Struct" (like a dict)
pl.col("json_data").str.json_decode().alias("parsed_json")
).unnest("parsed_json")This single command uses .str.json_decode() to parse the string, and then .unnest() to split the JSON keys (name, age, city) into their own separate columns.
Output:
shape: (2, 4)
┌─────┬───────────────────────────┬───────┬─────┬──────────┐
│ id ┆ json_data ┆ name ┆ age ┆ city │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ i64 ┆ str │
╞═════╪═══════════════════════════╪═══════╪═════╪══════════╡
│ 1 ┆ {"name": "Alice", "age":… ┆ Alice ┆ 30 ┆ New York │
│ 2 ┆ {"name": "Bob", "age": 4… ┆ Bob ┆ 45 ┆ London │
└─────┴───────────────────────────┴───────┴─────┴──────────┘




