
Real-world data is often “sparse.” You might have sales data for Monday and Friday, but nothing for Tuesday, Wednesday, or Thursday. This is where polars upsample interpolate techniques can be useful for filling in the missing values.
To analyze this, you often need to upsample the data (create the empty rows for Tue/Wed/Thu) and then interpolate (make a guess at what those values would be).
Step 1: The Setup
Let’s create a sparse DataFrame.
import polars as pl
from datetime import date
df = pl.DataFrame({
"time": [date(2025, 1, 1), date(2025, 1, 4)],
"sales": [10, 40]
})
print(df)shape: (2, 2) โโโโโโโโโโโโโโฌโโโโโโโโ โ time โ sales โ โ --- โ --- โ โ date โ i64 โ โโโโโโโโโโโโโโชโโโโโโโโก โ 2025-01-01 โ 10 โ โ 2025-01-04 โ 40 โ โโโโโโโโโโโโโโดโโโโโโโโ
Step 2: upsample() (Create the Gaps)
First, we use upsample() to create the empty rows for Jan 2 and Jan 3.
# '1d' = 1 day frequency df_upsampled = df.upsample(time_column="time", every="1d") print(df_upsampled)
shape: (4, 2) โโโโโโโโโโโโโโฌโโโโโโโโ โ time โ sales โ โ --- โ --- โ โ date โ i64 โ โโโโโโโโโโโโโโชโโโโโโโโก โ 2025-01-01 โ 10 โ โ 2025-01-02 โ null โ โ 2025-01-03 โ null โ โ 2025-01-04 โ 40 โ โโโโโโโโโโโโโโดโโโโโโโโ
Step 3: interpolate() (Fill the Gaps)
Now, we fill those null values. The most common method is “linear” interpolation (draw a straight line between 10 and 40).
df_filled = df_upsampled.interpolate() print(df_filled)
shape: (4, 2) โโโโโโโโโโโโโโฌโโโโโโโโ โ time โ sales โ โ --- โ --- โ โ date โ i64 โ โโโโโโโโโโโโโโชโโโโโโโโก โ 2025-01-01 โ 10 โ โ 2025-01-02 โ 20 โ โ 2025-01-03 โ 30 โ โ 2025-01-04 โ 40 โ โโโโโโโโโโโโโโดโโโโโโโโ
Polars automatically filled the gaps with 20 and 30, giving you a clean, complete dataset for analysis.
Key Takeaways
- Real-world data often appears sparse, lacking entries for certain days.
- To analyse sparse data, you need to use Polars upsample to create gaps for missing days.
- After upsampling, use interpolate to fill in the gaps, typically using linear interpolation.
- Polars simplifies this process by automatically filling the gaps, yielding a complete dataset for analysis.





