Time-Series in Polars: Filling Gaps with upsample and interpolate

3D visualization of a paving machine filling holes in a road, representing Polars upsample and interpolate functions.

Real-world data is often “sparse.” You might have sales data for Monday and Friday, but nothing for Tuesday, Wednesday, or Thursday. This is where polars upsample interpolate techniques can be useful for filling in the missing values.

To analyze this, you often need to upsample the data (create the empty rows for Tue/Wed/Thu) and then interpolate (make a guess at what those values would be).

Step 1: The Setup

Let’s create a sparse DataFrame.

import polars as pl
from datetime import date

df = pl.DataFrame({
    "time": [date(2025, 1, 1), date(2025, 1, 4)],
    "sales": [10, 40]
})
print(df)
shape: (2, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ time       โ”† sales โ”‚
โ”‚ ---        โ”† ---   โ”‚
โ”‚ date       โ”† i64   โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 2025-01-01 โ”† 10    โ”‚
โ”‚ 2025-01-04 โ”† 40    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 2: upsample() (Create the Gaps)

First, we use upsample() to create the empty rows for Jan 2 and Jan 3.

# '1d' = 1 day frequency
df_upsampled = df.upsample(time_column="time", every="1d")
print(df_upsampled)
shape: (4, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ time       โ”† sales โ”‚
โ”‚ ---        โ”† ---   โ”‚
โ”‚ date       โ”† i64   โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 2025-01-01 โ”† 10    โ”‚
โ”‚ 2025-01-02 โ”† null  โ”‚
โ”‚ 2025-01-03 โ”† null  โ”‚
โ”‚ 2025-01-04 โ”† 40    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 3: interpolate() (Fill the Gaps)

Now, we fill those null values. The most common method is “linear” interpolation (draw a straight line between 10 and 40).

df_filled = df_upsampled.interpolate()
print(df_filled)
shape: (4, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ time       โ”† sales โ”‚
โ”‚ ---        โ”† ---   โ”‚
โ”‚ date       โ”† i64   โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 2025-01-01 โ”† 10    โ”‚
โ”‚ 2025-01-02 โ”† 20    โ”‚
โ”‚ 2025-01-03 โ”† 30    โ”‚
โ”‚ 2025-01-04 โ”† 40    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Polars automatically filled the gaps with 20 and 30, giving you a clean, complete dataset for analysis.


Key Takeaways

  • Real-world data often appears sparse, lacking entries for certain days.
  • To analyse sparse data, you need to use Polars upsample to create gaps for missing days.
  • After upsampling, use interpolate to fill in the gaps, typically using linear interpolation.
  • Polars simplifies this process by automatically filling the gaps, yielding a complete dataset for analysis.

Similar Posts

Leave a Reply