Handling Missing Data in Pandas: A Guide to dropna() and fillna()

3D visualization of repairing a data grid by filling holes or removing rows, representing Pandas dropna and fillna methods.

In the real world, your datasets will have holes. Users forget to fill out forms, sensors break, or data gets corrupted. as these Pandas Missing Data values show up as NaN (Not a Number).

You cannot ignore them. If you try to do math with a NaN, the result is often just more NaNs. You have two main choices: Drop them or Fill them.

1. Finding Missing Data

Let’s first discover areas in the data where Pandas indicates missing values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [10, 11, 12, 13]
})

# Check for missing values (returns True/False for every cell)
print(df.isnull())

# Count missing values in each column (SUPER USEFUL!)
print(df.isnull().sum())
# Output:
# A    1
# B    2
# C    0

2. Option A: Drop Them (dropna)

If a row has too much missing data to be useful, just get rid of it.

# Drop ANY row that has at least one missing value
clean_df = df.dropna()
print(clean_df)
# Only rows 0 and 3 remain.

You can also drop columns that have missing values by using axis=1 for Pandas dataframes.

# Drop columns with missing values
clean_cols = df.dropna(axis=1)
# Only column 'C' remains.

3. Option B: Fill Them (fillna)

Often, dropping data is too aggressive. It’s better to fill the holes with a reasonable guess, like zero, or the average value of that column.

# Fill ALL missing values with 0
filled_zero = df.fillna(0)

# Fill with the average (mean) of each column
# (This is very common in Data Science)
filled_mean = df.fillna(df.mean())
print(filled_mean)

Now your dataset, free from missing data, is clean and ready for analysis!

Key Takeaways

  • Datasets often contain missing data due to user errors, sensor issues, or corruption, represented as NaN in Pandas.
  • Handling Pandas Missing Data is essential; you can either drop missing entries or fill them with values.
  • To find missing data, first check which areas in your dataset have NaN values.
  • You can drop rows with too much missing data using dropna or eliminate columns with missing values using axis=1.
  • Alternatively, you can fill in missing data using fillna with values like zero or the column average for cleaner analysis.

Similar Posts

Leave a Reply