Advanced Pandas: Mastering groupby() and Pivot Tables

3D visualization of a machine sorting and arranging colored blocks into groups and grids, representing Pandas groupby and pivot tables.

Loading data is easy. Summarizing it is where the value lies, and that’s where Pandas groupby can make a big difference. If you have a sales dataset, you don’t want to see every individual sale; you want to see “Total Sales Per Month” or “Average Sales Per Product.”

1. The groupby() Method

This is the most important tool for summarization. It follows the “Split-Apply-Combine” pattern:

  1. Split the data into groups (e.g., by “Category”).
  2. Apply a function to each group (e.g., sum, mean, count).
  3. Combine the results back together.
import pandas as pd
data = {
    'Product': ['A', 'B', 'A', 'B', 'C'],
    'Sales': [100, 200, 150, 250, 300],
    'Region': ['East', 'East', 'West', 'West', 'East']
}
df = pd.DataFrame(data)

# Group by 'Product' and sum their 'Sales'
print(df.groupby('Product')['Sales'].sum())
# Output:
# Product
# A    250
# B    450
# C    300

2. Pivot Tables (pivot_table)

If you come from Excel, you know Pivot Tables. Pandas can do them too, and they are even more powerful.

# Create a table with Products as rows, Regions as columns, showing average sales
pivot = df.pivot_table(values='Sales', index='Product', columns='Region', aggfunc='mean')
print(pivot)

This instantly creates a readable grid showing exactly how each product performs in each region.

Similar Posts

Leave a Reply