
Loading data is easy. Summarizing it is where the value lies, and that’s where Pandas groupby can make a big difference. If you have a sales dataset, you don’t want to see every individual sale; you want to see “Total Sales Per Month” or “Average Sales Per Product.”
1. The groupby() Method
This is the most important tool for summarization. It follows the “Split-Apply-Combine” pattern:
- Split the data into groups (e.g., by “Category”).
- Apply a function to each group (e.g., sum, mean, count).
- Combine the results back together.
import pandas as pd
data = {
'Product': ['A', 'B', 'A', 'B', 'C'],
'Sales': [100, 200, 150, 250, 300],
'Region': ['East', 'East', 'West', 'West', 'East']
}
df = pd.DataFrame(data)
# Group by 'Product' and sum their 'Sales'
print(df.groupby('Product')['Sales'].sum())
# Output:
# Product
# A 250
# B 450
# C 3002. Pivot Tables (pivot_table)
If you come from Excel, you know Pivot Tables. Pandas can do them too, and they are even more powerful.
# Create a table with Products as rows, Regions as columns, showing average sales
pivot = df.pivot_table(values='Sales', index='Product', columns='Region', aggfunc='mean')
print(pivot)This instantly creates a readable grid showing exactly how each product performs in each region.





