Merging DataFrames in Pandas: A Guide to merge() and concat()

3D visualization of two separate data grids being zipped together into one master grid, representing Pandas merge and concat functions.

Real-world data is rarely in one single file. You might have sales data in one CSV and customer info in another. You need to combine them, and that’s where the concept of Pandas Merging DataFrames comes in handy.

Pandas gives you two main ways to do this: Concatenating (stacking) and Merging (joining).

1. Concatenating (pd.concat)

Think of this as stacking DataFrames on top of each other. It’s useful when you have two files with the same columns (e.g., “January Sales” and “February Sales”).

import pandas as pd

df1 = pd.DataFrame({'A': ['a1', 'a2'], 'B': ['b1', 'b2']})
df2 = pd.DataFrame({'A': ['a3', 'a4'], 'B': ['b3', 'b4']})

# Stack them vertically (axis=0 is default)
result = pd.concat([df1, df2])
print(result)
# Output will have 4 rows: a1, a2, a3, a4

2. Merging (pd.merge)

Think of this like a SQL JOIN. It connects data horizontally based on a shared “key” column (like a User ID).

users = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'order_id': [101, 102],
    'user_id': [1, 2],  # Charlie (user 3) has no orders
    'amount': [50, 100]
})

# Merge them on the 'user_id' column
# 'inner' join keeps ONLY rows that match in BOTH frames
merged_df = pd.merge(users, orders, on='user_id', how='inner')

print(merged_df)
# Result: Alice and Bob only. Charlie is dropped because he didn't match.

Try changing how='inner' to how='outer' to keep Charlie!

Similar Posts

Leave a Reply