
We’ve done Regression (predicting prices) and Classification (predicting species). Both are Supervised learning (they need labeled answers). Now let’s dive into K-Means Clustering Python, a popular unsupervised learning technique.
We have data, but no answers. Our goal is to ask Python: “Can you find any natural groups (clusters) in this data for me?”
This is perfect for finding customer segments (e.g., “High Spenders, Low Activity” vs. “Low Spenders, High Activity”).
Step 1: The Data
We’ll create a simple, fake dataset of customer spending habits.
import pandas as pd
from sklearn.cluster import KMeans
import seaborn as sns
import matplotlib.pyplot as plt
# Fake customer data
data = {
'Annual_Income_k': [15, 20, 16, 22, 28, 33, 35, 61, 63, 60],
'Spending_Score': [39, 81, 6, 77, 40, 76, 36, 6, 52, 49]
}
df = pd.DataFrame(data)Step 2: Train the Model
The only thing we have to tell K-Means is how many clusters (k) to look for. Let’s start by looking for 3 groups.
# Initialize the model to find 3 clusters
model = KMeans(n_clusters=3, n_init=10, random_state=0)
# Train the model (it doesn't use 'y'!)
model.fit(df)
# Get the labels it assigned
cluster_labels = model.labels_
print(cluster_labels)
# Output: [1, 2, 1, 2, 1, 2, 1, 0, 0, 0]The model has automatically assigned each customer to group 0, 1, or 2.
Step 3: Visualize the Clusters
This is where it makes sense. Let’s plot the data, but color each dot based on the cluster it was assigned to.
# Add the labels back to our original DataFrame
df['Cluster'] = cluster_labels
# Plot using Seaborn
sns.scatterplot(
data=df,
x='Annual_Income_k',
y='Spending_Score',
hue='Cluster', # Color the dots by their cluster!
palette='Set1'
)
plt.title('Customer Segments')
plt.show()You will instantly see three distinct groups, allowing you to target each one with different marketing!





