
In our House Price project, we did Regression (predicting a number). Today, we’ll do Classification (predicting a category). We’re going to explore a Machine Learning Classification Project in detail.
Our goal: Build a model that can guess the species of an Iris flower (“setosa”, “versicolor”, or “virginica”) just by looking at its petal and sepal measurements.
Step 1: Load the Data
Scikit-Learn comes with this classic dataset built-in.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load data
iris = load_iris()
X = iris.data
y = iris.targetXis the data (4 columns: sepal length/width, petal length/width).yis the target (0, 1, or 2, representing the 3 species).
Step 2: Train-Test Split
We split the data so we can train on 80% and test on the unseen 20%.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Step 3: Train the Model
We’ll use a simple “K-Nearest Neighbors” classifier. It just finds the 3 “closest” known flowers and picks the most common species.
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)Step 4: Test the Model
y_pred = model.predict(X_test)
# See how accurate it was
acc = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {acc * 100:.2f}%")
# Output: Model Accuracy: 100.00% (It's a very easy dataset!)Step 5: Make a New Prediction
Let’s predict a new flower we just found.
new_flower = [[5.1, 3.5, 1.4, 0.2]] # [sepal_L, sepal_W, petal_L, petal_W]
prediction = model.predict(new_flower)
species_name = iris.target_names[prediction[0]]
print(f"Prediction: This is a {species_name}!")
# Output: Prediction: This is a setosa!




