|

Machine Learning Project: Predicting House Prices with Scikit-Learn

3D visualization of a neighborhood with regression lines predicting values, representing a house price prediction project.

In our Scikit-Learn intro, we used tiny fake data. Now we’ll use Python to predict house prices and build a real model.

We’ll use a classic dataset (Boston Housing) to predict a house’s value (MEDV) based on its features (like the number of rooms, RM).

Step 1: Load and Prepare Data

We’ll use a sample dataset included with Scikit-Learn for simplicity.

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load the data
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

# 1. Define our Features (X) and Target (y)
# We'll just use 'RM' (Rooms) to predict the 'MEDV' (Value)
X = df[['RM']]
y = df['MEDV']

# 2. Split data into a "training" set and a "testing" set
# We train on 80% of the data, and test on the unseen 20%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 2: Train the Model

model = LinearRegression()
model.fit(X_train, y_train)

print("Model trained!")

Step 3: Test the Model

Now we use the “unseen” X_test data to make predictions and see how close they are to the real prices (y_test).

# Make predictions on the test set
y_pred = model.predict(X_test)

# Let's see the first 5 predictions vs actuals
print("PREDICTED | ACTUAL")
for i in range(5):
    print(f"${y_pred[i]*1000:.0f}  | ${y_test.iloc[i]*1000:.0f}")

You now have a working model that can predict a house’s value based on its number of rooms!

Similar Posts

Leave a Reply