
In our Scikit-Learn intro, we used tiny fake data. Now we’ll use Python to predict house prices and build a real model.
We’ll use a classic dataset (Boston Housing) to predict a house’s value (MEDV) based on its features (like the number of rooms, RM).
Step 1: Load and Prepare Data
We’ll use a sample dataset included with Scikit-Learn for simplicity.
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load the data
boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target
# 1. Define our Features (X) and Target (y)
# We'll just use 'RM' (Rooms) to predict the 'MEDV' (Value)
X = df[['RM']]
y = df['MEDV']
# 2. Split data into a "training" set and a "testing" set
# We train on 80% of the data, and test on the unseen 20%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Step 2: Train the Model
model = LinearRegression()
model.fit(X_train, y_train)
print("Model trained!")Step 3: Test the Model
Now we use the “unseen” X_test data to make predictions and see how close they are to the real prices (y_test).
# Make predictions on the test set
y_pred = model.predict(X_test)
# Let's see the first 5 predictions vs actuals
print("PREDICTED | ACTUAL")
for i in range(5):
print(f"${y_pred[i]*1000:.0f} | ${y_test.iloc[i]*1000:.0f}")You now have a working model that can predict a house’s value based on its number of rooms!




