
In our Scikit-Learn intro, we used tiny fake data. Now we’ll use Python to predict house prices and build a real model.
We’ll use a classic dataset (Boston Housing) to predict a house’s value (MEDV) based on its features (like the number of rooms, RM).
Step 1: Load and Prepare Data
We’ll use a sample dataset included with Scikit-Learn for simplicity.
import pandas as pd from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # Load the data boston = load_boston() df = pd.DataFrame(boston.data, columns=boston.feature_names) df['MEDV'] = boston.target # 1. Define our Features (X) and Target (y) # We'll just use 'RM' (Rooms) to predict the 'MEDV' (Value) X = df[['RM']] y = df['MEDV'] # 2. Split data into a "training" set and a "testing" set # We train on 80% of the data, and test on the unseen 20% X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 2: Train the Model
model = LinearRegression()
model.fit(X_train, y_train)
print("Model trained!")Step 3: Test the Model
Now we use the “unseen” X_test data to make predictions and see how close they are to the real prices (y_test).
# Make predictions on the test set
y_pred = model.predict(X_test)
# Let's see the first 5 predictions vs actuals
print("PREDICTED | ACTUAL")
for i in range(5):
print(f"${y_pred[i]*1000:.0f} | ${y_test.iloc[i]*1000:.0f}")You now have a working model that can predict a house’s value based on its number of rooms!





