Your First Machine Learning Model: Linear Regression with Scikit-Learn

3D isometric illustration of a regression line being fitted through data points, representing linear regression in Scikit-Learn.

Machine Learning (ML) often sounds like magic, but at its core, it is just math. It is about finding patterns in data and using them to make predictions. If you are starting your journey, building a Linear Regression Scikit-Learn model is the absolute best way to begin.

Today, we will build the simplest possible ML model using Python. Imagine you have data on house sizes and their prices. Linear regression tries to draw a straight line through that data to predict the price of a new house based on its size.

Step 1: Install Scikit-Learn

pip install scikit-learn

Note: For more details on installation, you can visit the official Scikit-Learn installation guide.

Step 2: The Data Setup

We will use a tiny, made-up dataset for simplicity. In a real-world scenario, you would likely load this data from a CSV file using Pandas, but here we will hardcode it to keep things understandable.

We need to import numpy to handle the arrays and LinearRegression from the sklearn module.

import numpy as np
from sklearn.linear_model import LinearRegression

# Features (X): House size in 1000s of sq ft
# Scikit-learn needs a 2D array for features, hence the double brackets
X = np.array([[1], [2], [3], [4], [5]])

# Target (y): Price in $100,000s
y = np.array([1.5, 2.8, 3.6, 4.5, 5.0])

Step 3: Train the Model

This is where the “learning” happens. The model looks at the data and finds the best-fitting line.

When using Linear Regression Scikit-Learn classes, the process is standard: you initialize the model and then call .fit().

model = LinearRegression()
model.fit(X, y)

print("Model trained successfully!")

Step 4: Make a Prediction

Now we can ask our model: “How much should a 6,000 sq ft house cost?”

This is done using the .predict() method.

# Predict for a house size of 6
new_house = np.array([[6]])
prediction = model.predict(new_house)

print(f"Predicted price for a 6,000 sq ft house: ${prediction[0]*100000:,.0f}")
# Output will be around $5,800,000 based on the pattern it found.

Conclusion

You just built an AI. It’s simple, but it’s the exact same process used for massive models: Load Data -> Train Model -> Make Predictions.

Similar Posts

Leave a Reply