Lesson 6: Regression Basics
This lesson introduces how models predict continuous numeric values within a structured machine learning path. It begins with intuition, moves into workflow thinking, and then shows a practical Python example with clear notes.
Concept and intuition
Regression Basics is a core topic in machine learning because it shapes how we frame the problem, choose tools, and judge results. Regression is used whenever the target is a number, such as price, demand, temperature, or waiting time. It is one of the most useful machine learning patterns in business and science.
When learning how models predict continuous numeric values, do not focus only on formulas. The more important habit is to ask what the model is trying to learn, what assumptions it makes, and what could go wrong when the data is noisy, incomplete, or biased.
How it fits into a workflow
In a real project, how models predict continuous numeric values sits inside a larger workflow: define the problem, prepare data, choose features, train a model, evaluate it carefully, and improve the system over time. Strong machine learning practice is iterative rather than one-shot.
This means you should connect how models predict continuous numeric values to practical questions such as: What data is available? How will predictions be used? Which errors are most costly? How will the system be monitored after deployment? Those questions matter as much as model accuracy.
Common mistakes and practical advice
A common beginner mistake is to treat how models predict continuous numeric values as a purely technical task. In practice, success depends on data quality, evaluation design, and the clarity of the business goal. Even a sophisticated model can fail if the data pipeline is weak or the target is poorly defined.
As you read the code example in this lesson, pay attention to how the inputs are shaped, how training and prediction are separated, and how the output is interpreted. Good coding habits make machine learning work more reliable, explainable, and easier to improve.
Three practical examples
A model estimates property price from location, size, and number of rooms.
A manufacturer predicts next month's order volume.
A utility estimates future electricity usage.
Training a simple regression model
This code example focuses on clarity rather than production scale. Read the comments, then study the notes below to understand why each step matters.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
import pandas as pd
df = pd.DataFrame({
"rooms": [2, 3, 4, 4, 5, 6],
"size": [800, 950, 1200, 1300, 1500, 1800],
"price": [180000, 220000, 275000, 290000, 340000, 410000]
})
X = df[["rooms", "size"]]
y = df["price"]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, preds))Code walkthrough
- The target variable is `price`, which is continuous, so this is a regression task.
- `rooms` and `size` act as input features for the model.
- `mean_absolute_error` reports the average size of prediction errors in the original unit.
- The goal is not just to fit the training data, but to estimate unseen cases reasonably well.
Summary and key takeaways
- Regression predicts numbers, not categories.
- Choose regression when the business answer is a continuous value.
- Error metrics such as MAE help interpret how wrong the model tends to be.
- Feature choice strongly affects regression quality.
Exercises
- Give three new examples of regression problems.
- Why is house-price prediction not a classification problem?
- Add a new feature column named `age_of_house` and imagine how it might help.
- Explain what MAE means in plain language.