Beginner lesson

Lesson 8: Training, Validation, and Test Sets

Beginner Course position: 8 of 30 Track: Machine Learning Tutorials

This lesson introduces why data should be split so that models are tuned and evaluated fairly within a structured machine learning path. It begins with intuition, moves into workflow thinking, and then shows a practical Python example with clear notes.

Learning objectives

Understand the main idea behind why data should be split so that models are tuned and evaluated fairly.
See how the concept appears in real machine learning workflows.
Follow a practical Python example step by step.
Finish the lesson with key takeaways and exercises.

Prerequisites

Basic Python familiarity is helpful, but the explanation is written for guided self-study.

Key takeaways

Separate evaluation data is essential for trustworthy model assessment.
The test set should not guide model tuning decisions.
Stratified splits are useful for classification problems with class labels.
Good experimental design prevents misleadingly optimistic results.

Concept and intuition

Training, Validation, and Test Sets is a core topic in machine learning because it shapes how we frame the problem, choose tools, and judge results. If evaluation is not designed carefully, you may believe a model is better than it really is. Splitting data properly helps measure generalization rather than memorization.

When learning why data should be split so that models are tuned and evaluated fairly, do not focus only on formulas. The more important habit is to ask what the model is trying to learn, what assumptions it makes, and what could go wrong when the data is noisy, incomplete, or biased.

How it fits into a workflow

In a real project, why data should be split so that models are tuned and evaluated fairly sits inside a larger workflow: define the problem, prepare data, choose features, train a model, evaluate it carefully, and improve the system over time. Strong machine learning practice is iterative rather than one-shot.

This means you should connect why data should be split so that models are tuned and evaluated fairly to practical questions such as: What data is available? How will predictions be used? Which errors are most costly? How will the system be monitored after deployment? Those questions matter as much as model accuracy.

Common mistakes and practical advice

A common beginner mistake is to treat why data should be split so that models are tuned and evaluated fairly as a purely technical task. In practice, success depends on data quality, evaluation design, and the clarity of the business goal. Even a sophisticated model can fail if the data pipeline is weak or the target is poorly defined.

As you read the code example in this lesson, pay attention to how the inputs are shaped, how training and prediction are separated, and how the output is interpreted. Good coding habits make machine learning work more reliable, explainable, and easier to improve.

Three practical examples

Training set

The model learns its parameters from historical examples.

Validation set

A data scientist compares different settings before finalizing the model.

Test set

The final score is reported on data that was not used for tuning.

Creating train and test splits

This code example focuses on clarity rather than production scale. Read the comments, then study the notes below to understand why each step matters.

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.DataFrame({
    "feature": range(10),
    "target": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
})

X = df[["feature"]]
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Train size:", len(X_train))
print("Test size:", len(X_test))

Code walkthrough

Training data is used to fit the model parameters.
Validation data, when used, helps choose settings without touching the test set.
The test set should represent new, unseen data as honestly as possible.
`stratify=y` helps preserve class balance when splitting classification data.

Summary and key takeaways

Separate evaluation data is essential for trustworthy model assessment.
The test set should not guide model tuning decisions.
Stratified splits are useful for classification problems with class labels.
Good experimental design prevents misleadingly optimistic results.

Exercises

Why is it risky to evaluate on the same data used for training?
What role does a validation set play?
When is stratified splitting helpful?
Change the test size and observe how the train and test counts change.

Continue your learning

Previous lesson Lesson 7: Classification Basics Next lesson Lesson 9: Evaluation Metrics for Beginners