Beginner lesson

Lesson 7: Classification Basics

Beginner Course position: 7 of 30 Track: Machine Learning Tutorials

This lesson introduces how models assign items to categories or classes within a structured machine learning path. It begins with intuition, moves into workflow thinking, and then shows a practical Python example with clear notes.

Concept and intuition

Classification Basics is a core topic in machine learning because it shapes how we frame the problem, choose tools, and judge results. Classification is central to spam detection, diagnosis support, fraud alerts, sentiment analysis, defect detection, and many other real-world tasks.

When learning how models assign items to categories or classes, do not focus only on formulas. The more important habit is to ask what the model is trying to learn, what assumptions it makes, and what could go wrong when the data is noisy, incomplete, or biased.

How it fits into a workflow

In a real project, how models assign items to categories or classes sits inside a larger workflow: define the problem, prepare data, choose features, train a model, evaluate it carefully, and improve the system over time. Strong machine learning practice is iterative rather than one-shot.

This means you should connect how models assign items to categories or classes to practical questions such as: What data is available? How will predictions be used? Which errors are most costly? How will the system be monitored after deployment? Those questions matter as much as model accuracy.

Common mistakes and practical advice

A common beginner mistake is to treat how models assign items to categories or classes as a purely technical task. In practice, success depends on data quality, evaluation design, and the clarity of the business goal. Even a sophisticated model can fail if the data pipeline is weak or the target is poorly defined.

As you read the code example in this lesson, pay attention to how the inputs are shaped, how training and prediction are separated, and how the output is interpreted. Good coding habits make machine learning work more reliable, explainable, and easier to improve.

Three practical examples

Spam detection

Emails are classified as spam or not spam.

Customer churn

A customer is classified as likely to leave or likely to stay.

Image recognition

An image is classified as one category among several possible classes.

Training a simple classifier

This code example focuses on clarity rather than production scale. Read the comments, then study the notes below to understand why each step matters.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))

Code walkthrough

  • The target here is a class label rather than a continuous number.
  • Logistic regression is a common baseline for classification tasks.
  • Classification models can output either class labels or probabilities.
  • Accuracy is useful, but later lessons will show why it is not always enough.

Summary and key takeaways

  • Classification predicts categories, classes, or labels.
  • Binary classification has two classes; multiclass classification has more than two.
  • A simple baseline model is still valuable in classification tasks.
  • Model outputs can often be interpreted as either labels or probabilities.

Exercises

  • Name two binary classification tasks and two multiclass tasks.
  • What is the difference between a regression output and a classification output?
  • Why might a business care about probability scores instead of only class labels?
  • Try printing `model.predict_proba(X_test[:3])` and inspect the output.

Continue your learning