Lesson 8: Features and Labels

        Learning objectives
        Define features and labels clearly
Recognize how feature choice affects model quality
Practice identifying inputs and outputs in supervised tasks

      

Introduction

In supervised learning, the model learns a mapping from inputs to known outputs. The input variables are commonly called features, while the output you want the model to predict is called the label or target.

Good feature selection can make a model more accurate, more interpretable, and more efficient. Poor feature selection can introduce noise, confusion, or hidden bias.

A strong beginner habit is to practice identifying the features and label in any AI problem before thinking about algorithms.

What counts as a feature

A feature is any measurable input the model uses to learn. In a house price model, features may include square footage, number of rooms, location, and age of the property. In a spam filter, features might include suspicious keywords, number of links, sender reputation, or unusual formatting.

Features can be numerical, categorical, text-derived, image-derived, or engineered from raw data. In modern AI, many features are learned automatically by the model, especially in deep learning.

What counts as a label

The label is the correct answer in the training data. In a classification task, it could be a category such as spam or not spam. In a regression task, it could be a numerical value such as price, temperature, or demand.

Labels are often expensive to obtain because they require expert review, historical outcomes, or manual annotation.

Feature quality matters

Not all features are useful. Some may be irrelevant, redundant, or unstable. Others may cause leakage if they reveal information that would not actually be available at prediction time.

In applied work, good features should be relevant, measurable, reliable, and available when the model is used.

Examples

Student performance prediction

Features might include attendance, assignment completion, quiz scores, and study hours. The label could be pass or fail.

Weather forecast model

Features may include humidity, pressure, wind speed, and past rainfall. The label could be tomorrow’s rainfall amount.

Online retail churn prediction

Features could include recent purchase frequency, average order value, support complaints, and time since last purchase. The label is whether the customer leaves.

Exercises

Identify the features and label in a credit approval system.
Create your own supervised learning example with at least four features and one label.
Why can a feature that looks useful still be risky?
Explain the difference between raw features and engineered features.
Give one example of label collection being expensive or difficult.

Key takeaway

A supervised learning problem becomes much clearer when you identify exactly what information goes in as features and what result comes out as the label.