Lesson 11: Reinforcement Learning

        Learning objectives
        Understand the agent-environment-reward framework
Recognize where reinforcement learning is useful
Differentiate reinforcement learning from supervised learning

      

Introduction

Reinforcement learning is a form of machine learning in which an agent interacts with an environment and learns by receiving rewards or penalties. Rather than being shown the correct answer directly, the agent tries actions and gradually improves its strategy.

This approach is useful in settings where decisions unfold over time, where each action affects future opportunities, and where success can be measured through rewards. Examples include robotics, game playing, recommendation policies, and resource optimization.

Reinforcement learning is conceptually exciting because it resembles learning through experience. However, it is also challenging because trial-and-error learning can require many interactions and careful design.

Core ideas: agent, action, state, reward

The agent is the learner or decision-maker. The environment is the world the agent interacts with. At each step, the agent observes a state, chooses an action, and receives a reward signal. Over time, it tries to choose actions that maximize long-term reward.

A crucial idea is that not every good action gives an immediate reward. Some choices may pay off only later, so the agent must learn to think across sequences.

Why trial and error is powerful

Reinforcement learning allows systems to discover strategies that humans might not have programmed directly. In complex games or control tasks, this can lead to surprisingly effective behavior.

However, trial and error can also be expensive or risky in the real world. Training a robot by letting it fail thousands of times may be slow or unsafe, so simulations are often used.

Comparison with other learning styles

Unlike supervised learning, reinforcement learning does not rely on labeled input-output pairs. Unlike unsupervised learning, it is guided by a reward objective. It is especially suited to sequential decision problems.

This makes it a natural fit for problems where choices must adapt over time rather than one-off predictions.

Examples

Game-playing AI

An agent learns which moves lead to higher chances of winning by receiving positive rewards for successful game outcomes.

Warehouse robotics

A robot learns efficient routes for picking items while avoiding collisions and delays in a simulated warehouse.

Adaptive recommendations

A content platform tests which recommendation strategy keeps users engaged longer while balancing relevance and diversity.

Exercises

Define agent, environment, state, action, and reward.
Why is reinforcement learning useful for sequential decisions?
Give one example where simulation would be safer than real-world trial and error.
How is reinforcement learning different from supervised learning?
Describe a simple game that could be used to teach reinforcement learning concepts.

Key takeaway

Reinforcement learning is about learning good decisions through interaction and reward, especially in problems where actions affect future outcomes.