Lesson 13 · Intermediate

Embeddings and Semantic Similarity

Understand embeddings as vector representations of meaning and learn why they power search, clustering, recommendations, and RAG.

Read the explanation carefully, then review the examples and coding section. The goal is to understand both the concept and how it appears inside a real application workflow.

Explanation

Embeddings turn text into vectors so semantically similar items are numerically close to each other.

They are useful when exact keyword matching is not enough.

Semantic similarity enables better search over concepts, not just literal word overlap.

Why this topic matters in practice

In generative AI products, the model is only one part of the system. The surrounding workflow determines whether the output is useful, safe, and maintainable. This lesson matters because it helps you connect the idea to tasks such as tutoring, search, copilots, business assistants, and production automation.

Examples

Search

A query about tuition fees can match a document that uses the phrase 'student charges' even without exact wording.

Recommendations

A tutorial system can recommend lessons that are conceptually related to the one a learner just completed.

Deduplication

Similar product descriptions can be flagged for review based on embedding distance.

Cosine similarity with simple vectors

The code below is intentionally concise so the underlying pattern stays clear. It focuses on the application logic you can reuse, even if you later switch model providers or deployment environments.

import math

def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    mag_a = math.sqrt(sum(x * x for x in a))
    mag_b = math.sqrt(sum(y * y for y in b))
    return dot / (mag_a * mag_b)

vector_a = [0.2, 0.9, 0.1]
vector_b = [0.1, 0.85, 0.15]
print("Similarity:", round(cosine_similarity(vector_a, vector_b), 4))

How the coding section works

Real embeddings are generated by a model, but similarity is often measured with cosine similarity.
High similarity suggests the texts mean related things.
Embedding-based systems often store vectors in a vector database for fast retrieval.

Implementation advice

When turning this lesson into a real feature, think beyond the code snippet itself. Decide what inputs should be allowed, how you will validate outputs, how you will recover from errors, and how you will measure whether the feature is actually helping users. Those surrounding choices often determine whether an AI feature feels polished or unreliable.

Summary / key takeaways

Embeddings capture meaning in vector form.
Semantic retrieval finds related content beyond literal keyword overlap.
Embeddings are foundational for modern search and RAG workflows.

Exercises

Explain semantic similarity in simple terms.
Where would embeddings be useful on an educational website?
Change one vector in the code and observe how the similarity score changes.