Tokens, Context Windows, and Prompts
Understand how text is split into tokens, why context windows matter, and how prompt size affects quality and cost.
Explanation
Models do not see raw sentences the way humans do; they process chunks called tokens.
A context window is the amount of prompt and response text a model can consider at one time.
Good prompts fit the available context and prioritize the most useful instructions and evidence.
Why this topic matters in practice
In generative AI products, the model is only one part of the system. The surrounding workflow determines whether the output is useful, safe, and maintainable. This lesson matters because it helps you connect the idea to tasks such as tutoring, search, copilots, business assistants, and production automation.
Examples
Long documents
If a document exceeds the context window, you may need chunking, summarization, or retrieval.
Instruction priority
A short, clear system instruction often works better than a long and repetitive prompt.
Conversation history
When chat history grows too long, older messages may need summarization or trimming.
Estimate token-heavy prompts with simple heuristics
The code below is intentionally concise so the underlying pattern stays clear. It focuses on the application logic you can reuse, even if you later switch model providers or deployment environments.
def rough_token_estimate(text: str) -> int:
# A rough rule of thumb for English prose
return max(1, len(text) // 4)
prompt = """You are a tutor.
Explain machine learning to a beginner.
Use 5 bullet points and one real-world example.
"""
estimated_tokens = rough_token_estimate(prompt)
print("Estimated tokens:", estimated_tokens)How the coding section works
- This rough estimate is only a planning tool, not a replacement for a provider tokenizer.
- Token awareness helps you control prompt size, cost, and context usage.
- In production, use the tokenizer for the exact model you deploy.
Implementation advice
When turning this lesson into a real feature, think beyond the code snippet itself. Decide what inputs should be allowed, how you will validate outputs, how you will recover from errors, and how you will measure whether the feature is actually helping users. Those surrounding choices often determine whether an AI feature feels polished or unreliable.
Summary / key takeaways
- Tokens are the units models actually process.
- Context windows limit how much information the model can use at once.
- Prompt design must balance clarity, completeness, and length.
Exercises
- Rewrite a long instruction into a shorter prompt without losing meaning.
- Why might an overloaded context window reduce answer quality?
- Estimate the token length of three prompts you might use on your site.