Token | Glossary

A token is the smallest unit of text that an LLM processes. It's not a word. A word is often multiple tokens, sometimes a token is part of a word. 75 words on average. So a 1000-token context is roughly 750 words. Models have context windows, maximum token lengths they can process. GPT-4 has 128k tokens, which is roughly 96k words. Llama 2 has 4k tokens. This context window limit matters.

It determines how much text you can give the model at once. If you're using retrieval-augmented generation and want to ground responses in documents, you're limited by context window size. If you're building a long-form reasoning system, context window is your constraint. Smaller context windows mean you can't reference long documents.

Larger context windows let you dump entire codebases into the model. Token counting isn't straightforward. Different tokenizers produce different token counts for the same text. The tokenizer affects everything. Common tokens like "the" might be single tokens. Uncommon words might be 3-4 tokens. Prompt engineering sometimes involves phrasing differently to reduce token usage.

You pay for LLM API calls by tokens, so understanding token economics changes how you structure your prompts and systems.

Interactive Concept: token

AI Token Visualizer

Explore how text gets broken down into tokens - the smallest units LLMs process. Type text below and see real-time tokenization!

Input Text:

Tokenized Output

Thequickbrownfoxjumpsoverthelazydog.

Statistics

Tokens:15

Words:9

Tokens/Word:1.67

GPT-4 Context Usage

15 / 128,000 tokens0.01%

Max ~96,000 words for this model

Key Insight: Tokens aren't words! They're subword units that help models understand text more efficiently. Notice how longer words get split into multiple tokens, and punctuation often becomes separate tokens.

Related Terms

Large Language Model (LLM)Transformer