A Large Language Model is a neural network trained on massive text datasets to predict and generate human-like text. Examples include GPT-4, Claude, and Gemini. The key insight is that scale determines capability. An LLM with billions of parameters can perform tasks that smaller models cannot.
It can reason through multi-step problems, write code, analyze documents, and engage in complex dialogue. The training process works like this: you show the model a word and the words before it, and it learns to predict the next word. Do this billions of times across text from the entire internet, and something emerges.
The model develops an internal understanding of language, logic, and concepts. It learns that certain word sequences correlate with other sequences. It learns patterns about how humans think and write. This emergent behavior, where capability arises from scale without explicit programming, is what makes LLMs different from previous AI systems. LLMs don't follow rules.
They generate text token by token, always choosing the most statistically likely continuation. This is why they can hallucinate plausible-sounding falsehoods. They're pattern-matching machines, not knowledge databases.