Zero-shot learning is when an AI system performs tasks it was never explicitly trained on, using only natural language instructions. This is an emergent capability of large models. GPT-4 can translate languages it rarely saw in training data. It can write essays about topics not in its training set. It can code in programming languages that emerged after its training.
Because the model learned generalizable patterns about how language works, how logic works, how structure works. When you ask it to do something new, it applies these meta-patterns to solve the novel problem. Zero-shot means no examples. Just ask. Few-shot learning greatly improves performance. Zero-shot is surprisingly capable but less reliable. The capabilities emerge from scale.
Smaller models can't do zero-shot learning well. Larger models can. This suggests that scale itself is teaching the model something core about reasoning and transfer learning. You don't need to fine-tune a new model for every task. You just write a good prompt and the general model adapts.
Interactive Visualizer
Zero-Shot Learning
AI systems can perform tasks they were never explicitly trained on by leveraging learned patterns. Click on zero-shot tasks to see how training patterns enable new capabilities.