Zero-shot learning is when an AI system performs tasks it was never explicitly trained on, using only natural language instructions. This is an emergent capability of large models. GPT-4 can translate languages it rarely saw in training data. It can write essays about topics not in its training set. It can code in programming languages that emerged after its training.
This shouldn't work, but it does. The reason is that the model learned generalizable patterns about how language works, how logic works, how structure works. When you ask it to do something new, it applies these meta-patterns to solve the novel problem. This is different from few-shot learning, where you show the model examples before asking the task. Zero-shot means no examples. Just ask.
Few-shot learning greatly improves performance. Zero-shot is surprisingly capable but less reliable. The capabilities emerge from scale. Smaller models can't do zero-shot learning well. Larger models can. This suggests that scale itself is teaching the model something fundamental about reasoning and transfer learning. Zero-shot capability is one reason LLMs are so versatile.
You don't need to fine-tune a new model for every task. You just write a good prompt and the general model adapts.