veda.ng
Back to Glossary

Top-p Sampling

Top-p sampling, also called nucleus sampling, dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p, adapting the candidate pool size based on the model's confidence distribution. After computing probabilities from logits, top-p sorts tokens by probability and includes tokens from highest to lowest until their cumulative probability reaches p.

9, only that token is considered. If probability is spread across many tokens, more are included to reach the threshold. This adaptive behavior is the key advantage over top-k: the candidate set expands when the model is uncertain and contracts when it's confident. 9 typically works well across diverse tasks, it focuses on the probability mass that matters while allowing occasional surprises.

95+) allow more creativity and variation. Top-p was introduced in the 2019 paper 'The Curious Case of Neural Text Degeneration' which showed it produces more human-like text than top-k or pure temperature sampling. Most production systems combine top-p with temperature: temperature shapes the distribution, then top-p truncates to the nucleus of likely tokens.

This combination provides reliable generation across diverse prompts.