veda.ng
Back to Glossary

Prefix Tuning

Prefix Tuning infographic

Prefix tuning is a parameter-efficient fine-tuning method that adapts a frozen language model by learning a small set of continuous vectors prepended to the input, steering model behavior without updating any of the original model weights.

Instead of fine-tuning billions of parameters, prefix tuning trains only the prefix embeddings, typically a few hundred vectors per layer, totaling less than 1% of model parameters. These learned prefix vectors act as soft prompts that condition all subsequent computation.

The model's attention mechanism attends to the prefix before the actual input, receiving task-specific context that guides its processing. During training, gradients flow back through the frozen model to update only the prefix parameters.

This greatly reduces memory requirements (no optimizer states for frozen weights) and enables multi-task learning (different prefixes for different tasks, same base model). Prefix tuning was introduced in 2021 as an alternative to full fine-tuning and prompt engineering. It sits between them in controllability: more flexible than discrete prompts, more efficient than full fine-tuning.

Related methods include prompt tuning (prefixes only at the input layer), adapter layers (small trainable modules inserted throughout), and LoRA (low-rank weight modifications). Prefix tuning works because transformers' attention mechanisms are highly sensitive to context, and learned context can be as effective as learned weights.

Interactive Visualizer

Prefix Tuning Visualizer

Explore how prefix tuning adapts language models by learning only small prefix vectors while keeping the original model frozen

Model Architecture

Embedding
+2
frozen
Prefix vectors: 10 × 768 dims = 7,680 trainable params
Layer 1
+2
frozen
Layer 2
+2
frozen
Layer 3
+2
frozen
Output
+2
frozen
Parameter Efficiency
Total Model Parameters:175,000,000
Trainable Prefixes:38,400
Efficiency:0.022%
Training Simulation

Input Processing Flow

P0
P1
P2
P3
P4
+
The
cat
sat
on
the
mat
Adapted Output

Prefix vectors (red) guide the model's behavior for the input tokens (blue)