Adapter | Glossary

An adapter is a small trainable module inserted into a frozen pretrained model, enabling task-specific adaptation while keeping the vast majority of parameters fixed. The typical adapter architecture is a bottleneck: a down-projection to a lower dimension, a nonlinearity, and an up-projection back to the original dimension.

This is inserted after attention or feedforward layers, with a residual connection so the module learns only what to add to the frozen computation. 5-5% of the parameters of the layers they modify, yet can match full fine-tuning performance on many tasks.

The parameter efficiency enables several practical benefits: different adapters for different tasks share the same base model; adapters can be trained on limited hardware; and adapters can be combined through interpolation or mixture. Training adapters is faster and cheaper than full fine-tuning because only adapter parameters require gradients and optimizer states.

The base model's forward pass is unchanged, making adapters complementary to quantization and other efficiency techniques. Adapter methods have evolved into a family: LoRA modifies weight matrices with low-rank updates; prefix tuning prepends learnable tokens; prompt tuning learns input embeddings.

The common principle is keeping most parameters frozen while learning a small, targeted modification. This has democratized LLM customization.

Interactive Concept: adapter

Adapter Architecture

Interactive visualization of adapter modules in pretrained models

Pretrained Model

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Adapter Configuration

Bottleneck Dimension: 64

32256

Parameter Efficiency

Total Model Params:110M

Adapter Params:590K

Trainable Ratio:0.5%

Click on layers to explore adapter architecture. Adjust bottleneck dimension to see parameter efficiency trade-offs. Adapters enable task-specific fine-tuning while keeping 95%+ of original parameters frozen.

Related Terms

Fine-Tuning LoRA