veda.ng

Long Short-Term Memory is a recurrent neural network architecture designed to capture long-range dependencies in sequential data by using gating mechanisms that control information flow through a persistent cell state. Standard RNNs struggle with sequences longer than a few dozen steps because gradients either vanish or explode during backpropagation through time.

LSTMs solve this with three gates: the forget gate decides what to discard from the cell state, the input gate decides what new information to store, and the output gate decides what to emit from the cell state. The cell state acts as a highway for information, protected by gates from the multiplicative gradient decay that plagues vanilla RNNs.

Important information can persist across hundreds of time steps. LSTMs also include a hidden state that captures short-term context. Introduced by Hochreiter and Schmidhuber in 1997, LSTMs enabled practical sequence modeling for machine translation, speech recognition, and text generation before transformers emerged.

The Gated Recurrent Unit simplifies the LSTM with only two gates while achieving comparable performance. Although transformers have largely replaced LSTMs for natural language processing, LSTMs remain relevant for time series forecasting, online sequence modeling, and applications where the autoregressive structure of RNNs is advantageous.

Understanding LSTMs provides insight into the vanishing gradient problem and the importance of information highways in deep networks.

Interactive Visualizer

LSTM Cell Interactive

Explore how gates control information flow through an LSTM cell

Input Sequence

0.8
-0.3
0.6
-0.9
0.4

Gate Weights

LSTM Cell State

Input: 0.80
Forget Gate
0.500
Input Gate
0.500
Output Gate
0.500
Cell State (C_t)
0.000
Long-term memory
Hidden State (h_t)
0.000
Output & short-term
Candidate Value
0.000
Step 1 of 5
Adjust the gate weights and input sequence to see how LSTM cells selectively forget, remember, and output information. The forget gate controls what to discard from cell state, input gate decides what new information to store, and output gate determines what parts of the cell state to output.