Transformer Attention Visualizer

How to Use

Select a transformer model from the dropdown
Enter or modify the input text
Click "Analyze Attention" to process the text
Use layer and head selectors to explore different attention patterns
Switch between visualization types to see different perspectives
Hover over elements to see detailed attention weights

Understanding Attention

Attention weights show how much each token "pays attention" to other tokens when computing its representation. Higher weights (darker colors, thicker lines) indicate stronger attention.

Multi-head attention allows the model to focus on different aspects of the input simultaneously. Each head can learn different patterns.

Visualization Types:

Heatmap: Color-coded matrix showing attention from queries (rows) to keys (columns)
Arc Diagram: Curved lines connecting tokens, with thickness representing attention strength
Matrix View: Detailed numerical matrix with exact attention values

Transformer Attention Visualizer

Attention Visualization

Multi-Head Comparison

How to Use

Understanding Attention

Visualization Types: