Explore how transformer models attend to different tokens
Work in Progress
This demo is under development and may contain bugs or incomplete features.
Layers:-
Heads per Layer:-
Select a model and enter text to begin...
Attention Visualization
Multi-Head Comparison
Attention Weight:
0.00.51.0
How to Use
Select a transformer model from the dropdown
Enter or modify the input text
Click "Analyze Attention" to process the text
Use layer and head selectors to explore different attention patterns
Switch between visualization types to see different perspectives
Hover over elements to see detailed attention weights
Understanding Attention
Attention weights show how much each token "pays attention" to other tokens when computing its representation. Higher weights (darker colors, thicker lines) indicate stronger attention.
Multi-head attention allows the model to focus on different aspects of the input simultaneously. Each head can learn different patterns.
Visualization Types:
Heatmap: Color-coded matrix showing attention from queries (rows) to keys (columns)