Transformer Attention Visualizer

Explore how transformer models attend to different tokens

Work in Progress

This demo is under development and may contain bugs or incomplete features.

Select a model and enter text to begin...

How to Use

  1. Select a transformer model from the dropdown
  2. Enter or modify the input text
  3. Click "Analyze Attention" to process the text
  4. Use layer and head selectors to explore different attention patterns
  5. Switch between visualization types to see different perspectives
  6. Hover over elements to see detailed attention weights

Understanding Attention

Attention weights show how much each token "pays attention" to other tokens when computing its representation. Higher weights (darker colors, thicker lines) indicate stronger attention.

Multi-head attention allows the model to focus on different aspects of the input simultaneously. Each head can learn different patterns.

Visualization Types:

  • Heatmap: Color-coded matrix showing attention from queries (rows) to keys (columns)
  • Arc Diagram: Curved lines connecting tokens, with thickness representing attention strength
  • Matrix View: Detailed numerical matrix with exact attention values