Interactive Transformer Architecture Explorer

Click on components to explore

Use mouse to rotate, zoom, and pan the 3D view

Transformer Overview

About the Transformer

The Transformer architecture, introduced in "Attention is All You Need" (Vaswani et al., 2017), revolutionized natural language processing by relying entirely on attention mechanisms.

Key Components:

Embedding Layer: Converts tokens to vectors
Positional Encoding: Adds position information
Multi-Head Attention: Parallel attention mechanisms
Feed-Forward Network: Position-wise transformations
Layer Normalization: Stabilizes training
Residual Connections: Enables deep networks

Layers: 6

Attention Heads: 8

Model Dimension: 512

FFN Dimension: 2048

Component Library

Embedding

Token → Vector

Positional Encoding

Add Position Info

Multi-Head Attention

Query, Key, Value

Feed-Forward

2-Layer MLP

Layer Norm

Normalize Features

Residual

Skip Connections

Transformer Architecture Explorer