Interactive Tokenization Visualizer

Loading Tokenizers...

Downloading GPT-2, BERT, and T5 tokenizers from HuggingFace

What is tokenization? Tokenization breaks text into smaller units (tokens) that language models can process. Different models use different algorithms: GPT-2 uses BPE, BERT uses WordPiece, and T5 uses SentencePiece. The ␣ symbol indicates a leading space in the original text.

Enter text to tokenize:

Or try:

GPT-2 (BPE)

OpenAI

Vocab: 50,257 Tokens: - Ratio: -

Token IDs

BERT (WordPiece)

Google

Vocab: 30,522 Tokens: - Ratio: -

Token IDs

T5 (SentencePiece)

Google

Vocab: 32,128 Tokens: - Ratio: -

Token IDs

Comparison Statistics

Byte Pair Encoding (BPE) iteratively merges the most frequent character pairs. Watch the algorithm work step-by-step:

Start Visualization: Begin the algorithm and see each merge step
Next Step: Advance one merge at a time to understand each decision
Auto Play: Watch the algorithm run automatically

Try an example:

Or enter your own text:

BPE Mode:

Mode: Simplified (Educational) - 16 common patterns
• Simplified: Uses 16 hardcoded common English patterns for easy learning
• Real GPT-2: Uses 5,000 actual merge rules from OpenAI's GPT-2 tokenizer

BPE Algorithm Steps

Click "Load Text" to begin the BPE visualization

Merge History

Current State

Visualization will appear here

Current Step: 0

Token Count: 0

Pairs Found: 0

BPE Merge Tree

Explore Tokenizer Vocabularies

Browse the complete vocabulary of each tokenizer. Each model uses different strategies:

GPT-2: 50,257 tokens using Byte Pair Encoding. Leading spaces shown as ␣ (original: Ġ).
BERT: 30,522 tokens using WordPiece. Subword continuations shown as ·· (original: ##). Includes 994 [unused0]–[unused993] placeholder tokens reserved for fine-tuning.
T5: 32,128 tokens using SentencePiece. Leading spaces shown as ␣ (original: ▁).

Hover over tokens to see raw values.

Select Tokenizer: Search Vocabulary: Filter by:

Vocabulary Size

Subword Tokens

Token ID	Token	Type	Length
Loading vocabulary...

Page 1