Chatbot Evolution Timeline

Journey through 60 years of conversational AI - from pattern matching to neural language models

1966
ELIZA

1972
PARRY

1995
A.L.I.C.E.

2020
BlenderBot

2024
SmolLM2

1966: ELIZA

Rule-Based Pattern Matching

About ELIZA

Creator: Joseph Weizenbaum (MIT)

Method: Pattern matching and substitution rules

Innovation: First chatbot to simulate human conversation

Famous for: Rogerian psychotherapist simulation

How It Works

Scans input for keywords and patterns
Applies transformation rules
Reflects statements back as questions
No understanding, pure pattern matching

mem ELIZA's Memory

No memories stored

Enter text to see ELIZA's pattern matching

Watch how keywords, patterns, and transformations work together

1972: PARRY

State Machine with Emotions

About PARRY

Creator: Kenneth Colby (Stanford)

Method: State machines with emotional modeling

Innovation: Simulated paranoid schizophrenia

Famous for: "Conversation" with ELIZA in 1972

How It Works

Maintains internal emotional state
Anger, fear, mistrust levels
Responses vary based on emotional state
More complex than ELIZA's reflection

Current Emotional State

Anger

5/20

Fear

8/20

Mistrust

10/15

Enter text to see PARRY's emotional processing

Watch how emotional states influence responses

1995: A.L.I.C.E.

AIML Pattern Language

About A.L.I.C.E.

Creator: Richard Wallace

Method: AIML (Artificial Intelligence Markup Language)

Innovation: Won Loebner Prize 3 times

Famous for: Large rule database (40,000+ patterns)

How It Works

XML-based pattern matching language
Recursive pattern matching
Context and topic tracking
Still rule-based but more sophisticated

AIML Context

Topic: general

That (last response): (none)

User Name: (unknown)

Enter text to see AIML pattern matching

Watch how categories, wildcards, and context work together

2014: Neural Seq2Seq

Neural Machine Translation

About Seq2Seq

Innovation: First end-to-end neural chatbots

Method: Encoder-Decoder with transformers

Breakthrough: Learns from data, not rules

Model: BlenderBot Small (90M parameters)

How It Works

Real neural conversation model
Trained on conversational datasets
Generates contextual responses
Shows real neural characteristics
May take time to load initially

Using BlenderBot Small (90M) - a real neural conversation model. Loads on first message (~30s).

Encoder-Decoder Transformer Architecture

Input

"How are you?"

↓

Tokenizer

Text → Token IDs

↓

Encoder

Self-Attention

Feed Forward

x6 layers

↓

Decoder

Masked Self-Attention

Cross-Attention

Feed Forward

x6 layers

↓

Output

"I'm doing well!"

Key Concepts

Encoder

Processes the entire input sequence at once. Uses bidirectional self-attention to understand context from both directions.

Decoder

Generates output one token at a time. Uses masked self-attention (can only see previous tokens) and cross-attention to the encoder output.

Cross-Attention

Allows the decoder to "look at" the encoder's representation of the input while generating each output token.

Tokenization

Breaks text into subword units (BPE). "Hello" might become ["Hel", "lo"]. Enables handling of any text.

BlenderBot Small Specifications

Parameters	90 Million
Architecture	Encoder-Decoder Transformer
Layers	6 encoder + 6 decoder
Hidden Size	512
Attention Heads	8
Training Data	Blended Skill Talk, ConvAI2, Empathetic Dialogues
Year	2020 (Facebook AI)

2020s: GPT & Transformers

Instruction-Tuned Models

About SmolLM2

Creator: HuggingFace (2024)

Models: 135M and 360M parameters

Method: Decoder-only transformer with chat templates

Innovation: Optimized for browser/edge deployment

How It Works

Auto-selects model based on your device RAM
Pre-trained on web text, code, and reasoning data
Instruction-tuned for helpful conversations
Runs entirely in your browser (WebGPU/WASM)

Model auto-selected based on your device RAM. First message loads the model (~30-60s).

Decoder-Only Transformer (SmolLM2)

Input Prompt

"What is AI?"

↓

Tokenizer + RoPE Embeddings

Token IDs + Rotary Position

↓

Transformer Decoder Stack

Grouped-Query Attention

SwiGLU FFN

RMSNorm

x9 (135M) / x16 (360M) layers

↓

Generated Response

"AI is..." → token by token

Key Concepts

Instruction Tuning

Models are fine-tuned on instruction-response pairs, learning to follow user requests and generate helpful outputs.

Chat Templates

System prompts and message formatting guide model behavior, enabling multi-turn conversations with context.

Grouped-Query Attention

GQA reduces memory usage by sharing key-value heads across query heads, enabling efficient browser inference.

Quantization (q4)

4-bit quantization shrinks model size by ~4x while preserving quality, essential for browser deployment.

SmolLM2 Model Family

Spec	SmolLM2 135M	SmolLM2 360M
Parameters	135 Million	360 Million
Layers	9	16
Hidden Size	576	960
Download (q4)	~172 MB	~368 MB
Min RAM	2 GB	4 GB
Context Length	8,192 tokens

Model Links: SmolLM2 135M | SmolLM2 360M