Chatbot Evolution Timeline

Journey through 60 years of conversational AI - from pattern matching to neural language models

1966
ELIZA
1972
PARRY
1995
A.L.I.C.E.
2020
BlenderBot
2024
SmolLM2

1966: ELIZA

Rule-Based Pattern Matching

About ELIZA

Creator: Joseph Weizenbaum (MIT)

Method: Pattern matching and substitution rules

Innovation: First chatbot to simulate human conversation

Famous for: Rogerian psychotherapist simulation

How It Works

  • Scans input for keywords and patterns
  • Applies transformation rules
  • Reflects statements back as questions
  • No understanding, pure pattern matching
mem ELIZA's Memory
No memories stored
Enter text to see ELIZA's pattern matching
Watch how keywords, patterns, and transformations work together

1972: PARRY

State Machine with Emotions

About PARRY

Creator: Kenneth Colby (Stanford)

Method: State machines with emotional modeling

Innovation: Simulated paranoid schizophrenia

Famous for: "Conversation" with ELIZA in 1972

How It Works

  • Maintains internal emotional state
  • Anger, fear, mistrust levels
  • Responses vary based on emotional state
  • More complex than ELIZA's reflection

Current Emotional State

Anger
5/20
Fear
8/20
Mistrust
10/15
Enter text to see PARRY's emotional processing
Watch how emotional states influence responses

1995: A.L.I.C.E.

AIML Pattern Language

About A.L.I.C.E.

Creator: Richard Wallace

Method: AIML (Artificial Intelligence Markup Language)

Innovation: Won Loebner Prize 3 times

Famous for: Large rule database (40,000+ patterns)

How It Works

  • XML-based pattern matching language
  • Recursive pattern matching
  • Context and topic tracking
  • Still rule-based but more sophisticated

AIML Context

Topic: general
That (last response): (none)
User Name: (unknown)
Enter text to see AIML pattern matching
Watch how categories, wildcards, and context work together

2014: Neural Seq2Seq

Neural Machine Translation

About Seq2Seq

Innovation: First end-to-end neural chatbots

Method: Encoder-Decoder with transformers

Breakthrough: Learns from data, not rules

Model: BlenderBot Small (90M parameters)

How It Works

  • Real neural conversation model
  • Trained on conversational datasets
  • Generates contextual responses
  • Shows real neural characteristics
  • May take time to load initially

Using BlenderBot Small (90M) - a real neural conversation model. Loads on first message (~30s).

Encoder-Decoder Transformer Architecture

Input
"How are you?"
Tokenizer
Text → Token IDs
Encoder
Self-Attention
Feed Forward
x6 layers
Decoder
Masked Self-Attention
Cross-Attention
Feed Forward
x6 layers
Output
"I'm doing well!"

Key Concepts

Encoder

Processes the entire input sequence at once. Uses bidirectional self-attention to understand context from both directions.

Decoder

Generates output one token at a time. Uses masked self-attention (can only see previous tokens) and cross-attention to the encoder output.

Cross-Attention

Allows the decoder to "look at" the encoder's representation of the input while generating each output token.

Tokenization

Breaks text into subword units (BPE). "Hello" might become ["Hel", "lo"]. Enables handling of any text.

BlenderBot Small Specifications

Parameters90 Million
ArchitectureEncoder-Decoder Transformer
Layers6 encoder + 6 decoder
Hidden Size512
Attention Heads8
Training DataBlended Skill Talk, ConvAI2, Empathetic Dialogues
Year2020 (Facebook AI)

2020s: GPT & Transformers

Instruction-Tuned Models

About SmolLM2

Creator: HuggingFace (2024)

Models: 135M and 360M parameters

Method: Decoder-only transformer with chat templates

Innovation: Optimized for browser/edge deployment

How It Works

  • Auto-selects model based on your device RAM
  • Pre-trained on web text, code, and reasoning data
  • Instruction-tuned for helpful conversations
  • Runs entirely in your browser (WebGPU/WASM)

Model auto-selected based on your device RAM. First message loads the model (~30-60s).

Decoder-Only Transformer (SmolLM2)

Input Prompt
"What is AI?"
Tokenizer + RoPE Embeddings
Token IDs + Rotary Position
Transformer Decoder Stack
Grouped-Query Attention
SwiGLU FFN
RMSNorm
x9 (135M) / x16 (360M) layers
Generated Response
"AI is..." → token by token

Key Concepts

Instruction Tuning

Models are fine-tuned on instruction-response pairs, learning to follow user requests and generate helpful outputs.

Chat Templates

System prompts and message formatting guide model behavior, enabling multi-turn conversations with context.

Grouped-Query Attention

GQA reduces memory usage by sharing key-value heads across query heads, enabling efficient browser inference.

Quantization (q4)

4-bit quantization shrinks model size by ~4x while preserving quality, essential for browser deployment.

SmolLM2 Model Family

SpecSmolLM2 135MSmolLM2 360M
Parameters135 Million360 Million
Layers916
Hidden Size576960
Download (q4)~172 MB~368 MB
Min RAM2 GB4 GB
Context Length8,192 tokens
Model Links: SmolLM2 135M | SmolLM2 360M