1966: ELIZA
Rule-Based Pattern MatchingAbout ELIZA
Creator: Joseph Weizenbaum (MIT)
Method: Pattern matching and substitution rules
Innovation: First chatbot to simulate human conversation
Famous for: Rogerian psychotherapist simulation
How It Works
- Scans input for keywords and patterns
- Applies transformation rules
- Reflects statements back as questions
- No understanding, pure pattern matching
1972: PARRY
State Machine with EmotionsAbout PARRY
Creator: Kenneth Colby (Stanford)
Method: State machines with emotional modeling
Innovation: Simulated paranoid schizophrenia
Famous for: "Conversation" with ELIZA in 1972
How It Works
- Maintains internal emotional state
- Anger, fear, mistrust levels
- Responses vary based on emotional state
- More complex than ELIZA's reflection
Current Emotional State
1995: A.L.I.C.E.
AIML Pattern LanguageAbout A.L.I.C.E.
Creator: Richard Wallace
Method: AIML (Artificial Intelligence Markup Language)
Innovation: Won Loebner Prize 3 times
Famous for: Large rule database (40,000+ patterns)
How It Works
- XML-based pattern matching language
- Recursive pattern matching
- Context and topic tracking
- Still rule-based but more sophisticated
AIML Context
2014: Neural Seq2Seq
Neural Machine TranslationAbout Seq2Seq
Innovation: First end-to-end neural chatbots
Method: Encoder-Decoder with transformers
Breakthrough: Learns from data, not rules
Model: BlenderBot Small (90M parameters)
How It Works
- Real neural conversation model
- Trained on conversational datasets
- Generates contextual responses
- Shows real neural characteristics
- May take time to load initially
Using BlenderBot Small (90M) - a real neural conversation model. Loads on first message (~30s).
Encoder-Decoder Transformer Architecture
Key Concepts
Encoder
Processes the entire input sequence at once. Uses bidirectional self-attention to understand context from both directions.
Decoder
Generates output one token at a time. Uses masked self-attention (can only see previous tokens) and cross-attention to the encoder output.
Cross-Attention
Allows the decoder to "look at" the encoder's representation of the input while generating each output token.
Tokenization
Breaks text into subword units (BPE). "Hello" might become ["Hel", "lo"]. Enables handling of any text.
BlenderBot Small Specifications
| Parameters | 90 Million |
| Architecture | Encoder-Decoder Transformer |
| Layers | 6 encoder + 6 decoder |
| Hidden Size | 512 |
| Attention Heads | 8 |
| Training Data | Blended Skill Talk, ConvAI2, Empathetic Dialogues |
| Year | 2020 (Facebook AI) |
2020s: GPT & Transformers
Instruction-Tuned ModelsAbout SmolLM2
Creator: HuggingFace (2024)
Models: 135M and 360M parameters
Method: Decoder-only transformer with chat templates
Innovation: Optimized for browser/edge deployment
How It Works
- Auto-selects model based on your device RAM
- Pre-trained on web text, code, and reasoning data
- Instruction-tuned for helpful conversations
- Runs entirely in your browser (WebGPU/WASM)
Model auto-selected based on your device RAM. First message loads the model (~30-60s).
Decoder-Only Transformer (SmolLM2)
Key Concepts
Instruction Tuning
Models are fine-tuned on instruction-response pairs, learning to follow user requests and generate helpful outputs.
Chat Templates
System prompts and message formatting guide model behavior, enabling multi-turn conversations with context.
Grouped-Query Attention
GQA reduces memory usage by sharing key-value heads across query heads, enabling efficient browser inference.
Quantization (q4)
4-bit quantization shrinks model size by ~4x while preserving quality, essential for browser deployment.
SmolLM2 Model Family
| Spec | SmolLM2 135M | SmolLM2 360M |
|---|---|---|
| Parameters | 135 Million | 360 Million |
| Layers | 9 | 16 |
| Hidden Size | 576 | 960 |
| Download (q4) | ~172 MB | ~368 MB |
| Min RAM | 2 GB | 4 GB |
| Context Length | 8,192 tokens | |