Query	Before BERT	With BERT
"can you get medicine for someone pharmacy"	Generic pharmacy results	Picking up prescriptions for others
"do estheticians stand a lot at work"	Esthetician job listings	Physical demands of the job
"parking on a hill with no curb"	Parking tickets, curb info	How to park safely without a curb

PSYC 51.07: Models of Language and Communication

Prediction in Brains vs. Language Models

Parallels between neural and artificial systems

Phenomenon	Human Brain	Transformer Models
Surprise	N400 amplitude (EEG)	Cross-entropy loss
Hierarchy	sounds → words → sentences	tokens → phrases → meaning
Context	Prior discourse, world knowledge	Self-attention over sequence
Representation	Population coding (neurons)	Distributed embeddings (vectors)


1# Concrete example: Surprise/N400 parallel
2sentence_a = "I take my coffee with cream and sugar"  # Expected
3sentence_b = "I take my coffee with cream and socks"  # Surprising
4
5# Brain: N400 amplitude higher for "socks"
6# Model: Higher loss for "socks"
7loss_a = model.compute_loss("sugar", context)  # Low loss
8loss_b = model.compute_loss("socks", context)  # High loss
9
10# Both systems encode "surprisal" = -log P(word | context)
11surprisal = -np.log(model.predict_prob("socks", context))
12# Correlates with N400 amplitude in EEG studies!

Question: Are these superficial analogies or deep connections?

Reference: Kuperberg & Jaeger (2016) - "What do we mean by prediction in language comprehension?"

Optimization	Size	Latency	Quality
Original (FP32)	420MB	50ms	100%
Quantized (INT8)	110MB	25ms	99.5%
ONNX + Quantized	110MB	20ms	99.5%
DistilBERT + ONNX	65MB	12ms	97%

p{4.5cm}}	Encoder (BERT)	Decoder (GPT)
• Classification
• NER, QA
• Similarity	Generation tasks:
• Text completion
• Dialogue
• Creative writing

Lecture 20: Applications of Encoder Models

Week 6, Lecture 3 - From Theory to Practice

Today's Agenda

BERT Applications

Case Study: Google Search

Question Answering with BERT

Named Entity Recognition

Sentiment Analysis

Semantic Similarity

Semantic Similarity

Cognitive Neuroscience Perspective

Prediction in Brains vs. Language Models

Neural Encoding with Language Models

Neural Encoding with Language Models

Discussion: What Does the Model "Understand"?

Adversarial Examples and Brittleness

Limitations of Current Models

Bias in Language Models

Practical Tips for Working with Transformers

Deployment Considerations

Deployment Considerations

Future Directions

Encoder vs Decoder Models Revisited

Discussion Questions

Assignment 4: Context-Aware Models

Summary: Weeks 5-6

Resources & Further Reading

Looking Forward in the Course

Questions?