Assignment 4: Context-Aware Customer Service Chatbot
Timeline: 1 Week
Overview
In this assignment, you will build a customer service chatbot that uses retrieval-augmented generation (RAG) to answer questions about a fictional company called TechCo. Your system will:
- Retrieve relevant FAQ entries using semantic search (sentence embeddings + cosine similarity)
- Compare semantic search against a TF-IDF keyword baseline
- Generate helpful responses based on the retrieved context
Learning Objectives
- Apply sentence embeddings for semantic search
- Implement cosine similarity retrieval (the same approach from Lecture 17)
- Compare semantic search vs. keyword matching baselines
- Build a working RAG-style chatbot with template-based response generation
- Evaluate retrieval quality using Accuracy at k and MRR
Dataset
We provide a 50-entry FAQ knowledge base for TechCo covering: account access, billing, product features, shipping and returns, technical support, and general information.
We also provide 20 test queries with ground-truth relevant FAQ IDs for evaluation.
The data is automatically downloaded in the notebook.
Your Tasks
Required (100 points)
| Component | Points | Description |
|---|---|---|
| Semantic Search | 20 | Implement semantic_search() using sentence-transformers |
| TF-IDF Baseline | 10 | Implement tfidf_search() using scikit-learn |
| Response Generation | 15 | Implement generate_response() with confidence logic |
| Evaluation | 20 | Implement evaluate_retrieval() and create bar chart |
| Error Analysis | 10 | 8+ examples with explanations |
| Interactive Demo | 5 | Implement chat() function |
| Code Quality | 10 | Clean, well-commented code |
| Reflection Essay | 10 | 300-500 word reflection |
Optional Bonus (up to +8 points)
| Bonus | Points | Description |
|---|---|---|
| Multi-Turn Conversation | +2 | Implement ConversationManager for context tracking |
| LLM-Based Generation | +3 | Use FLAN-T5 for response generation |
| Hybrid Search | +2 | Combine TF-IDF and semantic scores |
| UMAP Visualization | +1 | Visualize the FAQ embedding space |
Technical Requirements
- All libraries are installed in the notebook
- CPU is sufficient (Colab free tier works)
- No API keys required
- No external servers
Submission
Push your completed notebook to your GitHub Classroom repository before the deadline.
Checklist
- All required functions implemented (no NotImplementedError)
- Evaluation metrics computed with bar chart
- Error analysis with 8+ examples
- Reflection written (300-500 words)
- All cells execute without errors
Questions?
- Check this README and notebook markdown cells
- Review Lecture 17 slides
- Post in the course Discord
- Attend office hours