Ready to Start?

Accept Assignment

Due: February 6, 2026 at 11:59 PM EST

Assignment 4: Context-Aware Customer Service Chatbot

Timeline: 1 Week

Overview

In this assignment, you will build a sophisticated, context-aware customer service chatbot that uses modern transformer models and retrieval-augmented generation (RAG) techniques. Unlike traditional rule-based chatbots, your system will leverage semantic understanding to match customer queries with relevant knowledge base entries and generate contextually appropriate responses.

This is a 1-week assignment designed to be achievable with GenAI assistance (ChatGPT, Claude, GitHub Copilot). You can focus on system design, integration, and evaluation rather than getting bogged down in low-level implementation details.

You will implement a complete RAG pipeline that: This assignment simulates a real-world application where customers need accurate, context-aware assistance, and where hallucinated or incorrect information could damage user trust.

Learning Objectives

By completing this assignment, you will develop the following skills:

  1. Semantic Understanding: Apply transformer-based models (BERT, Sentence-BERT) to encode text into meaningful vector representations
  2. Information Retrieval: Implement efficient semantic search using vector similarity and libraries like FAISS
  3. Retrieval-Augmented Generation (RAG): Combine retrieval and generation to produce grounded, factual responses
  4. Evaluation Design: Develop metrics to assess chatbot quality, including retrieval accuracy and response relevance
  5. System Architecture: Design and implement a complete end-to-end conversational AI system
  6. Baseline Comparison: Understand the importance of baselines by comparing against keyword-matching approaches
  7. Production Considerations: Handle edge cases, multi-turn conversations, and system scalability

Background

Context-Aware Language Understanding

Traditional customer service systems often rely on keyword matching or simple pattern recognition (like your Assignment 1 ELIZA chatbot). However, customers express the same need in many different ways: Transformer-based models like BERT can understand that these queries are semantically similar, even when they share few keywords. This is achieved through:
  1. Contextualized Embeddings: BERT produces vector representations where semantically similar text has similar vectors
  2. Semantic Similarity: Using cosine similarity or other distance metrics to find relevant knowledge base entries
  3. Dense Retrieval: Unlike sparse keyword methods (TF-IDF, BM25), dense vector representations capture deeper semantic meaning

Retrieval-Augmented Generation (RAG)

RAG systems combine the strengths of retrieval (finding relevant information) and generation (producing natural language). The typical pipeline:

  1. Encode: Convert the user query into a vector representation
  2. Retrieve: Find the most similar entries in your knowledge base
  3. Augment: Include retrieved context with the query
  4. Generate: Produce a response grounded in the retrieved information
This approach helps prevent hallucination and ensures responses are factually grounded in your knowledge base.

Key Papers and Concepts

Dataset

You will use a customer service FAQ dataset. We recommend one of the following options:

Use the customersupporttwitter dataset or similar customer service datasets from HuggingFace:

from datasets import load_dataset

Load customer support conversations

dataset = loaddataset("salesken/customersupport_twitter")
Alternatively, explore these datasets:

Option 2: Create Your Own Knowledge Base

You can create a synthetic knowledge base for a specific domain (e-commerce, banking, tech support, etc.):

knowledge_base = [
    {
        "question": "How do I reset my password?",
        "answer": "To reset your password, click 'Forgot Password' on the login page. Enter your email address, and we'll send you a reset link. Follow the link to create a new password.",
        "category": "account_access"
    },
    {
        "question": "What is your return policy?",
        "answer": "We offer a 30-day return policy for most items. Products must be in original condition with tags attached. Refunds are processed within 5-7 business days.",
        "category": "returns"
    },
    # Add 100+ entries for a meaningful knowledge base
]

Option 3: Web Scraping

Scrape FAQs from public customer support pages (ensure compliance with terms of service):

# Example: Parse FAQ pages from a website
import requests
from bs4 import BeautifulSoup

Your scraping code here

Requirements:

Your Tasks

1. Build a Semantic Search System

Implement a semantic search system that can find relevant FAQ entries given a customer query.

Requirements:

a) Encode the Knowledge Base:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

Your code here

b) Implement Efficient Search:
import faiss
import numpy as np

Build FAISS index

dimension = embeddings.shape[1] index = faiss.IndexFlatL2(dimension) index.add(embeddings)
c) Query Processing:

2. Implement a Baseline (Keyword Matching)

Create a simple baseline using traditional keyword matching to demonstrate the value of semantic search.

Requirements:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

Your baseline implementation

3. Build the Retrieval Mechanism

Develop a complete retrieval system that:

a) Handles Query Variations: b) Context Filtering: c) Re-ranking (Advanced):

4. Generate Contextual Responses

Use the retrieved context to generate helpful responses.

Requirements:

a) Template-Based Generation (Minimum): b) LLM-Based Generation (Recommended):
# Example prompt structure
def generateresponse(query, retrievedcontexts):
    prompt = f"""You are a helpful customer service assistant.

Customer Question: {query}

Relevant Information: {retrieved_contexts}

Provide a helpful, accurate response based on the information above. Do not make up information not present in the context."""

# Call your LLM here return response

c) Response Quality:

5. Handle Multi-Turn Conversations

Extend your system to maintain context across multiple conversation turns.

Requirements:

a) Conversation State: b) Context Integration: c) Example Multi-Turn Flow:
User: "I want to return an item"
Bot: "Our return policy allows returns within 30 days..."

User: "How do I start the process?" Bot: [Uses context that this is about returns]

User: "What about shipping costs?" Bot: [Understands this relates to return shipping]

6. Evaluate Response Quality

Develop comprehensive evaluation metrics for your system.

Required Metrics:

a) Retrieval Metrics: b) Response Quality: c) Baseline Comparison: d) Error Analysis:

7. Advanced Features (Optional Bonus)

Implement one or more of these for extra credit:

Technical Requirements

Required Libraries

# Core ML/NLP
transformers>=4.30.0
sentence-transformers>=2.2.0
torch>=2.0.0

Vector Search

faiss-cpu>=1.7.4 # or faiss-gpu for GPU support

Traditional IR (baseline)

scikit-learn>=1.3.0 rank-bm25>=0.2.2

Data Handling

datasets>=2.14.0 pandas>=2.0.0 numpy>=1.24.0

Visualization

matplotlib>=3.7.0 seaborn>=0.12.0 plotly>=5.14.0

Optional: LLM Integration

openai>=0.27.0 # if using OpenAI

or use Ollama for local LLMs

Sentence Encoders (choose one or compare multiple): Cross-Encoders (for re-ranking): Generation Models (optional):

Computational Requirements

The entire assignment should run in Google Colaboratory with a free tier GPU and is designed to be completable in 1 week with GenAI assistance.

Deliverables

Submit a Google Colaboratory notebook that includes:

1. Code Implementation (60%)

2. Documentation (20%)

3. Evaluation and Analysis (15%)

4. Examples and Demo (5%)

Required Sections in Notebook

  1. Introduction: Overview of your system
  2. Data Loading: Load and explore the knowledge base
  3. Semantic Search Implementation: Encoder model and FAISS
  4. Baseline Implementation: TF-IDF or BM25
  5. Response Generation: Template or LLM-based
  6. Multi-Turn Handling: Conversation state management
  7. Evaluation: Metrics, comparisons, and analysis
  8. Examples: Interactive demos
  9. Conclusion: Findings, limitations, future improvements

Evaluation Criteria

Your assignment will be graded on the following criteria:

Technical Implementation (40 points)

Evaluation and Analysis (25 points)

Code Quality and Documentation (20 points)

Examples and Presentation (10 points)

Creativity and Innovation (5 points)

Total: 100 points

Grading Rubric

Tips for Success

Getting Started (1-Week Timeline)

  1. Start Simple: Begin with a small knowledge base (20-30 FAQs) to test your pipeline
  2. Incremental Development: Build and test each component separately before integration
  3. Use Examples: Work through concrete examples at each step
  4. Validate Early: Check that embeddings and retrieval make sense before moving to generation
  5. Leverage GenAI: Use ChatGPT, Claude, or GitHub Copilot to accelerate implementation. Ask for help understanding libraries, debugging errors, and optimizing code.

Common Pitfalls to Avoid

  1. Ignoring Normalization: Normalize embeddings for cosine similarity
  2. Wrong Distance Metric: FAISS L2 distance requires normalized vectors for cosine similarity, or use IndexFlatIP
  3. Memory Issues: Batch embedding generation for large knowledge bases
  4. Overfitting to Examples: Test on diverse, unseen queries
  5. Hallucination: Always ground responses in retrieved context
  6. Ignoring Edge Cases: Handle no-match scenarios gracefully

Debugging Strategies

  1. Print Similarities: Inspect actual similarity scores to understand retrieval
  2. Manual Inspection: Look at retrieved documents for sample queries
  3. Embedding Visualization: Use t-SNE/UMAP to visualize embedding space
  4. Start Small: Debug with 10 FAQs before scaling to 100+

Performance Optimization

  1. Cache Embeddings: Don't re-encode the knowledge base every time
  2. Batch Processing: Encode multiple queries at once
  3. FAISS GPU: Use GPU-accelerated FAISS for large knowledge bases
  4. Model Selection: Balance model size with quality needs

Going Beyond Requirements

Resources and References

Key Papers

  1. BERT: Devlin et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805
  2. Sentence-BERT: Reimers | Gurevych (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". arXiv:1908.10084
  3. RAG: Lewis et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". arXiv:2005.11401
  4. Dense Passage Retrieval: Karpukhin et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering". arXiv:2004.04906
  5. ColBERT: Khattab | Zaharia (2020). "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT". arXiv:2004.12832

Documentation and Tutorials

Code Examples

Datasets

Tools and Libraries

Additional Reading

Submission Guidelines

GitHub Classroom Submission

This assignment is submitted via GitHub Classroom. Follow these steps:

  1. Accept the assignment: Click the assignment link provided in Canvas or by your instructor
  2. Clone your repository:
   git clone https://github.com/ContextLab/customer-service-bot-llm-course-YOUR_USERNAME.git
  1. Complete your work:
    • Work in Google Colab, Jupyter, or your preferred environment
    • Save your notebook to the repository
  2. Commit and push your changes:
   git add .
   git commit -m "Complete customer service chatbot assignment"
   git push
  1. Verify submission: Check that your latest commit appears in your GitHub repository before the deadline
Deadline: February 6, 2026 at 11:59 PM EST

Notebook Requirements

  1. Runtime: The notebook must run from start to finish without errors
  2. Permissions: Ensure the notebook is accessible (include in your GitHub repository)
  3. Dependencies: All required packages should be installed in the notebook
  4. Data: Include code to automatically download any required datasets
  5. Output: Keep cell outputs visible in your submission

Before Submission Checklist

Academic Integrity

You are encouraged to: You must: Violations of academic integrity will result in a failing grade for the assignment and potential course-level consequences.

Questions?

If you have questions about the assignment:
  1. Check this README thoroughly
  2. Review the resources and references section
  3. Post questions in the course forum
  4. Attend office hours
  5. Email the instructor/TA with specific questions
Good luck, and have fun building your customer service chatbot!

This assignment is designed to give you hands-on experience with modern NLP techniques used in production systems. The skills you develop here—semantic search, RAG, and evaluation—are directly applicable to real-world AI applications.