Ready to Start?

Accept Assignment

Due: March 9, 2026 at 11:59 PM EST

Final Project: Your LLM Research Capstone

Overview

The final project is your opportunity to synthesize everything you've learned throughout this course into an ambitious, cutting-edge project that showcases your mastery of large language models and conversational AI. This is a capstone experience where you'll tackle a challenging problem that would be nearly impossible in a traditional course—but is achievable with the help of modern GenAI tools like Claude, ChatGPT, and GitHub Copilot.

Unlike the structured problem sets, this project is open-ended and research-oriented. You have the creative freedom to explore novel applications, replicate and extend published research, build innovative systems, or conduct rigorous evaluations of LLM capabilities. The goal is to produce work that is intellectually sophisticated, technically ambitious, and potentially publishable or deployable in the real world.

This is your chance to: You will work in teams of 2-3 students over 4 weeks (Weeks 7-10), leveraging your combined skills and interests to tackle problems that would be challenging for any individual. The collaborative nature of this project mirrors real-world research and industry practices.

Learning Objectives

By completing this project, you will:

  1. Synthesize course concepts: Integrate knowledge from all assignments—from rule-based systems (ELIZA) to modern transformers (GPT), from embeddings to fine-tuning.
  2. Design and execute research: Formulate research questions, design methodologies, implement solutions, and critically evaluate results.
  3. Master LLM tools and frameworks: Work extensively with Hugging Face, OpenAI/Anthropic APIs, PyTorch, and other modern ML frameworks.
  4. Leverage GenAI for ambitious projects: Use AI coding assistants to implement complex systems that would traditionally take months.
  5. Communicate technical work: Present findings clearly through code, visualizations, presentations, and written reports.
  6. Think critically about AI: Evaluate limitations, biases, ethical implications, and societal impacts of your work.
  7. Collaborate effectively: Work in a team to divide tasks, integrate components, and produce cohesive results.

Project Scope

Your project should represent approximately 4 weeks of focused work for a team of 2-3 students (Weeks 7-10). This translates to roughly 15-20 hours per person per week, though the actual time investment may vary based on your background and the project's complexity.

Scale and Ambition

With GenAI assistance, you can tackle projects that would have been impossible just a few years ago. However, be realistic about scope:

Appropriate scale: Too small: Too ambitious (probably):

Finding the Right Balance

A good rule of thumb: your project should involve at least 3-4 substantial technical components (e.g., custom data processing + model fine-tuning + evaluation framework + analysis), where each component requires thoughtful design and implementation.

Project Types

Your project should fall into one or more of the following categories:

1. Novel Applications of LLMs

Design and implement a new application that leverages LLMs in creative ways. Focus on solving a real problem or creating something genuinely useful. Examples:

2. Research Replication and Extension

Reproduce results from a recent research paper, then extend the work with new experiments, datasets, or analyses. Examples:

3. New Model Architectures or Training Approaches

Explore modifications to existing architectures or novel training procedures. This is advanced but achievable with GenAI help. Examples:

4. Analysis and Evaluation of LLMs

Conduct rigorous empirical studies of LLM capabilities, limitations, or behaviors. Examples:

5. Multimodal and Agent-Based Systems

Build systems that integrate multiple modalities (text, image, audio) or implement autonomous agents with reasoning and tool use. Examples:

6. Safety, Alignment, and Ethics

Investigate critical issues around AI safety, bias, fairness, or societal impact. Examples:

Project Ideas

Here are 20 diverse project ideas to inspire you. These span different difficulty levels and interest areas. Feel free to use these as starting points, combine elements, or create something entirely new.

RAG and Retrieval Systems

  1. Domain-Specific RAG System: Build a retrieval-augmented generation system for a specialized domain (e.g., medical literature, legal documents, historical archives). Implement custom embeddings, retrieval strategies, and evaluate against baselines.
  2. Conversational Search Engine: Create a multi-turn conversational search system that maintains context, asks clarifying questions, and provides source citations. Compare different retrieval and reranking strategies.
  3. Hybrid Retrieval Architecture: Combine dense embeddings, sparse retrieval (BM25), and knowledge graphs for enhanced retrieval. Systematically evaluate each component's contribution.

Fine-Tuning and Adaptation

  1. Low-Resource Language Adaptation: Fine-tune an LLM for a low-resource language or dialect using limited data. Explore techniques like cross-lingual transfer, data augmentation, and parameter-efficient fine-tuning.
  2. Scientific Writing Assistant: Fine-tune a model to help researchers write better papers by learning from high-quality publications in a specific field. Implement style transfer and citation generation.
  3. Personalized Conversation Models: Develop methods to adapt dialogue models to individual user preferences and communication styles while preserving helpfulness and safety.

Dialogue and Interaction

  1. Multi-Turn Reasoning Dialogues: Build a conversational system that can engage in extended reasoning tasks (math problems, logic puzzles, strategic planning) while explaining its thought process.
  2. Empathetic Support Chatbot: Create a psychologically-informed chatbot for emotional support or mental health check-ins. Evaluate empathy, safety, and effectiveness using human evaluation and automated metrics.
  3. Socratic Tutoring System: Implement an educational assistant that uses Socratic questioning to guide students through problem-solving rather than providing direct answers.

Code and Technical Applications

  1. Code Documentation Generator: Build a system that generates comprehensive documentation for code repositories, including docstrings, README files, and tutorials. Test on diverse codebases and programming languages.
  2. Bug Detection and Repair: Develop an LLM-based system for finding and fixing bugs in code. Create a benchmark and compare against existing tools.
  3. Technical Interview Practice System: Create an interactive system that conducts realistic technical interviews, provides feedback, and adapts difficulty based on performance.

Creative and Cultural Applications

  1. Interactive Fiction Engine: Build a system for collaborative storytelling where the model maintains narrative coherence, character consistency, and player agency across long interactions.
  2. Cross-Cultural Communication Assistant: Develop a tool that helps bridge cultural communication gaps by explaining context, suggesting phrasing, and highlighting potential misunderstandings.
  3. Historical Dialogue Simulation: Create historically-accurate conversational agents representing figures from different time periods, trained on period-appropriate texts and evaluated for authenticity.

Cognitive Science and Modeling

  1. LLMs as Cognitive Models: Systematically evaluate whether LLMs exhibit human-like cognitive biases, heuristics, and reasoning patterns. Compare model behavior to psychological literature.
  2. Language Acquisition Simulation: Investigate how LLMs "learn" linguistic structures by analyzing learning trajectories, comparing to child language acquisition data.
  3. Theory of Mind in LLMs: Design experiments to test whether models exhibit theory of mind capabilities—understanding beliefs, intentions, and mental states of others.

Interpretability and Analysis

  1. Mechanistic Interpretability Study: Use techniques like attention visualization, activation probing, and causal interventions to understand how models perform specific tasks (e.g., factual recall, arithmetic, syntax).
  2. Failure Mode Taxonomy: Systematically categorize and analyze LLM failures across multiple models and tasks. Develop predictors for when failures occur and potential mitigations.

Advanced Reasoning and Agents

  1. Multi-Step Planning Agent: Build an agent that can decompose complex goals, create plans, execute actions with tools, and adapt based on feedback. Test on challenging benchmarks.
  2. Scientific Hypothesis Generator: Create a system that reads research papers, identifies gaps, and proposes testable hypotheses. Evaluate novelty and plausibility with domain experts.

Requirements

Your final project must include four key deliverables:

1. Project Proposal (Due Week 8)

Submit a 1-2 page proposal that includes:

The proposal helps you clarify your ideas and allows us to provide early feedback to ensure your project is appropriately scoped.

2. Code Implementation (Due Week 10)

Submit a well-documented Jupyter notebook that:

Code quality matters: Your notebook should be something you'd be proud to share publicly or include in a portfolio.

3. Presentation (Due Week 10)

Create a 10-12 minute video presentation and present it in class:

The presentation should be accessible to your classmates—assume they're smart but may not know the specific technical details of your domain.

4. Written Writeup (Due Week 10)

Submit a 2-5 page writeup (not including references or appendices) structured as:

Introduction (0.5-1 page) Approach (1-2 pages) Results (1-1.5 pages) Discussion (0.5-1 page) References Write clearly and concisely. Think of this as a mini research paper suitable for a workshop or course anthology.

Timeline and Milestones

Week 7: Team Formation and Brainstorming

Week 8: Proposal, Feedback, and Implementation Starts

Week 9: Core Development and Iteration

Week 10: Final Push and Presentations

Grading Rubric

Your project will be evaluated holistically, with the final grade broken down as:

Proposal (10%)

Implementation (40%)

Results and Analysis (20%)

Presentation (15%)

Writeup (10%)

Teamwork and Collaboration (5%)

Note: Exceptional projects that go above and beyond expectations may receive bonus points. Projects that demonstrate novel insights, publishable quality, or significant real-world applicability will be highlighted.

Resources and Support

Course Materials

Review all assignments and lectures—they contain techniques, tools, and insights directly applicable to your project:

Key Tools and Frameworks

Finding Datasets

Recent Research

Stay current with cutting-edge work:

Getting Help

Tips for Success

Start Early

Don't underestimate the time required. Even with GenAI assistance, projects take longer than expected. Starting early gives you time to iterate, handle unexpected challenges, and produce polished results.

Embrace Failure and Iteration

Not everything will work on the first try. That's normal in research. Build in time to try multiple approaches, debug issues, and refine your methods based on what you learn.

Use GenAI Aggressively

This is where you can be truly ambitious. Use tools like Claude, ChatGPT, and GitHub Copilot to: Remember: You're responsible for understanding and validating what AI generates. Always test, verify, and critically evaluate AI-generated code.

Scope Appropriately

It's better to do one thing really well than three things poorly. If you find your project is too ambitious, narrow the focus rather than delivering incomplete work.

Communicate Clearly

Technical sophistication is important, but so is clear communication. Make sure your code, writeup, and presentation are understandable to intelligent non-experts.

Document Everything

Keep track of experiments, decisions, and results as you go. This makes writing the final report much easier and helps you understand what worked and why.

Leverage Your Team's Strengths

Different team members may excel at different aspects (coding, experimentation, writing, visualization). Divide labor strategically but ensure everyone understands the full project.

Make It Yours

Choose a project you're genuinely excited about. Passion and curiosity will sustain you through challenges and lead to better results.

Example Projects from Prior Years

Note: As this is a new course, we don't yet have prior student projects to showcase. However, here are examples of the caliber of work we hope to see:

Your project can match or exceed these examples. With modern tools and GenAI assistance, sophisticated projects are within reach.

Submission Guidelines

GitHub Classroom Submission

This assignment is submitted via GitHub Classroom. Follow these steps:

  1. Accept the assignment: Click the assignment link provided in Canvas or by your instructor
  2. Clone your repository:
   git clone https://github.com/ContextLab/final-project-llm-course-YOUR_USERNAME.git
  1. Complete your work:
    • Work in Google Colab, Jupyter, or your preferred environment
    • Save your notebook, writeup, and presentation materials to the repository
  2. Commit and push your changes:
   git add .
   git commit -m "Complete final project"
   git push
  1. Verify submission: Check that your latest commit appears in your GitHub repository before the deadline

Deadlines

Technical Requirements

Google Colaboratory Compatibility

Your project must run in Google Colab. This means:

Model and Data Accessibility

Reproducibility

Documentation

Final Thoughts

This final project is your chance to create something remarkable. You've learned about the full spectrum of language models—from simple pattern matching to sophisticated transformers. You've implemented these systems, evaluated them, and thought critically about their capabilities and limitations.

Now it's time to apply that knowledge to a problem you care about. With GenAI tools at your disposal, you can tackle projects that would have been impossible for a student team just a few years ago. Take advantage of this unique moment in history.

The best projects will: We're excited to see what you create. Good luck, and remember: ambitious goals combined with thoughtful execution lead to extraordinary results.

Questions? Use the course Discord or attend office hours. We're here to help you succeed.

Ready to start? Form your team, start brainstorming, and prepare to build something amazing.