Programming basics

PSYC 81.09: Storytelling with Data

Jeremy R. Manning
Dartmouth College
Spring 2026

Today's agenda

  1. Two practice problems — hands-on worked examples
  2. Git and GitHub — forking, the workflow you'll use for this course
  3. Everything you need to submit Assignment 3 on Monday

Open the companion notebook in Colab so you can run through the examples yourself.

Problem 1: exploring a real dataset

You have a list of songs, each a dict with title, artist, genre, and plays:


1songs = [
2    {'title': 'Blinding Lights', 'artist': 'The Weeknd', 'genre': 'pop',  'plays': 4_200_000_000},
3    {'title': 'Shape of You',    'artist': 'Ed Sheeran', 'genre': 'pop',  'plays': 3_800_000_000},
4    {'title': 'Bohemian Rhapsody','artist': 'Queen',     'genre': 'rock', 'plays': 2_400_000_000},
5    # ... many more ...
6]

Find the three genres with the highest average plays per song.

This looks simple but it's a complete analysis pipeline: group, aggregate, sort, slice. You'll likely do some version of this in every Part II project.

Problem 1: one reasonable solution

# Step 1: group plays by genre
plays_by_genre = {}
for song in songs:
    g = song['genre']
    plays_by_genre.setdefault(g, []).append(song['plays'])

# Step 2: compute the average plays per genre
avg_by_genre = {
    g: sum(plays) / len(plays)
    for g, plays in plays_by_genre.items()
}

# Step 3: sort genres by average plays, descending
ranked = sorted(avg_by_genre.items(), key=lambda item: -item[1])

# Step 4: take the top 3
top_three = ranked[:3]
print(top_three)

Every step is named and small. When something breaks, you can print any intermediate variable and see exactly what's happening. This is the pattern of all data analysis.

Problem 2: find and fix the bug

This function is supposed to return a dict mapping each student to their average grade. It runs without errors — but the numbers are wrong.


1def student_averages(grades):
2    result = {}
3    for student, scores in grades.items():
4        total = 0
5        for s in scores:
6            total += s
7        result[student] = total / len(grades)
8    return result
9
10grades = {'Ada': [90, 85, 92], 'Grace': [78, 88], 'Alan': [95, 91, 88, 84]}
11print(student_averages(grades))

Problem 2: the bug


1result[student] = total / len(grades)   # BUG: divides by # of students
2result[student] = total / len(scores)   # FIX: divide by # of scores

The code runs — no error, no crash. It just returns wrong numbers. These are the hardest bugs to catch. Always test with a small example you can verify by hand.

Ada has three scores (90, 85, 92) averaging 89. The buggy function returns 267/3 = 89 by coincidence — but Grace's average comes out wrong (166/3 = 55.3 instead of 166/2 = 83).

Part 2: Git and GitHub

Every data scientist uses Git and GitHub — daily. For this course, you'll use GitHub to:

  • Fork the course repo to your own account
  • Submit every assignment via a pull request
  • Collaborate on your Part II data stories
  • Track issues, discussions, and feedback

Git vs GitHub

Git is a tool on your computer that tracks changes to files in a folder (a repository).

GitHub is a website that hosts Git repositories online — where you share, collaborate, and back up your work.

You need a free GitHub account. Use your Dartmouth email and pick a professional username.

Key terms

  • Repository ("repo") — a project folder tracked by Git
  • Fork — your own copy of someone else's repo on GitHub
  • Clone — download a repo from GitHub to your computer
  • Commit — a snapshot of your project with a message describing the change
  • Push — upload your commits to GitHub
  • Pull — download the latest changes from GitHub
  • Pull request (PR) — a proposal to merge your fork's changes into the original repo
  • Issue — a tracked task, bug, or question

The forking workflow

  1. Fork the course repo on GitHub — creates yourname/storytelling-with-data
  2. Clone your fork to your computer: git clone <your-fork-url>
  3. Edit files, add your assignment work
  4. Commit with a clear message: git commit -m "add assignment 3 demo"
  5. Push commits to your fork: git push
  6. Open a pull request on GitHub, proposing your changes to the course repo

Claude Code runs all of these for you — your job is to review what it did and confirm the commit messages are clear.

Good commit messages

Bad: update stuff · fixed it · asdf

Good:

  • add word frequency function to text_utils.py
  • fix off-by-one error in average() calculation
  • update README with installation instructions

The rule: if you had to describe this commit to a teammate in one sentence, what would you say?

Merge conflicts: don't panic

A merge conflict happens when two edits touch the same lines of the same file. Git asks you to pick:


1<<<<<<< HEAD
2version from main
3=======
4version from your branch
5>>>>>>> my-changes

Steps: open the file, decide what the final version should be, delete the <<<, ===, >>> markers, save, commit, push.

Claude Code is excellent at resolving merge conflicts — show it the conflict and ask which version to keep.

.gitignore: keep junk and secrets out

A .gitignore file lists files Git should never track:

  • Large data files (CSVs, images, raw datasets)
  • Secrets (API keys, passwords, .env files)
  • Build artifacts and caches (__pycache__/, .ipynb_checkpoints/)
  • System junk (.DS_Store, .vscode/)

Once something is in Git history, it's very hard to remove. When in doubt, add it to .gitignore first.

Summary

Programming:

  • Name every step, test with small examples, read code out loud when debugging
  • Use AI to explain, debug, and unstick yourself — but verify what it produces

Git and GitHub:

  • Fork the course repo, clone your fork, commit, push, open a PR
  • Write clear commit messages (future you will thank you)
  • Never commit secrets — use .gitignore

Get started now: fork the course repo and clone it before Friday.

Questions? Want to chat more?

📧 Email me
💬 Join our Slack
💁 Come to office hours
  • Thursday X-hour: Introduction to vibe coding
  • Friday: Assignment 3 brainstorm + release