Problem 1: exploring a real dataset

You have a list of songs, each a dict with title, artist, genre, and plays:


1songs = [
2    {'title': 'Blinding Lights', 'artist': 'The Weeknd', 'genre': 'pop',  'plays': 4_200_000_000},
3    {'title': 'Shape of You',    'artist': 'Ed Sheeran', 'genre': 'pop',  'plays': 3_800_000_000},
4    {'title': 'Bohemian Rhapsody','artist': 'Queen',     'genre': 'rock', 'plays': 2_400_000_000},
5    # ... many more ...
6]

Find the three genres with the highest average plays per song.

This looks simple but it's a complete analysis pipeline: group, aggregate, sort, slice. You'll likely do some version of this in every Part II project.

Programming basics

PSYC 81.09: Storytelling with Data

Today's agenda

Problem 1: exploring a real dataset

Problem 1: one reasonable solution

Problem 2: find and fix the bug

Problem 2: the bug

Part 2: Git and GitHub

Git vs GitHub

Key terms

The forking workflow

Good commit messages

Merge conflicts: don't panic

`.gitignore`: keep junk and secrets out

Summary

Questions? Want to chat more?

Programming basics

PSYC 81.09: Storytelling with Data

Today's agenda

Problem 1: exploring a real dataset

Problem 1: one reasonable solution

Problem 2: find and fix the bug

Problem 2: the bug

Part 2: Git and GitHub

Git vs GitHub

Key terms

The forking workflow

Good commit messages

Merge conflicts: don't panic

.gitignore: keep junk and secrets out

Summary

Questions? Want to chat more?

`.gitignore`: keep junk and secrets out