CPSC 203, 2025 W2
February 5, 2026
Asking harder questions:
.isin() to filter by a listGiven a year of charts,
What do the paths of the number ones look like?
Do most songs debut at the top? How fast do they fall?
We can quantify these things, and analyze them, but a picture will help us see if there’s anything interesting in the data.
Draw a very loose sketch of the paths through the charts of all the songs that hit #1.
What are some reasonable axes?
Imagine what the path of one song looks like.
How many paths are there for all songs?
Tuesday: single snapshot (100 songs, 1 week)
Each row is one song on one week’s chart:
We can track a song’s journey through the charts!
First, filter to rows where rank == 1:
Many songs stay at #1 for multiple weeks. We want the unique song titles:
We have a list of song titles that hit #1.
Now we want all rows for those songs — their complete chart history.
How do we filter for existence in a list?
.isin() — Filter by List Membership.isin(list) returns True for rows where the value appears in the list.
We want to plot rank over time for each #1 song.
Our data looks like this (one row per song per week):
| date | title | rank |
|---|---|---|
| 2025-01-04 | Song A | 1 |
| 2025-01-04 | Song B | 15 |
| 2025-01-11 | Song A | 3 |
| 2025-01-11 | Song B | 1 |
But plotting wants one column per song:
| date | Song A | Song B |
|---|---|---|
| 2025-01-04 | 1 | 15 |
| 2025-01-11 | 3 | 1 |
groupby organizes rows into groups — here, one group per (date, title) pair.
Now we have a Series with a multi-level index (date, title).
.sum() might seem odd — but each group has exactly one row, so sum just extracts that value.
unstack()unstack() takes the inner index level (title) and makes it into columns.
Now each column is a song, each row is a date!
Open the Billboard (Visualization) activity, and load STUDENT_viz_nb.py.
Does a strong debut mean a song will stick around longer?
Or do slow climbers have more staying power?
Let’s investigate with a scatter plot!
To answer this, we need two pieces of info for each song:
The isNew column is True when a song first appears:
The weeks column shows how many weeks a song has been on the chart.
We want the maximum for each song:
Now we need to merge debut position with staying power:
| Concept | Code |
|---|---|
| Filter by condition | df[df['rank'] == 1] |
| Get unique values | df['title'].unique() |
| Filter by list | df[df['title'].isin(list)] |
| Reshape for plotting | .groupby([...]).unstack() |
https://pymotw.com/2/datetime/
https://www.dataschool.io/best-python-pandas-resources/
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
https://queirozf.com/entries/pandas-dataframe-plot-examples-with-matplotlib-pyplot