Programming, problem solving, and algorithms

CPSC 203, 2025 W2

February 5, 2026

Announcements

Lab 5 is this week

Today

Asking harder questions:

Working with multiple weeks of data
Visualizing song trajectories
Using .isin() to filter by a list

Today’s challenge…

Given a year of charts,

What do the paths of the number ones look like?
Do most songs debut at the top? How fast do they fall?

We can quantify these things, and analyze them, but a picture will help us see if there’s anything interesting in the data.

Sketch

Draw a very loose sketch of the paths through the charts of all the songs that hit #1.

What are some reasonable axes?
Imagine what the path of one song looks like.
How many paths are there for all songs?

Key steps:

Assemble the data we want to analyze.
Find the list of Number 1 songs.
For each of those songs, make a list of their ranks over time.
Plot each song’s trajectory!

Step 1: One Week vs. One Year

Tuesday: single snapshot (100 songs, 1 week)

Step 1: Structure of Multi-Week Data

Each row is one song on one week’s chart:

We can track a song’s journey through the charts!

Step 2: Which Songs Hit #1?

First, filter to rows where rank == 1:

Step 2: Get Unique Titles

Many songs stay at #1 for multiple weeks. We want the unique song titles:

Step 3: The Problem

We have a list of song titles that hit #1.

Now we want all rows for those songs — their complete chart history.

How do we filter for existence in a list?

Step 3: `.isin()` — Filter by List Membership

.isin(list) returns True for rows where the value appears in the list.

Step 4: Plot one Line Per Song

We want to plot rank over time for each #1 song.

X-axis: _________
Y-axis: _________ (note: _________)
One line per song

Step 4: Reshaping for Plotting

Our data looks like this (one row per song per week):

date	title	rank
2025-01-04	Song A	1
2025-01-04	Song B	15
2025-01-11	Song A	3
2025-01-11	Song B	1

But plotting wants one column per song:

date	Song A	Song B
2025-01-04	1	15
2025-01-11	3	1

Step 4A: Breaking Down the Chain

groupby organizes rows into groups — here, one group per (date, title) pair.

Step 4B: Extract the rank

Now we have a Series with a multi-level index (date, title).

.sum() might seem odd — but each group has exactly one row, so sum just extracts that value.

Step 4C: Pivot with `unstack()`

unstack() takes the inner index level (title) and makes it into columns.

Now each column is a song, each row is a date!

Step 4D: The Plot

Let’s Write Code

Open the Billboard (Visualization) activity, and load STUDENT_viz_nb.py.

PrairieLearn Activity

Discussion

What patterns do you see in the #1 songs?
Do most songs debut at the top?
How quickly do they typically fall?
Any songs with unusual trajectories?

Part 2: Staying Power

The Question

Does a strong debut mean a song will stick around longer?

Or do slow climbers have more staying power?

Let’s investigate with a scatter plot!

What We Need

To answer this, we need two pieces of info for each song:

Debut position — where did it first appear on the chart?
Total weeks — how long did it stay on the chart?

Finding Debut Position

The isNew column is True when a song first appears:

Finding Total Weeks

The weeks column shows how many weeks a song has been on the chart.

We want the maximum for each song:

Combining the Data

Now we need to merge debut position with staying power:

The Scatter Plot

What Do You See?

Is there a correlation between debut position and staying power?
Any outliers — songs that debuted low but stayed forever?
Or songs that debuted at #1 but disappeared quickly?

Summary

Concept	Code
Filter by condition	`df[df['rank'] == 1]`
Get unique values	`df['title'].unique()`
Filter by list	`df[df['title'].isin(list)]`
Reshape for plotting	`.groupby([...]).unstack()`

Resources

https://pymotw.com/2/datetime/

https://www.dataschool.io/best-python-pandas-resources/

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

https://queirozf.com/entries/pandas-dataframe-plot-examples-with-matplotlib-pyplot