Data Structures and Algorithms for Data Science

DSCI 221, 2025 W2

February 11, 2026

Announcements

Dictionaries

Dictionaries are Python’s key-value lookup structure.

Beware: KeyError!

Why Dictionaries?

Lists: Access by position (index)

fruit = [("apple","red"), ("banana","yellow"), ("cherry","red")]
fruit[0], fruit[1], fruit[2]  # How do you look for "apple"?

Dictionaries: Access by meaningful key

fruit = {"apple":"red", "banana":"yellow", "cherry":"red"}
fruit["apple"], fruit["banana"], fruit["cherry"]  # Self-documenting!

Dictionaries are also fast — We hope for O(1)…

Dictionary Patterns

Pattern 0: Records

Alternative to a data frame or named tuple:

Pattern 1: Membership

Find everyone who was in Gallery A and Gallery B.

The Naive Approach: Check Every Pair

This works… but what if the lists are LONG?

The Problem with Nested Loops

With 10 people in Gallery A and 12 in Gallery B: 10 × 12 = 120 comparisons

With 10,000 in each: 100,000,000 comparisons!

Pattern 1: Membership with a Dictionary

Why Is This Faster?

  • Step 1: Look at each Gallery A person once → 10 operations
  • Step 2: Look at each Gallery B person once → 12 operations
  • Total: 10 + 12 = 22 operations (not 120!)

Dictionary lookup is instant — it doesn’t matter how many keys are stored!

The Pattern: Set Intersection

from collections import defaultdict
seen_in_first = defaultdict(bool)
for item in first_list:
    seen_in_first[item] = True

in_both = [item for item in second_list if seen_in_first[item]]

Use when: Finding common elements, detecting duplicates

Pattern 2: Counting Frequencies

Use case: Count how many times each item appears.

Pattern 3: Finding Complements

Use case: Find two numbers that add up to a target.

As we scan, we remember what we’ve seen — and check if the complement exists!

Pattern 4: Grouping by Category

Use case: Organize items into categories

Grouping with Dictionaries

What if our items are dictionaries, not tuples?

Same pattern — just access the key with suspect["alibi"] instead of unpacking a tuple!

Let’s Practice!

Open the Data Heist activity. You’ll solve 4 puzzles with LARGE datasets.

Inspired by Advent of Code puzzles!

PrairieLearn Activity

The Dictionary Patterns

Pattern Use Case Technique
Record Store named fields {"name": ..., "age": ...}
Membership Track what we’ve seen seen[x] = True
Counting Count occurrences Counter(items)
Complement Find pairs Store & check complements
Grouping Organize by category defaultdict(list)