Discrete Math for Data Science

DSCI 220, 2025 W1

October 19, 2025

Announcements

Sets

Sets = Filters on Rows

 

Let \(U\) be all rows of a dataframe df.

  • A set \(A\subseteq U\) selects rows by a predicate \(P_A(x)\).
  • In code, a set is a Boolean mask: A_mask = P_A(row).

 

Key Ideas

  • Membership: \(x\in A \iff P_A(x)\iff\) row mask on \(x\) is True
  • Cardinality: \(|A| = \sum_{x\in U} \mathbf{1}_A(x)\iff\) A_mask.sum()
  • Subset: \(A\subseteq B \iff \forall x, P_A(x)\Rightarrow P_B(x)\iff\) ((A & ~B).sum()==0)

From subsets to dataframe masks

A Boolean mask over rows is an indicator vector \(\in\{0,1\}^{|U|}\).

  • Choosing rows is the same as selecting a subset of the universe \(U\).
  • \(|\)mask\(|\) (sum of Trues) = cardinality of that subset.

Math ⇄ Predicate ⇄ Mask

Let \(U\) be all rows in df.

  • Math (set): a subset \(A\subseteq U\)
  • Predicate: \(A=\{\,x\in U \mid P_A(x)\,\}\)
  • Dataframe: A_mask = P_A(row) then A_df = df[A_mask]

We’ll use these café predicates:

  • \(A=\) “Iced drinks”
    \(A=\{x\in U \mid x.\mathrm{is\_iced}=\text{True}\}\)
    A = df['is_iced']

  • \(B=\) “Non-dairy milk (Oat or Almond)”
    \(B=\{x\in U \mid x.\mathrm{milk}\in\{\text{'Oat','Almond'}\}\}\)
    B = df['milk'].str.strip().isin(['Oat','Almond'])

  • \(C=\) “High caffeine (≥150 mg)”
    \(C=\{x\in U \mid x.\mathrm{caffeine\_mg}\ge 150\}\)
    C = df['caffeine_mg'] >= 150

  • \(D=\) “Low calorie (≤150 cal)”
    \(D=\{x\in U \mid x.\mathrm{calories} \le 150\}\)
    D = df['calories'] <= 150

Operations: Three Ways to Say the Same Thing

Let \(A,B\subseteq U\).

Operation Math (set) Predicate (logic) Dataframe (mask)
Union \(A\cup B\) \(\{x: P_A(x)\lor P_B(x)\}\) (A | B)
Intersection \(A\cap B\) \(\{x: P_A(x)\land P_B(x)\}\) (A & B)
Difference \(A-B\) \(\{x: P_A(x)\land \neg P_B(x)\}\) (A & ~B)
Complement \(A^c\) \(\{x: \neg P_A(x)\}\) ~A
Symm. diff. \(A\triangle B\) \(\{x:(P_A\lor P_B)\land \neg(P_A\land P_B)\}\) A ^ B

Precedence note: use &, |, ~ with parentheses: (A & B) | (~C).

Union Example

Let
\(A=\{x\mid x.\mathrm{is\_iced}\}\) and
\(B=\{x\mid x.\mathrm{milk}\in\{\text{Oat, Almond}\}\}\).

  • Math: \(A\cup B\)
  • Predicate: \(\{\,x\in U \mid x.\mathrm{is\_iced}\ \lor\ x.\mathrm{milk}\in\{\text{Oat, Almond}\}\,\}\)
  • Mask: (df['is_iced']) | (df['milk'].str.strip().isin(['Oat','Almond']))

Café Orders

Cardinality

Identity

Experiment using the code below, and then complete the observation:

For any sets \(A\), and \(B\): ___________ = ___________

Inclusion-Exclusion

Experiment using the code below, and then complete the observation:

For any sets \(A\) and \(B\): \(|A\cup B| =\) _____________

Equality?

Two sets are equal if they contain exactly the same elements. Think of a way to test set equality using set operations.

Are sets \(X\) and \(Y\) equal?

Subsets

\(A\subseteq B \iff \forall x, P_A(x)\Rightarrow P_B(x)\iff\) ((A & ~B).sum()==0)

Are any of our sets \(A\), \(B\), \(C\), or \(D\) subsets of one another?

A Venn Puzzle

Fill in the Venn Diagram using the values for \(|U|\), \(|A|\), \(|B|\), \(|A\cup B|\). Verify your counts are correct by finding \(|A-B|\), \(|A\cap B|\), \(|B-A|\), and \(|U-(A\cup B)|\), via code.

U A B A - B A ∩ B B - A U - (A ∪ B)

Bounds on \(|A\cap B|\)

Give upper and lower bounds on \(|A\cap B|\) for any \(A,B\subseteq U\).

 

__________________ \(\leq |A\cap B|\leq\) __________________

 

U A B A - B A ∩ B B - A U - (A ∪ B)