Discrete Math for Data Science

DSCI 220, 2025 W1

October 28, 2025

Announcements

Regular Expressions and DFAs

Goals

  • Recall a regex is a set of strings (a language).
  • Build small DFAs that recognize those sets.
  • Use closure tricks (∪, ∩, complement) to combine DFAs.

Two views on Regular Expressions

Meaning In Python/Pandas Anchors
Full-string match re.fullmatch(r'(bb+|c[ac]*|b?)', s)
df['col'].str.fullmatch(r'(bb+|c[ac]*|b?)')
Use ^ and $: r'^(...)$'
Substring search re.search(r'(bb+|c[ac]*|b?)', s)
df['col'].str.contains(r'(bb+|c[ac]*|b?)')
none

Warm-Up

Pattern: ^(0|1)*11(0|1)*$

Which strings are in the set?

  • 10101
  • 11010
  • 10001
  • 10110

Contains 11

Alphabet: {0,1}

States:

  • S0: have seen no 1s (start)
  • S1: just saw a 1
  • S2: saw substring 11 (accepting)

DFA Sketch:

DFA Ideas

Process a string from left to right, one character at a time.

  • State: what the prefix tells us so far.

  • Accept iff we finish in an accepting state.

  • Deterministic: at most one transition per symbol from every state.

  • Total: every state has a transition on every symbol.

DFA Definition

A Deterministic Finite Automaton consists of the following:

  1. a set of states \(S\)
  2. a finite alphabet \(\Sigma\)
  3. a transition function \(f\) that assigns a next state to each \((s, i)\) pair, \(s\in S\) and \(i\in \Sigma\).
  4. a start state \(s_0\)
  5. a set of accepting states \(F\subseteq S\)

Regex = DFA

A language is regular if and only if it is recognized by some DFA.

You Try: “Ends with Tea”

Target: strings that end with tea

Prefix Example: “Starts with Iced”

Plan:

  • Chain prefix states; before success, any mismatch → SINK.
  • After success, loop on any character in ACC.

Diagram:

One Last Example

Build the DFA corresponding to ^0(1*|0*)1$

Closure Tricks

What’s closure?

  • Complement: make the DFA total (add SINK if needed), then flip accept to non-accept, and vice-versa.
  • Union / Intersection:
    States are pairs (p,q)
    • Union accepts if p ∈ F1 or q ∈ F2
    • Intersection accepts if p ∈ F1 and q ∈ F2
  • Difference: L1 - L2 = \(L1\cap L2^c\)

Mini-Exercise

Build a DFA for the union of:

  • L1: contains 11 (use the machine we built)
  • L2: ends with 00 (three states: Tε, T0, T00)

Hint: sketch product states (S_i, T_j) and mark accept pairs for union.

Parity Demo

Language: even number of 1s over {0,1}.

  • EVEN (start, accept), ODD
  • on 1: toggle; on 0: stay

Try: which strings of length \(\leq\) 4 accept?