DSCI 220, 2025 W1
October 28, 2025
| Meaning | In Python/Pandas | Anchors |
|---|---|---|
| Full-string match | re.fullmatch(r'(bb+|c[ac]*|b?)', s)df['col'].str.fullmatch(r'(bb+|c[ac]*|b?)') |
Use ^ and $: r'^(...)$' |
| Substring search | re.search(r'(bb+|c[ac]*|b?)', s)df['col'].str.contains(r'(bb+|c[ac]*|b?)') |
none |
Pattern: ^(0|1)*11(0|1)*$
Which strings are in the set?
10101110101000110110Alphabet: {0,1}
States:
S0: have seen no 1s (start)S1: just saw a 1S2: saw substring 11 (accepting)DFA Sketch:
Process a string from left to right, one character at a time.
State: what the prefix tells us so far.
Accept iff we finish in an accepting state.
Deterministic: at most one transition per symbol from every state.
Total: every state has a transition on every symbol.
A Deterministic Finite Automaton consists of the following:
A language is regular if and only if it is recognized by some DFA.
Target: strings that end with tea
Plan:
SINK.ACC.Diagram:
Build the DFA corresponding to ^0(1*|0*)1$
What’s closure?
(p,q)
p ∈ F1 or q ∈ F2p ∈ F1 and q ∈ F2L1 - L2 = \(L1\cap L2^c\)Build a DFA for the union of:
L1: contains 11 (use the machine we built)L2: ends with 00 (three states: Tε, T0, T00)Hint: sketch product states (S_i, T_j) and mark accept pairs for union.
Language: even number of 1s over {0,1}.
EVEN (start, accept), ODD1: toggle; on 0: stayTry: which strings of length \(\leq\) 4 accept?