DSCI 220, 2025 W1
November 24, 2025
What is a function?
You’ve seen things like:
Informal idea:
A function takes an input and gives a single output.
Today: we’ll make this more precise in a way that specifically applies to data science.
Let
Then the Cartesian product \(X \times Y\) is:
\[ X \times Y = \{(1,a), (1,b), \dots, (3,c)\} \]
All possible ordered pairs with first component in \(X\) and second in \(Y\).
A function \(f : X \to Y\) is a subset of \(X \times Y\) with a special property:
- For every \(x \in X\),
- there is exactly one \(y \in Y\)
such that \((x, y)\) is in the subset \(f\).
We have special names for \(X\) and \(Y\):
With \(X = \{1,2,3\}, Y = \{a,b\}\), which of these are functions \(X \to Y\)?
Discuss with a neighbor: which ones violate “every” or “exactly one”?
Let \(X = \{1,2,3\}\), \(Y = \{a,b\}\).
How many functions \(f : X \to Y\) are there?
Let \(|X| = n\), \(|Y| = m\).
For each of the \(n\) inputs in \(X\):
Choices are independent, so:
\[ \#\{f : X \to Y\} = m^n. \]
Special case: if \(Y = \{0,1\}\), then there are \(2^n\) different functions \(X \to \{0,1\}\).
Work with a neighbor:
Algebra view:
Data science view:
It’s still a function: inputs to outputs.
Example: recommending a drink based on conditions.
Define a rule:
This is a function:
\[ f : [0,40] \times [0,10] \to \{\text{hot drink}, \text{iced drink}\}. \]
Each pair of numbers gets exactly one label.
Suppose our feature vectors are length-3 binary:
\[ X = \{0,1\}^3 = \{(0,0,0), (0,0,1), \dots, (1,1,1)\}. \]
There are \(|X| = 2^3 = 8\) possible inputs.
Let \(Y = \{0,1\}\) (e.g., “no / yes”, “up / down”, “negative / positive”).
A classifier here is just a function
\[ f : X \to \{0,1\}. \]
How many such classifiers are there?
We already know:
\[ \#\{f : X \to \{0,1\}\} = 2^{|X|}. \]
Here \(|X| = 8\), so:
\[ \#\text{classifiers} = 2^8 = 256. \]
Even in this tiny universe with 3 binary features, there are 256 different labeling rules.
Each one says, for each of the 8 feature vectors, whether the label is 0 or 1.
In ML:
\[ D = \{(x_1, y_1), \dots, (x_n, y_n)\} \]
where each \(y_i = f(x_i)\).
Use the 3-bit feature space:
\[ X = \{0,1\}^3 \]
and labels in \(Y = \{0,1\}\).
Suppose the true function \(f\) is unknown, but we see data:
That’s 4 labeled points out of the 8 possible inputs.
Question:
How many different functions \(h : X \to \{0,1\}\) agree with these 4 labeled examples?
So there are
\[ 2^4 = 16 \]
different functions that fit the data perfectly.
The data does not uniquely determine the function, even in this tiny world.
In general:
The set of all possible classifiers is:
\[ \{ f : X \to Y \} \]
and its size is:
\[ \#\{f : X \to Y\} = m^N. \]
For binary labels \(Y = \{0,1\}\): \(2^N\) possible classifiers.
This is called the hypothesis space if we allow all functions.
In practice, ML algorithms restrict to a much smaller family.
Example:
We don’t need to know the exact number; just that it’s unimaginably large.
Key point:
Even for modest feature spaces, the set of all possible labelings (all functions) is enormous.
A learning algorithm can’t search all functions — it explores a tiny, structured subset.
Today:
Next time: