DSCI 220: Discrete Math for Data Science – Representation as Encoding

Goals

DON’T FORGET THE MAGIC TRICK!

Consider an encoding into binary:

\[(b_{k-1}\dots b_1 b_0)_2=\sum_{i=0}^{k-1} b_i 2^i,\quad b_i\in\{0,1\}\]

We are not flummoxed by this, because:

\[(d_{k-1}\dots d_1 d_0)_{10}=\sum_{i=0}^{k-1} d_i 10^i,\quad d_i\in\{0,1,\ldots,9\}\]

How many unique values can we store in \(n\) bits?

How many bits to represent \(m\) values?

How many codes with…

How many bits to count to…

Claim: With \(b\) bits we can represent \(2^b\) values.

int (pure Python): arbitrary precision (variable width)
NumPy / Pandas arrays: fixed-width datatypes (int8, int32, int64, uint64, …)

Demo:

Consider addition, with 4 bits:

Punchline: Overflow occurs when the number you want to represent is too big to fit into the space you’ve reserved for it.

Unsigned integers are represented by their encodings as discussed above.

\(b\) bits typically represent values \([0..2^b-1]\).

Signed integers have a different encoding (that we won’t discuss) but we can still guess the values that can be encoded:

\(b\) bits typically represent values ____________.

For each, compute bits needed or max value:

In words: what does \(\lceil\log_2 N\rceil\) tell you?