Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 2: Terminology, Baselines, Decision Trees

Lecture 2: Terminology, Baselines, Decision Trees

UBC 2025-26

Imports, Announcements, LOs

Imports

import os
import re
import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

sys.path.append(os.path.join(os.path.abspath(".."), "code"))
import graphviz
import IPython
import mglearn
from IPython.display import HTML, display
from plotting_functions import *
from sklearn.dummy import DummyClassifier
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, export_graphviz
from utils import *

plt.rcParams["font.size"] = 16
pd.set_option("display.max_colwidth", 200)
%matplotlib inline

DATA_DIR = '../data/' 
/Users/kvarada/CS/2025-26/330/cpsc330-2025W1/lectures/code/utils.py:46: SyntaxWarning: invalid escape sequence '\d'
  new_text = re.sub('samples = \d+\n', '', text.get_text()) # Hide samples



Learning outcomes

By the end of this lesson, you will be able to:

  • Define key machine learning terminology:
    features, targets, predictions, training, error, classification vs. regression, supervised vs. unsupervised learning, hyperparameters vs. parameters, baselines, decision boundaries

  • Build a simple machine learning model in scikit-learn, explaining the fitpredict workflow and evaluating performance with the score method

  • Describe at a high level how decision trees are trained (fitting) and how they make predictions

  • Implement and visualize decision trees in scikit-learn using DecisionTreeClassifier and DecisionTreeRegressor





Terminology [video]

You will see a lot of variable terminology in machine learning and statistics. Let’s familiarize ourselves with some of the basic terminology used in ML.

Big picture and datasets

In this lecture, we’ll talk about our first machine learning model: Decision trees. We will also familiarize ourselves with some common terminology in supervised machine learning.

Toy datasets

Later in the course we will use larger datasets from Kaggle, for instance. But for our first couple of lectures, we will be working with the following three toy datasets:

I’ll be using the following grade prediction toy dataset to demonstrate the terminology. Imagine that you are taking a course with four home work assignments and two quizzes. You and your friends are quite nervous about your quiz2 grades and you want to know how will you do based on your previous performance and some other attributes. So you decide to collect some data from your friends from last year and train a supervised machine learning model for quiz2 grade prediction.

classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
print(classification_df.shape)
classification_df.head()
(21, 8)
Loading...

Recap: Supervised machine learning

Tabular data

In supervised machine learning, the input data is typically organized in a tabular format, where rows are examples and columns are features. One of the columns is typically the target.

Features
Features are relevant characteristics of the problem, usually suggested by experts. Features are typically denoted by XX and the number of features is usually denoted by dd.
Target
Target is the feature we want to predict (typically denoted by yy).
Example
A row of feature values. When people refer to an example, it may or may not include the target corresponding to the feature values, depending upon the context. The number of examples is usually denoted by nn.
Training
The process of learning the mapping between the features (XX) and the target (yy).

Example: Tabular data for grade prediction

The tabular data usually contains both: the features (X) and the target (y).

classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
classification_df.head()
Loading...

So the first step in training a supervised machine learning model is separating X and y.

X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]
X.head()
Loading...
y.head()
0 A+ 1 not A+ 2 not A+ 3 A+ 4 A+ Name: quiz2, dtype: object

Example: Tabular data for the housing price prediction

Here is an example of tabular data for housing price prediction. You can download the data from here.

housing_df = pd.read_csv(DATA_DIR + "kc_house_data.csv")
housing_df.drop(["id", "date"], axis=1, inplace=True)
HTML(housing_df.head().to_html(index=False))
Loading...
X = housing_df.drop(columns=["price"])
y = housing_df["price"]
X.head()
Loading...
y.head()
0 221900.0 1 538000.0 2 180000.0 3 604000.0 4 510000.0 Name: price, dtype: float64
X.shape
(21613, 18)



Alternative terminology for examples, features, targets, and training

  • examples = rows = samples = records = instances

  • features = inputs = predictors = explanatory variables = regressors = independent variables = covariates

  • targets = outputs = outcomes = response variable = dependent variable = labels (if categorical).

  • training = learning = fitting



Supervised learning vs. Unsupervised learning

In supervised learning, training data comprises a set of features (XX) and their corresponding targets (yy). We wish to find a model function ff that relates XX to yy. Then use that model function to predict the targets of new examples.

In unsupervised learning training data consists of observations (XX) without any corresponding targets. Unsupervised learning could be used to group similar things together in XX or to provide concise summary of the data. We’ll learn more about this topic in later videos.

Supervised machine learning is about function approximation, i.e., finding the mapping function between X and y whereas unsupervised machine learning is about concisely describing the data.



Classification vs. Regression

In supervised machine learning, there are two main kinds of learning problems based on what they are trying to predict.

  • Classification problem: predicting among two or more discrete classes

    • Example1: Predict whether a patient has a liver disease or not

    • Example2: Predict whether a student would get an A+ or not in quiz2.

  • Regression problem: predicting a continuous value

    • Example1: Predict housing prices

    • Example2: Predict a student’s score in quiz2.

# quiz2 classification toy data
classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
classification_df.head(4)
Loading...
# quiz2 regression toy data
regression_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-regression.csv")
regression_df.head(4)
Loading...
classification_df
Loading...
classification_df.shape
(21, 8)

❓❓ Questions for you

Exercise 2.1 Select all of the following statements which are examples of supervised machine learning

  1. How many examples and features are there in the housing price data above? You can use df.shape to get number of rows and columns in a dataframe.

  2. For each of the following examples what would be the relevant features and what would be the target?

    1. Sentiment analysis

    2. Fraud detection

    3. Face recognition



iClicker Exercise 2.2 Supervised vs unsupervised

Select all of the following statements which are examples of supervised machine learning

  • (A) Finding groups of similar properties in a real estate data set.

  • (B) Predicting whether someone will have a heart attack or not on the basis of demographic, diet, and clinical measurement.

  • (C) Grouping articles on different topics from different news sources (something like the Google News app).

  • (D) Detecting credit card fraud based on examples of fraudulent and non-fraudulent transactions.

  • (E) Given some measure of employee performance, identify the key factors which are likely to influence their performance.



iClicker Exercise 2.3 Classification vs regression

Select all of the following statements which are examples of regression problems

  • (A) Predicting the price of a house based on features such as number of bedrooms and the year built.

  • (B) Predicting if a house will sell or not based on features like the price of the house, number of rooms, etc.

  • (C) Predicting percentage grade in CPSC 330 based on past grades.

  • (D) Predicting whether you should bicycle tomorrow or not based on the weather forecast.

  • (E) Predicting appropriate thermostat temperature based on the wind speed and the number of people in a room.





Baselines [video]

Supervised learning (Reminder)

  • Training data \rightarrow Machine learning algorithm \rightarrow ML model

  • Unseen test data + ML model \rightarrow predictions

Let’s build a very simple supervised machine learning model for quiz2 grade prediction problem.

classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
classification_df.head()
Loading...
classification_df['quiz2'].value_counts()
quiz2 not A+ 11 A+ 10 Name: count, dtype: int64

Seems like “not A+” occurs more frequently than “A+”. What if we predict “not A+” all the time?

Baselines

Baseline
A simple machine learning algorithm based on simple rules of thumb.
  • For example, most frequent baseline always predicts the most frequent label in the training set.

  • Baselines provide a way to sanity check your machine learning model.

DummyClassifier

  • sklearn’s baseline model for classification

  • Let’s train DummyClassifier on the grade prediction dataset.

Steps to train a classifier using sklearn

  1. Read the data

  2. Create XX and yy

  3. Create a classifier object

  4. fit the classifier

  5. predict on new examples

  6. score the model

Reading the data

classification_df.head()
Loading...

Create XX and yy

  • XX → Feature vectors

  • yy → Target

X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]

Create a classifier object

  • import the appropriate classifier

  • Create an object of the classifier

from sklearn.dummy import DummyClassifier # import the classifier

dummy_clf = DummyClassifier(strategy="most_frequent") # Create a classifier object

fit the classifier

  • The “learning” is carried out when we call fit on the classifier object.

dummy_clf.fit(X, y); # fit the classifier

predict the target of given examples

  • We can predict the target of examples by calling predict on the classifier object.

dummy_clf.predict(X) # predict using the trained classifier
array(['not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+'], dtype='<U6')

score your model

  • How do you know how well your model is doing?

  • For classification problems, by default, score gives the accuracy of the model, i.e., proportion of correctly predicted targets.

    accuracy=correct predictionstotal examplesaccuracy = \frac{\text{correct predictions}}{\text{total examples}}

print("The accuracy of the model on the training data: %0.3f" % (dummy_clf.score(X, y)))
The accuracy of the model on the training data: 0.524
  • Sometimes you will also see people reporting error, which is usually 1accuracy1 - accuracy

  • score

    • calls predict on X

    • compares predictions with y (true targets)

    • returns the accuracy in case of classification.

print(
    "The error of the model on the training data: %0.3f" % (1 - dummy_clf.score(X, y))
)
The error of the model on the training data: 0.476

fit, predict , and score summary

Here is the general pattern when we build ML models using sklearn.

# Create `X` and `y` from the given data
X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]

clf = DummyClassifier(strategy="most_frequent") # Create a class object
clf.fit(X, y) # Train/fit the model
print(clf.score(X, y)) # Assess the model

new_examples = [[0, 1, 92, 90, 95, 93, 92], [1, 1, 92, 93, 94, 92]]
clf.predict(new_examples) # Predict on some new data using the trained model
0.5238095238095238
array(['not A+', 'not A+'], dtype='<U6')

DummyRegressor

You can also do the same thing for regression problems using DummyRegressor, which predicts mean, median, or constant value of the training set for all examples.

  • Let’s build a regression baseline model using sklearn.

from sklearn.dummy import DummyRegressor

regression_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-regression.csv") # Read data 
X = regression_df.drop(columns=["quiz2"]) # Create `X` and `y` from the given data
y = regression_df["quiz2"]
reg = DummyRegressor() # Create a class object
reg.fit(X, y) # Train/fit the model
reg.score(X, y) # Assess the model
new_examples = [[0, 1, 92, 90, 95, 93, 92], [1, 1, 92, 93, 94, 92]]
reg.predict(new_examples) # Predict on some new data using the trained model
array([86.28571429, 86.28571429])
  • The fit and predict paradigms similar to classification. The score method in the context of regression returns somethings called R2R^2 score. (More on this in later videos.)

    • The maximum R2R^2 is 1 for perfect predictions.

    • For DummyRegressor it returns the mean of the y values.

reg.score(X, y)
0.0



❓❓ Questions for you

Exercise 2.4

  1. Order the steps below to build ML models using sklearn.

    • score to evaluate the performance of a given model

    • predict on new examples

    • Creating a model instance

    • Creating X and y

    • fit





Break (5 min)

  • We will try to take a 5-minute break half way through every class.





Decision trees [video]

Writing a traditional program to predict quiz2 grade

  • Can we do better than the baseline?

  • Forget about ML for a second. If you are asked to write a program to predict whether a student gets an A+ or not in quiz2, how would you go for it?

  • For simplicity, let’s binarize the feature values.

  • Is there a pattern that distinguishes yes’s from no’s and what does the pattern say about today?

  • How about a rule-based algorithm with a number of if else statements?

    if class_attendance == 1 and quiz1 == 1:
        quiz2 == "A+"
    elif class_attendance == 1 and lab3 == 1 and lab4 == 1:
        quiz2 == "A+"
    ...
  • How many possible rule combinations there could be with the given 7 binary features?

    • Gets unwieldy pretty quickly

Decision tree algorithm

  • A machine learning algorithm to derive such rules from data in a principled way.

  • Have you ever played 20-questions game? Decision trees are based on the same idea!

  • Let’s fit a decision tree using scikit-learn and predict with it.

  • Recall that scikit-learn uses the term fit for training or learning and uses predict for prediction.

Building decision trees with sklearn

Let’s binarize our toy dataset for simplicity.

classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]

X_binary = X.copy()
columns = ["lab1", "lab2", "lab3", "lab4", "quiz1"]
for col in columns:
    X_binary[col] = X_binary[col].apply(lambda x: 1 if x >= 90 else 0)
X_binary.head()
Loading...
y.head()
0 A+ 1 not A+ 2 not A+ 3 A+ 4 A+ Name: quiz2, dtype: object

DummyClassifier on quiz2 grade prediction toy dataset

dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(X_binary, y)
dummy_clf.score(X_binary, y)
0.5238095238095238

DecisionTreeClassifier on quiz2 grade prediction toy dataset

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier() # Create a decision tree
model.fit(X_binary, y) # Fit a decision tree
model.score(X_binary, y) # Assess the model
0.9047619047619048

The decision tree classifier is giving much higher accuracy than the dummy classifier. That’s good news!

# Call the custom_plot_tree function to visualize the customized tree
width=12 
height = 8
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,
                 fontsize=10,)
<Figure size 1200x800 with 1 Axes>

Here is a commonly used terminology in a typical representation of decision trees.

A root node
represents the first condition to check or question to ask
A branch
connects a node (condition) to the next node (condition) in the tree. Each branch typically represents either true or false.
An internal node
represents conditions within the tree
A leaf node
represents the predicted class/value when the path from root to the leaf node is followed.
Tree depth
The number of edges on the path from the root node to the farthest away leaf node.

How does predict work?

new_example = np.array([[0, 1, 0, 0, 1, 1, 1]])
pd.DataFrame(data=new_example, columns=X.columns)
Loading...
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,
                 fontsize=10)
<Figure size 1200x800 with 1 Axes>

What’s the prediction for the new example?

model.predict(new_example)
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names
  warnings.warn(
array(['A+'], dtype=object)

In summary, given a learned tree and a test example, during prediction time,

  • Start at the top of the tree. Ask binary questions at each node and follow the appropriate path in the tree. Once you are at a leaf node, you have the prediction.

  • Note that the model only considers the features which are in the learned tree and ignores all other features.

How does fit work?

  • Decision tree is inspired by 20-questions game.

  • Each node either represents a question or an answer. The terminal nodes (called leaf nodes) represent answers.

plot_fruit_tree() # defined in code/plotting_functions.py
<Figure size 640x480 with 1 Axes>

How does fit work?

  • Which features are most useful for classification?

  • Minimize impurity at each question

  • Common criteria to minimize impurity: gini index, information gain, cross entropy

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier() # Create a decision tree
model.fit(X_binary, y) # Fit a decision tree
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 fontsize=10)
<Figure size 1200x800 with 1 Axes>

Decision trees with continuous features

X.head()
Loading...
model = DecisionTreeClassifier()
model.fit(X, y)
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,
                 fontsize=10,)
<Figure size 1200x800 with 1 Axes>

Decision tree for regression problems

  • We can also use decision tree algorithm for regression.

  • Instead of gini, we use some other criteria for splitting. A common one is mean squared error (MSE). (More on this in later videos.)

  • scikit-learn supports regression using decision trees with DecisionTreeRegressor

    • fit and predict paradigms similar to classification

    • score returns somethings called R2R^2 score.

      • The maximum R2R^2 is 1 for perfect predictions.

      • It can be negative which is very bad (worse than DummyRegressor).

regression_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-regression.csv")
regression_df.head()
Loading...
X = regression_df.drop(["quiz2"], axis=1)
y = regression_df["quiz2"]

depth = 2
reg_model = DecisionTreeRegressor(max_depth=depth)
reg_model.fit(X, y); 
regression_df["predicted_quiz2"] = reg_model.predict(X)
print("R^2 score on the training data: %0.3f\n\n" % (reg_model.score(X, y)))
regression_df.head()
R^2 score on the training data: 0.989


Loading...



❓❓ Questions for you

iClicker Exercise 2.5: Baselines and decision trees

Select all of the following statements which are TRUE.

  • (A) Change in features (i.e., binarizing features above) would change DummyClassifier predictions.

  • (B) predict takes only X as argument whereas fit and score take both X and y as arguments.

  • (C) For the decision tree algorithm to work, the feature values must be binary.

  • (D) The prediction in a decision tree works by routing the example from the root to the leaf.





More terminology [video]

  • Parameters and hyperparameters

  • Decision boundary

Parameters

  • The decision tree algorithm primarily learns two things:

    • the best feature to split on

    • the threshold for the feature to split on at each node

  • These are called parameters of the decision tree model.

  • When predicting on new examples, we need parameters of the model.

classification_df = pd.read_csv(DATA_DIR + "quiz2-grade-toy-classification.csv")
X = classification_df.drop(columns=["quiz2"])
y = classification_df["quiz2"]
model = DecisionTreeClassifier()
model.fit(X, y);
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,
                 fontsize=10,)
<Figure size 1200x800 with 1 Axes>
  • With the default setting, the nodes are expanded until all leaves are “pure”.

  • The decision tree is creating very specific rules, based on just one example from the data.

  • Is it possible to control the learning in any way?

    • Yes! One way to do it is by controlling the depth of the tree, which is the length of the longest path from the tree root to a leaf.

Decision tree with max_depth=1

Decision stump
A decision tree with only one split (depth=1) is called a decision stump.
model = DecisionTreeClassifier(max_depth=1)
model.fit(X, y)
width=8;height=2
plt.figure(figsize=(width, height))
custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,                
                 fontsize=12)
<Figure size 800x200 with 1 Axes>

max_depth is a hyperparameter of DecisionTreeClassifier.

Decision tree with max_depth=3

model = DecisionTreeClassifier(
    max_depth=3
)  # Let's try another value for the hyperparameter
model.fit(X, y)
width=10;height=5
plt.figure(figsize=(width, height))

custom_plot_tree(model, 
                 feature_names=X_binary.columns.tolist(), 
                 class_names=['A+', 'not A+'],
                 impurity=False,                
                 fontsize=12)
<Figure size 1000x500 with 1 Axes>

Parameters and hyperparameters: Summary

Parameters
When you call fit, a bunch of values get set, like the features to split on and split thresholds. These are called parameters. These are learned by the algorithm from the data during training. We need them during prediction time.
Hyperparameters
Even before calling fit on a specific data set, we can set some “knobs” that control the learning. These are called hyperparameters. These are specified based on: expert knowledge, heuristics, or systematic/automated optimization (more on this in the coming lectures).

Above we looked at the max_depth hyperparameter. Some other commonly used hyperparameters of decision tree are:

  • min_samples_split

  • min_samples_leaf

  • max_leaf_nodes

Decision boundary

What do we do with learned models? So far we have been using them to predict the class of a new instance. Another way to think about them is to ask: what sort of test examples will the model classify as positive, and what sort will it classify as negative?

Example 1: quiz 2 grade prediction

For visualization purposes, let’s consider a subset of the data with only two features.

X_subset = X[["lab4", "quiz1"]]
X_subset.head()
Loading...
Decision boundary for max_depth=1
depth = 1  # decision stump
model = DecisionTreeClassifier(max_depth=depth)
model.fit(X_subset.values, y)
plot_tree_decision_boundary_and_tree(
    model, X_subset, y, x_label="lab4", y_label="quiz1", fontsize=15
)
<Figure size 1600x600 with 2 Axes>

We assume geometric view of the data. Here, the red region corresponds to “not A+” class and blue region corresponds to “A+” class. And there is a line separating the red region and the blue region which is called the decision boundary of the model. Different models have different kinds of decision boundaries. In decision tree models, when we are working with only two features, the decision boundary is made up of horizontal and vertical lines. In the example above, the decision boundary is created by asking one question lab4 <= 84.5.

Decision boundary for max_depth=2
model = DecisionTreeClassifier(max_depth=2)
model.fit(X_subset.values, y)
plot_tree_decision_boundary_and_tree(
    model, X_subset, y, x_label="lab4", y_label="quiz1", fontsize=12
)
<Figure size 1600x600 with 2 Axes>

The decision boundary, i.e., the model gets a bit more complicated.

Decision boundary for max_depth=5
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_subset.values, y)
plot_tree_decision_boundary_and_tree(
    model, X_subset, y, x_label="lab4", y_label="quiz1", fontsize=8
)
<Figure size 1600x600 with 2 Axes>

The decision boundary, i.e., the model gets even more complicated with max_depth=5.



Example 2: Predicting country using the longitude and latitude

Imagine that you are given longitude and latitude of some border cities of USA and Canada along with which country they belong to. Using this training data, you are supposed to come up with a classification model to predict whether a given longitude and latitude combination is in the USA or Canada.

### US Canada cities data
df = pd.read_csv(DATA_DIR + "canada_usa_cities.csv")
df
Loading...
X = df[["longitude", "latitude"]]
y = df["country"]
mglearn.discrete_scatter(X.iloc[:, 0], X.iloc[:, 1], y)
plt.xlabel("longitude")
plt.ylabel("latitude");
<Figure size 640x480 with 1 Axes>
Real boundary between Canada and USA

In real life we know what’s the boundary between USA and Canada.

Source

Here we want to pretend that we do not know this boundary and we want to infer this boundary based on the limited training examples given to us.

model = DecisionTreeClassifier(max_depth=1)
model.fit(X.values, y)
plot_tree_decision_boundary_and_tree(
    model,
    X,
    y,
    height=6,
    width=16,
    fontsize=15,
    eps=10,
    x_label="longitude",
    y_label="latitude",
)
<Figure size 1600x600 with 2 Axes>
model = DecisionTreeClassifier(max_depth=2)
model.fit(X.values, y)
plot_tree_decision_boundary_and_tree(
    model,
    X,
    y,
    height=6,
    width=16,
    fontsize=12,
    eps=10,
    x_label="longitude",
    y_label="latitude",
)
<Figure size 1600x600 with 2 Axes>

Practice exercises

  • If you want more practice, check out module 2 in this online course. All the sections without video or notes symbol are exercises.





Final comments, summary, and reflection

What did we learn today?

  • There is a lot of terminology and jargon used in ML. Some of the basic terminology includes:

    • Features, target, examples, training

    • Supervised vs. Unsupervised machine learning

    • Classification and regression

    • Accuracy and error

    • Parameters and hyperparameters

    • Decision boundary

  • Baselines and steps to train a supervised machine learning model

    • Baselines serve as reference points in ML workflow.

  • Decision trees

    • are models that make predictions by sequentially looking at features and checking whether they are above/below a threshold

    • learn a hierarchy of if/else questions, similar to questions you might ask in a 20-questions game.

    • learn axis-aligned decision boundaries (vertical and horizontal lines with 2 features)

    • One way to control the complexity of decision tree models is by using the depth hyperparameter (max_depth in sklearn).