Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 12: Feature importances and model transparency

Lecture 12: Feature importances and model transparency

UBC 2025-26

Imports, LOs

Imports

import os
import string
import sys
from collections import deque

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

sys.path.append(os.path.join(os.path.abspath(".."), "code"))
import seaborn as sns
from plotting_functions import *
from sklearn import datasets
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.model_selection import (
    GridSearchCV,
    RandomizedSearchCV,
    cross_val_score,
    cross_validate,
    train_test_split,
)
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler
from sklearn.svm import SVC, SVR
from sklearn.tree import DecisionTreeClassifier
from utils import *

DATA_DIR = os.path.join(os.path.abspath(".."), "data/")
%matplotlib inline

Learning outcomes

From this lecture, students are expected to be able to:

  • Interpret the coefficients of linear regression for ordinal, one-hot encoded categorical, and scaled numeric features.

  • Explain why interpretability is important in ML.

  • Use feature_importances_ attribute of sklearn models and interpret its output.

  • Apply SHAP to assess feature importances and interpret model predictions.

  • Explain force plot, summary plot, and dependence plot produced with shapely values.

import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)



I’m using seaborn in this lecture for easy heatmap plotting, which is not in the course environment. You can install it as follows.

> conda activate cpsc330
> conda install -c anaconda seaborn
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

Data

In the first part of this lecture, we’ll be using Kaggle House Prices dataset, the dataset we used in lecture 10. As usual, to run this notebook you’ll need to download the data. Unzip the data into a subdirectory called data. For this dataset, train and test have already been separated. We’ll be working with the train portion in this lecture.

df = pd.read_csv(DATA_DIR + "housing-kaggle/train.csv")
train_df, test_df = train_test_split(df, test_size=0.10, random_state=123)
train_df.head()
Loading...
  • The prediction task is predicting SalePrice given features related to properties.

  • Note that the target is numeric, not categorical.

train_df.shape
(1314, 81)

Let’s separate X and y

X_train = train_df.drop(columns=["SalePrice"])
y_train = train_df["SalePrice"]

X_test = test_df.drop(columns=["SalePrice"])
y_test = test_df["SalePrice"]

Let’s identify feature types

drop_features = ["Id"]
numeric_features = [
    "BedroomAbvGr",
    "KitchenAbvGr",
    "LotFrontage",
    "LotArea",
    "OverallQual",
    "OverallCond",
    "YearBuilt",
    "YearRemodAdd",
    "MasVnrArea",
    "BsmtFinSF1",
    "BsmtFinSF2",
    "BsmtUnfSF",
    "TotalBsmtSF",
    "1stFlrSF",
    "2ndFlrSF",
    "LowQualFinSF",
    "GrLivArea",
    "BsmtFullBath",
    "BsmtHalfBath",
    "FullBath",
    "HalfBath",
    "TotRmsAbvGrd",
    "Fireplaces",
    "GarageYrBlt",
    "GarageCars",
    "GarageArea",
    "WoodDeckSF",
    "OpenPorchSF",
    "EnclosedPorch",
    "3SsnPorch",
    "ScreenPorch",
    "PoolArea",
    "MiscVal",
    "YrSold",
]
ordinal_features_reg = [
    "ExterQual",
    "ExterCond",
    "BsmtQual",
    "BsmtCond",
    "HeatingQC",
    "KitchenQual",
    "FireplaceQu",
    "GarageQual",
    "GarageCond",
    "PoolQC",
]
ordering = [
    "Po",
    "Fa",
    "TA",
    "Gd",
    "Ex",
]  # if N/A it will just impute something, per below
ordering_ordinal_reg = [ordering] * len(ordinal_features_reg)
ordering_ordinal_reg
[['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex'], ['Po', 'Fa', 'TA', 'Gd', 'Ex']]
ordinal_features_oth = [
    "BsmtExposure",
    "BsmtFinType1",
    "BsmtFinType2",
    "Functional",
    "Fence",
]
ordering_ordinal_oth = [
    ["NA", "No", "Mn", "Av", "Gd"],
    ["NA", "Unf", "LwQ", "Rec", "BLQ", "ALQ", "GLQ"],
    ["NA", "Unf", "LwQ", "Rec", "BLQ", "ALQ", "GLQ"],
    ["Sal", "Sev", "Maj2", "Maj1", "Mod", "Min2", "Min1", "Typ"],
    ["NA", "MnWw", "GdWo", "MnPrv", "GdPrv"],
]
categorical_features = list(
    set(X_train.columns)
    - set(numeric_features)
    - set(ordinal_features_reg)
    - set(ordinal_features_oth)
    - set(drop_features)
)
categorical_features
['LandSlope', 'Heating', 'Condition1', 'Condition2', 'Alley', 'MSSubClass', 'Foundation', 'LandContour', 'MoSold', 'HouseStyle', 'SaleCondition', 'Utilities', 'Street', 'MasVnrType', 'BldgType', 'MiscFeature', 'MSZoning', 'PavedDrive', 'GarageFinish', 'Electrical', 'CentralAir', 'Neighborhood', 'RoofStyle', 'LotShape', 'GarageType', 'LotConfig', 'SaleType', 'Exterior1st', 'RoofMatl', 'Exterior2nd']
from sklearn.compose import ColumnTransformer, make_column_transformer

numeric_transformer = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
ordinal_transformer_reg = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OrdinalEncoder(categories=ordering_ordinal_reg),
)

ordinal_transformer_oth = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OrdinalEncoder(categories=ordering_ordinal_oth),
)

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(handle_unknown="ignore", sparse_output=False),
)

preprocessor = make_column_transformer(
    ("drop", drop_features),
    (numeric_transformer, numeric_features),
    (ordinal_transformer_reg, ordinal_features_reg),
    (ordinal_transformer_oth, ordinal_features_oth),
    (categorical_transformer, categorical_features),
    verbose_feature_names_out=False
)
preprocessor.fit(X_train)
Loading...
new_columns = preprocessor.get_feature_names_out()
new_columns
array(['BedroomAbvGr', 'KitchenAbvGr', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'YrSold', 'ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', 'HeatingQC', 'KitchenQual', 'FireplaceQu', 'GarageQual', 'GarageCond', 'PoolQC', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Functional', 'Fence', 'LandSlope_Gtl', 'LandSlope_Mod', 'LandSlope_Sev', 'Heating_Floor', 'Heating_GasA', 'Heating_GasW', 'Heating_Grav', 'Heating_OthW', 'Heating_Wall', 'Condition1_Artery', 'Condition1_Feedr', 'Condition1_Norm', 'Condition1_PosA', 'Condition1_PosN', 'Condition1_RRAe', 'Condition1_RRAn', 'Condition1_RRNe', 'Condition1_RRNn', 'Condition2_Artery', 'Condition2_Feedr', 'Condition2_Norm', 'Condition2_PosA', 'Condition2_PosN', 'Condition2_RRAe', 'Condition2_RRAn', 'Condition2_RRNn', 'Alley_Grvl', 'Alley_Pave', 'Alley_missing', 'MSSubClass_20', 'MSSubClass_30', 'MSSubClass_40', 'MSSubClass_45', 'MSSubClass_50', 'MSSubClass_60', 'MSSubClass_70', 'MSSubClass_75', 'MSSubClass_80', 'MSSubClass_85', 'MSSubClass_90', 'MSSubClass_120', 'MSSubClass_160', 'MSSubClass_180', 'MSSubClass_190', 'Foundation_BrkTil', 'Foundation_CBlock', 'Foundation_PConc', 'Foundation_Slab', 'Foundation_Stone', 'Foundation_Wood', 'LandContour_Bnk', 'LandContour_HLS', 'LandContour_Low', 'LandContour_Lvl', 'MoSold_1', 'MoSold_2', 'MoSold_3', 'MoSold_4', 'MoSold_5', 'MoSold_6', 'MoSold_7', 'MoSold_8', 'MoSold_9', 'MoSold_10', 'MoSold_11', 'MoSold_12', 'HouseStyle_1.5Fin', 'HouseStyle_1.5Unf', 'HouseStyle_1Story', 'HouseStyle_2.5Fin', 'HouseStyle_2.5Unf', 'HouseStyle_2Story', 'HouseStyle_SFoyer', 'HouseStyle_SLvl', 'SaleCondition_Abnorml', 'SaleCondition_AdjLand', 'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal', 'SaleCondition_Partial', 'Utilities_AllPub', 'Utilities_NoSeWa', 'Street_Grvl', 'Street_Pave', 'MasVnrType_BrkCmn', 'MasVnrType_BrkFace', 'MasVnrType_Stone', 'MasVnrType_missing', 'BldgType_1Fam', 'BldgType_2fmCon', 'BldgType_Duplex', 'BldgType_Twnhs', 'BldgType_TwnhsE', 'MiscFeature_Gar2', 'MiscFeature_Othr', 'MiscFeature_Shed', 'MiscFeature_TenC', 'MiscFeature_missing', 'MSZoning_C (all)', 'MSZoning_FV', 'MSZoning_RH', 'MSZoning_RL', 'MSZoning_RM', 'PavedDrive_N', 'PavedDrive_P', 'PavedDrive_Y', 'GarageFinish_Fin', 'GarageFinish_RFn', 'GarageFinish_Unf', 'GarageFinish_missing', 'Electrical_FuseA', 'Electrical_FuseF', 'Electrical_FuseP', 'Electrical_Mix', 'Electrical_SBrkr', 'Electrical_missing', 'CentralAir_N', 'CentralAir_Y', 'Neighborhood_Blmngtn', 'Neighborhood_Blueste', 'Neighborhood_BrDale', 'Neighborhood_BrkSide', 'Neighborhood_ClearCr', 'Neighborhood_CollgCr', 'Neighborhood_Crawfor', 'Neighborhood_Edwards', 'Neighborhood_Gilbert', 'Neighborhood_IDOTRR', 'Neighborhood_MeadowV', 'Neighborhood_Mitchel', 'Neighborhood_NAmes', 'Neighborhood_NPkVill', 'Neighborhood_NWAmes', 'Neighborhood_NoRidge', 'Neighborhood_NridgHt', 'Neighborhood_OldTown', 'Neighborhood_SWISU', 'Neighborhood_Sawyer', 'Neighborhood_SawyerW', 'Neighborhood_Somerst', 'Neighborhood_StoneBr', 'Neighborhood_Timber', 'Neighborhood_Veenker', 'RoofStyle_Flat', 'RoofStyle_Gable', 'RoofStyle_Gambrel', 'RoofStyle_Hip', 'RoofStyle_Mansard', 'RoofStyle_Shed', 'LotShape_IR1', 'LotShape_IR2', 'LotShape_IR3', 'LotShape_Reg', 'GarageType_2Types', 'GarageType_Attchd', 'GarageType_Basment', 'GarageType_BuiltIn', 'GarageType_CarPort', 'GarageType_Detchd', 'GarageType_missing', 'LotConfig_Corner', 'LotConfig_CulDSac', 'LotConfig_FR2', 'LotConfig_FR3', 'LotConfig_Inside', 'SaleType_COD', 'SaleType_CWD', 'SaleType_Con', 'SaleType_ConLD', 'SaleType_ConLI', 'SaleType_ConLw', 'SaleType_New', 'SaleType_Oth', 'SaleType_WD', 'Exterior1st_AsbShng', 'Exterior1st_AsphShn', 'Exterior1st_BrkComm', 'Exterior1st_BrkFace', 'Exterior1st_CBlock', 'Exterior1st_CemntBd', 'Exterior1st_HdBoard', 'Exterior1st_ImStucc', 'Exterior1st_MetalSd', 'Exterior1st_Plywood', 'Exterior1st_Stone', 'Exterior1st_Stucco', 'Exterior1st_VinylSd', 'Exterior1st_Wd Sdng', 'Exterior1st_WdShing', 'RoofMatl_ClyTile', 'RoofMatl_CompShg', 'RoofMatl_Membran', 'RoofMatl_Metal', 'RoofMatl_Roll', 'RoofMatl_Tar&Grv', 'RoofMatl_WdShake', 'RoofMatl_WdShngl', 'Exterior2nd_AsbShng', 'Exterior2nd_AsphShn', 'Exterior2nd_Brk Cmn', 'Exterior2nd_BrkFace', 'Exterior2nd_CBlock', 'Exterior2nd_CmentBd', 'Exterior2nd_HdBoard', 'Exterior2nd_ImStucc', 'Exterior2nd_MetalSd', 'Exterior2nd_Other', 'Exterior2nd_Plywood', 'Exterior2nd_Stone', 'Exterior2nd_Stucco', 'Exterior2nd_VinylSd', 'Exterior2nd_Wd Sdng', 'Exterior2nd_Wd Shng'], dtype=object)
X_train_enc = pd.DataFrame(
    preprocessor.transform(X_train), index=X_train.index, columns=new_columns
)
X_train_enc
Loading...
X_train_enc.shape
(1314, 262)
lr_pipe = make_pipeline(preprocessor, Ridge())
scores = cross_validate(lr_pipe, X_train, y_train, return_train_score=True)
pd.DataFrame(scores)
Loading...





Feature importances

  • How does the output depend upon the input?

  • How do the predictions change as a function of a particular feature?

  • If the model is bad interpretability does not make sense.

SimpleFeature correlations

  • Let’s look at the correlations between various features with other features and the target in our encoded data (first row/column).

  • In simple terms here is how you can interpret correlations between two variables XX and YY:

    • If YY goes up when XX goes up, we say XX and YY are positively correlated.

    • If YY goes down when XX goes up, we say XX and YY are negatively correlated.

    • If YY is unchanged when XX changes, we say XX and YY are uncorrelated.

Let’s examine the correlations among different columns, including the target column.

cor = pd.concat((y_train, X_train_enc), axis=1).iloc[:, :10].corr()
plt.figure(figsize=(8, 6))
sns.set(font_scale=0.8)
sns.heatmap(cor, annot=True, cmap=plt.cm.Blues);
<Figure size 800x600 with 2 Axes>
  • We can immediately see that SalePrice is highly correlated with OverallQual.

  • This is an early hint that OverallQual is a useful feature in predicting SalePrice.

  • However, this approach is extremely simplistic.

    • It only looks at each feature in isolation.

    • It only looks at linear associations:

      • What if SalePrice is high when BsmtFullBath is 2 or 3, but low when it’s 0, 1, or 4? They might seem uncorrelated.

cor = pd.concat((y_train, X_train_enc), axis=1).iloc[:, 10:15].corr()
plt.figure(figsize=(4, 4))
sns.set(font_scale=0.8)
sns.heatmap(cor, annot=True, cmap=plt.cm.Blues);
<Figure size 400x400 with 2 Axes>
  • Looking at this diagram also tells us the relationship between features.

    • For example, 1stFlrSF and TotalBsmtSF are highly correlated.

    • Do we need both of them?

    • If our model says 1stFlrSF is very important and TotalBsmtSF is very unimportant, do we trust those values?

    • Maybe TotalBsmtSF only “becomes important” if 1stFlrSF is removed.

    • Sometimes the opposite happens: a feature only becomes important if another feature is added.



Feature importances in linear models

  • With linear regression we can look at the coefficients for each feature.

  • Overall idea: predicted price = intercept + i\sum_i coefficient i ×\times feature i.

lr = make_pipeline(preprocessor, Ridge())
lr.fit(X_train, y_train);

Let’s look at the coefficients.

lr_coefs = pd.DataFrame(
    data=lr.named_steps["ridge"].coef_, index=new_columns, columns=["Coefficient"]
)
lr_coefs.head(20)
Loading...

Let’s try to interpret coefficients for different types of features.

Ordinal features

  • The ordinal features are easiest to interpret.

print(ordinal_features_reg)
['ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', 'HeatingQC', 'KitchenQual', 'FireplaceQu', 'GarageQual', 'GarageCond', 'PoolQC']
lr_coefs.loc["ExterQual"]
Coefficient 4236.969653 Name: ExterQual, dtype: float64
  • Increasing by one category of exterior quality (e.g. good -> excellent) increases the predicted price by $4195\sim\$4195.

    • Wow, that’s a lot!

    • Remember this is just what the model has learned. It doesn’t tell us how the world works.

one_example = X_test[:1]
one_example[["ExterQual"]]
Loading...

Let’s perturb the example and change ExterQual to Ex.

one_example_perturbed = one_example.copy()
one_example_perturbed["ExterQual"] = "Ex"  # Change Gd to Ex
one_example_perturbed[["ExterQual"]]
Loading...

How does the prediction change after changing ExterQual from Gd to Ex?

print("Prediction on the original example: ", lr.predict(one_example))
print("Prediction on the perturbed example: ", lr.predict(one_example_perturbed))
print(
    "After changing ExterQual from Gd to Ex increased the prediction by: ",
    lr.predict(one_example_perturbed) - lr.predict(one_example),
)
Prediction on the original example:  [224865.34161762]
Prediction on the perturbed example:  [229102.31127015]
After changing ExterQual from Gd to Ex increased the prediction by:  [4236.96965253]

That’s exactly the learned coefficient for ExterQual!

lr_coefs.loc["ExterQual"]
Coefficient 4236.969653 Name: ExterQual, dtype: float64

So our interpretation is correct!

  • Increasing by one category of exterior quality (e.g. good -> excellent) increases the predicted price by $4195\sim\$4195.



Categorical features

  • What about the categorical features?

  • We have created a number of columns for each category with OHE and each category gets it’s own coefficient.

print(categorical_features)
['LandSlope', 'Heating', 'Condition1', 'Condition2', 'Alley', 'MSSubClass', 'Foundation', 'LandContour', 'MoSold', 'HouseStyle', 'SaleCondition', 'Utilities', 'Street', 'MasVnrType', 'BldgType', 'MiscFeature', 'MSZoning', 'PavedDrive', 'GarageFinish', 'Electrical', 'CentralAir', 'Neighborhood', 'RoofStyle', 'LotShape', 'GarageType', 'LotConfig', 'SaleType', 'Exterior1st', 'RoofMatl', 'Exterior2nd']
lr_coefs_landslope = lr_coefs[lr_coefs.index.str.startswith("LandSlope")]
lr_coefs_landslope
Loading...
  • We can talk about switching from one of these categories to another by picking a “reference” category:

lr_coefs_landslope - lr_coefs_landslope.loc["LandSlope_Gtl"]
Loading...
  • If you change the category from LandSlope_Gtl to LandSlope_Mod the prediction price goes up by $6963\sim\$6963

  • If you change the category from LandSlope_Gtl to LandSlope_Sev the prediction price goes down by $8334\sim\$8334

Note that this might not make sense in the real world but this is what our model decided to learn given this small amount of data.

one_example = X_test[:1]
one_example[['LandSlope']]
Loading...

Let’s perturb the example and change LandSlope to Mod.

one_example_perturbed = one_example.copy()
one_example_perturbed["LandSlope"] = "Mod"  # Change Gd to Ex
one_example_perturbed[["LandSlope"]]
Loading...

How does the prediction change after changing LandSlope from Gtl to Mod?

print("Prediction on the original example: ", lr.predict(one_example))
print("Prediction on the perturbed example: ", lr.predict(one_example_perturbed))
print(
    "After changing ExterQual from Gd to Ex increased the prediction by: ",
    lr.predict(one_example_perturbed) - lr.predict(one_example),
)
Prediction on the original example:  [224865.34161762]
Prediction on the perturbed example:  [231815.62688064]
After changing ExterQual from Gd to Ex increased the prediction by:  [6950.28526302]

Our interpretation above is correct!

lr_coefs.sort_values(by="Coefficient")
Loading...
  • For example, the above coefficient says that “If the roof is made of clay or tile, the predicted price is \$191K less”?

  • Do we believe these interpretations??

    • Do we believe this is how the predictions are being computed? Yes.

    • Do we believe that this is how the world works? No.



Interpreting coefficients of numeric features

Let’s look at coefficients of PoolArea, LotFrontage, LotArea.

lr_coefs.loc[["PoolArea", "LotFrontage", "LotArea"]]
Loading...

Intuition:

  • Tricky because numeric features are scaled!

  • Increasing PoolArea by 1 scaled unit increases the predicted price by $2822\sim\$2822.

  • Increasing LotArea by 1 scaled unit increases the predicted price by $5109\sim\$5109.

  • Increasing LotFrontage by 1 scaled unit decreases the predicted price by $1578\sim\$1578.

Does that sound reasonable?

  • For PoolArea and LotArea, yes.

  • For LotFrontage, that’s surprising. Something positive would have made more sense?

It might be the case that LotFrontage is correlated with some other variable, which might have a larger positive coefficient.

BTW, let’s make sure the predictions behave as expected:

Example showing how can we interpret coefficients of scaled features.

  • What’s one scaled unit for LotArea?

  • The scaler subtracted the mean and divided by the standard deviation.

  • The division actually changed the scale!

  • For the unit conversion, we don’t care about the subtraction, but only the scaling.

scaler = preprocessor.named_transformers_["pipeline-1"]["standardscaler"]
lr_scales = pd.DataFrame(
    data=np.sqrt(scaler.var_), index=numeric_features, columns=["Scale"]
)
lr_scales.head()
Loading...
  • It seems like LotArea was divided by 8994.471032 sqft.

lr_coefs.loc[["LotArea"]]
Loading...
  • The coefficient tells us that if we increase the scaled LotArea by one scaled unit the price would go up by $5118\approx\$5118.

  • One scaled unit represents 8994\sim 8994 sqft in the original scale.

Let’s examine whether this behaves as expected.

X_test_enc = pd.DataFrame(
    preprocessor.transform(X_test), index=X_test.index, columns=new_columns
)
one_ex_preprocessed = X_test_enc[:1]
one_ex_preprocessed
Loading...
orig_pred = lr.named_steps["ridge"].predict(one_ex_preprocessed)
orig_pred
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2742: UserWarning: X has feature names, but Ridge was fitted without feature names
  warnings.warn(
array([224865.34161762])
one_ex_preprocessed_perturbed = one_ex_preprocessed.copy()
one_ex_preprocessed_perturbed["LotArea"] += 1  # we are adding one to the scaled LotArea
one_ex_preprocessed_perturbed
Loading...

We are expecting an increase of $5118.03516073 in the prediction compared to the original value of LotArea.

perturbed_pred = lr.named_steps["ridge"].predict(one_ex_preprocessed_perturbed)
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2742: UserWarning: X has feature names, but Ridge was fitted without feature names
  warnings.warn(
perturbed_pred - orig_pred
array([5118.03516073])

Our interpretation is correct!

  • Humans find it easier to think about features in their original scale.

  • How can we interpret this coefficient in the original scale?

  • If I increase original LotArea by one square foot then the predicted price would go up by this amount:

5118.03516073 / 8994.471032 # Coefficient learned on the scaled features / the scaling factor for this feature
0.5690201394302518
one_example = X_test[:1]
one_example
Loading...

Let’s perturb the example and add 1 to the LotArea.

one_example_perturbed = one_example.copy()
one_example_perturbed["LotArea"] += 1

if we add 8994.471032 to the original LotArea, the housing price prediction should go up by the coefficient 5109.35671794.

one_example_perturbed
Loading...

Prediction on the original example.

lr.predict(one_example)
array([224865.34161762])

Prediction on the perturbed example.

lr.predict(one_example_perturbed)
array([224865.91063776])
  • What’s the difference between predictions?

  • Does the difference make sense given the coefficient of the feature?

lr.predict(one_example_perturbed) - lr.predict(one_example)
array([0.56902014])

Yes! Our interpretation is correct.

  • That said don’t read too much into these coefficients without statistical training.

Interim summary

  • Correlation among features might make coefficients completely uninterpretable.

  • Fairly straightforward to interpret coefficients of ordinal features.

  • In categorical features, it’s often helpful to consider one category as a reference point and think about relative importance.

  • For numeric features, relative importance is meaningful after scaling.

  • You have to be careful about the scale of the feature when interpreting the coefficients.

  • Remember that explaining the model \neq explaining the data or explaining how the world works.

  • The coefficients tell us only about the model and they might not accurately reflect the data.





Transparency and explainability of ML models: Motivation

Activity (~5 mins)

Suppose you have a machine learning model which gives you a 98% cross-validation score (with the metric of your interest) and 97% test score on a reasonably sized train and test sets. Since you have impressive cross-validation and test scores, you decide to just trust the model and use it as a black box, ignoring why it’s making certain predictions.

Give some scenarios when this might or might not be problematic.





Why model transparency/interpretability?

  • Ability to interpret ML models is crucial in many applications such as banking, healthcare, and criminal justice.

  • It can be leveraged by domain experts to diagnose systematic errors and underlying biases of complex ML systems.

Source

What is model interpretability?



Data

adult_df_large = pd.read_csv(DATA_DIR + "adult.csv")
train_df, test_df = train_test_split(adult_df_large, test_size=0.2, random_state=42)
train_df_nan = train_df.replace("?", np.nan)
test_df_nan = test_df.replace("?", np.nan)
train_df_nan.head()
Loading...
numeric_features = ["age", "capital.gain", "capital.loss", "hours.per.week"]
categorical_features = [
    "workclass",
    "marital.status",
    "occupation",
    "relationship",
    "native.country",
]
ordinal_features = ["education"]
binary_features = ["sex"]
drop_features = ["race", "education.num", "fnlwgt"]
target_column = "income"
education_levels = [
    "Preschool",
    "1st-4th",
    "5th-6th",
    "7th-8th",
    "9th",
    "10th",
    "11th",
    "12th",
    "HS-grad",
    "Prof-school",
    "Assoc-voc",
    "Assoc-acdm",
    "Some-college",
    "Bachelors",
    "Masters",
    "Doctorate",
]
assert set(education_levels) == set(train_df["education"].unique())
numeric_transformer = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
tree_numeric_transformer = make_pipeline(SimpleImputer(strategy="median"))

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(handle_unknown="ignore"),
)

ordinal_transformer = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="missing"),
    OrdinalEncoder(categories=[education_levels], dtype=int),
)

binary_transformer = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(drop="if_binary", dtype=int),
)

preprocessor = make_column_transformer(
    ("drop", drop_features),
    (numeric_transformer, numeric_features),
    (ordinal_transformer, ordinal_features),
    (binary_transformer, binary_features),
    (categorical_transformer, categorical_features),
    verbose_feature_names_out=False
)
X_train = train_df_nan.drop(columns=[target_column])
y_train = train_df_nan[target_column]

X_test = test_df_nan.drop(columns=[target_column])
y_test = test_df_nan[target_column]
# encode categorical class values as integers for XGBoost
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y_train_num = label_encoder.fit_transform(y_train)
y_test_num = label_encoder.transform(y_test)

Do we have class imbalance?

  • There is class imbalance. But without any context, both classes seem equally important.

  • Let’s use accuracy as our metric.

train_df_nan["income"].value_counts(normalize=True)
income <=50K 0.757985 >50K 0.242015 Name: proportion, dtype: float64
scoring_metric = "accuracy"

Let’s store all the results in a dictionary called results.

results = {}

We are going to use models outside sklearn. Some of them cannot handle categorical target values. So we’ll convert them to integers using LabelEncoder.

y_train_num
array([0, 0, 0, ..., 1, 1, 0], shape=(26048,))

Baseline

dummy = DummyClassifier()
results["Dummy"] = mean_std_cross_val_scores(
    dummy, X_train, y_train_num, return_train_score=True, scoring=scoring_metric
)

Different models

from lightgbm.sklearn import LGBMClassifier
from xgboost import XGBClassifier

pipe_lr = make_pipeline(
    preprocessor, LogisticRegression(max_iter=2000, random_state=123)
)
pipe_rf = make_pipeline(preprocessor, RandomForestClassifier(random_state=123))
pipe_xgb = make_pipeline(
    preprocessor, XGBClassifier(random_state=123, eval_metric="logloss", verbosity=0)
)
pipe_lgbm = make_pipeline(preprocessor, LGBMClassifier(random_state=123, verbose=-1))
classifiers = {
    "logistic regression": pipe_lr,
    "random forest": pipe_rf,
    "XGBoost": pipe_xgb,
    "LightGBM": pipe_lgbm,
}
for (name, model) in classifiers.items():
    results[name] = mean_std_cross_val_scores(
        model, X_train, y_train_num, return_train_score=True, scoring=scoring_metric
    )
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
/Users/kvarada/miniforge3/envs/cpsc330/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LGBMClassifier was fitted with feature names
  warnings.warn(
pd.DataFrame(results).T
Loading...
  • Logistic regression is giving reasonable scores but not the best ones.

  • XGBoost and LightGBM are giving us the best CV scores.

  • Often simple models (e.g., linear models) are interpretable but not very accurate.

  • Complex models (e.g., LightGBM) are more accurate but less interpretable.

Source

Feature importances in linear models

Let’s create and fit a pipeline with preprocessor and logistic regression.

pipe_lr = make_pipeline(preprocessor, LogisticRegression(max_iter=2000, random_state=2))
pipe_lr.fit(X_train, y_train_num);
feature_names = preprocessor.get_feature_names_out()
feature_names[:15]
array(['age', 'capital.gain', 'capital.loss', 'hours.per.week', 'education', 'sex_Male', 'workclass_Federal-gov', 'workclass_Local-gov', 'workclass_Never-worked', 'workclass_Private', 'workclass_Self-emp-inc', 'workclass_Self-emp-not-inc', 'workclass_State-gov', 'workclass_Without-pay', 'workclass_missing'], dtype=object)
feature_names = preprocessor.get_feature_names_out()
data = {
    "coefficient": pipe_lr.named_steps["logisticregression"].coef_.flatten().tolist(),
    "magnitude": np.absolute(
        pipe_lr.named_steps["logisticregression"].coef_.flatten().tolist()
    ),
}
coef_df = pd.DataFrame(data, index=feature_names).sort_values(
    "magnitude", ascending=False
)
coef_df[:10]
Loading...
  • Increasing capital.gain is likely to push the prediction towards “>50k” income class

  • Whereas occupation of private house service is likely to push the prediction towards “<=50K” income.

Can we get feature importances for non-linear models?



Model interpretability beyond linear models

  • We will be looking at interpretability in terms of feature importances.

  • Note that there is no absolute or perfect way to get feature importances. But it’s useful to get some idea on feature importances. So we just try our best.

We will be looking at two ways to get feature importances.

  • sklearn’s feature_importances_ and permutation_importance

  • SHAP

sklearn’s feature_importances_ and permutation_importance

Feature importance or variable importance is a score associated with a feature which tells us how “important” the feature is to the model.

Activity (~5 mins)

Linear models learn a coefficient associated with each feature which tells us the importance of the feature to the model.

  • What might be some reasonable ways to calculate feature importances of the following models?

    • Decision trees

    • Linear SVMs

    • KNNs, RBF SVMs

  • Suppose you have correlated features in your dataset. Do you need to be careful about this when you examine feature importances?

Discuss with your neighbour.





Do we have correlated features?

X_train_enc = preprocessor.fit_transform(X_train).todense()
corr_df = pd.DataFrame(X_train_enc, columns=feature_names).corr().abs()
corr_df[corr_df == 1] = 0 # Set the diagonal to 0. 
  • Let’s look at columns where any correlation number is > 0.80.

  • 0.80 is an arbitrary choice

high_corr = [column for column in corr_df.columns if any(corr_df[column] > 0.80)]
print(high_corr)
['workclass_missing', 'marital.status_Married-civ-spouse', 'occupation_missing', 'relationship_Husband']

Seems like there are some columns which are highly correlated.

corr_df['occupation_missing']['workclass_missing']
np.float64(0.9977957422135844)
corr_df['marital.status_Married-civ-spouse']['relationship_Husband']
np.float64(0.8937442459553657)
  • When we look at the feature importances, we should be mindful of these correlated features.

  • Remember the limitations of looking at simple linear correlations.

  • You should probably investigate multi-colinearity with more sophisticated approaches (e.g., variance inflation factors (VIF)).

sklearn’s feature_importances_ attribute vs permutation_importance

  • Feature importances can be

    • algorithm dependent, i.e., calculated based on the information given by the model algorithm (e.g., gini importance)

    • model agnostic (e.g., by measuring increase in prediction error after permuting feature values).

  • Different measures give insight into different aspects of your data and model.

Here you will find some drawbacks of using feature_importances_ attribute in the context of tree-based models.

Decision tree feature importances

pipe_dt = make_pipeline(preprocessor, DecisionTreeClassifier(max_depth=3))
pipe_dt.fit(X_train, y_train_num);
data = {
    "Importance": pipe_dt.named_steps["decisiontreeclassifier"].feature_importances_,
}
pd.DataFrame(data=data, index=feature_names,).sort_values(
    by="Importance", ascending=False
)[:10]
Loading...
custom_plot_tree(pipe_dt.named_steps["decisiontreeclassifier"], feature_names = feature_names, fontsize=8)
<Figure size 640x480 with 1 Axes>

Let’s explore permutation importance.

  • For each feature this method evaluates the impact of permuting feature values

from sklearn.inspection import permutation_importance
def get_permutation_importance(model):
    X_train_perm = X_train.drop(columns=["race", "education.num", "fnlwgt"])
    result = permutation_importance(model, X_train_perm, y_train_num, n_repeats=10, random_state=123)
    perm_sorted_idx = result.importances_mean.argsort()
    plt.boxplot(
        result.importances[perm_sorted_idx].T,
        vert=False,
        labels=X_train_perm.columns[perm_sorted_idx],
    )
    plt.xlabel('Permutation feature importance')
    plt.show()
get_permutation_importance(pipe_dt)
/var/folders/b3/g26r0dcx4b35vf3nk31216hc0000gr/T/ipykernel_27559/3038344082.py:6: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  plt.boxplot(
<Figure size 640x480 with 1 Axes>

Decision tree is primarily making all decisions based on three features: marital.status, education, and capital.gain.

Let’s create and fit a pipeline with preprocessor and random forest.

Random forest feature importances

pipe_rf = make_pipeline(preprocessor, RandomForestClassifier(random_state=2))
pipe_rf.fit(X_train, y_train_num);

Which features are driving the predictions the most?

data = {
    "Importance": pipe_rf.named_steps["randomforestclassifier"].feature_importances_,
}
rf_imp_df = pd.DataFrame(
    data=data,
    index=feature_names,
).sort_values(by="Importance", ascending=False)
rf_imp_df[:8]
Loading...
np.sum(pipe_rf.named_steps["randomforestclassifier"].feature_importances_)
np.float64(1.0)
get_permutation_importance(pipe_rf)
/var/folders/b3/g26r0dcx4b35vf3nk31216hc0000gr/T/ipykernel_27559/3038344082.py:6: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  plt.boxplot(
<Figure size 640x480 with 1 Axes>

Random forest is using more features in the model compared to decision trees.

Key point

  • Unlike the linear model coefficients, feature_importances_ do not have a sign!

    • They tell us about importance, but not an “up or down”.

    • Indeed, increasing a feature may cause the prediction to first go up, and then go down.

    • This cannot happen in linear models, because they are linear.

How can we get feature importances for non sklearn models?

  • One way to do it is by using a tool called eli5.

Unfortunately, this is not compatible with the latest version of sklearn, which we are using.

conda install -c conda-forge eli5
  • Another popular way is using SHAP. You can install it using the following in the course conda environment.

conda install -c conda-forge shap





SHAP (SHapley Additive exPlanations) introduction

Explaining a prediction

Source

SHAP (SHapley Additive exPlanations) intuition

  • Based on a concept from game theory called Shapley values.

  • Imagine you and your friends are playing a team game where everyone contributes differently to the win. After the game, you want to figure out who contributed the most.

  • Shapley values help distribute the total win fairly based on each player’s contribution.

In the context of machine learning

  • SHAP assigns each feature a Shapley value which tells us how much that feature pushed the model’s output higher or lower.

  • A shapely value is created for each example and each feature.

  • Can explain the prediction of an example by computing the contribution of each feature to the prediction.

Our focus

Using the SHAP package

  • You should have shap in the course conda environment. If not, install it in the course conda environment.

pip install shap

or

conda install -c conda-forge shap

X_train
Loading...
  • Let’s try it out to explain predictions from our best performing LGBM model.

  • Let’s create train and test dataframes with our transformed features.

X_train_enc = pd.DataFrame(
    data=preprocessor.transform(X_train).toarray(),
    columns=feature_names,
    index=X_train.index,
)
X_train_enc.head()
Loading...
X_test_enc = pd.DataFrame(
    data=preprocessor.transform(X_test).toarray(),
    columns=feature_names,
    index=X_test.index,
)
X_test_enc.shape
(6513, 85)

Let’s train the model with the encoded data.

model = pipe_lgbm.named_steps["lgbmclassifier"]
model.fit(X_train_enc, y_train)
Loading...

Creating SHAP values

  • Let’s get SHAP values for train and test data.

import shap 

explainer = shap.TreeExplainer(model) # define the shap explainer 
train_shap_values = explainer(X_train_enc) # train shap values
test_shap_values = explainer(X_test_enc) # test shap values 
train_shap_values.shape
(26048, 85)
test_shap_values.shape
(6513, 85)
train_shap_values
.values = array([[-4.08151507e-01, -2.82025568e-01, -4.70162085e-02, ..., 1.03017665e-03, 0.00000000e+00, 1.69027185e-03], [-5.46019608e-01, -2.77536150e-01, -4.69698010e-02, ..., 9.00720988e-04, 0.00000000e+00, 6.78058051e-04], [ 4.39095422e-01, -2.50475372e-01, -6.51137414e-02, ..., 9.02446630e-04, 0.00000000e+00, 3.54676006e-04], ..., [ 1.05137470e+00, -1.89706451e-01, 2.74798624e+00, ..., 1.13229595e-03, 0.00000000e+00, 1.31449687e-04], [ 6.32247597e-01, -3.01432486e-01, -8.99744241e-02, ..., 1.03411038e-03, 0.00000000e+00, -4.04709519e-04], [-1.15559528e+00, -2.32397724e-01, -5.55862988e-02, ..., 1.05290827e-03, 0.00000000e+00, 8.11336331e-04]], shape=(26048, 85)) .base_values = array([-2.33641142, -2.33641142, -2.33641142, ..., -2.33641142, -2.33641142, -2.33641142], shape=(26048,)) .data = array([[-0.92195464, -0.14716638, -0.21767954, ..., 0. , 0. , 0. ], [-1.06915047, -0.14716638, -0.21767954, ..., 0. , 0. , 0. ], [-0.18597545, -0.14716638, -0.21767954, ..., 0. , 0. , 0. ], ..., [ 1.212385 , -0.14716638, 4.42104086, ..., 0. , 0. , 0. ], [ 0.18201414, -0.14716638, -0.21767954, ..., 0. , 0. , 0. ], [-1.21634631, -0.14716638, -0.21767954, ..., 0. , 0. , 0. ]], shape=(26048, 85))
  • For each example and each feature, we have a SHAP value.

  • SHAP values tell us how to fairly distribute the prediction among features.



SHAP plots

SHAP waterfall plot

# load JS visualization code to notebook
shap.initjs()
Loading...
  • Designed to display explainations for individual predictions.

  • Let’s explain the LGBM’s predictions for two examples from the test set

    • Example 1: index 10 (<=50K)

    • Example 2: index 68 (>50K)

ex1_idx = 10
ex2_idx = 68

Example 1

X_test_enc.iloc[ex1_idx]
age 0.476406 capital.gain -0.147166 capital.loss 4.649658 hours.per.week -0.042081 education 8.000000 ... native.country_Trinadad&Tobago 0.000000 native.country_United-States 1.000000 native.country_Vietnam 0.000000 native.country_Yugoslavia 0.000000 native.country_missing 0.000000 Name: 345, Length: 85, dtype: float64
y_test.iloc[ex1_idx]
'<=50K'
model.predict(X_test_enc)[ex1_idx]
'<=50K'
model.predict_proba(X_test_enc)[ex1_idx]
array([0.99240562, 0.00759438])
  • The model seems quite confident about the prediction.

  • If we want to know more, for example, which feature values are playing a role in this specific prediction, we can use SHAP plots.

  • Remember that we have SHAP values per feature per example.

shap.plots.waterfall(test_shap_values[ex1_idx])
<Figure size 800x650 with 3 Axes>
  • E[f(x)]E[f(x)] is the average predicted value for all examples in our training set.

    • proportion in the context of classification

    • average in the context of regression

  • f(x)f(x) is the model prediction for this specific example.

  • SHAP values tell us how each feature contributes to the prediction when compared to the average prediction.

  • The numbers on the y-axis are feature values.

  • The 76 least impactful features have been collapsed into a single term so that we don’t have too many rows. This can be changed using the max_display argument

model.classes_
array(['<=50K', '>50K'], dtype=object)

We can get the raw model output by passing raw_score=True in predict.

model.predict(X_test_enc, raw_score=True)
array([-1.76270194, -7.61912405, -0.45555535, ..., 1.13521135, -6.62873917, -0.84062193], shape=(6513,))

What’s the raw score of the example above we are trying to explain?

model.predict(X_test_enc, raw_score=True)[ex1_idx]
np.float64(-4.872722908439952)
  • The score matches with what we see in the force plot.

  • The base score above is the mean raw score. Our example has a lower raw score compared to the average raw score and the force plot tries to explain which feature values are bringing this score to a lower value.

model.predict(X_train_enc, raw_score=True).mean()
np.float64(-2.336411423367732)
explainer.expected_value  # on average this is the raw score for class 1
np.float64(-2.3364114233677307)

Note: a nice thing about SHAP values is that the feature importances sum to the prediction:

test_shap_values.values[ex1_idx, :].sum() + explainer.expected_value 
np.float64(-4.8727229084399575)



Example 2

X_test_enc.iloc[ex2_idx]
age 0.402808 capital.gain -0.147166 capital.loss -0.217680 hours.per.week -0.690778 education 9.000000 ... native.country_Trinadad&Tobago 0.000000 native.country_United-States 1.000000 native.country_Vietnam 0.000000 native.country_Yugoslavia 0.000000 native.country_missing 0.000000 Name: 23011, Length: 85, dtype: float64
y_test.iloc[ex2_idx]
'>50K'
model.predict(X_test_enc)[ex2_idx]
'>50K'
shap.plots.waterfall(test_shap_values[ex2_idx])
<Figure size 800x650 with 3 Axes>

The predicted f(x)f(x) is bigger than the expected f(x)f(x).

SHAP Force plot

We can also display explanations of individual predictions using force plots.

shap.force_plot(explainer.expected_value, test_shap_values.values[ex1_idx, :], X_test_enc.iloc[ex1_idx, :])
Loading...
shap.force_plot(explainer.expected_value, test_shap_values.values[ex2_idx, :], X_test_enc.iloc[ex2_idx, :])
Loading...

SHAP: visualizing multiple predictions

  • We can also visualize multiple predictions

  • Let’s examine predictions of the first 4 examples from the training data.

shap.force_plot(explainer.expected_value, train_shap_values.values[:4, :], X_train_enc.iloc[:4, :])
Loading...
  • Let’s visualize predictions for 1,000 individuals.

shap.force_plot(explainer.expected_value, train_shap_values.values[:1000, :], X_train_enc.iloc[:1000, :])
Loading...



SHAP summary plot

Let’s look at the average SHAP values associated with each feature.

From the documentation:

Rather than use a typical feature importance bar chart, we use a density scatter plot of SHAP values for each feature to identify how much impact each feature has on the model output for individuals in the validation dataset. Features are sorted by the sum of the SHAP value magnitudes across all samples.

shap.summary_plot(train_shap_values, X_train_enc)
<Figure size 800x950 with 2 Axes>

The plot shows the most important features for predicting the class. It also shows the direction of how it’s going to drive the prediction.

  • Presence of the marital status of Married-civ-spouse seems to have bigger SHAP values for class 1 and absence seems to have smaller SHAP values for class 1.

  • Higher levels of education seem to have bigger SHAP values for class 1 whereas smaller levels of education have smaller SHAP values.

We can also plot the summary of shap values as a bar chart.

shap.summary_plot(train_shap_values, X_train_enc, plot_type="bar")
<Figure size 800x950 with 1 Axes>

SHAP Dependence plot

From the documentation:

SHAP dependence plots show the effect of a single feature across the whole dataset. They plot a feature’s value vs. the SHAP value of that feature across many samples.

shap.dependence_plot("age", train_shap_values.values, X_train_enc)
<Figure size 750x500 with 2 Axes>

The plot above shows effect of age feature on the prediction.

  • Each dot is a single prediction for examples above.

  • The x-axis represents values of the feature age (scaled).

  • The y-axis is the SHAP value for that feature, which represents how much knowing that feature’s value changes the output of the model for that example’s prediction.

  • Lower values of age have smaller SHAP values for class “>50K”.

  • Similarly, higher values of age also have a bit smaller SHAP values for class “>50K”, which makes sense.

  • There is some optimal value of age between scaled age of 1 which gives highest SHAP values for for class “>50K”.

  • Ignore the colour for now. The color corresponds to a second feature (education feature in this case) that may have an interaction effect with the feature we are plotting.



Here, we explore SHAP’s TreeExplainer. It also provides explainer for different kinds of models.

  • Can also be used to explain text classification and image classification

  • Example: In the picture below, red pixels represent positive SHAP values that increase the probability of the class, while blue pixels represent negative SHAP values the reduce the probability of the class.

Source

Other tools

  • lime is another package.

If you’re not already impressed, keep in mind:

  • So far we’ve only used sklearn models.

  • Most sklearn models have some built-in measure of feature importances.

  • On many tasks we need to move beyond sklearn, e.g. LightGBM, deep learning.

  • These tools work on other models as well, which makes them extremely useful.

Why do we want this information?

Possible reasons:

  • Identify features that are not useful and maybe remove them.

  • Get guidance on what new data to collect.

    • New features related to useful features -> better results.

    • Don’t bother collecting useless features -> save resources.

  • Help explain why the model is making certain predictions.

    • Debugging, if the model is behaving strangely.

    • Regulatory requirements.

    • Fairness / bias. See this.

    • Keep in mind this can be used on deployment predictions!

Here are some guidelines and important points to remember when you work on a prediction problem where you also want to understand which features are influencing the predictions.

  • Examine multicoliniarity in your dataset using methods such as VIF.

  • If you observe high correlations in your dataset, either get rid of redundant features or be mindful of these correlations during interpretation.

  • Be mindful that feature relevance is not clearly defined. Adding/removing features can change feature importance/unimportance. Also, feature importances do not give us causal relationships.

  • Don’t be overconfident. Always take feature importance values with a grain of salt.