partialDependence

Compute partial dependence

Description

example

pd = partialDependence(Mdl,Vars) computes the partial dependence pd between the predictor variables listed in Vars and the responses predicted by using the regression model Mdl, which contains predictor data.

example

pd = partialDependence(Mdl,Vars,Labels) computes the partial dependence pd between the predictor variables listed in Vars and the scores for the classes specified in Labels by using the classification model Mdl, which contains predictor data.

pd = partialDependence(___,Data) uses new predictor data in Data. You can specify Data in addition to any of the input argument combinations in the previous syntaxes.

example

pd = partialDependence(___,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, if you specify 'UseParallel','true', the partialDependence function uses parallel computing to perform the partial dependence calculations.

[pd,x,y] = partialDependence(___) also returns x and y, which contain the query points of the first and second predictor variables in Vars, respectively. If you specify one variable in Vars, then partialDependence returns an empty matrix ([]) for y.

Examples

collapse all

Train a naive Bayes classification model with the fisheriris data set, and compute partial dependence values that show the relationship between the predictor variable and the predicted scores (posterior probabilities) for multiple classes.

Load the fisheriris data set, which contains species (species) and measurements (meas) on sepal length, sepal width, petal length, and petal width for 150 iris specimens. The data set contains 50 specimens from each of three species: setosa, versicolor, and virginica.

load fisheriris

Train a naive Bayes classification model with species as the response and meas as predictors.

Mdl = fitcnb(meas,species,'PredictorNames',["Sepal Length","Sepal Width","Petal Length","Petal Width"]);

Compute partial dependence values on the third predictor variable (petal length) of the scores predicted by Mdl for all three classes of species. Specify the class labels by using the ClassNames property of Mdl.

[pd,x] = partialDependence(Mdl,3,Mdl.ClassNames);

pd contains the partial dependence values for the query points x. You can plot the computed partial dependence values by using plotting functions such as plot and bar. Plot pd against x by using the bar function.

bar(x,pd)
legend(Mdl.ClassNames)
xlabel("Petal Length")
ylabel("Scores")
title("Partial Dependence Plot")

According to this model, the probability of virginica increases with petal length. The probability of setosa is about 0.33, from where petal length is 0 to around 2.5, and then the probability drops to almost 0.

Alternatively, you can use the plotPartialDependence function to compute and plot partial dependence values.

plotPartialDependence(Mdl,3,Mdl.ClassNames)

Train an ensemble of classification models and compute partial dependence values on two variables for multiple classes. Then plot the partial dependence values for each class.

Load the census1994 data set, which contains US yearly salary data, categorized as <=50K or >50K, and several demographic variables.

load census1994

Extract a subset of variables to analyze from the table adultdata.

X = adultdata(1:500,{'age','workClass','education_num','marital_status','race', ...
   'sex','capital_gain','capital_loss','hours_per_week','salary'});

Train a random forest of classification trees by using fitcensemble and specifying 'Method' as 'Bag'. For reproducibility, use a template of trees created by using templateTree with the 'Reproducible' option.

rng('default')
t = templateTree('Reproducible',true);
Mdl = fitcensemble(X,'salary','Method','Bag','Learners',t);

Inspect the class names in Mdl.

Mdl.ClassNames
ans = 2×1 categorical
     <=50K 
     >50K 

Compute partial dependence values of the scores on the predictors age and education_num for both classes (<=50K and >50K). Specify the number of observations to sample as 100.

[pd,x,y] = partialDependence(Mdl,{'age','education_num'},Mdl.ClassNames,'NumObservationsToSample',100);

Create a surface plot of the partial dependence values for the first class (<=50K) by using the surfl function.

figure
surf(x,y,squeeze(pd(1,:,:)))
xlabel('age')
ylabel('education\_num')
zlabel('Score of class <=50K')
title('Partial Dependence Plot')
view([130 30]) % Modify the viewing angle

Create a surface plot of the partial dependence values for the second class (>50K).

figure
surf(x,y,squeeze(pd(2,:,:)))
xlabel('age')
ylabel('education\_num')
zlabel('Score of class >50K')
title('Partial Dependence Plot')
view([130 30]) % Modify the viewing angle

The two plots show different partial dependence patterns depending on the class.

Train a support vector machine (SVM) regression model using the carsmall data set, and compute the partial dependence on two predictor variables. Then, create a figure that shows the partial dependence on the two variables along with the histogram on each variable.

Load the carsmall data set.

load carsmall

Create a table that contains Weight, Cylinders, Displacement, and Horsepower.

Tbl = table(Weight,Cylinders,Displacement,Horsepower);

Train an SVM regression model using the predictor variables in Tbl and the response variable MPG. Use a Gaussian kernel function with an automatic kernel scale.

Mdl = fitrsvm(Tbl,MPG,'ResponseName','MPG', ...
    'CategoricalPredictors','Cylinders','Standardize',true, ...
    'KernelFunction','gaussian','KernelScale','auto');

Compute the partial dependence of the predicted response (MPG) on the predictor variables Weight and Horsepower. Specify query points to compute the partial dependence by using the 'QueryPoints' name-value pair argument.

numPoints = 10;
ptX = linspace(min(Weight),max(Weight),numPoints)';
ptY = linspace(min(Horsepower),max(Horsepower),numPoints)';
[pd,x,y] = partialDependence(Mdl,{'Weight','Horsepower'},'QueryPoints',[ptX ptY]);

Create a figure that contains a 5-by-5 tiled chart layout. Plot the partial dependence on the two variables by using the imagesc function. Then draw the histogram for each variable by using the histogram function. Specify the edges of the histograms so that the centers of the histogram bars align with the query points. Change the axes properties to align the axes of the plots.

t = tiledlayout(5,5,'TileSpacing','compact');

ax1 = nexttile(2,[4,4]);
imagesc(x,y,pd)
title('Partial Dependence Plot')
colorbar('eastoutside')
ax1.YDir = 'normal';

ax2 = nexttile(22,[1,4]);
dX = diff(ptX(1:2));
edgeX = [ptX-dX/2;ptX(end)+dX];
histogram(Weight,edgeX);
xlabel('Weight')
xlim(ax1.XLim);

ax3 = nexttile(1,[4,1]);
dY = diff(ptY(1:2));
edgeY = [ptY-dY/2;ptY(end)+dY];
histogram(Horsepower,edgeY)
xlabel('Horsepower')
xlim(ax1.YLim);
ax3.XDir = 'reverse';
camroll(-90)

Each element of pd specifies the color for one pixel of the image plot. The histograms aligned with the axes of the image show the distribution of the predictors.

Input Arguments

collapse all

Machine learning model, specified as a full or compact regression or classification model object, as given in the following tables of supported models.

Regression Model Object

ModelFull or Compact Regression Model Object
Bootstrap aggregation for ensemble of decision treesTreeBagger, CompactTreeBagger
Ensemble of regression modelsRegressionEnsemble, RegressionBaggedEnsemble, CompactRegressionEnsemble
Gaussian kernel regression model using random feature expansionRegressionKernel
Gaussian process regressionRegressionGP, CompactRegressionGP
Generalized linear mixed-effect modelGeneralizedLinearMixedModel
Generalized linear modelGeneralizedLinearModel, CompactGeneralizedLinearModel
Linear mixed-effect modelLinearMixedModel
Linear regressionLinearModel, CompactLinearModel
Linear regression for high-dimensional dataRegressionLinear
Nonlinear regressionNonLinearModel
Regression treeRegressionTree, CompactRegressionTree
Support vector machine regressionRegressionSVM, CompactRegressionSVM

Classification Model Object

ModelFull or Compact Classification Model Object
Discriminant analysis classifierClassificationDiscriminant, CompactClassificationDiscriminant
Multiclass model for support vector machines or other classifiersClassificationECOC, CompactClassificationECOC
Ensemble of learners for classificationClassificationEnsemble, CompactClassificationEnsemble, ClassificationBaggedEnsemble
Gaussian kernel classification model using random feature expansionClassificationKernel
k-nearest neighbor classifierClassificationKNN
Linear classification modelClassificationLinear
Multiclass naive Bayes modelClassificationNaiveBayes, CompactClassificationNaiveBayes
Support vector machine classifier for one-class and binary classificationClassificationSVM, CompactClassificationSVM
Binary decision tree for multiclass classificationClassificationTree, CompactClassificationTree
Bagged ensemble of decision treesTreeBagger, CompactTreeBagger

If Mdl is a compact model object, you must provide the input argument Data.

partialDependence does not support a model object trained with a sparse matrix. When you train a model, use a full numeric matrix or table for predictor data where rows correspond to individual observations.

Predictor variables, specified as a vector of positive integers, character vector, string scalar, string array, or cell array of character vectors. You can specify one or two predictor variables, as shown in the following tables.

One Predictor Variable

ValueDescription
positive integerIndex value corresponding to the column of the predictor data.
character vector or string scalar

Name of a predictor variable. The name must match the entry in Mdl.PredictorNames.

Two Predictor Variables

ValueDescription
vector of two positive integersIndex values corresponding to the columns of the predictor data.
string array or cell array of character vectors

Names of predictor variables. Each element in the array is the name of a predictor variable. The names must match the entries in Mdl.PredictorNames.

Example: {'x1','x3'}

Data Types: single | double | char | string | cell

Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. The values and data types in Labels must match those of the class names in the ClassNames property of Mdl (Mdl.ClassNames).

You can specify one or multiple class labels.

This argument is valid only when Mdl is a classification model object.

Example: {'red','blue'}

Example: Mdl.ClassNames([1 3]) specifies Labels as the first and third classes in Mdl.

Data Types: single | double | logical | char | cell | categorical

Predictor data, specified as a numeric matrix or table. Each row of Data corresponds to one observation, and each column corresponds to one variable.

Data must be consistent with the predictor data that trained Mdl, stored in either Mdl.X or Mdl.Variables.

  • If you trained Mdl using a numeric matrix, then Data must be a numeric matrix. The variables making up the columns of Data must have the same number and order as the predictor variables that trained Mdl.

  • If you trained Mdl using a table (for example, Tbl), then Data must be a table. All predictor variables in Data must have the same variable names and data types as the names and types in Tbl. However, the column order of Data does not need to correspond to the column order of Tbl.

  • partialDependence does not support a sparse matrix.

If Mdl is a compact model object, you must provide Data. If Mdl is a full model object that contains predictor data and you specify this argument, then partialDependence does not use the predictor data in Mdl and uses Data only.

Data Types: single | double | table

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: partialDependence(Mdl,Vars,Data,'NumObservationsToSample',100,'UseParallel',true) computes the partial dependence values by using 100 sampled observations in Data and executing for-loop iterations in parallel.

Number of observations to sample, specified as the comma-separated pair consisting of 'NumObservationsToSample' and a positive integer. The default value is the number of total observations in either Mdl or Data. If you specify a value larger than the number of total observations, then partialDependence uses all observations.

partialDependence samples observations without replacement by using the datasample function and uses the sampled observations to compute partial dependence.

Example: 'NumObservationsToSample',100

Data Types: single | double

Points to compute partial dependence for numeric predictors, specified as the comma-separated pair consisting of 'QueryPoints' and a numeric column vector, a numeric two-column matrix, or a cell array of two numeric column vectors.

  • If you select one predictor variable in Vars, use a numeric column vector.

  • If you select two predictor variables in Vars:

    • Use a numeric two-column matrix to specify the same number of points for each predictor variable.

    • Use a cell array of two numeric column vectors to specify a different number of points for each predictor variable.

The default value is a numeric column vector or a numeric two-column matrix, depending on the number of selected predictor variables. Each column contains 100 evenly spaced points between the minimum and maximum values of the sampled observations for the corresponding predictor variable.

You cannot modify 'QueryPoints' for a categorical variable. The partialDependence function uses all categorical values in the selected variable.

If you select one numeric variable and one categorical variable, you can specify 'QueryPoints' for a numeric variable by using a cell array consisting of a numeric column vector and an empty array.

Example: 'QueryPoints',{pt,[]}

Data Types: single | double | cell

Flag to run in parallel, specified as the comma-separated pair consisting of 'UseParallel' and true or false. If you specify 'UseParallel' as true, the partialDependence function executes for-loop iterations in parallel by using parfor when predicting responses or scores for each observation and averaging them.

Example: 'UseParallel',true

Data Types: logical

Output Arguments

collapse all

Partial dependence values, returned as a numX-by-numY numeric matrix (for a regression model) or numLabels-by-numX-by-numY numeric array (for a classification model). numX and numY are the number of query points of the first and second variables in Vars, respectively. numLabels is the number of class labels in Labels.

The value in pd(i,j,k) is the partial dependence value of the query point x(j) and y(k) for the ith class label. x(j) is the jth query point of the first predictor variable, and y(k) is the kth query point of the second predictor variable.

Query points of the first predictor variable in Vars, returned as a numeric or categorical column vector.

If the predictor variable is numeric, then you can specify the query points by using the 'QueryPoints' name-value pair argument.

Data Types: single | double | categorical

Query points of the second predictor variable in Vars, returned as a numeric or categorical column vector.

If the predictor variable is numeric, then you can specify the query points by using the 'QueryPoints' name-value pair argument.

Data Types: single | double | categorical

More About

collapse all

Partial Dependence for Regression Models

Partial dependence[1] represents the relationships between predictor variables and predicted responses in a trained regression model. partialDependence computes the partial dependence of predicted responses on a subset of predictor variables by marginalizing over the other variables.

Consider partial dependence on a subset XS of the whole predictor variable set X = {x1, x2, …, xm}. A subset XS includes either one variable or two variables: XS = {xS1} or XS = {xS1, xS2}. Let XC be the complementary set of XS in X. A predicted response f(X) depends on all variables in X:

f(X) = f(XS, XC).

The partial dependence of predicted responses on XS is defined by the expectation of predicted responses with respect to XC:

fS(XS)=EC[f(XS,XC)]=f(XS,XC)pC(XC)dXC,

where pC(XC) is the marginal probability of XC, that is, pC(XC)p(XS,XC)dXS. Assuming that each observation is equally likely, and the dependence between XS and XC and the interactions of XS and XC in responses is not strong, partialDependence estimates the partial dependence by using observed predictor data as follows:

fS(XS)1Ni=1Nf(XS,XiC),(1)

where N is the number of observations and Xi = (XiS, XiC) is the ith observation.

When you call the partialDependence function, you can specify a trained model (f(·)) and select variables (XS) by using the input arguments Mdl and Vars, respectively. partialDependence computes the partial dependence at 100 evenly spaced points of XS or the points that you specify by using the 'QueryPoints' name-value pair argument. You can specify the number (N) of observations to sample from given predictor data by using the 'NumObservationsToSample' name-value pair argument.

Partial Dependence Classification Models

In the case of classification models, partialDependence computes the partial dependence in the same way as for regression models, with one exception: instead of using the predicted responses from the model, the function uses the predicted scores for the classes specified in Labels.

Weighted Traversal Algorithm

The weighted traversal algorithm[1] is a method to estimate partial dependence for a tree-based model. The estimated partial dependence is the weighted average of response or score values corresponding to the leaf nodes visited during the tree traversal.

Let XS be a subset of the whole variable set X and XC be the complementary set of XS in X. For each XS value to compute partial dependence, the algorithm traverses a tree from the root (beginning) node down to leaf (terminal) nodes and finds the weights of leaf nodes. The traversal starts by assigning a weight value of one at the root node. If a node splits by XS, the algorithm traverses to the appropriate child node depending on the XS value. The weight of the child node becomes the same value as its parent node. If a node splits by XC, the algorithm traverses to both child nodes. The weight of each child node becomes a value of its parent node multiplied by the fraction of observations corresponding to each child node. After completing the tree traversal, the algorithm computes the weighted average by using the assigned weights.

For an ensemble of bagged trees, the estimated partial dependence is an average of the weighted averages over the individual trees.

Algorithms

partialDependence uses a predict function to predict responses or scores. partialDependence chooses the proper predict function according to Mdl and runs predict with its default settings. For details about each predict function, see the predict functions in the following two tables. If Mdl is a tree-based model (not including a boosted ensemble of trees), then partialDependence uses the weighted traversal algorithm instead of the predict function. For details, see Weighted Traversal Algorithm.

Regression Model Object

Model TypeFull or Compact Regression Model ObjectFunction to Predict Responses
Bootstrap aggregation for ensemble of decision treesCompactTreeBaggerpredict
Bootstrap aggregation for ensemble of decision treesTreeBaggerpredict
Ensemble of regression modelsRegressionEnsemble, RegressionBaggedEnsemble, CompactRegressionEnsemblepredict
Gaussian kernel regression model using random feature expansionRegressionKernelpredict
Gaussian process regressionRegressionGP, CompactRegressionGPpredict
Generalized linear mixed-effect modelGeneralizedLinearMixedModelpredict
Generalized linear modelGeneralizedLinearModel, CompactGeneralizedLinearModelpredict
Linear mixed-effect modelLinearMixedModelpredict
Linear regressionLinearModel, CompactLinearModelpredict
Linear regression for high-dimensional dataRegressionLinearpredict
Nonlinear regressionNonLinearModelpredict
Regression treeRegressionTree, CompactRegressionTreepredict
Support vector machine regressionRegressionSVM, CompactRegressionSVMpredict

Classification Model Object

Model TypeFull or Compact Classification Model ObjectFunction to Predict Labels and Scores
Discriminant analysis classifierClassificationDiscriminant, CompactClassificationDiscriminantpredict
Multiclass model for support vector machines or other classifiersClassificationECOC, CompactClassificationECOCpredict
Ensemble of learners for classificationClassificationEnsemble, CompactClassificationEnsemble, ClassificationBaggedEnsemblepredict
Gaussian kernel classification model using random feature expansionClassificationKernelpredict
k-nearest neighbor classifierClassificationKNNpredict
Linear classification modelClassificationLinearpredict
Multiclass naive Bayes modelClassificationNaiveBayes, CompactClassificationNaiveBayespredict
Support vector machine classifier for one-class and binary classificationClassificationSVM, CompactClassificationSVMpredict
Binary decision tree for multiclass classificationClassificationTree, CompactClassificationTreepredict
Bagged ensemble of decision treesTreeBagger, CompactTreeBaggerpredict

Alternative Functionality

References

[1] Friedman, Jerome. H. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29, no. 5 (2001): 1189-1232.

[2] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. New York, NY: Springer New York, 2009.

Extended Capabilities

Introduced in R2020b