Documentation

plotPartialDependence(Mdl,Vars) computes and plots the partial dependence between the predictor variables listed in Vars and the responses predicted by using the regression model Mdl, which contains predictor data.

If you specify one variable in Vars, the function creates a line plot of the partial dependence against the variable.
If you specify two variables in Vars, the function creates a surface plot of the partial dependence against the two variables.

plotPartialDependence(Mdl,Vars,Labels) computes and plots the partial dependence between the predictor variables listed in Vars and the scores for the classes specified in Labels by using the classification model Mdl, which contains predictor data.

If you specify one variable in Vars and one class in Labels, the function creates a line plot of the partial dependence against the variable for the specified class.
If you specify one variable in Vars and multiple classes in Labels, the function creates a line plot for each class on one figure.
If you specify two variables in Vars and one class in Labels, the function creates a surface plot of the partial dependence against the two variables.

plotPartialDependence(___,Data) uses new predictor data Data. You can specify Data in addition to any of the input argument combinations in the previous syntaxes.

plotPartialDependence(___,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, if you specify 'Conditional','absolute', the plotPartialDependence function creates a figure including a PDP, a scatter plot of the selected predictor variable and predicted responses or scores, and an ICE plot for each observation.

ax = plotPartialDependence(___) returns the axes of the plot.

Examples

Create Partial Dependence Plot

Train a regression tree using the carsmall data set, and create a PDP that shows the relationship between a feature and the predicted responses in the trained regression tree.

Load the carsmall data set.

load carsmall

Specify Weight, Cylinders, and Horsepower as the predictor variables (X), and MPG as the response variable (Y).

X = [Weight,Cylinders,Horsepower];
Y = MPG;

Train a regression tree using X and Y.

Mdl = fitrtree(X,Y);

View a graphical display of the trained regression tree.

view(Mdl,'Mode','graph')

Create a PDP of the first predictor variable, Weight.

plotPartialDependence(Mdl,1)

The plotted line represents averaged partial relationships between Weight (labeled as x1) and MPG (labeled as Y) in the trained regression tree Mdl. The x-axis minor ticks represent the unique values in x1.

The regression tree viewer shows that the first decision is whether x1 is smaller than 3085.5. The PDP also shows a large change near x1 = 3085.5. The tree viewer visualizes each decision at each node based on predictor variables. You can find several nodes split based on the values of x1, but determining the dependence of Y on x1 is not easy. However, the plotPartialDependence plots average predicted responses against x1, so you can clearly see the partial dependence of Y on x1.

The labels x1 and Y are the default values of the predictor names and the response name. You can modify these names by specifying the name-value pair arguments 'PredictorNames' and 'ResponseName' when you train Mdl using fitrtree. You can also modify axis labels by using the xlabel and ylabel functions.

Create Partial Dependence Plot for Multiple Classes

Train a naive Bayes classification model with the fisheriris data set, and create a PDP that shows the relationship between the predictor variable and the predicted scores (posterior probabilities) for multiple classes.

Load the fisheriris data set, which contains species (species) and measurements (meas) on sepal length, sepal width, petal length, and petal width for 150 iris specimens. The data set contains 50 specimens from each of three species: setosa, versicolor, and virginica.

load fisheriris

Train a naive Bayes classification model with species as the response and meas as predictors.

Mdl = fitcnb(meas,species);

Create a PDP of the scores predicted by Mdl for all three classes of species against the third predictor variable x3. Specify the class labels by using the ClassNames property of Mdl.

plotPartialDependence(Mdl,3,Mdl.ClassNames);

According to this model, the probability of virginica increases with x3. The probability of setosa is about 0.33, from where x3 is 0 to around 2.5, and then the probability drops to almost 0.

Create Individual Conditional Expectation Plots

Train a Gaussian process regression model using generated sample data where a response variable includes interactions between predictor variables. Then, create ICE plots that show the relationship between a feature and the predicted responses for each observation.

Generate sample predictor data x1 and x2.

rng('default') % For reproducibility
n = 200;
x1 = rand(n,1)*2-1;
x2 = rand(n,1)*2-1;

Generate response values that include interactions between x1 and x2.

Y = x1-2*x1.*(x2>0)+0.1*rand(n,1);

Create a Gaussian process regression model using [x1 x2] and Y.

Mdl = fitrgp([x1 x2],Y);

Create a figure including a PDP (red line) for the first predictor x1, a scatter plot (circle markers) of x1 and predicted responses, and a set of ICE plots (gray lines) by specifying 'Conditional' as 'centered'.

plotPartialDependence(Mdl,1,'Conditional','centered')

When 'Conditional' is 'centered', plotPartialDependence offsets plots so that all plots start from zero, which is helpful in examining the cumulative effect of the selected feature.

A PDP finds averaged relationships, so it does not reveal hidden dependencies especially when responses include interactions between features. However, the ICE plots clearly show two different dependencies of responses on x1.

Use New Predictor Data for Partial Dependence Plot

Train an ensemble of classification models and create two PDPs, one using the training data set and the other using a new data set.

Load the census1994 data set, which contains US yearly salary data, categorized as <=50K or >50K, and several demographic variables.

load census1994

Extract a subset of variables to analyze from the tables adultdata and adulttest.

X = adultdata(:,{'age','workClass','education_num','marital_status','race', ...
   'sex','capital_gain','capital_loss','hours_per_week','salary'});
Xnew = adulttest(:,{'age','workClass','education_num','marital_status','race', ...
   'sex','capital_gain','capital_loss','hours_per_week','salary'});

Train an ensemble of classifiers with salary as the response and the remaining variables as predictors by using the function fitcensemble. For binary classification, fitcensemble aggregates 100 classification trees using the LogitBoost method.

Mdl = fitcensemble(X,'salary');

Inspect the class names in Mdl.

Mdl.ClassNames

ans = 2×1 categorical
     <=50K 
     >50K

Create a partial dependence plot of the scores predicted by Mdl for the second class of salary (>50K) against the predictor age using the training data.

plotPartialDependence(Mdl,'age',Mdl.ClassNames(2))

Create a PDP of the scores for class >50K against age using new predictor data from the table Xnew.

plotPartialDependence(Mdl,'age',Mdl.ClassNames(2),Xnew)

The two plots show similar shapes for the partial dependence of the predicted score of high salary (>50K) on age. Both plots indicate that the predicted score of high salary rises fast until the age of 30, then stays almost flat until the age of 60, and then drops fast. However, the plot based on the new data produces slightly higher scores for ages over 65.

Compare Importance of Predictor Variables

Train a regression ensemble using the carsmall data set, and create a PDP plot and ICE plots for each predictor variable using a new data set, carbig. Then, compare the figures to analyze the importance of predictor variables. Also, compare the results with the estimates of predictor importance returned by the predictorImportance function.

Load the carsmall data set.

load carsmall

Specify Weight, Cylinders, Horsepower, and Model_Year as the predictor variables (X), and MPG as the response variable (Y).

X = [Weight,Cylinders,Horsepower,Model_Year];
Y = MPG;

Train a regression ensemble using X and Y.

Mdl = fitrensemble(X,Y, ...
    'PredictorNames',{'Weight','Cylinders','Horsepower','Model Year'}, ...
    'ResponseName','MPG');

Create the importance of predictor variables by using the plotPartialDependence and predictorImportance functions. The plotPartialDependence function visualizes the relationships between a selected predictor and predicted responses. predictorImportance summarizes the importance of a predictor with a single value.

Create a figure including a PDP plot (red line) and ICE plots (gray lines) for each predictor by using plotPartialDependence and specifying 'Conditional','absolute'. Each figure also includes a scatter plot (circle markers) of the selected predictor and predicted responses. Also, load the carbig data set and use it as new predictor data, Xnew. When you provide Xnew, the plotPartialDependence function uses Xnew instead of the predictor data in Mdl.

load carbig
Xnew = [Weight,Cylinders,Horsepower,Model_Year];

figure
t = tiledlayout(2,2,'TileSpacing','compact');
title(t,'Individual Conditional Expectation Plots')

for i = 1 : 4
    nexttile
    plotPartialDependence(Mdl,i,Xnew,'Conditional','absolute')
    title('')
end

Compute estimates of predictor importance by using predictorImportance. This function sums changes in the mean squared error (MSE) due to splits on every predictor, and then divides the sum by the number of branch nodes.

imp = predictorImportance(Mdl);
figure
bar(imp)
title('Predictor Importance Estimates')
ylabel('Estimates')
xlabel('Predictors')
ax = gca;
ax.XTickLabel = Mdl.PredictorNames;

The variable Weight has the most impact on MPG according to predictor importance. The PDP of Weight also shows that MPG has high partial dependence on Weight. The variable Cylinders has the least impact on MPG according to predictor importance. The PDP of Cylinders also shows that MPG does not change much depending on Cylinders.

Extract Partial Dependence Estimates from Plots

Train a support vector machine (SVM) regression model using the carsmall data set, and create a PDP for two predictor variables. Then, extract partial dependence estimates from the output of plotPartialDependence. Alternatively, you can get the partial dependence values by using the partialDependence function.

Load the carsmall data set.

load carsmall

Specify Weight, Cylinders, Displacement, and Horsepower as the predictor variables (Tbl).

Tbl = table(Weight,Cylinders,Displacement,Horsepower);

Construct an SVM regression model using Tbl and the response variable MPG. Use a Gaussian kernel function with an automatic kernel scale.

Mdl = fitrsvm(Tbl,MPG,'ResponseName','MPG', ...
    'CategoricalPredictors','Cylinders','Standardize',true, ...
    'KernelFunction','gaussian','KernelScale','auto');

Create a PDP that visualizes partial dependence of predicted responses (MPG) on the predictor variables Weight and Cylinders. Specify query points to compute the partial dependence for Weight by using the 'QueryPoints' name-value pair argument. You cannot specify the 'QueryPoints' value for Cylinders because it is a categorical variable. plotPartialDependence uses all categorical values.

pt = linspace(min(Weight),max(Weight),50)';
ax = plotPartialDependence(Mdl,{'Weight','Cylinders'},'QueryPoints',{pt,[]});
view(140,30) % Modify the viewing angle

The PDP shows an interaction effect between Weight and Cylinders. The partial dependence of MPG on Weight changes depending on the value of Cylinders.

Extract the estimated partial dependence of MPG on Weight and Cylinders. The XData, YData, and ZData values of ax.Children are x-axis values (the first selected predictor values), y-axis values (the second selected predictor values), and z-axis values (the corresponding partial dependence values), respectively.

xval = ax.Children.XData;
yval = ax.Children.YData;
zval = ax.Children.ZData;

Alternatively, you can get the partial dependence values by using the partialDependence function.

[pd,x,y] = partialDependence(Mdl,{'Weight','Cylinders'},'QueryPoints',{pt,[]});

pd contains the partial dependence values for the query points x and y.

If you specify 'Conditional' as 'absolute', plotPartialDependence creates a figure including a PDP, a scatter plot, and a set of ICE plots. ax.Children(1) and ax.Children(2) correspond to the PDP and scatter plot, respectively. The remaining elements of ax.Children correspond to the ICE plots. The XData and YData values of ax.Children(i) are x-axis values (the selected predictor values) and y-axis values (the corresponding partial dependence values), respectively.

Input Arguments

`Mdl` — Machine learning model
regression model object | classification model object

Machine learning model, specified as a full or compact regression or classification model object, as given in the following tables of supported models.

Regression Model Object

Model	Full or Compact Regression Model Object
Bootstrap aggregation for ensemble of decision trees	`TreeBagger`, `CompactTreeBagger`
Ensemble of regression models	`RegressionEnsemble`, `RegressionBaggedEnsemble`, `CompactRegressionEnsemble`
Gaussian kernel regression model using random feature expansion	`RegressionKernel`
Gaussian process regression	`RegressionGP`, `CompactRegressionGP`
Generalized linear mixed-effect model	`GeneralizedLinearMixedModel`
Generalized linear model	`GeneralizedLinearModel`, `CompactGeneralizedLinearModel`
Linear mixed-effect model	`LinearMixedModel`
Linear regression	`LinearModel`, `CompactLinearModel`
Linear regression for high-dimensional data	`RegressionLinear`
Nonlinear regression	`NonLinearModel`
Regression tree	`RegressionTree`, `CompactRegressionTree`
Support vector machine regression	`RegressionSVM`, `CompactRegressionSVM`

Classification Model Object

Model	Full or Compact Classification Model Object
Discriminant analysis classifier	`ClassificationDiscriminant`, `CompactClassificationDiscriminant`
Multiclass model for support vector machines or other classifiers	`ClassificationECOC`, `CompactClassificationECOC`
Ensemble of learners for classification	`ClassificationEnsemble`, `CompactClassificationEnsemble`, `ClassificationBaggedEnsemble`
Gaussian kernel classification model using random feature expansion	`ClassificationKernel`
k-nearest neighbor classifier	`ClassificationKNN`
Linear classification model	`ClassificationLinear`
Multiclass naive Bayes model	`ClassificationNaiveBayes`, `CompactClassificationNaiveBayes`
Support vector machine classifier for one-class and binary classification	`ClassificationSVM`, `CompactClassificationSVM`
Binary decision tree for multiclass classification	`ClassificationTree`, `CompactClassificationTree`
Bagged ensemble of decision trees	`TreeBagger`, `CompactTreeBagger`

If Mdl is a compact model object, you must provide the input argument Data.

plotPartialDependence does not support a model object trained with a sparse matrix. When you train a model, use a full numeric matrix or table for predictor data where rows correspond to individual observations.

`Vars` — Predictor variables
vector of positive integers | character vector | string scalar | string array | cell array of character vectors

Predictor variables, specified as a vector of positive integers, character vector, string scalar, string array, or cell array of character vectors. You can specify one or two predictor variables, as shown in the following tables.

One Predictor Variable

Value	Description
positive integer	Index value corresponding to the column of the predictor data.
character vector or string scalar	Name of a predictor variable. The name must match the entry in `Mdl.PredictorNames`.

Two Predictor Variables

Value	Description
vector of two positive integers	Index values corresponding to the columns of the predictor data.
string array or cell array of character vectors	Names of predictor variables. Each element in the array is the name of a predictor variable. The names must match the entries in `Mdl.PredictorNames`.

Example: {'x1','x3'}

Data Types: single | double | char | string | cell

`Labels` — Class labels
categorical array | character array | logical vector | numeric vector | cell array of character vectors

Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. The values and data types in Labels must match those of the class names in the ClassNames property of Mdl (Mdl.ClassNames).

You can specify multiple class labels only when you specify one variable in Vars and specify 'Conditional' as 'none' (default).
Use partialDependence if you want to compute the partial dependence for multiple variables and multiple class labels in one function call.

This argument is valid only when Mdl is a classification model object.

Example: {'red','blue'}

Example: Mdl.ClassNames([1 3]) specifies Labels as the first and third classes in Mdl.

`Data` — Predictor data
numeric matrix | table

Predictor data, specified as a numeric matrix or table. Each row of Data corresponds to one observation, and each column corresponds to one variable.

Data must be consistent with the predictor data that trained Mdl, stored in either Mdl.X or Mdl.Variables.

If you trained Mdl using a numeric matrix, then Data must be a numeric matrix. The variables making up the columns of Data must have the same number and order as the predictor variables that trained Mdl.
If you trained Mdl using a table (for example, Tbl), then Data must be a table. All predictor variables in Data must have the same variable names and data types as the names and types in Tbl. However, the column order of Data does not need to correspond to the column order of Tbl.
plotPartialDependence does not support a sparse matrix.

If Mdl is a compact model object, you must provide Data. If Mdl is a full model object that contains predictor data and you specify this argument, then plotPartialDependence does not use the predictor data in Mdl and uses Data only.

Data Types: single | double | table

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: plotPartialDependence(Mdl,Vars,Data,'NumObservationsToSample',100,'UseParallel',true) creates a PDP by using 100 sampled observations in Data and executing for-loop iterations in parallel.

`'Conditional'` — Plot type
`'none'` (default) | `'absolute'` | `'centered'`

Plot type, specified as the comma-separated pair consisting of 'Conditional' and 'none', 'absolute', or 'centered'.

Value Description

Value	Description
`'none'`	`plotPartialDependence` creates a PDP. The plot type depends on the number of predictor variables specified in `Vars` and the number of class labels specified in `Labels` (for a classification model). One predictor variable and one class label — `plotPartialDependence` computes partial dependence at the query points, and creates a 2-D line plot of the partial dependence. One predictor variable and multiple class labels — `plotPartialDependence` creates one figure containing multiple 2-D line plots for the selected classes. Two predictor variables and one class label — `plotPartialDependence` creates a surface plot of partial dependence against the two variables.
`'absolute'`	`plotPartialDependence` creates a figure including the following three types of plots: PDP with a red line Scatter plot of the selected predictor variable and predicted responses or scores with circle markers ICE plot for each observation with a gray line This value is valid when you select only one predictor variable in `Vars` and one class label in `Labels` (for a classification model).
`'centered'`	`plotPartialDependence` creates a figure including the same three types of plots as `'absolute'`. The function offsets plots so that all plots start from zero. This value is valid when you select only one predictor variable in `Vars` and one class label in `Labels` (for a classification model).

'none'

plotPartialDependence creates a PDP. The plot type depends on the number of predictor variables specified in Vars and the number of class labels specified in Labels (for a classification model).

One predictor variable and one class label — plotPartialDependence computes partial dependence at the query points, and creates a 2-D line plot of the partial dependence.
One predictor variable and multiple class labels — plotPartialDependence creates one figure containing multiple 2-D line plots for the selected classes.
Two predictor variables and one class label — plotPartialDependence creates a surface plot of partial dependence against the two variables.

'absolute'

plotPartialDependence creates a figure including the following three types of plots:

PDP with a red line
Scatter plot of the selected predictor variable and predicted responses or scores with circle markers
ICE plot for each observation with a gray line

This value is valid when you select only one predictor variable in Vars and one class label in Labels (for a classification model).

'centered'

plotPartialDependence creates a figure including the same three types of plots as 'absolute'. The function offsets plots so that all plots start from zero.

This value is valid when you select only one predictor variable in Vars and one class label in Labels (for a classification model).

Example: 'Conditional','absolute'

`'NumObservationsToSample'` — Number of observations to sample
number of total observations (default) | positive integer

Number of observations to sample, specified as the comma-separated pair consisting of 'NumObservationsToSample' and a positive integer. The default value is the number of total observations in either Mdl or Data. If you specify a value larger than the number of total observations, then plotPartialDependence uses all observations.

plotPartialDependence samples observations without replacement by using the datasample function and uses the sampled observations to compute partial dependence.

plotPartialDependence displays minor tick marks at the unique values of the sampled observations.

If you specify 'Conditional' as either 'absolute' or 'centered', plotPartialDependence creates a figure including an ICE plot for each sampled observation.

Example: 'NumObservationsToSample',100

Data Types: single | double

`'Parent'` — Axes in which to plot
`gca` (default) | axes object

Axes in which to plot, specified as the comma-separated pair consisting of 'Parent' and an axes object. If you do not specify the axes and if the current axes are Cartesian, then plotPartialDependence uses the current axes (gca). If axes do not exist, plotPartialDependence plots in a new figure.

Example: 'Parent',ax

`'QueryPoints'` — Points to compute partial dependence
numeric column vector | numeric two-column matrix | cell array of two numeric column vectors

Points to compute partial dependence for numeric predictors, specified as the comma-separated pair consisting of 'QueryPoints' and a numeric column vector, a numeric two-column matrix, or a cell array of two numeric column vectors.

If you select one predictor variable in Vars, use a numeric column vector.
If you select two predictor variables in Vars:
- Use a numeric two-column matrix to specify the same number of points for each predictor variable.
- Use a cell array of two numeric column vectors to specify a different number of points for each predictor variable.

The default value is a numeric column vector or a numeric two-column matrix, depending on the number of selected predictor variables. Each column contains 100 evenly spaced points between the minimum and maximum values of the sampled observations for the corresponding predictor variable.

If 'Conditional' is 'absolute' or 'centered', then the software adds the predictor data values (Data or predictor data in Mdl) of the selected predictors to the query points.

You cannot modify 'QueryPoints' for a categorical variable. The plotPartialDependence function uses all categorical values in the selected variable.

If you select one numeric variable and one categorical variable, you can specify 'QueryPoints' for a numeric variable by using a cell array consisting of a numeric column vector and an empty array.

Example: 'QueryPoints',{pt,[]}

Data Types: single | double | cell

`'UseParallel'` — Flag to run in parallel
`false` (default) | `true`

Flag to run in parallel, specified as the comma-separated pair consisting of 'UseParallel' and true or false. If you specify 'UseParallel' as true, the plotPartialDependence function executes for-loop iterations in parallel by using parfor when predicting responses or scores for each observation and averaging them.

Example: 'UseParallel',true

Data Types: logical

Output Arguments

`ax` — Axes of the plot
axes object

Axes of the plot, returned as an axes object. For details on how to modify the appearance of the axes and extract data from plots, see Axes Appearance and Extract Partial Dependence Estimates from Plots.

More About