Class: TreeBagger
Predict responses using ensemble of bagged decision trees
Yfit = predict(B,X)
Yfit = predict(B,X,Name,Value)
[Yfit,stdevs] = predict(___)
[Yfit,scores] = predict(___)
[Yfit,scores,stdevs] = predict(___)
Yfit = predict(B,X)
returns a vector of
predicted responses for the predictor data in the table or matrix X
,
based on the ensemble of bagged decision trees B
. Yfit
is
a cell array of character vectors for classification and a numeric
array for regression. By default, predict
takes
a democratic (nonweighted) average vote from all trees in the ensemble.
B
is a trained TreeBagger
model object, that is, a model
returned by TreeBagger
.
X
is a table or matrix of predictor data
used to generate responses. Rows represent observations and columns
represent variables.
If X
is a numeric matrix:
The variables making up the columns of X
must
have the same order as the predictor variables that trained B
.
If you trained B
using a table
(for example, Tbl
), then X
can
be a numeric matrix if Tbl
contains all numeric
predictor variables. To treat numeric predictors in Tbl
as
categorical during training, identify categorical predictors using
the CategoricalPredictors
name-value pair argument
of TreeBagger
.
If Tbl
contains heterogeneous predictor variables
(for example, numeric and categorical data types) and X
is
a numeric matrix, then predict
throws an error.
If X
is a table:
predict
does not support multi-column
variables and cell arrays other than cell arrays of character vectors.
If you trained B
using a table
(for example, Tbl
), then all predictor variables
in X
must have the same variable names and be of
the same data types as those that trained B
(stored
in B.PredictorNames
). However, the column order
of X
does not need to correspond to the column
order of Tbl
. Tbl
and X
can
contain additional variables (response variables, observation weights,
etc.), but predict
ignores them.
If you trained B
using a numeric
matrix, then the predictor names in B.PredictorNames
and
corresponding predictor variable names in X
must
be the same. To specify predictor names during training, see the PredictorNames
name-value
pair argument of TreeBagger
. All predictor variables
in X
must be numeric vectors. X
can
contain additional variables (response variables, observation weights,
etc.), but predict
ignores them.
Yfit = predict(B,X,
specifies
additional options using one or more name-value pair arguments:Name,Value
)
'Trees'
— Array of tree indices to use for computation
of responses. The default is 'all'
.
'TreeWeights'
— Array of NTrees
weights for weighting votes from the specified trees, where
NTrees
is the number of trees in the ensemble.
'UseInstanceForTree'
— Logical matrix of size
Nobs
-by-NTrees
indicating which
trees to use to make predictions for each observation, where
Nobs
is the number of observations. By default all
trees are used for all observations.
For regression, [Yfit,stdevs] = predict(___)
also returns
standard deviations of the computed responses over the ensemble of the grown trees using
any of the input argument combinations in previous syntaxes.
For classification, [Yfit,scores] = predict(___)
also returns
scores for all classes. scores
is a matrix with one row per
observation and one column per class. For each observation and each class, the score
generated by each tree is the probability of the observation originating from the class,
computed as the fraction of observations of the class in a tree leaf.
predict
averages these scores over all trees in the
ensemble.
[Yfit,scores,stdevs] = predict(___)
also returns standard
deviations of the computed scores for classification. stdevs
is a
matrix with one row per observation and one column per class, with standard deviations
taken over the ensemble of the grown trees.
For regression problems, the predicted response for an observation is the weighted average of the predictions using selected trees only. That is,
is the prediction from tree t in the ensemble.
S is the set of indices of selected
trees that comprise the prediction (see '
Trees
'
and '
UseInstanceForTree
'
). is
1 if t is in the set S, and
0 otherwise.
αt is
the weight of tree t (see '
TreeWeights
'
).
For classification problems, the predicted class for an observation is the class that yields the largest weighted average of the class posterior probabilities (i.e., classification scores) computed using selected trees only. That is,
For each class c ∊ C and
each tree t = 1,...,T, predict
computes ,
which is the estimated posterior probability of class c given
observation x using tree t. C is
the set of all distinct classes in the training data. For more details
on classification tree posterior probabilities, see fitctree
and predict
.
predict
computes the weighted average
of the class posterior probabilities over the selected trees.
The predicted class is the class that yields the largest weighted average.