Bag of decision trees
TreeBagger
bags an ensemble of decision trees for either classification
or regression. Bagging stands for bootstrap aggregation. Every tree in the ensemble is grown on
an independently drawn bootstrap replica of input data. Observations not included in this replica
are "out of bag" for this tree.
TreeBagger
relies on the ClassificationTree
and
RegressionTree
functionality for growing individual trees. In particular,
ClassificationTree
and RegressionTree
accepts the number
of features selected at random for each decision split as an optional input argument. That is,
TreeBagger
implements the random forest algorithm [1].
For regression problems, TreeBagger
supports mean and quantile regression
(that is, quantile regression forest [2]).
To predict mean responses or estimate the mean-squared error given data, pass a
TreeBagger
model and the data to predict
or error
, respectively. To perform similar operations
for out-of-bag observations, use oobPredict
or oobError
.
To estimate quantiles of the response distribution or the quantile error given data, pass
a TreeBagger
model and the data to quantilePredict
or quantileError
, respectively. To perform similar
operations for out-of-bag observations, use oobQuantilePredict
or oobQuantileError
.
TreeBagger | Create bag of decision trees |
append | Append new trees to ensemble |
compact | Compact ensemble of decision trees |
error | Error (misclassification probability or MSE) |
fillprox | Proximity matrix for training data |
growTrees | Train additional trees and add to ensemble |
margin | Classification margin |
mdsprox | Multidimensional scaling of proximity matrix |
meanMargin | Mean classification margin |
oobError | Out-of-bag error |
oobMargin | Out-of-bag margins |
oobMeanMargin | Out-of-bag mean margins |
oobPredict | Ensemble predictions for out-of-bag observations |
oobQuantileError | Out-of-bag quantile loss of bag of regression trees |
oobQuantilePredict | Quantile predictions for out-of-bag observations from bag of regression trees |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
predict | Predict responses using ensemble of bagged decision trees |
quantileError | Quantile loss using bag of regression trees |
quantilePredict | Predict response quantile using bag of regression trees |
|
A cell array containing the class names for the response variable |
|
A logical flag specifying whether out-of-bag predictions for training observations should
be computed. The default is If this flag is
If this flag is
|
|
A logical flag specifying whether out-of-bag estimates of variable importance should be
computed. The default is If this flag is
|
|
Square matrix, where This property is:
|
|
Default value returned by
|
|
A numeric array of size 1-by-Nvars of changes in the split criterion summed over splits on each variable, averaged across the entire ensemble of grown trees. |
|
Fraction of observations that are randomly selected with replacement for each bootstrap
replica. The size of each replica is
Nobs× |
|
A logical flag specifying whether decision tree leaves with the same parent are merged
for splits that do not decrease the total risk. The default value is
|
|
Method used by trees. The possible values are |
|
Minimum number of observations per tree leaf. By default, |
|
Scalar value equal to the number of decision trees in the ensemble. |
|
A numeric array of size 1-by-Nvars, where every element gives a number of splits on this predictor summed over all trees. |
|
Number of predictor or feature variables to select at random for each decision split. By
default, |
|
Logical array of size Nobs-by-NumTrees, where
Nobs is the number of observations in the training data and
NumTrees is the number of trees in the ensemble. A |
|
Numeric array of size Nobs-by-1 containing the number of trees used for computing the out-of-bag response for each observation. Nobs is the number of observations in the training data used to create the ensemble. |
|
A numeric array of size 1-by-Nvars containing a measure of variable importance for each predictor variable (feature). For any variable, the measure is the difference between the number of raised margins and the number of lowered margins if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees. |
|
A numeric array of size 1-by-Nvars containing a measure of importance for each predictor variable (feature). For any variable, the measure is the increase in prediction error if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. |
|
A numeric array of size 1-by-Nvars containing a measure of importance for each predictor variable (feature). For any variable, the measure is the decrease in the classification margin if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees. |
|
A numeric array of size Nobs-by-1, where Nobs is the number of observations in the training data, containing outlier measures for each observation. |
|
Numeric vector of prior probabilities for each class. The order of the elements of
This property is:
|
|
A numeric matrix of size Nobs-by-Nobs, where Nobs is the number of observations in the training data, containing measures of the proximity between observations. For any two observations, their proximity is defined as the fraction of trees for which these observations land on the same leaf. This is a symmetric matrix with 1s on the diagonal and off-diagonal elements ranging from 0 to 1. |
|
The |
|
A logical flag specifying if data are sampled for each decision tree with replacement.
This property is |
|
Cell array of arguments for |
|
A cell array of size NumTrees-by-1 containing the trees in the ensemble. |
|
A matrix of size Nvars-by-Nvars with predictive
measures of variable association, averaged across the entire ensemble of grown trees. If you
grew the ensemble setting |
|
A cell array containing the names of the predictor variables (features).
|
|
Numeric vector of weights of length Nobs, where
Nobs is the number of observations (rows) in the training data. |
|
A table or numeric matrix of size Nobs-by-Nvars,
where Nobs is the number of observations (rows) and
Nvars is the number of variables (columns) in the training data. If you
train the ensemble using a table of predictor values, then |
|
A size Nobs array of response data. Elements of |
Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB® Object-Oriented Programming documentation.
For a TreeBagger
model object B
, the
Trees
property stores a cell vector of B.NumTrees
CompactClassificationTree
or CompactRegressionTree
model objects. For a textual or
graphical display of tree t
in the cell vector,
enter
view(B.Trees{t})
Statistics and Machine Learning Toolbox™ offers three objects for bagging and random forest:
ClassificationBaggedEnsemble
created by
fitcensemble
for
classification
RegressionBaggedEnsemble
created by
fitrensemble
for regression
TreeBagger
created by TreeBagger
for classification and regression
For details about the differences between TreeBagger
and
bagged ensembles (ClassificationBaggedEnsemble
and
RegressionBaggedEnsemble
), see Comparison of TreeBagger and Bagged Ensembles.
[1] Breiman, L. Random Forests. Machine Learning 45, pp. 5–32, 2001.
[2] Meinshausen, N. “Quantile Regression Forests.” Journal of Machine Learning Research, Vol. 7, 2006, pp. 983–999.
compact
| CompactTreeBagger
| error
| oobError
| oobPredict
| predict
| TreeBagger
| view
| view