CompactTreeBagger class

Compact ensemble of decision trees grown by bootstrap aggregation

Description

CompactTreeBagger class is a lightweight class that contains the trees grown using TreeBagger. CompactTreeBagger does not preserve any information about how TreeBagger grew the decision trees. It does not contain the input data used for growing trees, nor does it contain training parameters such as minimal leaf size or number of variables sampled for each decision split at random. You can only use CompactTreeBagger for predicting the response of the trained ensemble given new data X, and other related functions.

CompactTreeBagger lets you save the trained ensemble to disk, or use it in any other way, while discarding training data and various parameters of the training configuration irrelevant for predicting response of the fully grown ensemble. This reduces storage and memory requirements, especially for ensembles trained on large data sets.

Construction

CompactTreeBaggerCreate CompactTreeBagger object

CMdl = compact(Mdl) creates a compact version of Mdl, a TreeBagger model object. You can predict regressions using CMdl exactly as you can using Mdl. However, since CMdl does not contain training data, you cannot perform some actions, such as make out-of-bag predictions using oobPredict.

Object Functions

combineCombine two ensembles
errorError (misclassification probability or MSE)
marginClassification margin
mdsproxMultidimensional scaling of proximity matrix
meanMarginMean classification margin
outlierMeasureOutlier measure for data
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictPredict responses using ensemble of bagged decision trees
proximityProximity matrix for data
setDefaultYfitSet default value for predict

Properties

ClassNames

The ClassNames property is a cell array containing the class names for the response variable Y supplied to TreeBagger. This property is empty for regression trees.

DefaultYfit

The DefaultYfit property controls what predicted value CompactTreeBagger returns when no prediction is possible, for example when the predict method needs to predict for an observation which has only false values in the matrix supplied through 'useifort' argument.

For classification, you can set this property to either '' or 'MostPopular'. If you choose 'MostPopular' (default), the property value becomes the name of the most probable class in the training data.

For regression, you can set this property to any numeric scalar. The default is the mean of the response for the training data.

DeltaCriterionDecisionSplit

The DeltaCriterionDecisionSplit property is a numeric array of size 1-by-Nvars of changes in the split criterion summed over splits on each variable, averaged across the entire ensemble of grown trees.

Method

The Method property is 'classification' for classification ensembles and 'regression' for regression ensembles.

NumPredictorSplit

The NumPredictorSplit property is a numeric array of size 1-by-Nvars, where every element gives a number of splits on this predictor summed over all trees.

NumTrees

The NumTrees property is a scalar equal to the number of decision trees in the ensemble.

PredictorNames

The PredictorNames property is a cell array containing the names of the predictor variables (features). These names are taken from the optional 'names' parameter that supplied to TreeBagger. The default names are 'x1', 'x2', etc.

SurrogateAssociation

The SurrogateAssociation property is a matrix of size Nvars-by-Nvars with predictive measures of variable association, averaged across the entire ensemble of grown trees. If you grew the ensemble setting 'surrogate' to 'on', this matrix for each tree is filled with predictive measures of association averaged over the surrogate splits. If you grew the ensemble setting 'surrogate' to 'off' (default), SurrogateAssociation is diagonal.

Trees

The Trees property is a cell array of size NumTrees-by-1 containing the trees in the ensemble.

Examples

collapse all

Create a compact bag of trees for efficiently making predictions on new data.

Load the ionosphere data set.

load ionosphere

Train a bag of 100 classification trees using all measurements and the AdaBoostM1 method.

Mdl = TreeBagger(100,X,Y,'Method','classification')
Mdl = 
  TreeBagger
Ensemble with 100 bagged decision trees:
                    Training X:             [351x34]
                    Training Y:              [351x1]
                        Method:       classification
                 NumPredictors:                   34
         NumPredictorsToSample:                    6
                   MinLeafSize:                    1
                 InBagFraction:                    1
         SampleWithReplacement:                    1
          ComputeOOBPrediction:                    0
 ComputeOOBPredictorImportance:                    0
                     Proximity:                   []
                    ClassNames:             'b'             'g'

  Properties, Methods

Mdl is a TreeBagger model object that contains the training data, among other things.

Create a compact version of Mdl.

CMdl = compact(Mdl)
CMdl = 
  CompactTreeBagger
Ensemble with 100 bagged decision trees:
              Method:       classification
       NumPredictors:                   34
          ClassNames: 'b' 'g'

  Properties, Methods

CMdl is a CompactTreeBagger model object. CMdl is almost the same as Mdl. One exception is that it does not store the training data.

Compare the amounts of space consumed by Mdl and CMdl.

mdlInfo = whos('Mdl');
cMdlInfo = whos('CMdl');
[mdlInfo.bytes cMdlInfo.bytes]
ans = 1×2

     1115742      976936

Mdl consumes more space than CMdl.

CMdl.Trees stores the trained classification trees (CompactClassificationTree model objects) that compose Mdl.

Display a graph of the first tree in the compact model.

view(CMdl.Trees{1},'Mode','graph');

By default, TreeBagger grows deep trees.

Predict the label of the mean of X using the compact ensemble.

predMeanX = predict(CMdl,mean(X))
predMeanX = 1x1 cell array
    {'g'}

Copy Semantics

Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB® Object-Oriented Programming documentation.

Tips

The Trees property of CMdl stores a cell vector of CMdl.NumTrees CompactClassificationTree or CompactRegressionTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(CMdl.Trees{t})