Fit ensemble of learners for classification and regression
fitensemble
can boost or bag decision
tree learners or discriminant analysis classifiers. The function can
also train random subspace ensembles of KNN or discriminant analysis
classifiers.
For simpler interfaces that fit classification and regression
ensembles, instead use fitcensemble
and fitrensemble
, respectively. Also, fitcensemble
and fitrensemble
provide
options for Bayesian optimization.
returns
a trained ensemble model object that contains the results of fitting
an ensemble of Mdl
= fitensemble(Tbl
,ResponseVarName
,Method
,NLearn
,Learners
)NLearn
classification or regression
learners (Learners
) to all variables in the table Tbl
. ResponseVarName
is
the name of the response variable in Tbl
. Method
is
the ensemble-aggregation method.
trains
an ensemble using additional options specified by one or more Mdl
= fitensemble(___,Name,Value
)Name,Value
pair
arguments and any of the previous syntaxes. For example, you can specify
the class order, to implement 10–fold cross-validation, or
the learning rate.
NLearn
can vary from a few dozen
to a few thousand. Usually, an ensemble with good predictive power
requires from a few hundred to a few thousand weak learners. However,
you do not have to train an ensemble for that many cycles at once.
You can start by growing a few dozen learners, inspect the ensemble
performance and then, if necessary, train more weak learners using resume
for
classification problems, or resume
for regression problems.
Ensemble performance depends on the ensemble setting and the setting of the weak learners. That is, if you specify weak learners with default parameters, then the ensemble can perform poorly. Therefore, like ensemble settings, it is good practice to adjust the parameters of the weak learners using templates, and to choose values that minimize generalization error.
If you specify to resample using Resample
,
then it is good practice to resample to entire data set. That is,
use the default setting of 1
for FResample
.
In classification problems (that is, Type
is 'classification'
):
If the ensemble-aggregation method (Method
)
is 'bag'
and:
The misclassification cost (Cost
)
is highly imbalanced, then, for in-bag samples, the software oversamples
unique observations from the class that has a large penalty.
The class prior probabilities (Prior
)
are highly skewed, the software oversamples unique observations from
the class that has a large prior probability.
For smaller sample sizes, these combinations can result
in a low relative frequency of out-of-bag observations from the class
that has a large penalty or prior probability. Consequently, the
estimated out-of-bag error is highly variable and it can be difficult
to interpret. To avoid large estimated out-of-bag error variances,
particularly for small sample sizes, set a more balanced misclassification
cost matrix using Cost
or a less skewed prior
probability vector using Prior
.
Because the order of some input and output arguments
correspond to the distinct classes in the training data, it is good
practice to specify the class order using the ClassNames
name-value
pair argument.
To determine the class order quickly, remove all
observations from the training data that are unclassified (that is,
have a missing label), obtain and display an array of all the distinct
classes, and then specify the array for ClassNames
.
For example, suppose the response variable (Y
)
is a cell array of labels. This code specifies the class order in
the variable classNames
.
Ycat = categorical(Y); classNames = categories(Ycat)
categorical
assigns <undefined>
to
unclassified observations and categories
excludes <undefined>
from
its output. Therefore, if you use this code for cell arrays of labels
or similar code for categorical arrays, then you do not have to remove
observations with missing labels to obtain a list of the distinct
classes.To specify that the class order from lowest-represented
label to most-represented, then quickly determine the class order
(as in the previous bullet), but arrange the classes in the list by
frequency before passing the list to ClassNames
.
Following from the previous example, this code specifies the class
order from lowest- to most-represented in classNamesLH
.
Ycat = categorical(Y); classNames = categories(Ycat); freq = countcats(Ycat); [~,idx] = sort(freq); classNamesLH = classNames(idx);
For details of ensemble-aggregation algorithms, see Ensemble Algorithms.
If you specify Method
to be a
boosting algorithm and Learners
to be decision
trees, then the software grows stumps by default.
A decision stump is one root node connected to two terminal, leaf
nodes. You can adjust tree depth by specifying the MaxNumSplits
, MinLeafSize
,
and MinParentSize
name-value pair arguments using templateTree
.
fitensemble
generates in-bag
samples by oversampling classes with large misclassification costs
and undersampling classes with small misclassification costs. Consequently,
out-of-bag samples have fewer observations from classes with large
misclassification costs and more observations from classes with small
misclassification costs. If you train a classification ensemble using
a small data set and a highly skewed cost matrix, then the number
of out-of-bag observations per class can be low. Therefore, the estimated
out-of-bag error can have a large variance and can be difficult to
interpret. The same phenomenon can occur for classes with large prior
probabilities.
For the RUSBoost ensemble-aggregation method (Method
),
the name-value pair argument RatioToSmallest
specifies the
sampling proportion for each class with respect to the lowest-represented class.
For example, suppose that there are two classes in the training data:
A and B. A have 100
observations and B have 10 observations. Also, suppose that
the lowest-represented class has m
observations in
the training data.
If you set 'RatioToSmallest',2
, then
= s
*m
2*10
= 20
. Consequently,
fitensemble
trains every learner using 20
observations from class A and 20 observations from
class B. If you set 'RatioToSmallest',[2
2]
, then you obtain the same result.
If you set 'RatioToSmallest',[2,1]
, then
= s1
*m
2*10
= 20
and
= s2
*m
1*10
= 10
. Consequently,
fitensemble
trains every learner using 20
observations from class A and 10 observations from
class B.
For ensembles of decision trees, and for dual-core
systems and above, fitensemble
parallelizes training
using Intel® Threading Building Blocks (TBB). For details on Intel TBB,
see https://software.intel.com/en-us/intel-tbb.
[1] Breiman, L. “Bagging Predictors.” Machine Learning. Vol. 26, pp. 123–140, 1996.
[2] Breiman, L. “Random Forests.” Machine Learning. Vol. 45, pp. 5–32, 2001.
[3] Freund, Y. “A more robust boosting algorithm.” arXiv:0905.2138v1, 2009.
[4] Freund, Y. and R. E. Schapire. “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” J. of Computer and System Sciences, Vol. 55, pp. 119–139, 1997.
[5] Friedman, J. “Greedy function approximation: A gradient boosting machine.” Annals of Statistics, Vol. 29, No. 5, pp. 1189–1232, 2001.
[6] Friedman, J., T. Hastie, and R. Tibshirani. “Additive logistic regression: A statistical view of boosting.” Annals of Statistics, Vol. 28, No. 2, pp. 337–407, 2000.
[7] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning section edition, Springer, New York, 2008.
[8] Ho, T. K. “The random subspace method for constructing decision forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832–844, 1998.
[9] Schapire, R. E., Y. Freund, P. Bartlett, and W.S. Lee. “Boosting the margin: A new explanation for the effectiveness of voting methods.” Annals of Statistics, Vol. 26, No. 5, pp. 1651–1686, 1998.
[10] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. “RUSBoost: Improving clasification performance when training data is skewed.” 19th International Conference on Pattern Recognition, pp. 1–4, 2008.
[11] Warmuth, M., J. Liao, and G. Ratsch. “Totally corrective boosting algorithms that maximize the margin.” Proc. 23rd Int’l. Conf. on Machine Learning, ACM, New York, pp. 1001–1008, 2006.
ClassificationBaggedEnsemble
| ClassificationEnsemble
| ClassificationPartitionedEnsemble
| RegressionBaggedEnsemble
| RegressionEnsemble
| RegressionPartitionedEnsemble
| templateDiscriminant
| templateKNN
| templateTree