Class: CompactTreeBagger
Error (misclassification probability or MSE)
err = error(B,TBLnew,Ynew)
err = error(B,Xnew,Ynew)
err = error(B,TBLnew,Ynew,'param1',val1,'param2',val2,...)
err
= error(B,Xnew,Ynew,'param1',val1,'param2',val2,...)
err = error(B,TBLnew,Ynew)
computes the
misclassification probability for classification trees or mean squared
error (MSE) for regression trees for each tree, for the predictors
contained in the table TBLnew
given true response Ynew
.
You can omit Ynew
if TBLnew
contains
the response variable. If you trained B
using sample
data contained in a table, then the input data for this method must
also be in a table.
err = error(B,Xnew,Ynew)
computes the misclassification
probability for classification trees or mean squared error (MSE) for
regression trees for each tree, the for predictors contained in the
matrix Xnew
given true response Ynew
.
If you trained B
using sample data contained in
a matrix, then the input data for this method must also be in a matrix.
For classification, Ynew
can be a numeric vector, character matrix, string
array, cell array of character vectors, categorical vector, or logical vector. For regression,
Y
must be a numeric vector. err
is a vector with one error
measure for each of the NTrees
trees in the ensemble
B
.
err = error(B,TBLnew,Ynew,'param1',val1,'param2',val2,...)
or err
= error(B,Xnew,Ynew,'param1',val1,'param2',val2,...)
specifies
optional parameter name-value pairs:
'Mode' | Character vector or string scalar indicating how the method computes errors. If set to
'cumulative' (default), error computes cumulative
errors and err is a vector of length NTrees , where the
first element gives error from trees(1) , second element gives error
fromtrees(1:2) etc., up to trees(1:NTrees) . If set to
'individual' , err is a vector of length
NTrees , where each element is an error from each tree in the ensemble.
If set to 'ensemble' , err is a scalar showing the
cumulative error for the entire ensemble. |
'Weights' | Vector of observation weights to use for error averaging. By
default the weight of every observation is 1. The length of this vector
must be equal to the number of rows in X . |
'Trees' | Vector of indices indicating what trees to include in this
calculation. By default, this argument is set to 'all' and
the method uses all trees. If 'Trees' is a numeric
vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes,
where NTrees is the number of elements in the input
vector, and a scalar for 'ensemble' mode. For example,
in the 'cumulative' mode, the first element gives
error from trees(1) , the second element gives error
from trees(1:2) etc. |
'TreeWeights' | Vector of tree weights. This vector must have the same length
as the 'Trees' vector. The method uses these weights
to combine output from the specified trees by taking a weighted average
instead of the simple non-weighted majority vote. You cannot use this
argument in the 'individual' mode. |
'UseInstanceForTree' | Logical matrix of size Nobs -by-NTrees indicating
which trees should be used to make predictions for each observation.
By default the method uses all trees for all observations. |
When estimating the ensemble error:
Using the 'Mode'
name-value pair
argument, you can specify to return the error any of these three ways:
The error for individual trees in the ensemble
The cumulative error over all trees
The error for the entire ensemble
Using the 'Trees'
name-value pair
argument, you can specify which trees to use in the ensemble error
calculations.
Using the 'UseInstanceForTree'
name-value
pair argument, you can specify which observations in the input data
(X
and Y
) to use in the ensemble
error calculation for each selected tree.
Using the 'Weights'
name-value
pair argument, you can attribute each observation with
a weight. For the formulae that follow, wj is
the weight of observation j.
Using the 'TreeWeights'
name-value
pair argument, you can attribute each tree with
a weight.
For regression problems, error
estimates
the weighted MSE of the ensemble of bagged regression trees for predicting Y
given X
using
selected trees and observations.
error
predicts responses for selected
observations in X
using the selected regression
trees in the ensemble.
The MSE estimate depends on the value of 'Mode'
.
If you specify 'Mode','Individual'
,
then the weighted MSE for tree t is
is
the predicted response of observation j from selected
regression tree t. error
sets
any unselected observations within a selected tree to the weighted
sample average of the observed, training data responses.
If you specify 'Mode','Cumulative'
,
then the weighted MSE is a vector of size T* containing
cumulative, weighted MSEs over the T* ≤ T selected
trees. error
follows these steps to estimate MSEt*,
the cumulative, weighted MSE using the first t selected
trees.
For selected observation j, j =
1,...,n, error
estimates ,
the weighted average of the predictions among the first t selected
trees (for details, see predict
). For this computation, error
uses
the tree weights.
error
estimates the cumulative,
weighted MSE through tree t.
error
sets observations that are
unselected for all selected trees to the weighted sample average of
the observed, training data responses.
If you specify 'Mode','Ensemble'
,
then the weighted MSE is the last element of the cumulative, weighted
MSE vector.
For classification problems, error
estimates
the weighted misclassification rate of the ensemble of bagged classification
trees for predicting Y
given X
using
selected trees and observations.
If you specify 'Mode','Individual'
,
then the weighted misclassification rate for tree t is
is
the predicted class for selected observation j using
from selected classification tree t. error
sets
any unselected observations within a selected tree to the predicted,
weighted, most popular class over all training responses. If there
are multiple most popular classes, error
considers
the one listed first in the ClassNames
property
of the TreeBagger
model the most popular.
If you specify 'Mode','Cumulative'
then
the weighted misclassification rate is a vector of size T* containing
cumulative, weighted misclassification rates over the T* ≤ T selected
trees. error
follows these steps to estimate et*,
the cumulative, weighted misclassification rate using the first t selected
trees.
For selected observation j, j =
1,...,n, error
estimates ,
the weighted, most popular class among the first t selected
trees (for details, see predict
). For this computation, error
uses
the tree weights.
error
estimates the cumulative,
weighted misclassification rate through tree t.
error
sets
any observations that are unselected for all selected trees to the
predicted, weighted, most popular class over all training responses.
If there are multiple most popular classes, error
considers
the one listed first in the ClassNames
property
of the TreeBagger
model the most popular.
If you specify 'Mode','Ensemble'
,
then the weighted misclassification rate is the last element of the
cumulative, weighted misclassification rate vector.