resubEdge

Find classification edge for support vector machine (SVM) classifier by resubstitution

Description

example

e = resubEdge(SVMModel) returns the resubstitution Classification Edge (e) for the support vector machine (SVM) classifier SVMModel using the training data stored in SVMModel.X and the corresponding class labels stored in SVMModel.Y.

The classification edge is a scalar value that represents the weighted mean of the classification margins.

Examples

collapse all

Load the ionosphere data set.

load ionosphere

Train an SVM classifier. Standardize the data and specify that 'g' is the positive class.

SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});

SVMModel is a trained ClassificationSVM classifier.

Estimate the resubstitution edge. This is the mean of the training sample margins.

e = resubEdge(SVMModel)
e = 5.1000

Perform feature selection by comparing training sample edges from multiple models. Based solely on this comparison, the classifier with the highest edge is the best classifier.

Load the ionosphere data set. Define two data sets:

  • fullX contains all predictors (except the removed column of 0s).

  • partX contains the last 20 predictors.

load ionosphere
fullX = X;
partX = X(:,end-20:end);

Train SVM classifiers for each predictor set.

FullSVMModel = fitcsvm(fullX,Y);
PartSVMModel = fitcsvm(partX,Y);

Estimate the training sample edge for each classifier.

fullEdge = resubEdge(FullSVMModel)
fullEdge = 3.3653
partEdge = resubEdge(PartSVMModel)
partEdge = 2.0471

The edge for the classifier trained on the complete data set is greater, suggesting that the classifier trained with all the predictors has a better in-sample fit.

Input Arguments

collapse all

Full, trained SVM classifier, specified as a ClassificationSVM model trained with fitcsvm.

More About

collapse all

Classification Edge

The edge is the weighted mean of the classification margins.

The weights are the prior class probabilities. If you supply weights, then the software normalizes them to sum to the prior probabilities in the respective classes. The software uses the renormalized weights to compute the weighted mean.

One way to choose among multiple classifiers, for example, to perform feature selection, is to choose the classifier that yields the highest edge.

Classification Margin

The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class.

The software defines the classification margin for binary classification as

m=2yf(x).

x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x).

If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

Classification Score

The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞. A positive score for a class indicates that x is predicted to be in that class. A negative score indicates otherwise.

The positive class classification score f(x) is the trained SVM classification function. f(x) is also the numerical predicted response for x, or the score for predicting x into the positive class.

f(x)=j=1nαjyjG(xj,x)+b,

where (α1,...,αn,b) are the estimated SVM parameters, G(xj,x) is the dot product in the predictor space between x and the support vectors, and the sum includes the training set observations. The negative class classification score for x, or the score for predicting x into the negative class, is –f(x).

If G(xj,x) = xjx (the linear kernel), then the score function reduces to

f(x)=(x/s)β+b.

s is the kernel scale and β is the vector of fitted linear coefficients.

For more details, see Understanding Support Vector Machines.

Algorithms

For binary classification, the software defines the margin for observation j, mj, as

mj=2yjf(xj),

where yj ∊ {-1,1}, and f(xj) is the predicted score of observation j for the positive class. However, mj = yjf(xj) is commonly used to define the margin.

References

[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

Introduced in R2014a