Predict labels for Gaussian kernel classification model
Predict the training set labels using a binary kernel classification model, and display the confusion matrix for the resulting classification.
Load the ionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
load ionosphere
Train a binary kernel classification model that identifies whether the radar return is bad ('b'
) or good ('g'
).
rng('default') % For reproducibility Mdl = fitckernel(X,Y);
Mdl
is a ClassificationKernel
model.
Predict the training set, or resubstitution, labels.
label = predict(Mdl,X);
Construct a confusion matrix.
ConfusionTrain = confusionchart(Y,label);
The model misclassifies one radar return for each class.
Predict the test set labels using a binary kernel classification model, and display the confusion matrix for the resulting classification.
Load the ionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
load ionosphere
Partition the data set into training and test sets. Specify a 15% holdout sample for the test set.
rng('default') % For reproducibility Partition = cvpartition(Y,'Holdout',0.15); trainingInds = training(Partition); % Indices for the training set testInds = test(Partition); % Indices for the test set
Train a binary kernel classification model using the training set. A good practice is to define the class order.
Mdl = fitckernel(X(trainingInds,:),Y(trainingInds),'ClassNames',{'b','g'});
Predict the training-set labels and the test set labels.
labelTrain = predict(Mdl,X(trainingInds,:)); labelTest = predict(Mdl,X(testInds,:));
Construct a confusion matrix for the training set.
ConfusionTrain = confusionchart(Y(trainingInds),labelTrain);
The model misclassifies only one radar return for each class.
Construct a confusion matrix for the test set.
ConfusionTest = confusionchart(Y(testInds),labelTest);
The model misclassifies one bad radar return as being a good return, and five good radar returns as being bad returns.
Estimate posterior class probabilities for a test set, and determine the quality of the model by plotting a receiver operating characteristic (ROC) curve. Kernel classification models return posterior probabilities for logistic regression learners only.
Load the ionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
load ionosphere
Partition the data set into training and test sets. Specify a 30% holdout sample for the test set.
rng('default') % For reproducibility Partition = cvpartition(Y,'Holdout',0.30); trainingInds = training(Partition); % Indices for the training set testInds = test(Partition); % Indices for the test set
Train a binary kernel classification model. Fit logistic regression learners.
Mdl = fitckernel(X(trainingInds,:),Y(trainingInds), ... 'ClassNames',{'b','g'},'Learner','logistic');
Predict the posterior class probabilities for the test set.
[~,posterior] = predict(Mdl,X(testInds,:));
Because Mdl
has one regularization strength, the output posterior
is a matrix with two columns and rows equal to the number of test-set observations. Column i
contains posterior probabilities of Mdl.ClassNames(i)
given a particular observation.
Obtain false and true positive rates, and estimate the area under the curve (AUC). Specify that the second class is the positive class.
[fpr,tpr,~,auc] = perfcurve(Y(testInds),posterior(:,2),Mdl.ClassNames(2)); auc
auc = 0.9042
The AUC is close to 1
, which indicates that the model predicts labels well.
Plot an ROC curve.
figure; plot(fpr,tpr) h = gca; h.XLim(1) = -0.1; h.YLim(2) = 1.1; xlabel('False positive rate') ylabel('True positive rate') title('ROC Curve')
Mdl
— Binary kernel classification modelClassificationKernel
model objectBinary kernel classification model, specified as a ClassificationKernel
model object. You can create a
ClassificationKernel
model object using fitckernel
.
X
— Predictor data to be classifiedPredictor data to be classified, specified as a numeric matrix or table.
Each row of X
corresponds to one observation, and
each column corresponds to one variable.
For a numeric matrix:
The variables in the columns of X
must have the same order as the predictor variables that
trained Mdl
.
If you trained Mdl
using a table
(for example, Tbl
) and
Tbl
contains all numeric
predictor variables, then X
can be
a numeric matrix. To treat numeric predictors in
Tbl
as categorical during
training, identify categorical predictors by using the
CategoricalPredictors
name-value pair
argument of fitckernel
. If Tbl
contains heterogeneous predictor variables (for example,
numeric and categorical data types) and
X
is a numeric matrix, then
predict
throws an error.
For a table:
predict
does not support
multicolumn variables or cell arrays other than cell
arrays of character vectors.
If you trained Mdl
using a table
(for example, Tbl
), then all
predictor variables in X
must have
the same variable names and data types as those that
trained Mdl
(stored in
Mdl.PredictorNames
). However, the
column order of X
does not need to
correspond to the column order of
Tbl
. Also, Tbl
and
X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
If you trained Mdl
using a
numeric matrix, then the predictor names in
Mdl.PredictorNames
and
corresponding predictor variable names in
X
must be the same. To specify
predictor names during training, see the PredictorNames
name-value pair argument
of fitckernel
. All predictor
variables in X
must be numeric
vectors. X
can contain additional
variables (response variables, observation weights, and
so on), but predict
ignores
them.
Data Types: table
| double
| single
Label
— Predicted class labelsPredicted class labels, returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors.
Label
has n rows, where
n is the number of observations in
X
, and has the same data type as the observed class
labels (Y
) used to train Mdl
.
(The software treats string arrays as cell arrays of character
vectors.)
predict
classifies observations into the class
yielding the highest score.
Score
— Classification scoresClassification scores, returned as an n-by-2
numeric array, where n is the number of observations in
X
.
Score(
is the score for classifying observation i
,j
)i
into
class j
. Mdl.ClassNames
stores
the order of the classes.
If Mdl.Learner
is 'logistic'
, then
classification scores are posterior probabilities.
For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by
is a transformation of an observation for feature expansion.
β is the estimated column vector of coefficients.
b is the estimated scalar bias.
The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score.
If the kernel classification model consists of logistic regression learners, then the
software applies the 'logit'
score transformation to the raw
classification scores (see ScoreTransform
).
Usage notes and limitations:
predict
does not support tall table
data.
For more information, see Tall Arrays.
ClassificationKernel
| confusionchart
| fitckernel
| perfcurve
| resume
You have a modified version of this example. Do you want to open this example with your edits?