predict

Predict labels using k-nearest neighbor classification model

Syntax

label = predict(mdl,X)

[label,score,cost]
= predict(mdl,X)

Description

label = predict(mdl,X) returns a vector of predicted class labels for the predictor data in the table or matrix X, based on the trained k-nearest neighbor classification model mdl. See Predicted Class Label.

example

[label,score,cost] = predict(mdl,X) also returns:

A matrix of classification scores (score) indicating the likelihood that a label comes from a particular class. For k-nearest neighbor, scores are posterior probabilities. See Posterior Probability.
A matrix of expected classification cost (cost). For each observation in X, the predicted class label corresponds to the minimum expected classification costs among all classes. See Expected Cost.

Examples

collapse all

k-Nearest Neighbor Classification Predictions

Open Live Script

Create a k-nearest neighbor classifier for Fisher's iris data, where k = 5. Evaluate some model predictions on new data.

Load the Fisher iris data set.

load fisheriris
X = meas;
Y = species;

Create a classifier for five nearest neighbors. Standardize the noncategorical predictor data.

mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1);

Predict the classifications for flowers with minimum, mean, and maximum characteristics.

Xnew = [min(X);mean(X);max(X)];
[label,score,cost] = predict(mdl,Xnew)

label = 3x1 cell
    {'versicolor'}
    {'versicolor'}
    {'virginica' }

score = 3×3

    0.4000    0.6000         0
         0    1.0000         0
         0         0    1.0000

cost = 3×3

    0.6000    0.4000    1.0000
    1.0000         0    1.0000
    1.0000    1.0000         0

The second and third rows of the score and cost matrices have binary values, which means all five nearest neighbors of the mean and maximum flower measurements have identical classifications.

Input Arguments

collapse all

`mdl` — k-nearest neighbor classifier model
`ClassificationKNN` object

k-nearest neighbor classifier model, specified as a ClassificationKNN object.

`X` — Predictor data to be classified
numeric matrix | table

Predictor data to be classified, specified as a numeric matrix or table.

Each row of X corresponds to one observation, and each column corresponds to one variable.

For a numeric matrix:
- The variables that make up the columns of X must have the same order as the predictor variables used to train mdl.
- If you train mdl using a table (for example, Tbl), then X can be a numeric matrix if Tbl contains all numeric predictor variables. k-nearest neighbor classification requires homogeneous predictors. Therefore, to treat all numeric predictors in Tbl as categorical during training, set 'CategoricalPredictors','all' when you train using fitcknn. If Tbl contains heterogeneous predictors (for example, numeric and categorical data types) and X is a numeric matrix, then predict throws an error.
For a table:
- predict does not support multicolumn variables and cell arrays other than cell arrays of character vectors.
- If you train mdl using a table (for example, Tbl), then all predictor variables in X must have the same variable names and data types as those used to train mdl (stored in mdl.PredictorNames). However, the column order of X does not need to correspond to the column order of Tbl. Both Tbl and X can contain additional variables (response variables, observation weights, and so on), but predict ignores them.
- If you train mdl using a numeric matrix, then the predictor names in mdl.PredictorNames and corresponding predictor variable names in X must be the same. To specify predictor names during training, see the PredictorNames name-value pair argument of fitcknn. All predictor variables in X must be numeric vectors. X can contain additional variables (response variables, observation weights, and so on), but predict ignores them.

If you set 'Standardize',true in fitcknn to train mdl, then the software standardizes the columns of X using the corresponding means in mdl.Mu and standard deviations in mdl.Sigma.

Data Types: double | single | table

Output Arguments

collapse all

`label` — Predicted class labels
categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

Predicted class labels for the observations (rows) in X, returned as a categorical array, character array, logical vector, vector of numeric values, or cell array of character vectors. label has length equal to the number of rows in X. The label is the class with minimal expected cost. See Predicted Class Label.

`score` — Predicted class scores or posterior probabilities
numeric matrix

Predicted class scores or posterior probabilities, returned as a numeric matrix of size n-by-K. n is the number of observations (rows) in X, and K is the number of classes (in mdl.ClassNames). score(i,j) is the posterior probability that observation i in X is of class j in mdl.ClassNames. See Posterior Probability.

Data Types: single | double

`cost` — Expected classification costs
numeric matrix

Expected classification costs, returned as a numeric matrix of size n-by-K. n is the number of observations (rows) in X, and K is the number of classes (in mdl.ClassNames). cost(i,j) is the cost of classifying row i of X as class j in mdl.ClassNames. See Expected Cost.

Data Types: single | double

Algorithms

collapse all

Predicted Class Label

predict classifies by minimizing the expected classification cost:

$\hat{y} = \underset{y = 1, ..., K}{\arg \min} \sum_{j = 1}^{K} \hat{P} (j | x) C (y | j),$

where

$\hat{y}$ is the predicted classification.
K is the number of classes.
$\hat{P} (j | x)$ is the posterior probability of class j for observation x.
$C (y | j)$ is the cost of classifying an observation as y when its true class is j.

Posterior Probability

Consider a vector (single query point) xnew and a model mdl.

k is the number of nearest neighbors used in prediction, mdl.NumNeighbors.
nbd(mdl,xnew) specifies the k nearest neighbors to xnew in mdl.X.
Y(nbd) specifies the classifications of the points in nbd(mdl,xnew), namely mdl.Y(nbd).
W(nbd) specifies the weights of the points in nbd(mdl,xnew).
prior specifies the priors of the classes in mdl.Y.

If the model contains a vector of prior probabilities, then the observation weights W are normalized by class to sum to the priors. This process might involve a calculation for the point xnew, because weights can depend on the distance from xnew to the points in mdl.X.

The posterior probability p(j|xnew) is

$p (j | x new) = \frac{\sum_{i \in nbd} W (i) 1_{Y (X (i)) = j}}{\sum_{i \in nbd} W (i)} .$

Here, $1_{Y (X (i)) = j}$ is 1 when mdl.Y(i) = j, and 0 otherwise.

True Misclassification Cost

Two costs are associated with KNN classification: the true misclassification cost per class and the expected misclassification cost per observation.

You can set the true misclassification cost per class by using the 'Cost' name-value pair argument when you run fitcknn. The value Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification.

Expected Cost

Two costs are associated with KNN classification: the true misclassification cost per class and the expected misclassification cost per observation. The third output of predict is the expected misclassification cost per observation.

Suppose you have Nobs observations that you want to classify with a trained classifier mdl, and you have K classes. You place the observations into a matrix Xnew with one observation per row. The command

[label,score,cost] = predict(mdl,Xnew)

returns a matrix cost of size Nobs-by-K, among other outputs. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K classes. cost(n,j) is

$\sum_{i = 1}^{K} \hat{P} (i | X n e w (n)) C (j | i),$

where

K is the number of classes.
$\hat{P} (i | X n e w (n))$ is the posterior probability of class i for observation Xnew(n).
$C (j | i)$ is the true misclassification cost of classifying an observation as j when its true class is i.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. You can use models trained on either in-memory or tall data with this function.

For more information, see Tall Arrays (MATLAB).

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Use saveLearnerForCoder, loadLearnerForCoder, and codegen to generate code for the predict function. Save a trained model by using saveLearnerForCoder. Define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the predict function. Then use codegen to generate code for the entry-point function.

This table contains notes about the arguments of predict. Arguments not included in this table are fully supported.

Argument Notes and Limitations

Argument	Notes and Limitations
`mdl`	A `ClassificationKNN` model object is a full object that does not have a corresponding compact object. For this model, `saveLearnerForCoder` saves a compact version that does not include the hyperparameter optimization properties. If `mdl` is a model trained using the kd-tree search algorithm, and the code generation build type is a MEX function, then `codegen` generates a MEX function using Intel^® Threading Building Blocks (TBB) for parallel computation. Otherwise, `codegen` generates code using `parfor`. MEX function for the kd-tree search algorithm — `codegen` generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB^® algorithms. For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb. If you generate the MEX function to test the generated code of the `parfor` version, you can disable the usage of Intel TBB. Set the `ExtrinsicCalls` property of the MEX configuration object to `false`. For details, see `coder.MexCodeConfig`. MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of `predict` uses `parfor` to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder™ treats the `parfor`-loops as `for`-loops. To find supported compilers, see Supported Compilers. To disable OpenMP library, set the `EnableOpenMP` property of the configuration object to `false`. For details, see `coder.CodeConfig`. For the usage notes and limitations of the model object, see Code Generation of the `ClassificationKNN` object.
`X`	For general code generation, `X` must be a single-precision or double-precision matrix or a table containing `single` or `double` predictor variables. If you want to specify `X` as a table, then your model must be trained using a table, and you must ensure that your entry-point function for prediction: Accepts data as arrays Creates a table from the data input arguments and specifies the variable names in the table Passes the table to `predict` For an example of this table workflow, see Generate Code to Classify Numeric Data in Table. For more information on using tables in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder). The number of rows, or observations, in `X` can be a variable size, but the number of columns in `X` must be fixed.

mdl

A ClassificationKNN model object is a full object that does not have a corresponding compact object. For this model, saveLearnerForCoder saves a compact version that does not include the hyperparameter optimization properties.
If mdl is a model trained using the kd-tree search algorithm, and the code generation build type is a MEX function, then codegen generates a MEX function using Intel^® Threading Building Blocks (TBB) for parallel computation. Otherwise, codegen generates code using parfor.
- MEX function for the kd-tree search algorithm — codegen generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB^® algorithms. For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb.
  If you generate the MEX function to test the generated code of the parfor version, you can disable the usage of Intel TBB. Set the ExtrinsicCalls property of the MEX configuration object to false. For details, see coder.MexCodeConfig.
- MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of predict uses parfor to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder™ treats the parfor-loops as for-loops. To find supported compilers, see Supported Compilers. To disable OpenMP library, set the EnableOpenMP property of the configuration object to false. For details, see coder.CodeConfig.
For the usage notes and limitations of the model object, see Code Generation of the ClassificationKNN object.

X

For general code generation, X must be a single-precision or double-precision matrix or a table containing single or double predictor variables.
If you want to specify X as a table, then your model must be trained using a table, and you must ensure that your entry-point function for prediction:
- Accepts data as arrays
- Creates a table from the data input arguments and specifies the variable names in the table
- Passes the table to predict
For an example of this table workflow, see Generate Code to Classify Numeric Data in Table. For more information on using tables in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder).
The number of rows, or observations, in X can be a variable size, but the number of columns in X must be fixed.

For more information, see Introduction to Code Generation.

Documentation

predict

Syntax

Description

Examples

k-Nearest Neighbor Classification Predictions

Input Arguments

`mdl` — k-nearest neighbor classifier model
`ClassificationKNN` object

`X` — Predictor data to be classified
numeric matrix | table

Output Arguments

`label` — Predicted class labels
categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

`score` — Predicted class scores or posterior probabilities
numeric matrix

`cost` — Expected classification costs
numeric matrix

Algorithms

Predicted Class Label

Posterior Probability

True Misclassification Cost

Expected Cost

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

Introduced in R2012a

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

predict

Syntax

Description

Examples

k-Nearest Neighbor Classification Predictions

Input Arguments

mdl — k-nearest neighbor classifier model ClassificationKNN object

X — Predictor data to be classified numeric matrix | table

Output Arguments

label — Predicted class labels categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

score — Predicted class scores or posterior probabilities numeric matrix

cost — Expected classification costs numeric matrix

Algorithms

Predicted Class Label

Posterior Probability

True Misclassification Cost

Expected Cost

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

Introduced in R2012a

Statistics and Machine Learning Toolbox Documentation

Support

`mdl` — k-nearest neighbor classifier model
`ClassificationKNN` object

`X` — Predictor data to be classified
numeric matrix | table

`label` — Predicted class labels
categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

`score` — Predicted class scores or posterior probabilities
numeric matrix

`cost` — Expected classification costs
numeric matrix

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.