createns

Create nearest neighbor searcher object

Syntax

NS = createns(X)

NS = createns(X,Name,Value)

Description

NS = createns(X) creates either an ExhaustiveSearcher or KDTreeSearcher model object using the n-by-K numeric matrix of the training data X.

example

NS = createns(X,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify NSMethod to determine which type of object to create.

Examples

collapse all

Train Default Exhaustive Nearest Neighbor Searcher

Open Live Script

Load Fisher's iris data set.

load fisheriris
X = meas;
[n,k] = size(X)

n = 150

k = 4

X has 150 observations and 4 predictors.

Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.

Mdl1 = ExhaustiveSearcher(X)

Mdl1 = 
  ExhaustiveSearcher with properties:

         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl1 is an ExhaustiveSearcher model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can prepare an exhaustive nearest neighbor searcher by using createns and specifying 'exhaustive' as the search method.

Mdl2 = createns(X,'NSMethod','exhaustive')

Mdl2 = 
  ExhaustiveSearcher with properties:

         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl2 is also an ExhaustiveSearcher model object, and it is equivalent to Mdl1.

To search X for the nearest neighbors to a batch of query data, pass the ExhaustiveSearcher model object and the query data to knnsearch or rangesearch.

Grow Default Kd-Tree

Open Live Script

Grow a four-dimensional Kd-tree that uses the Euclidean distance.

Load Fisher's iris data set.

load fisheriris
X = meas;
[n,k] = size(X)

n = 150

k = 4

X has 150 observations and 4 predictors.

Grow a four-dimensional Kd-tree using the entire data set as training data.

Mdl1 = KDTreeSearcher(X)

Mdl1 = 
  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl1 is a KDTreeSearcher model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can grow a Kd-tree by using createns.

Mdl2 = createns(X)

Mdl2 = 
  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl2 is also a KDTreeSearcher model object, and it is equivalent to Mdl1. Because X has four columns and the default distance metric is Euclidean, createns creates a KDTreeSearcher model by default.

To find the nearest neighbors in X to a batch of query data, pass the KDTreeSearcher model object and the query data to knnsearch or rangesearch.

Grow Kd-Tree Using Minkowski Distance Metric

Open Live Script

Grow a Kd-tree that uses the Minkowski distance with an exponent of five.

Load Fisher's iris data set. Create a variable for the petal dimensions.

load fisheriris
X = meas(:,3:4);

Grow a Kd-tree. Specify the Minkowski distance with an exponent of five.

Mdl = createns(X,'Distance','minkowski','P',5)

Mdl = 
  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'minkowski'
    DistParameter: 5
                X: [150x2 double]

Because X has two columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default.

Search for Nearest Neighbors of Query Data Using Mahalanobis Distance

Open Live Script

Create an exhaustive searcher object by using the createns function. Pass the object and query data to the knnsearch function to find k-nearest neighbors.

Load Fisher's iris data set.

load fisheriris

Remove five irises randomly from the predictor data to use as a query set.

rng('default');             % For reproducibility
n = size(meas,1);           % Sample size
qIdx = randsample(n,5);     % Indices of query data
X = meas(~ismember(1:n,qIdx),:);
Y = meas(qIdx,:);

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.

Mdl = createns(X,'Distance','mahalanobis')

Mdl = 
  ExhaustiveSearcher with properties:

         Distance: 'mahalanobis'
    DistParameter: [4x4 double]
                X: [145x4 double]

Because the distance metric is Mahalanobis, createns creates an ExhaustiveSearcher model object by default.

The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use Mdl.DistParameter.

Mdl.DistParameter

ans = 4×4

    0.6547   -0.0368    1.2320    0.5026
   -0.0368    0.1914   -0.3227   -0.1193
    1.2320   -0.3227    3.0671    1.2842
    0.5026   -0.1193    1.2842    0.5800

Find the indices of the training data (Mdl.X) that are the two nearest neighbors of each point in the query data (Y).

IdxNN = knnsearch(Mdl,Y,'K',2)

IdxNN = 5×2

     5     6
    98    95
   104   128
   135    65
   102   115

Each row of IdxNN corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of Y(3,:) is X(128,:).

Input Arguments

collapse all

`X` — Training data
numeric matrix

Training data, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: NS = createns(X,'Distance','mahalanobis') creates an ExhaustiveSearcher model object that uses the Mahalanobis distance metric when searching for nearest neighbors.

For Exhaustive and Kd-Tree Nearest Neighbor Searchers

collapse all

`'NSMethod'` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

Nearest neighbor search method used to define the type of object created, specified as the comma-separated pair consisting of 'NSMethod' and 'kdtree' or 'exhaustive'.

'kdtree' — createns creates a KDTreeSearcher model object using the Kd-tree algorithm.
'exhaustive' — createns creates an ExhaustiveSearcher model object using the exhaustive search algorithm.

The default value is 'kdtree' when these three conditions are true:

The number of columns of X (K) is less than or equal to 10 (that is, K ≤ 10).
X is not sparse.
Distance is 'euclidean', 'cityblock', 'chebychev', or 'minkowski'.

Otherwise, the default value is 'exhaustive'.

Example: 'NSMethod','exhaustive'

`'Distance'` — Distance metric
`'euclidean'` (default) | character vector or string scalar of distance metric name | custom distance function

Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as the comma-separated pair consisting of 'Distance' and a character vector or string scalar of distance metric name or function handle.

For both types of nearest neighbor searchers, createns supports these distance metrics.

Value	Description
`'chebychev'`	Chebychev distance (maximum coordinate difference).
`'cityblock'`	City block distance.
`'euclidean'`	Euclidean distance.
`'minkowski'`	Minkowski distance. The default exponent is 2. To specify a different exponent, use the `'P'` name-value pair argument.

If createns uses the exhaustive search algorithm ('NSMethod' is 'exhaustive'), then createns also supports these distance metrics.

Value	Description
`'correlation'`	One minus the sample linear correlation between observations (treated as sequences of values)
`'cosine'`	One minus the cosine of the included angle between observations (treated as row vectors)
`'hamming'`	Hamming distance, which is the percentage of coordinates that differ
`'jaccard'`	One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ
`'mahalanobis'`	Mahalanobis distance
`'seuclidean'`	Standardized Euclidean distance
`'spearman'`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values)

If createns uses the exhaustive search algorithm ('NSMethod' is 'exhaustive'), then you can also specify a function handle for a custom distance metric by using @ (for example, @distfun). A custom distance function must:

Have the form function D2 = distfun(ZI,ZJ).
Take as arguments:
- A 1-by-K vector ZI containing a single row from X or from the query points Y, where K is the number of columns in X.
- An m-by-K matrix ZJ containing multiple rows of X or Y, where m is a positive integer.
Return an m-by-1 vector of distances D2, where D2(j) is the distance between the observations ZI and ZJ(j,:).

For more details, see Distance Metrics.

Example: 'Distance','minkowski'

`'P'` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',3

Data Types: single | double

For Exhaustive Nearest Neighbor Searchers

collapse all

`'Cov'` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a K-by-K positive definite matrix, where K is the number of columns in X. This argument is valid only if 'Distance' is 'mahalanobis'.

Example: 'Cov',eye(3)

Data Types: single | double

`'Scale'` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative numeric vector

Scale parameter value for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a nonnegative numeric vector of length K, where K is the number of columns in X. The software scales each difference between the training and query data using the corresponding element of Scale. This argument is valid only if 'Distance' is 'seuclidean'.

Example: 'Scale',quantile(X,0.75) - quantile(X,0.25)

Data Types: single | double

For Nearest Neighbor Searchers Using Kd-Tree

collapse all

`'BucketSize'` — Maximum number of data points in each leaf node
`50` (default) | positive integer

Maximum number of data points in each leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer.

This argument is valid only when you create a KDTreeSearcher model object.

Example: 'BucketSize',10

Data Types: single | double

Output Arguments

collapse all

`NS` — Nearest neighbor searcher
`ExhaustiveSearcher` model object | `KDTreeSearcher` model object

Nearest neighbor searcher, returned as an ExhaustiveSearcher model object or a KDTreeSearcher model object.

Once you create a nearest neighbor searcher model object, you can find the neighboring points in the training data to the query data by performing a nearest neighbor search using knnsearch or a radius search using rangesearch.

Documentation

createns

Syntax

Description

Examples

Train Default Exhaustive Nearest Neighbor Searcher

Grow Default Kd-Tree

Grow Kd-Tree Using Minkowski Distance Metric

Search for Nearest Neighbors of Query Data Using Mahalanobis Distance

Input Arguments

`X` — Training data
numeric matrix

Name-Value Pair Arguments

`'NSMethod'` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`'Distance'` — Distance metric
`'euclidean'` (default) | character vector or string scalar of distance metric name | custom distance function

`'P'` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`'Cov'` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`'Scale'` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative numeric vector

`'BucketSize'` — Maximum number of data points in each leaf node
`50` (default) | positive integer

Output Arguments

`NS` — Nearest neighbor searcher
`ExhaustiveSearcher` model object | `KDTreeSearcher` model object

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

Documentation

createns

Syntax

Description

Examples

Train Default Exhaustive Nearest Neighbor Searcher

Grow Default Kd-Tree

Grow Kd-Tree Using Minkowski Distance Metric

Search for Nearest Neighbors of Query Data Using Mahalanobis Distance

Input Arguments

X — Training data numeric matrix

Name-Value Pair Arguments

'NSMethod' — Nearest neighbor search method 'kdtree' | 'exhaustive'

'Distance' — Distance metric 'euclidean' (default) | character vector or string scalar of distance metric name | custom distance function

'P' — Exponent for Minkowski distance metric 2 (default) | positive scalar

'Cov' — Covariance matrix for Mahalanobis distance metric cov(X,'omitrows') (default) | positive definite matrix

'Scale' — Scale parameter value for standardized Euclidean distance metric std(X,'omitnan') (default) | nonnegative numeric vector

'BucketSize' — Maximum number of data points in each leaf node 50 (default) | positive integer

Output Arguments

NS — Nearest neighbor searcher ExhaustiveSearcher model object | KDTreeSearcher model object

See Also

Topics

Statistics and Machine Learning Toolbox Documentation

Support

`X` — Training data
numeric matrix

`'NSMethod'` — Nearest neighbor search method
`'kdtree'` | `'exhaustive'`

`'Distance'` — Distance metric
`'euclidean'` (default) | character vector or string scalar of distance metric name | custom distance function

`'P'` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`'Cov'` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`'Scale'` — Scale parameter value for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | nonnegative numeric vector

`'BucketSize'` — Maximum number of data points in each leaf node
`50` (default) | positive integer

`NS` — Nearest neighbor searcher
`ExhaustiveSearcher` model object | `KDTreeSearcher` model object