Create nearest neighbor searcher object
creates either an NS
= createns(X
)ExhaustiveSearcher
or KDTreeSearcher
model object using the
n-by-K numeric matrix of the training data
X
.
specifies additional options using one or more name-value pair arguments. For
example, you can specify NS
= createns(X
,Name,Value
)NSMethod
to determine which type of
object to create.
Load Fisher's iris data set.
load fisheriris
X = meas;
[n,k] = size(X)
n = 150
k = 4
X
has 150 observations and 4 predictors.
Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.
Mdl1 = ExhaustiveSearcher(X)
Mdl1 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl1
is an ExhaustiveSearcher
model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.
Alternatively, you can prepare an exhaustive nearest neighbor searcher by using createns
and specifying 'exhaustive'
as the search method.
Mdl2 = createns(X,'NSMethod','exhaustive')
Mdl2 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl2
is also an ExhaustiveSearcher
model object, and it is equivalent to Mdl1
.
To search X
for the nearest neighbors to a batch of query data, pass the ExhaustiveSearcher
model object and the query data to knnsearch
or rangesearch
.
Grow a four-dimensional Kd-tree that uses the Euclidean distance.
Load Fisher's iris data set.
load fisheriris
X = meas;
[n,k] = size(X)
n = 150
k = 4
X
has 150 observations and 4 predictors.
Grow a four-dimensional Kd-tree using the entire data set as training data.
Mdl1 = KDTreeSearcher(X)
Mdl1 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl1
is a KDTreeSearcher
model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation.
Alternatively, you can grow a Kd-tree by using createns
.
Mdl2 = createns(X)
Mdl2 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl2
is also a KDTreeSearcher
model object, and it is equivalent to Mdl1
. Because X
has four columns and the default distance metric is Euclidean, createns
creates a KDTreeSearcher
model by default.
To find the nearest neighbors in X
to a batch of query data, pass the KDTreeSearcher
model object and the query data to knnsearch
or rangesearch
.
Grow a Kd-tree that uses the Minkowski distance with an exponent of five.
Load Fisher's iris data set. Create a variable for the petal dimensions.
load fisheriris
X = meas(:,3:4);
Grow a Kd-tree. Specify the Minkowski distance with an exponent of five.
Mdl = createns(X,'Distance','minkowski','P',5)
Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 5 X: [150x2 double]
Because X
has two columns and the distance metric is Minkowski, createns
creates a KDTreeSearcher
model object by default.
Create an exhaustive searcher object by using the createns
function. Pass the object and query data to the knnsearch
function to find k-nearest neighbors.
Load Fisher's iris data set.
load fisheriris
Remove five irises randomly from the predictor data to use as a query set.
rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);
Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.
Mdl = createns(X,'Distance','mahalanobis')
Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [4x4 double] X: [145x4 double]
Because the distance metric is Mahalanobis, createns
creates an ExhaustiveSearcher
model object by default.
The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use Mdl.DistParameter
.
Mdl.DistParameter
ans = 4×4
0.6547 -0.0368 1.2320 0.5026
-0.0368 0.1914 -0.3227 -0.1193
1.2320 -0.3227 3.0671 1.2842
0.5026 -0.1193 1.2842 0.5800
Find the indices of the training data (Mdl.X
) that are the two nearest neighbors of each point in the query data (Y
).
IdxNN = knnsearch(Mdl,Y,'K',2)
IdxNN = 5×2
5 6
98 95
104 128
135 65
102 115
Each row of IdxNN
corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of Y(3,:)
is X(128,:)
.
X
— Training dataTraining data, specified as a numeric matrix. X
has
n rows, each corresponding to an observation (that
is, an instance or example), and K columns, each
corresponding to a predictor (that is, a feature).
Data Types: single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
NS = createns(X,'Distance','mahalanobis')
creates an
ExhaustiveSearcher
model object that uses the Mahalanobis
distance metric when searching for nearest neighbors.'NSMethod'
— Nearest neighbor search method'kdtree'
| 'exhaustive'
Nearest neighbor search method used to define the type of object
created, specified as the comma-separated pair consisting of
'NSMethod'
and 'kdtree'
or
'exhaustive'
.
'kdtree'
—
createns
creates a KDTreeSearcher
model object using the Kd-tree
algorithm.
'exhaustive'
—
createns
creates an ExhaustiveSearcher
model object using the
exhaustive search algorithm.
The default value is 'kdtree'
when these three
conditions are true:
Otherwise, the default value is
'exhaustive'
.
Example: 'NSMethod','exhaustive'
'Distance'
— Distance metric'euclidean'
(default) | character vector or string scalar of distance metric name | custom distance functionDistance metric used when you call knnsearch
or rangesearch
to find nearest
neighbors for future query points, specified as the comma-separated pair
consisting of 'Distance'
and a character vector or
string scalar of distance metric name or function handle.
For both types of nearest neighbor searchers,
createns
supports these distance
metrics.
Value | Description |
---|---|
'chebychev' | Chebychev distance (maximum coordinate difference). |
'cityblock' | City block distance. |
'euclidean' | Euclidean distance. |
'minkowski' | Minkowski distance. The default exponent is 2. To specify a different exponent, use the
'P' name-value pair argument. |
If createns
uses the exhaustive search
algorithm ('NSMethod'
is
'exhaustive'
), then
createns
also supports these distance
metrics.
Value | Description |
---|---|
'correlation' | One minus the sample linear correlation between observations (treated as sequences of values) |
'cosine' | One minus the cosine of the included angle between observations (treated as row vectors) |
'hamming' | Hamming distance, which is the percentage of coordinates that differ |
'jaccard' | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |
'mahalanobis' | Mahalanobis distance |
'seuclidean' | Standardized Euclidean distance |
'spearman' | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |
If createns
uses the exhaustive search algorithm
('NSMethod'
is
'exhaustive'
), then you can also specify a function
handle for a custom distance metric by using @
(for
example, @distfun
). A custom distance function must:
Have the form function D2 =
distfun(ZI,ZJ)
.
Take as arguments:
A 1-by-K vector
ZI
containing a single row from
X
or from the query points
Y
, where K
is the number of columns in
X
.
An
m-by-K
matrix ZJ
containing multiple
rows of X
or
Y
, where m
is a positive integer.
Return an m-by-1 vector of distances
D2
, where
D2(
is the distance between the observations
j
)ZI
and
ZJ(
.j
,:)
For more details, see Distance Metrics.
Example: 'Distance','minkowski'
'P'
— Exponent for Minkowski distance metric2
(default) | positive scalarExponent for the Minkowski distance metric, specified as the comma-separated pair consisting
of 'P'
and a positive scalar. This argument is valid only if
'Distance'
is 'minkowski'
.
Example: 'P',3
Data Types: single
| double
'Cov'
— Covariance matrix for Mahalanobis distance metriccov(X,'omitrows')
(default) | positive definite matrixCovariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair
consisting of 'Cov'
and a
K-by-K positive definite matrix, where
K is the number of columns in X
. This
argument is valid only if 'Distance'
is
'mahalanobis'
.
Example: 'Cov',eye(3)
Data Types: single
| double
'Scale'
— Scale parameter value for standardized Euclidean distance metricstd(X,'omitnan')
(default) | nonnegative numeric vectorScale parameter value for the standardized Euclidean distance metric, specified as the
comma-separated pair consisting of 'Scale'
and a nonnegative numeric
vector of length K, where K is the number of
columns in X
. The software scales each difference between the
training and query data using the corresponding element of Scale
.
This argument is valid only if 'Distance'
is
'seuclidean'
.
Example: 'Scale',quantile(X,0.75) - quantile(X,0.25)
Data Types: single
| double
'BucketSize'
— Maximum number of data points in each leaf node50
(default) | positive integerMaximum number of data points in each leaf node of the
Kd-tree, specified as the comma-separated pair
consisting of 'BucketSize'
and a positive
integer.
This argument is valid only when you create a
KDTreeSearcher
model object.
Example: 'BucketSize',10
Data Types: single
| double
NS
— Nearest neighbor searcherExhaustiveSearcher
model object | KDTreeSearcher
model objectNearest neighbor searcher, returned as an ExhaustiveSearcher
model object
or a KDTreeSearcher
model
object.
Once you create a nearest neighbor searcher model object, you can find the
neighboring points in the training data to the query data by performing a
nearest neighbor search using knnsearch
or a radius search
using rangesearch
.
ExhaustiveSearcher
| KDTreeSearcher
| knnsearch
| rangesearch
You have a modified version of this example. Do you want to open this example with your edits?