Rank key features by class separability criteria
[
IDX
, Z
]
= rankfeatures(X
, Group
)
[IDX
, Z
]
= rankfeatures(X
, Group
,
...'Criterion', CriterionValue
, ...)
[IDX
, Z
]
= rankfeatures(X
, Group
,
...'CCWeighting', ALPHA
, ...)
[IDX
, Z
]
= rankfeatures(X
, Group
,
...'NWeighting', BETA
, ...)
[IDX
, Z
]
= rankfeatures(X
, Group
,
...'NumberOfIndices', N
, ...)
[IDX
, Z
]
= rankfeatures(X
, Group
,
...'CrossNorm', CN
, ...)
[
ranks
the features in IDX
, Z
]
= rankfeatures(X
, Group
)X
using an independent
evaluation criterion for binary classification. X
is
a matrix where every column is an observed vector and the number of
rows corresponds to the original number of features. Group
contains
the class labels.
IDX
is the list of indices to the
rows in X
with the most significant features. Z
is
the absolute value of the criterion used (see below).
Group
can be a numeric vector, a cell array of character
vectors or string vector. numel(Group)
is the same as the number
of columns in X
, and Group
must
have only two unique values. If it contains any NaN values, the function ignores the
corresponding observation vector in X
.
[
calls IDX
, Z
]
= rankfeatures(X
, Group
,
...'PropertyName
', PropertyValue
,
...)rankfeatures
with optional
properties that use property name/property value pairs. You can specify
one or more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
[
sets
the criterion used to assess the significance of every feature for
separating two labeled groups. Choices are:IDX
, Z
]
= rankfeatures(X
, Group
,
...'Criterion', CriterionValue
, ...)
'ttest'
(default) — Absolute
value two-sample t-test with pooled variance estimate.
'entropy'
— Relative entropy,
also known as Kullback-Leibler distance or divergence.
'bhattacharyya'
— Minimum
attainable classification error or Chernoff bound.
'roc'
— Area between the
empirical receiver operating characteristic (ROC) curve and the random
classifier slope.
'wilcoxon'
— Absolute value
of the standardized u-statistic of a two-sample unpaired Wilcoxon
test, also known as Mann-Whitney.
Note
'ttest'
, 'entropy'
, and 'bhattacharyya'
assume
normal distributed classes while 'roc'
and 'wilcoxon'
are
nonparametric tests. All tests are feature independent.
[
uses
correlation information to outweigh the IDX
, Z
]
= rankfeatures(X
, Group
,
...'CCWeighting', ALPHA
, ...)Z
value
of potential features using
, where Z
*
(1-ALPHA
*(RHO))RHO
is
the average of the absolute values of the cross-correlation coefficient
between the candidate feature and all previously selected features. ALPHA
sets
the weighting factor. It is a scalar value between 0
and 1
.
When ALPHA
is 0
(default)
potential features are not weighted. A large value of RHO
(close
to 1
) outweighs the significance statistic; this
means that features that are highly correlated with the features already
picked are less likely to be included in the output list.
[
uses
regional information to outweigh the IDX
, Z
]
= rankfeatures(X
, Group
,
...'NWeighting', BETA
, ...)Z
value
of potential features using
, where Z
*
(1-exp(-(DIST/BETA
).^2))DIST
is
the distance (in rows) between the candidate feature and previously
selected features. BETA
sets the weighting
factor. It is greater than or equal to 0
. When BETA
is 0
(default)
potential features are not weighted. A small DIST
(close
to 0
) outweighs the significance statistics of
only close features. This means that features that are close to already
picked features are less likely to be included in the output list.
This option is useful for extracting features from time series with
temporal correlation.
BETA
can also be a function of the
feature location, specified using @
or an anonymous
function. In both cases rankfeatures
passes the
row position of the feature to BETA()
and expects
back a value greater than or equal to 0
.
Note
You can use 'CCWeighting'
and 'NWeighting'
together.
[
sets
the number of output indices in IDX
, Z
]
= rankfeatures(X
, Group
,
...'NumberOfIndices', N
, ...)IDX
. Default
is the same as the number of features when ALPHA
and BETA
are 0
,
or 20
otherwise.
[
applies
independent normalization across the observations for every feature.
Cross-normalization ensures comparability among different features,
although it is not always necessary because the selected criterion
might already account for this. Choices are:IDX
, Z
]
= rankfeatures(X
, Group
,
...'CrossNorm', CN
, ...)
'none'
(default) — Intensities
are not cross-normalized.
'meanvar'
— x_new
= (x - mean(x))/std(x)
'softmax'
— x_new
= (1+exp((mean(x)-x)/std(x)))^-1
'minmax'
— x_new
= (x - min(x))/(max(x)-min(x))
[1] Theodoridis, S., and Koutroumbas, K. (1999). Pattern Recognition, Academic Press, 341-342.
[2] Liu, H., Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers.
[3] Ross, D.T. et.al. (2000). Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines. Nature Genetics. 24 (3), 227-235.
classify
| classperf
| crossvalind
| randfeatures
| sequentialfs