Dimensionality Reduction and Feature Extraction

PCA, factor analysis, feature selection, feature extraction, and more

Feature transformation techniques reduce the dimensionality in the data by transforming data into new features. Feature selection techniques are preferable when transformation of variables is not possible, e.g., when there are categorical variables in the data. For a feature selection technique that is specifically suitable for least-squares fitting, see Stepwise Regression.

Functions

expand all

Feature Selection

`fscchi2`	Univariate feature ranking for classification using chi-square tests
`fscmrmr`	Rank features for classification using minimum redundancy maximum relevance (MRMR) algorithm
`fscnca`	Feature selection using neighborhood component analysis for classification
`fsrftest`	Univariate feature ranking for regression using F-tests
`fsrnca`	Feature selection using neighborhood component analysis for regression
`fsulaplacian`	Rank features for unsupervised learning using Laplacian scores
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`oobPermutedPredictorImportance`	Predictor importance estimates by permutation of out-of-bag predictor observations for random forest of classification trees
`oobPermutedPredictorImportance`	Predictor importance estimates by permutation of out-of-bag predictor observations for random forest of regression trees
`predictorImportance`	Estimates of predictor importance for classification tree
`predictorImportance`	Estimates of predictor importance for classification ensemble of decision trees
`predictorImportance`	Estimates of predictor importance for regression tree
`predictorImportance`	Estimates of predictor importance for regression ensemble
`relieff`	Rank importance of predictors using ReliefF or RReliefF algorithm
`sequentialfs`	Sequential feature selection using custom criterion
`stepwiselm`	Perform stepwise regression
`stepwiseglm`	Create generalized linear regression model by stepwise regression

Feature Extraction

`rica`	Feature extraction by using reconstruction ICA
`sparsefilt`	Feature extraction by using sparse filtering
`transform`	Transform predictors into extracted features

t-SNE Multidimensional Visualization

tsne t-Distributed Stochastic Neighbor Embedding

PCA and Canonical Correlation

`barttest`	Bartlett’s test
`canoncorr`	Canonical correlation
`pca`	Principal component analysis of raw data
`pcacov`	Principal component analysis on covariance matrix
`pcares`	Residuals from principal component analysis
`ppca`	Probabilistic principal component analysis

Factor Analysis

`factoran`	Factor analysis
`rotatefactors`	Rotate factor loadings

Nonnegative Matrix Factorization

nnmf Nonnegative matrix factorization

Multidimensional Scaling

`cmdscale`	Classical multidimensional scaling
`mahal`	Mahalanobis distance
`mdscale`	Nonclassical multidimensional scaling
`pdist`	Pairwise distance between pairs of observations
`squareform`	Format distance matrix

Procrustes Analysis

procrustes Procrustes analysis

Objects

expand all

Feature Selection

`FeatureSelectionNCAClassification`	Feature selection for classification using neighborhood component analysis (NCA)
`FeatureSelectionNCARegression`	Feature selection for regression using neighborhood component analysis (NCA)

Feature Extraction

`ReconstructionICA`	Feature extraction by reconstruction ICA
`SparseFiltering`	Feature extraction by sparse filtering

Topics

Feature Selection

Introduction to Feature Selection

Learn about feature selection algorithms and explore the functions available for feature selection.

Sequential Feature Selection

This topic introduces to sequential feature selection and provides an example that selects features sequentially using a custom criterion and the sequentialfs function.

Neighborhood Component Analysis (NCA) Feature Selection

Neighborhood component analysis (NCA) is a non-parametric method for selecting features with the goal of maximizing prediction accuracy of regression and classification algorithms.

Regularize Discriminant Analysis Classifier

Make a more robust and simpler model by removing predictors without compromising the predictive power of the model.

Select Predictors for Random Forests

Select split-predictors for random forests using interaction test algorithm.

Feature Extraction

Feature Extraction

Feature extraction is a set of methods to extract high-level features from data.

Feature Extraction Workflow

This example shows a complete workflow for feature extraction from image data.

Extract Mixed Signals

This example shows how to use rica to disentangle mixed audio signals.

t-SNE Multidimensional Visualization

t-SNE

t-SNE is a method for visualizing high-dimensional data by nonlinear reduction to two or three dimensions, while preserving some features of the original data.

Visualize High-Dimensional Data Using t-SNE

This example shows how t-SNE creates a useful low-dimensional embedding of high-dimensional data.

tsne Settings

This example shows the effects of various tsne settings.

t-SNE Output Function

Output function description and example for t-SNE.

PCA and Canonical Correlation

Principal Component Analysis (PCA)

Principal Component Analysis reduces the dimensionality of data by replacing several correlated variables with a new set of variables that are linear combinations of the original variables.

Analyze Quality of Life in U.S. Cities Using PCA

Perform a weighted principal components analysis and interpret the results.

Factor Analysis

Factor Analysis

Factor analysis is a way to fit a model to multivariate data to estimate interdependence of measured variables on a smaller number of unobserved (latent) factors.

Analyze Stock Prices Using Factor Analysis

Use factor analysis to investigate whether companies within the same sector experience similar week-to-week changes in stock prices.

Perform Factor Analysis on Exam Grades

This example shows how to perform factor analysis using Statistics and Machine Learning Toolbox™.

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a dimension-reduction technique based on a low-rank approximation of the feature space.

Perform Nonnegative Matrix Factorization

Perform nonnegative matrix factorization using the multiplicative and alternating least-squares algorithms.

Multidimensional Scaling

Multidimensional Scaling

Multidimensional scaling allows you to visualize how near points are to each other for many kinds of distance or dissimilarity metrics and can produce a representation of data in a small number of dimensions.

Classical Multidimensional Scaling

Use cmdscale to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

Classical Multidimensional Scaling Applied to Nonspatial Distances

This example shows how to perform classical multidimensional scaling using the cmdscale function in Statistics and Machine Learning Toolbox™.

Nonclassical Multidimensional Scaling

This example shows how to visualize dissimilarity data using nonclassical forms of multidimensional scaling (MDS).

Nonclassical and Nonmetric Multidimensional Scaling

Perform nonclassical multidimensional scaling using mdscale.

Procrustes Analysis

Procrustes Analysis

Procrustes analysis minimizes the differences in location between compared landmark data using the best shape-preserving Euclidean transformations.

Compare Handwritten Shapes Using Procrustes Analysis

Use Procrustes analysis to compare two handwritten numerals.

Featured Examples

Selecting Features for Classifying High-dimensional Data

Select features for classifying high-dimensional data. More specifically, it shows how to perform sequential feature selection, which is one of the most popular feature selection algorithms. It also shows how to use holdout and cross-validation to evaluate the performance of the selected features.

Open Live Script

Partial Least Squares Regression and Principal Components Regression

Apply Partial Least Squares Regression (PLSR) and Principal Components Regression (PCR), and discusses the effectiveness of the two methods. PLSR and PCR are both methods to model a response variable when there are a large number of predictor variables, and those predictors are highly correlated or even collinear. Both methods construct new predictor variables, known as components, as linear combinations of the original predictor variables, but they construct those components in different ways. PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all. On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components. Whether or not that ultimately translates into a more parsimonious model, in terms of its practical use, depends on the context.

Open Script

Fitting an Orthogonal Regression Using Principal Components Analysis

Use Principal Components Analysis (PCA) to fit a linear regression. PCA minimizes the perpendicular distances from the data to the fitted model. This is the linear case of what is known as Orthogonal Regression or Total Least Squares, and is appropriate when there is no natural distinction between predictor and response variables, or when all variables are measured with error. This is in contrast to the usual regression assumption that predictor variables are measured exactly, and only the response variable has an error component.

Open Live Script

Documentation

Dimensionality Reduction and Feature Extraction

Functions

Feature Selection

Feature Extraction

t-SNE Multidimensional Visualization

PCA and Canonical Correlation

Factor Analysis

Nonnegative Matrix Factorization

Multidimensional Scaling

Procrustes Analysis

Objects

Feature Selection

Feature Extraction

Topics

Feature Selection

Feature Extraction

t-SNE Multidimensional Visualization

PCA and Canonical Correlation

Factor Analysis

Nonnegative Matrix Factorization

Multidimensional Scaling

Procrustes Analysis

Featured Examples

Selecting Features for Classifying High-dimensional Data

Partial Least Squares Regression and Principal Components Regression

Fitting an Orthogonal Regression Using Principal Components Analysis

Statistics and Machine Learning Toolbox Documentation

Support