audioFeatureExtractor

Streamline audio feature extraction

expand all in page

Description

audioFeatureExtractor encapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

Syntax

aFE = audioFeatureExtractor()

aFE = audioFeatureExtractor(Name,Value)

Description

aFE = audioFeatureExtractor() creates an audio feature extractor with default property values.

example

aFE = audioFeatureExtractor(Name,Value) specifies nondefault properties for aFE using one or more name-value pair arguments.

Properties

expand all

Main Properties

`Window` — Analysis window
`hamming(1024,"periodic")` (default) | real vector

Analysis window, specified as a real vector.

Data Types: single | double

`OverlapLength` — Overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(Window)`)

Overlap length of adjacent analysis windows, specified as an integer in the range [0, numel(Window)).

Data Types: single | double

`FFTLength` — FFT length
`[]` (default) | positive integer

FFT length, specified as an integer. The default, [], means that the FFT length is equal to the window length, (numel(Window)).

Data Types: single | double

`SampleRate` — Input sample rate (Hz)
`44100` (default) | nonnegative scalar

Input sample rate in Hz, specified as a nonnegative scalar.

Data Types: single | double

`SpectralDescriptorInput` — Input to spectral descriptors
`"linearSpectrum"` (default) | `"melSpectrum"` | `"barkSpectrum"` | `"erbSpectrum"`

Input to spectral descriptors, specified as "linearSpectrum", "melSpectrum", "barkSpectrum", or "erbSpectrum".

Spectral descriptors affected by this property are:

The spectrum input to the spectral descriptors is the same as output from the corresponding feature:

For example, if you set "SpectralDescriptorInput" to "barkSpectrum", and "spectralCentroid" to true, then aFE returns the centroid of the default Bark spectrum.

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
aFE = audioFeatureExtractor("SampleRate",fs, ...
                            "SpectralDescriptorInput","barkSpectrum", ...
                            "spectralCentroid",true);
barkSpectralCentroid = extract(aFE,audioIn);

If you specify a nondefault barkSpectrum using setExtractorParams, then the nondefault Bark spectrum is the input to the spectral descriptors. For example, if you call setExtractorParams(aFE,"barkSpectrum","NumBands",40), then aFE returns the centroid of an 40-band Bark spectrum.

setExtractorParams(aFE,"barkSpectrum","NumBands",40)
bark40SpectralCentroid = extract(aFE,audioIn);

Data Types: char | string

Features to Extract

`linearSpectrum` — Extract linear spectrum
`false` (default) | `true`

Extract the one-sided linear spectrum, specified as true or false.

To set parameters of the linear spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"linearSpectrum","Name",Value)

Settable parameters for the linear spectrum extraction are:

"FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
"SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".

Data Types: logical

`melSpectrum` — Extract mel spectrum
`false` (default) | `true`

Extract the one-sided mel spectrum, specified as true or false.

To set parameters of the mel spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"melSpectrum","Name",Value)

Settable parameters for the mel spectrum extraction are:

"FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
"SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
"NumBands" –– Number of mel bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to 32.
"Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth", "area", or "none". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

`barkSpectrum` — Extract Bark spectrum
`false` (default) | `true`

Extract the one-sided Bark spectrum, specified as true or false.

To set parameters of the Bark spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"barkSpectrum","Name",Value)

Settable parameters for the Bark spectrum extraction are:

"FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
"SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
"NumBands" –– Number of Bark bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to 32.
"Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth", "area", or "none". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

`erbSpectrum` — Extract ERB spectrum
`false` (default) | `true`

Extract the one-sided ERB spectrum, specified as true or false.

To set parameters of the ERB spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"erbSpectrum","Name",Value)

Settable parameters for the ERB spectrum extraction are:

"FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
"SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
"NumBands" –– Number of ERB bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1))).
"Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth", "area", or "none". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

`mfcc` — Extract mel-frequency cepstral coefficients (MFCC)
`false` (default) | `true`

Extract mel-frequency cepstral coefficients (MFCC), specified as true or false.

To set parameters of the MFCC extraction, use setExtractorParams:

setExtractorParams(aFE,"mfcc","Name",Value)

Settable parameters for the MFCC extraction are:

"NumCoeffs" –– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of "NumCoeffs" and a positive integer. If unspecified, NumCoeffs defaults to 13.
"DeltaWindowLength" –– Delta window length, specified as the comma-separated pair consisting of "DeltaWindowLength" and 2 or an odd integer. If unspecified, DeltaWindowLength defaults to 2. This parameter affects the mfccDelta and mfccDeltaDelta features.
"Rectification" –– Type of nonlinear rectification, specified as the comma-separated pair consisting of "Rectification" and "log" or "cubic-root".

The mel-frequency cepstral coefficients are calculated using the melSpectrum.

Data Types: logical

`mfccDelta` — Extract delta of MFCC
`false` (default) | `true`

Extract delta of MFCC, specified as true or false.

The delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDelta.

Data Types: logical

`mfccDeltaDelta` — Extract delta-delta of MFCC
`false` (default) | `true`

Extract delta-delta of MFCC, specified as true or false.

The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDeltaDelta.

Data Types: logical

`gtcc` — Extract gammatone cepstral coefficients (GTCC)
`false` (default) | `true`

Extract gammatone cepstral coefficients (GTCC), specified as true or false.

To set parameters of the GTCC extraction, use setExtractorParams:

setExtractorParams(aFE,"gtcc","Name",Value)

Settable parameters for the GTCC extraction are:

"NumCoeffs" –– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of "NumCoeffs" and a positive integer. If unspecified, NumCoeffs defaults to 13.
"DeltaWindowLength" –– Delta window length, specified as the comma-separated pair consisting of "DeltaWindowLength" and 2 or an odd integer. If unspecified, DeltaWindowLength defaults to 2. This parameter affects the gtccDelta and gtccDeltaDelta features.

"Rectification" –– Type of nonlinear rectification, specified as the comma-separated pair consisting of "Rectification" and "log" or "cubic-root".

The gammatone cepstral coefficients are calculated using the erbSpectrum.

Data Types: logical

`gtccDelta` — Extract delta of GTCC
`false` (default) | `true`

Extract delta of GTCC, specified as true or false.

The delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDelta.

Data Types: logical

`gtccDeltaDelta` — Extract delta-delta of GTCC
`false` (default) | `true`

Extract delta-delta of GTCC, specified as true or false.

The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDeltaDelta.

Data Types: logical

`spectralCentroid` — Extract spectral centroid
`false` (default) | `true`

Extract spectral centroid, specified as true or false.

The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralCrest` — Extract spectral crest
`false` (default) | `true`

Extract spectral crest, specified as true or false.

The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralDecrease` — Extract spectral decrease
`false` (default) | `true`

Extract spectral decrease, specified as true or false.

The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralEntropy` — Extract spectral entropy
`false` (default) | `true`

Extract spectral entropy, specified as true or false.

The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralFlatness` — Extract spectral flatness
`false` (default) | `true`

Extract spectral flatness, specified as true or false.

The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralFlux` — Extract spectral flux
`false` (default) | `true`

Extract spectral flux, specified as true or false.

The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral flux extraction, use setExtractorParams:

setExtractorParams(aFE,"spectralFlux","Name",Value)

Settable parameters for the spectral flux extraction are:

"NormType" –– Norm type used to calculate the spectral flux, specified as the comma-separated pair consisting of "NormType" and a 1 or 2. If unspecified, NormType defaults to 2.

Data Types: logical

`spectralKurtosis` — Extract spectral kurtosis
`false` (default) | `true`

Extract spectral kurtosis, specified as true or false.

The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralRolloffPoint` — Extract spectral rolloff point
`false` (default) | `true`

Extract spectral rolloff point, specified as true or false.

The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral rolloff point extraction, use setExtractorParams:

setExtractorParams(aFE,"spectralRolloffPoint","Name",Value)

Settable parameters for the spectral flux extraction are:

"Threshold" –– Threshold of the rolloff point, specified as the comma-separated pair consisting of "Threshold" and a scalar in the range (0, 1). If unspecified, Threshold defaults to 0.95.

Data Types: logical

`spectralSkewness` — Extract spectral skewness
`false` (default) | `true`

Extract spectral skewness, specified as true or false.

The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralSlope` — Extract spectral slope
`false` (default) | `true`

Extract spectral slope, specified as true or false.

The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralSpread` — Extract spectral spread
`false` (default) | `true`

Extract spectral spread, specified as true or false.

The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`pitch` — Extract pitch
`false` (default) | `true`

Extract pitch, specified as true or false.

To set parameters of the pitch extraction, use setExtractorParams:

setExtractorParams(aFE,"pitch","Name",Value)

Settable parameters for the pitch extraction are:

"Method" –– Method used to calculate the pitch, specified as the comma-separated pair consisting of "Method" and "PEF", "NCF", "CEP", "LHS", or "SRH". If unspecified, Method defaults to "NCF". For a description of available pitch extraction methods, see pitch.
"Range" –– Range within to search for the pitch in Hz, specified as the comma-separated pair consisting of "Range" and a two-element row vector of increasing values. If unspecified, Range defaults to [50,400].
"MedianFilterLength" –– Median filter length used to smooth pitch estimates over time, specified as the comma-separated pair consisting of "MedianFilterLength" and a positive integer. If unspecified, MedianFilterLength defaults to 1 (no median filtering).

Data Types: logical

`harmonicRatio` — Extract harmonic ratio
`false` (default) | `true`

Extract harmonic ratio, specified as true or false.

Data Types: logical

Object Functions

`extract`	Extract audio features
`setExtractorParams`	Set nondefault parameter values for individual feature extractors
`info`	Output mapping and individual feature extractor parameters

Examples

collapse all

Extract Multiple Audio Features

Open Live Script

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, and spectral centroid of an audio signal. Use a 30 ms analysis window with 20 ms overlap.

aFE = audioFeatureExtractor( ...
    "SampleRate",fs, ...
    "Window",hamming(round(0.03*fs),"periodic"), ...
    "OverlapLength",round(0.02*fs), ...
    "mfcc",true, ...
    "mfccDelta",true, ...
    "mfccDeltaDelta",true, ...
    "pitch",true, ...
    "spectralCentroid",true);

Call extract to extract the audio features from the audio signal.

features = extract(aFE,audioIn);

Use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(aFE)

idx = struct with fields:
                mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
           mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
      mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
    spectralCentroid: 40
               pitch: 41

Plot the detected pitch over time.

t = linspace(0,size(audioIn,1)/fs,size(features,1));
plot(t,features(:,idx.pitch))
title('Pitch')
xlabel('Time (s)')
ylabel('Frequency (Hz)')

Extract Features from Dataset

Open Live Script

Create an audio datastore that points to audio samples included with Audio Toolbox®.

folder = fullfile(matlabroot,'toolbox','audio','samples');
ads = audioDatastore(folder);

Find all files that correspond to a sample rate of 44.1 kHz and then subset the datastore.

keepFile = cellfun(@(x)contains(x,'44p1'),ads.Files);
ads = subset(ads,keepFile);

Convert the data to a tall array. tall arrays are evaluated only when you request them explicitly using gather. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.

adsTall = tall(ads)

Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 12).

adsTall =

  Mx1 tall cell array

    { 539648x1 double}
    { 227497x1 double}
    {   8000x1 double}
    { 685056x1 double}
    { 882688x2 double}
    {1116283x2 double}
    { 505726x2 double}
    {3195904x2 double}
        :         :
        :         :

Create an audioFeatureExtractor object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

aFE = audioFeatureExtractor('SampleRate',44.1e3, ...
    'melSpectrum',true, ...
    'barkSpectrum',true, ...
    'erbSpectrum',true, ...
    'linearSpectrum',true);

Define a cellfun function so that audio features are extracted from each cell of the tall array. Call gather to evaluate the tall array.

specsTall = cellfun(@(x)extract(aFE,x),adsTall,"UniformOutput",false);
specs = gather(specsTall);

Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 12 sec
Evaluation completed in 12 sec

The specs variable returned from gather is an numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numFiles = numel(specs)

numFiles = 12

[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})

numHops1 = 1053

numFeaturesFile1 = 620

numChanelsFile1 = 1

[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})

numHops2 = 443

numFeaturesFile2 = 620

numChanelsFile2 = 1

Algorithms

The audioFeatureExtractor creates a feature extraction pipeline based on your selected features. To reduce computations, audioFeatureExtractor reuses intermediary representations. Some intermediate representations can be output as features:

For example, to create an object that extracts the centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify the audioFeatureExtractor as:

 aFE = audioFeatureExtractor( ...
     "SpectralDescriptorInput","barkSpectrum", ...
     "spectralCentroid",true, ...
     "spectralFlux",true, ...
     "pitch",true, ...
     "harmonicRatio",true, ...
     "mfccDeltaDelta",true)

aFE = 

  audioFeatureExtractor with properties:

   Properties
                     Window: [1024×1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'

   Enabled Features
     mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy
     spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

This configuration corresponds to the highlighted feature extraction pipeline:

Note

Because audioFeatureExtractor reuses intermediary representations, the features output from audioFeatureExtractor may not correspond with the default configuration of features output by corresponding individual feature extractors.

Documentation

audioFeatureExtractor

Description

Creation

Syntax

Description

Properties

Main Properties

Window — Analysis window hamming(1024,"periodic") (default) | real vector

OverlapLength — Overlap length of adjacent analysis windows 512 (default) | integer in the range [0, numel(Window))

FFTLength — FFT length [] (default) | positive integer

SampleRate — Input sample rate (Hz) 44100 (default) | nonnegative scalar

SpectralDescriptorInput — Input to spectral descriptors "linearSpectrum" (default) | "melSpectrum" | "barkSpectrum" | "erbSpectrum"

Features to Extract

linearSpectrum — Extract linear spectrum false (default) | true

melSpectrum — Extract mel spectrum false (default) | true

barkSpectrum — Extract Bark spectrum false (default) | true

erbSpectrum — Extract ERB spectrum false (default) | true

mfcc — Extract mel-frequency cepstral coefficients (MFCC) false (default) | true

mfccDelta — Extract delta of MFCC false (default) | true

mfccDeltaDelta — Extract delta-delta of MFCC false (default) | true

gtcc — Extract gammatone cepstral coefficients (GTCC) false (default) | true

gtccDelta — Extract delta of GTCC false (default) | true

gtccDeltaDelta — Extract delta-delta of GTCC false (default) | true

spectralCentroid — Extract spectral centroid false (default) | true

spectralCrest — Extract spectral crest false (default) | true

spectralDecrease — Extract spectral decrease false (default) | true

spectralEntropy — Extract spectral entropy false (default) | true

spectralFlatness — Extract spectral flatness false (default) | true

spectralFlux — Extract spectral flux false (default) | true

spectralKurtosis — Extract spectral kurtosis false (default) | true

spectralRolloffPoint — Extract spectral rolloff point false (default) | true

spectralSkewness — Extract spectral skewness false (default) | true

spectralSlope — Extract spectral slope false (default) | true

spectralSpread — Extract spectral spread false (default) | true

pitch — Extract pitch false (default) | true

harmonicRatio — Extract harmonic ratio false (default) | true

Object Functions

Examples

Extract Multiple Audio Features

Extract Features from Dataset

Algorithms

Note

See Also

Introduced in R2019b

Audio Toolbox Documentation

Support

`Window` — Analysis window
`hamming(1024,"periodic")` (default) | real vector

`OverlapLength` — Overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(Window)`)

`FFTLength` — FFT length
`[]` (default) | positive integer

`SampleRate` — Input sample rate (Hz)
`44100` (default) | nonnegative scalar

`SpectralDescriptorInput` — Input to spectral descriptors
`"linearSpectrum"` (default) | `"melSpectrum"` | `"barkSpectrum"` | `"erbSpectrum"`

`linearSpectrum` — Extract linear spectrum
`false` (default) | `true`

`melSpectrum` — Extract mel spectrum
`false` (default) | `true`

`barkSpectrum` — Extract Bark spectrum
`false` (default) | `true`

`erbSpectrum` — Extract ERB spectrum
`false` (default) | `true`

`mfcc` — Extract mel-frequency cepstral coefficients (MFCC)
`false` (default) | `true`

`mfccDelta` — Extract delta of MFCC
`false` (default) | `true`

`mfccDeltaDelta` — Extract delta-delta of MFCC
`false` (default) | `true`

`gtcc` — Extract gammatone cepstral coefficients (GTCC)
`false` (default) | `true`

`gtccDelta` — Extract delta of GTCC
`false` (default) | `true`

`gtccDeltaDelta` — Extract delta-delta of GTCC
`false` (default) | `true`

`spectralCentroid` — Extract spectral centroid
`false` (default) | `true`

`spectralCrest` — Extract spectral crest
`false` (default) | `true`

`spectralDecrease` — Extract spectral decrease
`false` (default) | `true`

`spectralEntropy` — Extract spectral entropy
`false` (default) | `true`

`spectralFlatness` — Extract spectral flatness
`false` (default) | `true`

`spectralFlux` — Extract spectral flux
`false` (default) | `true`

`spectralKurtosis` — Extract spectral kurtosis
`false` (default) | `true`

`spectralRolloffPoint` — Extract spectral rolloff point
`false` (default) | `true`

`spectralSkewness` — Extract spectral skewness
`false` (default) | `true`

`spectralSlope` — Extract spectral slope
`false` (default) | `true`

`spectralSpread` — Extract spectral spread
`false` (default) | `true`

`pitch` — Extract pitch
`false` (default) | `true`

`harmonicRatio` — Extract harmonic ratio
`false` (default) | `true`