Streamline audio feature extraction
audioFeatureExtractor
encapsulates multiple audio feature
extractors into a streamlined and modular implementation.
creates an
audio feature extractor with default property values.aFE
= audioFeatureExtractor()
specifies nondefault properties for aFE
= audioFeatureExtractor(Name,Value
)aFE
using one or more name-value
pair arguments.
Window
— Analysis windowhamming(1024,"periodic")
(default) | real vectorAnalysis window, specified as a real vector.
Data Types: single
| double
OverlapLength
— Overlap length of adjacent analysis windows512
(default) | integer in the range [0,
numel(Window
)
)Overlap length of adjacent analysis windows, specified as an integer in the range
[0, numel(Window)
).
Data Types: single
| double
FFTLength
— FFT length[]
(default) | positive integerFFT length, specified as an integer. The default, []
, means
that the FFT length is equal to the window length, (numel(Window)
).
Data Types: single
| double
SampleRate
— Input sample rate (Hz)44100
(default) | nonnegative scalarInput sample rate in Hz, specified as a nonnegative scalar.
Data Types: single
| double
SpectralDescriptorInput
— Input to spectral descriptors"linearSpectrum"
(default) | "melSpectrum"
| "barkSpectrum"
| "erbSpectrum"
Input to spectral descriptors, specified as "linearSpectrum"
,
"melSpectrum"
, "barkSpectrum"
, or
"erbSpectrum"
.
Spectral descriptors affected by this property are:
The spectrum input to the spectral descriptors is the same as output from the corresponding feature:
For example, if you set "SpectralDescriptorInput"
to
"barkSpectrum"
, and "spectralCentroid"
to
true
, then aFE
returns the centroid of the
default Bark
spectrum.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); aFE = audioFeatureExtractor("SampleRate",fs, ... "SpectralDescriptorInput","barkSpectrum", ... "spectralCentroid",true); barkSpectralCentroid = extract(aFE,audioIn);
barkSpectrum
using setExtractorParams
, then the nondefault Bark spectrum is the input to
the spectral descriptors. For example, if you call
setExtractorParams(aFE,"barkSpectrum","NumBands",40)
, then
aFE
returns the centroid of an 40-band Bark spectrum.
setExtractorParams(aFE,"barkSpectrum","NumBands",40) bark40SpectralCentroid = extract(aFE,audioIn);
Data Types: char
| string
linearSpectrum
— Extract linear spectrumfalse
(default) | true
Extract the one-sided linear spectrum, specified as true
or
false
.
To set parameters of the linear spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"linearSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"WindowNormalization"
–– Apply window normalization,
specified as the comma-separated pair consisting of
"WindowNormalization"
and true
or
false
. If unspecified,
WindowNormalization
defaults to
true
.
Data Types: logical
melSpectrum
— Extract mel spectrumfalse
(default) | true
Extract the one-sided mel spectrum, specified as true
or
false
.
To set parameters of the mel spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"melSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of mel bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
32
.
"FilterBankNormalization"
–– Normalization applied to
bandpass filters, specified as the comma-separated pair consisting of
"FilterBankNormalization"
and
"bandwidth"
, "area"
, or
"none"
. If unspecified,
FilterBankNormalization
defaults to
"bandwidth"
.
"WindowNormalization"
–– Apply window normalization,
specified as the comma-separated pair consisting of
"WindowNormalization"
and true
or
false
. If unspecified,
WindowNormalization
defaults to
true
.
"FilterBankDesignDomain"
–– Domain in which the filter
bank is designed, specified as the comma-separated pair consisting of
FilterBankDesignDomain
and either
"linear"
or "warped"
. If unspecified,
FilterBankDesignDomain
defaults to
"linear"
.
Data Types: logical
barkSpectrum
— Extract Bark spectrumfalse
(default) | true
Extract the one-sided Bark spectrum, specified as true
or
false
.
To set parameters of the Bark spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"barkSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of Bark bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
32
.
"FilterBankNormalization"
–– Normalization applied to
bandpass filters, specified as the comma-separated pair consisting of
"FilterBankNormalization"
and
"bandwidth"
, "area"
, or
"none"
. If unspecified,
FilterBankNormalization
defaults to
"bandwidth"
.
"WindowNormalization"
–– Apply window normalization,
specified as the comma-separated pair consisting of
"WindowNormalization"
and true
or
false
. If unspecified,
WindowNormalization
defaults to
true
.
"FilterBankDesignDomain"
–– Domain in which the filter
bank is designed, specified as the comma-separated pair consisting of
FilterBankDesignDomain
and either
"linear"
or "warped"
. If unspecified,
FilterBankDesignDomain
defaults to
"linear"
.
Data Types: logical
erbSpectrum
— Extract ERB spectrumfalse
(default) | true
Extract the one-sided ERB spectrum, specified as true
or
false
.
To set parameters of the ERB spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"erbSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of ERB bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
ceil(
.hz2erb
(FrequencyRange(2))-hz2erb
(FrequencyRange(1)))
"FilterBankNormalization"
–– Normalization applied to
bandpass filters, specified as the comma-separated pair consisting of
"FilterBankNormalization"
and
"bandwidth"
, "area"
, or
"none"
. If unspecified,
FilterBankNormalization
defaults to
"bandwidth"
.
"WindowNormalization"
–– Apply window normalization,
specified as the comma-separated pair consisting of
"WindowNormalization"
and true
or
false
. If unspecified,
WindowNormalization
defaults to
true
.
Data Types: logical
mfcc
— Extract mel-frequency cepstral coefficients (MFCC)false
(default) | true
Extract mel-frequency cepstral coefficients (MFCC), specified as
true
or false
.
To set parameters of the MFCC extraction, use setExtractorParams
:
setExtractorParams(aFE,"mfcc","Name",Value)
"NumCoeffs"
–– Number of coefficients returned for each
window, specified as a the comma-separated pair consisting of
"NumCoeffs"
and a positive integer. If unspecified,
NumCoeffs
defaults to 13
.
"DeltaWindowLength"
–– Delta window length, specified as
the comma-separated pair consisting of "DeltaWindowLength"
and an odd integer greater than 2. If unspecified,
DeltaWindowLength
defaults to 9
. This
parameter affects the mfccDelta
and
mfccDeltaDelta
features.
"Rectification"
–– Type of nonlinear rectification,
specified as the comma-separated pair consisting of
"Rectification"
and "log"
or
"cubic-root"
.
The mel-frequency cepstral coefficients are calculated using the melSpectrum.
Data Types: logical
mfccDelta
— Extract delta of MFCCfalse
(default) | true
Extract delta of MFCC, specified as true
or
false
.
The delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc
affect mfccDelta
.
Data Types: logical
mfccDeltaDelta
— Extract delta-delta of MFCCfalse
(default) | true
Extract delta-delta of MFCC, specified as true
or
false
.
The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc
affect mfccDeltaDelta
.
Data Types: logical
gtcc
— Extract gammatone cepstral coefficients (GTCC)false
(default) | true
Extract gammatone cepstral coefficients (GTCC), specified as
true
or false
.
To set parameters of the GTCC extraction, use setExtractorParams
:
setExtractorParams(aFE,"gtcc","Name",Value)
"NumCoeffs"
–– Number of coefficients returned for each
window, specified as a the comma-separated pair consisting of
"NumCoeffs"
and a positive integer. If unspecified,
NumCoeffs
defaults to 13
.
"DeltaWindowLength"
–– Delta window length, specified as
the comma-separated pair consisting of "DeltaWindowLength"
and an odd integer greater than 2. If unspecified,
DeltaWindowLength
defaults to 9
. This
parameter affects the gtccDelta
and
gtccDeltaDelta
features.
"Rectification"
–– Type of nonlinear rectification,
specified as the comma-separated pair consisting of
"Rectification"
and "log"
or
"cubic-root"
.
The gammatone cepstral coefficients are calculated using the erbSpectrum.
Data Types: logical
gtccDelta
— Extract delta of GTCCfalse
(default) | true
Extract delta of GTCC, specified as true
or
false
.
The delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc
affect gtccDelta
.
Data Types: logical
gtccDeltaDelta
— Extract delta-delta of GTCCfalse
(default) | true
Extract delta-delta of GTCC, specified as true
or
false
.
The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc
affect gtccDeltaDelta
.
Data Types: logical
spectralCentroid
— Extract spectral centroidfalse
(default) | true
Extract spectral centroid, specified as true
or
false
.
The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralCrest
— Extract spectral crestfalse
(default) | true
Extract spectral crest, specified as true
or
false
.
The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralDecrease
— Extract spectral decreasefalse
(default) | true
Extract spectral decrease, specified as true
or
false
.
The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralEntropy
— Extract spectral entropyfalse
(default) | true
Extract spectral entropy, specified as true
or
false
.
The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralFlatness
— Extract spectral flatnessfalse
(default) | true
Extract spectral flatness, specified as true
or
false
.
The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralFlux
— Extract spectral fluxfalse
(default) | true
Extract spectral flux, specified as true
or
false
.
The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral flux extraction, use setExtractorParams
:
setExtractorParams(aFE,"spectralFlux","Name",Value)
"NormType"
–– Norm type used to calculate the spectral
flux, specified as the comma-separated pair consisting of
"NormType"
and a 1
or
2
. If unspecified, NormType
defaults
to 2
.
Data Types: logical
spectralKurtosis
— Extract spectral kurtosisfalse
(default) | true
Extract spectral kurtosis, specified as true
or
false
.
The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralRolloffPoint
— Extract spectral rolloff pointfalse
(default) | true
Extract spectral rolloff point, specified as true
or
false
.
The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral rolloff point extraction, use setExtractorParams
:
setExtractorParams(aFE,"spectralRolloffPoint","Name",Value)
"Threshold"
–– Threshold of the rolloff point, specified
as the comma-separated pair consisting of "Threshold"
and a
scalar in the range (0, 1). If unspecified, Threshold
defaults to 0.95
.
Data Types: logical
spectralSkewness
— Extract spectral skewnessfalse
(default) | true
Extract spectral skewness, specified as true
or
false
.
The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralSlope
— Extract spectral slopefalse
(default) | true
Extract spectral slope, specified as true
or
false
.
The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralSpread
— Extract spectral spreadfalse
(default) | true
Extract spectral spread, specified as true
or
false
.
The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
pitch
— Extract pitchfalse
(default) | true
Extract pitch, specified as true
or
false
.
To set parameters of the pitch extraction, use setExtractorParams
:
setExtractorParams(aFE,"pitch","Name",Value)
"Method"
–– Method used to calculate the pitch, specified
as the comma-separated pair consisting of "Method"
and
"PEF"
, "NCF"
, "CEP"
,
"LHS"
, or "SRH"
. If unspecified,
Method
defaults to "NCF"
. For a
description of available pitch extraction methods, see pitch
.
"Range"
–– Range within to search for the pitch in Hz,
specified as the comma-separated pair consisting of "Range"
and a two-element row vector of increasing values. If unspecified,
Range
defaults to [50,400]
.
"MedianFilterLength"
–– Median filter length used to
smooth pitch estimates over time, specified as the comma-separated pair
consisting of "MedianFilterLength"
and a positive integer. If
unspecified, MedianFilterLength
defaults to
1
(no median filtering).
Data Types: logical
harmonicRatio
— Extract harmonic ratiofalse
(default) | true
Extract harmonic ratio, specified as true
or
false
.
Data Types: logical
extract | Extract audio features |
setExtractorParams | Set nondefault parameter values for individual feature extractors |
info | Output mapping and individual feature extractor parameters |
generateMATLABFunction | Create MATLAB function compatible with C/C++ code generation |
Read in an audio signal.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
Create an audioFeatureExtractor
object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, and spectral centroid of an audio signal. Use a 30 ms analysis window with 20 ms overlap.
aFE = audioFeatureExtractor( ... "SampleRate",fs, ... "Window",hamming(round(0.03*fs),"periodic"), ... "OverlapLength",round(0.02*fs), ... "mfcc",true, ... "mfccDelta",true, ... "mfccDeltaDelta",true, ... "pitch",true, ... "spectralCentroid",true);
Call extract
to extract the audio features from the audio signal.
features = extract(aFE,audioIn);
Use info
to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
idx = info(aFE)
idx = struct with fields:
mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
spectralCentroid: 40
pitch: 41
Plot the detected pitch over time.
t = linspace(0,size(audioIn,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title('Pitch') xlabel('Time (s)') ylabel('Frequency (Hz)')
Create an audio datastore that points to audio samples included with Audio Toolbox®.
folder = fullfile(matlabroot,'toolbox','audio','samples'); ads = audioDatastore(folder);
Find all files that correspond to a sample rate of 44.1 kHz and then subset
the datastore.
keepFile = cellfun(@(x)contains(x,'44p1'),ads.Files);
ads = subset(ads,keepFile);
Convert the data to a tall
array. tall
arrays are evaluated only when you request them explicitly using gather
. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.
adsTall = tall(ads)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). adsTall = M×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : :
Create an audioFeatureExtractor
object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.
aFE = audioFeatureExtractor('SampleRate',44.1e3, ... 'melSpectrum',true, ... 'barkSpectrum',true, ... 'erbSpectrum',true, ... 'linearSpectrum',true);
Define a cellfun
function so that audio features are extracted from each cell of the tall array. Call gather
to evaluate the tall array.
specsTall = cellfun(@(x)extract(aFE,x),adsTall,"UniformOutput",false);
specs = gather(specsTall);
Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 12 sec Evaluation completed in 12 sec
The specs
variable returned from gather is a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.
numFiles = numel(specs)
numFiles = 12
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})
numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})
numHops2 = 443
numFeaturesFile2 = 620
numChanelsFile2 = 1
The audioFeatureExtractor
creates a feature extraction pipeline based on
your selected features. To reduce computations, audioFeatureExtractor
reuses
intermediary representations. Some intermediate representations can be output as
features:
For example, to create an object that extracts the centroid of the Bark spectrum, the flux
of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify
the audioFeatureExtractor
as:
aFE = audioFeatureExtractor( ... "SpectralDescriptorInput","barkSpectrum", ... "spectralCentroid",true, ... "spectralFlux",true, ... "pitch",true, ... "harmonicRatio",true, ... "mfccDeltaDelta",true)
aFE = audioFeatureExtractor with properties: Properties Window: [1024×1 double] OverlapLength: 512 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput: 'barkSpectrum' Enabled Features mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.
Note
Because audioFeatureExtractor
reuses intermediary representations, the
features output from audioFeatureExtractor
may not correspond with the
default configuration of features output by corresponding individual feature
extractors.
Behavior changed in R2020b
The audioDelta
function is now used to compute mfccDelta
,
mfccDeltaDelta
, gtccDelta
, and
gtccDeltaDelta
. The audioDelta
algorithm has a
different startup behavior than the previous algorithm. The default window length used to
compute the deltas has changed from 2
to 9
. A delta
window length of 2
is no longer supported.
Usage notes and limitations:
You cannot generate code directly from audioFeatureExtractor
. You
can generated C/C++ code from the function returned by generateMATLABFunction
.
Audio Labeler | audioDataAugmenter
| audioDatastore
| Extract Audio Features | vggishFeatures
You have a modified version of this example. Do you want to open this example with your edits?