Streamline audio feature extraction
audioFeatureExtractor
encapsulates multiple audio feature
extractors into a streamlined and modular implementation.
creates an
audio feature extractor with default property values.aFE
= audioFeatureExtractor()
specifies nondefault properties for aFE
= audioFeatureExtractor(Name,Value
)aFE
using one or more name-value
pair arguments.
Window
— Analysis windowhamming(1024,"periodic")
(default) | real vectorAnalysis window, specified as a real vector.
Data Types: single
| double
OverlapLength
— Overlap length of adjacent analysis windows512
(default) | integer in the range [0,
numel(Window
)
)Overlap length of adjacent analysis windows, specified as an integer in the range
[0, numel(Window)
).
Data Types: single
| double
FFTLength
— FFT length[]
(default) | positive integerFFT length, specified as an integer. The default, []
, means
that the FFT length is equal to the window length, (numel(Window)
).
Data Types: single
| double
SampleRate
— Input sample rate (Hz)44100
(default) | nonnegative scalarInput sample rate in Hz, specified as a nonnegative scalar.
Data Types: single
| double
SpectralDescriptorInput
— Input to spectral descriptors"linearSpectrum"
(default) | "melSpectrum"
| "barkSpectrum"
| "erbSpectrum"
Input to spectral descriptors, specified as "linearSpectrum"
,
"melSpectrum"
, "barkSpectrum"
, or
"erbSpectrum"
.
Spectral descriptors affected by this property are:
The spectrum input to the spectral descriptors is the same as output from the corresponding feature:
For example, if you set "SpectralDescriptorInput"
to
"barkSpectrum"
, and "spectralCentroid"
to
true
, then aFE
returns the centroid of the
default Bark
spectrum.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); aFE = audioFeatureExtractor("SampleRate",fs, ... "SpectralDescriptorInput","barkSpectrum", ... "spectralCentroid",true); barkSpectralCentroid = extract(aFE,audioIn);
barkSpectrum
using setExtractorParams
, then the nondefault Bark spectrum is the input to
the spectral descriptors. For example, if you call
setExtractorParams(aFE,"barkSpectrum","NumBands",40)
, then
aFE
returns the centroid of an 40-band Bark spectrum.
setExtractorParams(aFE,"barkSpectrum","NumBands",40) bark40SpectralCentroid = extract(aFE,audioIn);
Data Types: char
| string
linearSpectrum
— Extract linear spectrumfalse
(default) | true
Extract the one-sided linear spectrum, specified as true
or
false
.
To set parameters of the linear spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"linearSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
Data Types: logical
melSpectrum
— Extract mel spectrumfalse
(default) | true
Extract the one-sided mel spectrum, specified as true
or
false
.
To set parameters of the mel spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"melSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of mel bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
32
.
"Normalization"
–– Normalization applied to bandpass
filters, specified as the comma-separated pair consisting of
"Normalization"
and "bandwidth"
,
"area"
, or "none"
. If unspecified,
Normalization
defaults to
"bandwidth"
.
Data Types: logical
barkSpectrum
— Extract Bark spectrumfalse
(default) | true
Extract the one-sided Bark spectrum, specified as true
or
false
.
To set parameters of the Bark spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"barkSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of Bark bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
32
.
"Normalization"
–– Normalization applied to bandpass
filters, specified as the comma-separated pair consisting of
"Normalization"
and "bandwidth"
,
"area"
, or "none"
. If unspecified,
Normalization
defaults to
"bandwidth"
.
Data Types: logical
erbSpectrum
— Extract ERB spectrumfalse
(default) | true
Extract the one-sided ERB spectrum, specified as true
or
false
.
To set parameters of the ERB spectrum extraction, use setExtractorParams
:
setExtractorParams(aFE,"erbSpectrum","Name",Value)
"FrequencyRange"
–– Frequency range of the extracted
spectrum in Hz, specified as the comma-separated pair consisting of
"FrequencyRange"
and a two-element vector of increasing
numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange
defaults to [0,
.SampleRate
/2]
"SpectrumType"
–– Spectrum type, specified as the
comma-separated pair consisting of "SpectrumType"
and
"power"
or "magnitude"
. If unspecified,
SpectrumType
defaults to
"power"
.
"NumBands"
–– Number of ERB bands, specified as the
comma-separated pair consisting of "NumBands"
and an integer.
If unspecified, NumBands
defaults to
ceil(
.hz2erb
(FrequencyRange(2))-hz2erb
(FrequencyRange(1)))
"Normalization"
–– Normalization applied to bandpass
filters, specified as the comma-separated pair consisting of
"Normalization"
and "bandwidth"
,
"area"
, or "none"
. If unspecified,
Normalization
defaults to
"bandwidth"
.
Data Types: logical
mfcc
— Extract mel-frequency cepstral coefficients (MFCC)false
(default) | true
Extract mel-frequency cepstral coefficients (MFCC), specified as
true
or false
.
To set parameters of the MFCC extraction, use setExtractorParams
:
setExtractorParams(aFE,"mfcc","Name",Value)
"NumCoeffs"
–– Number of coefficients returned for each
window, specified as a the comma-separated pair consisting of
"NumCoeffs"
and a positive integer. If unspecified,
NumCoeffs
defaults to 13
.
"DeltaWindowLength"
–– Delta window length, specified as
the comma-separated pair consisting of "DeltaWindowLength"
and 2
or an odd integer. If unspecified,
DeltaWindowLength
defaults to 2
. This
parameter affects the mfccDelta
and
mfccDeltaDelta
features.
"Rectification"
–– Type of nonlinear rectification,
specified as the comma-separated pair consisting of
"Rectification"
and "log"
or
"cubic-root"
.
The mel-frequency cepstral coefficients are calculated using the melSpectrum.
Data Types: logical
mfccDelta
— Extract delta of MFCCfalse
(default) | true
Extract delta of MFCC, specified as true
or
false
.
The delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc
affect mfccDelta
.
Data Types: logical
mfccDeltaDelta
— Extract delta-delta of MFCCfalse
(default) | true
Extract delta-delta of MFCC, specified as true
or
false
.
The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc
affect mfccDeltaDelta
.
Data Types: logical
gtcc
— Extract gammatone cepstral coefficients (GTCC)false
(default) | true
Extract gammatone cepstral coefficients (GTCC), specified as
true
or false
.
To set parameters of the GTCC extraction, use setExtractorParams
:
setExtractorParams(aFE,"gtcc","Name",Value)
"NumCoeffs"
–– Number of coefficients returned for each
window, specified as a the comma-separated pair consisting of
"NumCoeffs"
and a positive integer. If unspecified,
NumCoeffs
defaults to 13
.
"DeltaWindowLength"
–– Delta window length, specified as
the comma-separated pair consisting of "DeltaWindowLength"
and 2
or an odd integer. If unspecified,
DeltaWindowLength
defaults to 2
. This
parameter affects the gtccDelta
and
gtccDeltaDelta
features.
"Rectification"
–– Type of nonlinear rectification,
specified as the comma-separated pair consisting of
"Rectification"
and "log"
or
"cubic-root"
.
The gammatone cepstral coefficients are calculated using the erbSpectrum.
Data Types: logical
gtccDelta
— Extract delta of GTCCfalse
(default) | true
Extract delta of GTCC, specified as true
or
false
.
The delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc
affect gtccDelta
.
Data Types: logical
gtccDeltaDelta
— Extract delta-delta of GTCCfalse
(default) | true
Extract delta-delta of GTCC, specified as true
or
false
.
The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc
affect gtccDeltaDelta
.
Data Types: logical
spectralCentroid
— Extract spectral centroidfalse
(default) | true
Extract spectral centroid, specified as true
or
false
.
The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralCrest
— Extract spectral crestfalse
(default) | true
Extract spectral crest, specified as true
or
false
.
The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralDecrease
— Extract spectral decreasefalse
(default) | true
Extract spectral decrease, specified as true
or
false
.
The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralEntropy
— Extract spectral entropyfalse
(default) | true
Extract spectral entropy, specified as true
or
false
.
The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralFlatness
— Extract spectral flatnessfalse
(default) | true
Extract spectral flatness, specified as true
or
false
.
The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralFlux
— Extract spectral fluxfalse
(default) | true
Extract spectral flux, specified as true
or
false
.
The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral flux extraction, use setExtractorParams
:
setExtractorParams(aFE,"spectralFlux","Name",Value)
"NormType"
–– Norm type used to calculate the spectral
flux, specified as the comma-separated pair consisting of
"NormType"
and a 1
or
2
. If unspecified, NormType
defaults
to 2
.
Data Types: logical
spectralKurtosis
— Extract spectral kurtosisfalse
(default) | true
Extract spectral kurtosis, specified as true
or
false
.
The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralRolloffPoint
— Extract spectral rolloff pointfalse
(default) | true
Extract spectral rolloff point, specified as true
or
false
.
The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral rolloff point extraction, use setExtractorParams
:
setExtractorParams(aFE,"spectralRolloffPoint","Name",Value)
"Threshold"
–– Threshold of the rolloff point, specified
as the comma-separated pair consisting of "Threshold"
and a
scalar in the range (0, 1). If unspecified, Threshold
defaults to 0.95
.
Data Types: logical
spectralSkewness
— Extract spectral skewnessfalse
(default) | true
Extract spectral skewness, specified as true
or
false
.
The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralSlope
— Extract spectral slopefalse
(default) | true
Extract spectral slope, specified as true
or
false
.
The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
spectralSpread
— Extract spectral spreadfalse
(default) | true
Extract spectral spread, specified as true
or
false
.
The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
pitch
— Extract pitchfalse
(default) | true
Extract pitch, specified as true
or
false
.
To set parameters of the pitch extraction, use setExtractorParams
:
setExtractorParams(aFE,"pitch","Name",Value)
"Method"
–– Method used to calculate the pitch, specified
as the comma-separated pair consisting of "Method"
and
"PEF"
, "NCF"
, "CEP"
,
"LHS"
, or "SRH"
. If unspecified,
Method
defaults to "NCF"
. For a
description of available pitch extraction methods, see pitch
.
"Range"
–– Range within to search for the pitch in Hz,
specified as the comma-separated pair consisting of "Range"
and a two-element row vector of increasing values. If unspecified,
Range
defaults to [50,400]
.
"MedianFilterLength"
–– Median filter length used to
smooth pitch estimates over time, specified as the comma-separated pair
consisting of "MedianFilterLength"
and a positive integer. If
unspecified, MedianFilterLength
defaults to
1
(no median filtering).
Data Types: logical
harmonicRatio
— Extract harmonic ratiofalse
(default) | true
Extract harmonic ratio, specified as true
or
false
.
Data Types: logical
extract | Extract audio features |
setExtractorParams | Set nondefault parameter values for individual feature extractors |
info | Output mapping and individual feature extractor parameters |
Read in an audio signal.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
Create an audioFeatureExtractor
object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, and spectral centroid of an audio signal. Use a 30 ms analysis window with 20 ms overlap.
aFE = audioFeatureExtractor( ... "SampleRate",fs, ... "Window",hamming(round(0.03*fs),"periodic"), ... "OverlapLength",round(0.02*fs), ... "mfcc",true, ... "mfccDelta",true, ... "mfccDeltaDelta",true, ... "pitch",true, ... "spectralCentroid",true);
Call extract
to extract the audio features from the audio signal.
features = extract(aFE,audioIn);
Use info
to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
idx = info(aFE)
idx = struct with fields:
mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
spectralCentroid: 40
pitch: 41
Plot the detected pitch over time.
t = linspace(0,size(audioIn,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title('Pitch') xlabel('Time (s)') ylabel('Frequency (Hz)')
Create an audio datastore that points to audio samples included with Audio Toolbox®.
folder = fullfile(matlabroot,'toolbox','audio','samples'); ads = audioDatastore(folder);
Find all files that correspond to a sample rate of 44.1 kHz and then subset
the datastore.
keepFile = cellfun(@(x)contains(x,'44p1'),ads.Files);
ads = subset(ads,keepFile);
Convert the data to a tall
array. tall
arrays are evaluated only when you request them explicitly using gather
. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.
adsTall = tall(ads)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). adsTall = Mx1 tall cell array { 539648x1 double} { 227497x1 double} { 8000x1 double} { 685056x1 double} { 882688x2 double} {1116283x2 double} { 505726x2 double} {3195904x2 double} : : : :
Create an audioFeatureExtractor
object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.
aFE = audioFeatureExtractor('SampleRate',44.1e3, ... 'melSpectrum',true, ... 'barkSpectrum',true, ... 'erbSpectrum',true, ... 'linearSpectrum',true);
Define a cellfun
function so that audio features are extracted from each cell of the tall array. Call gather
to evaluate the tall array.
specsTall = cellfun(@(x)extract(aFE,x),adsTall,"UniformOutput",false);
specs = gather(specsTall);
Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 12 sec Evaluation completed in 12 sec
The specs
variable returned from gather is an numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.
numFiles = numel(specs)
numFiles = 12
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})
numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})
numHops2 = 443
numFeaturesFile2 = 620
numChanelsFile2 = 1
The audioFeatureExtractor
creates a feature extraction pipeline based on
your selected features. To reduce computations, audioFeatureExtractor
reuses
intermediary representations. Some intermediate representations can be output as
features:
For example, to create an object that extracts the centroid of the Bark spectrum, the flux
of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify
the audioFeatureExtractor
as:
aFE = audioFeatureExtractor( ... "SpectralDescriptorInput","barkSpectrum", ... "spectralCentroid",true, ... "spectralFlux",true, ... "pitch",true, ... "harmonicRatio",true, ... "mfccDeltaDelta",true)
aFE = audioFeatureExtractor with properties: Properties Window: [1024×1 double] OverlapLength: 512 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput: 'barkSpectrum' Enabled Features mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.
Because audioFeatureExtractor
reuses intermediary representations, the
features output from audioFeatureExtractor
may not correspond with the
default configuration of features output by corresponding individual feature
extractors.
Audio Labeler | Extract Audio Features | audioDataAugmenter
| audioDatastore
| cellfun
| gather
| subset
| tall
You have a modified version of this example. Do you want to open this example with your edits?