Extract gammatone cepstral coefficients, log-energy, delta, and delta-delta
specifies options using one or more coeffs
= gtcc(___,Name,Value
)Name,Value
pair arguments.
[
also returns the delta, delta-delta, and location in samples corresponding to each window of
data.coeffs
,delta
,deltaDelta
,loc
] = gtcc(___)
Get the gammatone cepstral coefficients for an audio file using default settings. Plot the results.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); [coeffs,~,~,loc] = gtcc(audioIn,fs); t = loc./fs; plot(t,coeffs) xlabel('Time (s)') title('Gammatone Cepstral Coefficients') legend('logE','0','1','2','3','4','5','6','7','8','9','10','11','12', ... 'Location','northeastoutside')
Read in an audio file.
[audioIn,fs] = audioread('Turbine-16-44p1-mono-22secs.wav');
Calculate 20 GTCC using filters equally spaced on the ERB scale between hz2erb(62.5)
and hz2erb(12000)
. Calculate the coefficients using 50 ms periodic Hann windows with 25 ms overlap. Replace the 0th coefficient with the log-energy. Use time-domain filtering.
[coeffs,~,~,loc] = gtcc(audioIn,fs, ... 'NumCoeffs',20, ... 'FrequencyRange',[62.5,12000], ... 'Window',hann(round(0.05*fs),'periodic'), ... 'OverlapLength',round(0.025*fs), ... 'LogEnergy','Replace', ... 'FilterDomain','Time');
Plot the results.
t = loc/fs; plot(t,coeffs) xlabel('Time (s)') title('Gammatone Cepstral Coefficients') legend('logE','1','2','3','4','5','6','7','8','9','10','11','12','13', ... '14','15','16','17','18','19','Location','northeastoutside');
Read in an audio file and convert it to a frequency representation.
[audioIn,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); win = hann(1024,"periodic"); S = stft(audioIn,"Window",win,"OverlapLength",512,"Centered",false);
To extract the gammatone cepstral coefficients, call gtcc
with the frequency-domain audio. Ignore the log-energy.
coeffs = gtcc(S,fs,"LogEnergy","Ignore");
In many applications, GTCC observations are converted to summary statistics for use in classification tasks. Plot a probability density function for one of the gammatone cepstral coefficients to observe its distributions.
nbins = 60; coefficientToAnalyze =4; histogram(coeffs(:,coefficientToAnalyze+1),nbins,'Normalization','pdf') title(sprintf("Coefficient %d",coefficientToAnalyze))
audioIn
— Input signalInput signal, specified as a vector, matrix, or 3-D array.
If 'FilterDomain
' is set to 'Frequency'
(default), then audioIn
can be real or complex.
If audioIn
is real, it is interpreted as a time-domain
signal and must be a column vector or a matrix. Columns of the matrix are treated
as independent audio channels.
If audioIn
is complex, it is interpreted as a
frequency-domain signal. In this case, audioIn
must be an
L-by-M-by-N array,
where L is the number of DFT points, M is
the number of individual spectrums, and N is the number of
individual channels.
If 'FilterDomain
' is set to 'Time'
, then
audioIn
must be a real column vector or matrix. Columns of the
matrix are treated as independent audio channels.
Data Types: single
| double
Complex Number Support: Yes
fs
— Sample rate (Hz)Sample rate of the input signal in Hz, specified as a positive scalar.
Data Types: single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
coeffs = gtcc(audioIn,fs,'LogEnergy','Replace')
returns
gammatone cepstral coefficients for the audio input signal sampled at fs
Hz. For each analysis window, the first coefficient in the coeffs
vector
is replaced with the log energy of the input signal.'Window'
— Window applied in time domainhamming(round(fs*0.3),'periodic')
(default) | vectorWindow applied in time domain, specified as the comma-separated pair consisting of
'Window'
and a real vector. The number of elements in the
vector must be in the range
[1,size(
. The number of elements
in the vector must also be greater than audioIn
,1)]OverlapLength
.
Data Types: single
| double
'OverlapLength'
— Number of samples overlapped between adjacent windowsround(0.02*fs
)
(default) | non-negative scalar'NumCoeffs'
— Number of coefficients returned13
(default) | positive scalar integerNumber of coefficients returned for each window of data, specified as the
comma-separated pair consisting of 'NumCoeffs'
and an integer in
the range [2, v]. v is the number of valid
passbands. If unspecified, NumCoeffs
defaults to
13
.
The number of valid passbands is defined as the number of ERB steps
(ERBN) in the frequency range of the filter bank. The
frequency range of the filter bank is specified by
FrequencyRange
.
Data Types: single
| double
'FilterDomain'
— Domain in which to apply filtering'Frequency'
(default) | 'Time'
Domain in which to apply filtering, specified as the comma-separated pair
consisting of 'FilterDomain'
and 'Frequency'
or
'Time'
. If unspecified, FilterDomain
defaults to Frequency
.
Data Types: string
| char
'FrequencyRange'
— Frequency range of gammatone filter bank (Hz)[50 fs
/2]
(default) | two-element row vectorFrequency range of gammatone filter bank in Hz, specified as the comma-separated
pair consisting of 'FrequencyRange'
and a two-element row vector of
increasing values in the range [0, fs
/2]. If unspecified,
FrequencyRange
defaults to [50,
fs
/2]
Data Types: single
| double
'FFTLength'
— Number of bins in DFTnumel(Window
)
(default) | positive scalar integerNumber of bins used to calculate the discrete Fourier transform (DFT) of windowed
input samples. The FFT length must be greater than or equal to the number of elements
in the Window
.
Data Types: single
| double
'Rectification'
— Type of nonlinear rectification'log'
(default) | 'cubic-root'
Type of nonlinear rectification applied prior to the discrete cosine transform,
specified as 'log'
or 'cubic-root'
.
Data Types: char
| string
'DeltaWindowLength'
— Number of coefficients used to calculate delta and delta-delta9
(default) | odd integer greater than twoNumber of coefficients used to calculate the delta and the delta-delta values,
specified as the comma-separated pair consisting of
'DeltaWindowLength'
and an odd integer greater than two. If
unspecified, DeltaWindowLength
defaults to
9
.
Deltas are computed using the audioDelta
function.
Data Types: single
| double
'LogEnergy'
— Log energy usage'Append'
(default) | 'Replace'
| 'Ignore'
Log energy usage, specified as the comma-separated pair consisting of
'LogEnergy'
and 'Append'
,
'Replace'
, or 'Ignore'
. If unspecified,
LogEnergy
defaults to 'Append'
.
'Append'
–– The function prepends the log energy to the
coefficients vector. The length of the coefficients vector is 1 +
NumCoeffs
.
'Replace'
–– The function replaces the first coefficient
with the log energy of the signal. The length of the coefficients vector is
NumCoeffs
.
'Ignore'
–– The function does not calculate or return the
log energy.
Data Types: char
| string
coeffs
— Gammatone cepstral coefficientsGammatone cepstral coefficients, returned as an L-by-M matrix or an L-by-M-by-N array, where:
L –– Number of analysis windows the audio signal is
partitioned into. The input size, Window
, and
OverlapLength
control this dimension:
L = floor((size(
.audioIn
,1) −
numel(Window
)))/(numel(Window)
−
OverlapLength
) + 1
M –– Number of coefficients returned per frame. This value
is determined by NumCoeffs
and
LogEnergy
.
When LogEnergy
is set to:
'Append'
–– The function prepends the log energy
value to the coefficients vector. The length of the coefficients vector is 1
+ NumCoeffs
.
'Replace'
–– The function replaces the first
coefficient with the log energy of the signal. The length of the
coefficients vector is NumCoeffs
.
'Ignore'
–– The function does not calculate or return
the log energy. The length of the coefficients vector is
NumCoeffs
.
N –– Number of input channels (columns). This value is
size(
.audioIn
,2)
Data Types: single
| double
delta
— Change in coefficientsChange in coefficients from one analysis window to another, returned as an
L-by-M matrix or an
L-by-M-by-N array. The
delta
array is the same size and data type as the
coeffs
array. See coeffs
for the definitions
of L, M, and N.
Data Types: single
| double
loc
— Location of the last sample in each analysis windowLocation of last sample in each analysis window, returned as a column vector with
the same number of rows as coeffs
.
Data Types: single
| double
The gtcc
function splits the entire data into overlapping segments.
The length of each analysis window is determined by Window
. The length of
overlap between analysis windows is determined by OverlapLength
. The
algorithm to determine the gammatone cepstral coefficients depends on the filter domain,
specified by FilterDomain
. The default filter domain is frequency.
Gammatone cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of gammatone cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
The default gammatone filter bank is composed of gammatone filters spaced linearly on
the ERB scale between 50 and 8000 Hz. The filter bank is designed by designAuditoryFilterBank
.
The information contained in the zeroth gammatone cepstral coefficient is often augmented with or replaced by the log energy. The log energy calculation depends on the input domain.
If the input is a time-domain signal, the log energy is computed using the following equation:
If the input is a frequency-domain signal, the log energy is computed using the following equation:
If FilterDomain
is specified as 'Time'
, the
gtcc
function uses the gammatoneFilterBank
to apply time-domain filtering. The basic steps of the
gtcc
algorithm are outlined by the diagram.
The FrequencyRange
and sample rate (fs
)
parameters are set on the filter bank using the name-value pairs input to the
gtcc
function. The number of filters in the gammatone filter bank is
defined as
.This
roughly corresponds to placing a gammatone filter every 0.9 mm in the cochlea.hz2erb
(FrequencyRange
(2)) −
hz2erb
(FrequencyRange
(1))
The output from the gammatone filter bank is a multichannel signal. Each channel output
from the gammatone filter bank is buffered into overlapped analysis windows, as specified by
the Window
and OverlapLength
parameters. The
energy for each analysis window of data is calculated. The STE of the channels are
concatenated. The concatenated signal is then passed through a logarithm function and
transformed to the cepstral domain using a discrete cosine transform (DCT).
The log-energy is calculated on the original audio signal using the same buffering scheme applied to the gammatone filter bank output.
Behavior changed in R2020b
The delta and delta-delta calculations are now computed using the audioDelta
function, which has a different startup behavior than the previous algorithm. The default
value of the DeltaWindowLength
parameter has changed from
2
to 9
. A delta window length of
2
is no longer supported.
WindowLength
will be removed in a future releaseBehavior change in future release
The WindowLength
parameter will be removed from the
gtcc
function in a future release. Use the
Window
parameter instead.
In releases prior to R2020b, you could only specify the length of a time-domain window. The window was always designed as a periodic Hamming window. You can replace instances of the code
coeffs = gtcc(audioin,fs,'WindowLength',1024);
coeffs = gtcc(audioIn,fs,'Window',hamming(1024,'periodic'));
[1] Shao, Yang, Zhaozhang Jin, Deliang Wang, and Soundararajan Srinivasan. "An Auditory-Based Feature for Robust Speech Recognition." IEEE International Conference on Acoustics, Speech and Signal Processing. 2009.
[2] Valero, X., and F. Alias. "Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification." IEEE Transactions on Multimedia. Vol. 14, Issue 6, 2012, pp. 1684–1689.
audioDelta
| audioFeatureExtractor
| Cepstral Feature
Extractor | cepstralCoefficients
| detectSpeech
| mfcc
You have a modified version of this example. Do you want to open this example with your edits?