Mel spectrogram
specifies options using one or more S
= melSpectrogram(audioIn
,fs
,Name,Value
)Name,Value
pair arguments.
melSpectrogram(___)
plots the mel spectrogram on a
surface in the current figure.
Use the default settings to calculate the mel spectrogram for an entire audio file. Print the number of bandpass filters in the filter bank and the number of frames in the mel spectrogram.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); S = melSpectrogram(audioIn,fs); [numBands,numFrames] = size(S); fprintf("Number of bandpass filters in filterbank: %d\n",numBands)
Number of bandpass filters in filterbank: 32
fprintf("Number of frames in spectrogram: %d\n",numFrames)
Number of frames in spectrogram: 1551
Plot the mel spectrogram.
melSpectrogram(audioIn,fs)
Calculate the mel spectrums of 2048-point windows with 1024-point overlap. Convert to the frequency domain using a 4096-point FFT. Pass the frequency-domain representation through 64 half-overlapped triangular bandpass filters that span the range 62.5 Hz to 8 kHz.
[audioIn,fs] = audioread('FunkyDrums-44p1-stereo-25secs.mp3'); S = melSpectrogram(audioIn,fs, ... 'WindowLength',2048,... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3]);
Call melSpectrogram
again, this time with no output arguments so that you can visualize the mel spectrogram. The input audio is a multichannel signal. If you call melSpectrogram
with a multichannel input and with no output arguments, only the first channel is plotted.
melSpectrogram(audioIn,fs, ... 'WindowLength',2048,... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3])
melSpectrogram
applies a frequency-domain filter bank to audio signals that are windowed in time. You can get the center frequencies of the filters and the time instants corresponding to the analysis windows as the second and third output arguments from melSpectrogram
.
Get the mel spectrogram, filter bank center frequencies, and analysis window time instants of a multichannel audio signal. Use the center frequencies and time instants to plot the mel spectrogram for each channel.
[audioIn,fs] = audioread('AudioArray-16-16-4channels-20secs.wav'); [S,cF,t] = melSpectrogram(audioIn,fs); S = 10*log10(S+eps); % Convert to dB for plotting for i = 1:size(S,3) figure(i) surf(t,cF,S(:,:,i),'EdgeColor','none'); xlabel('Time (s)') ylabel('Frequency (Hz)') view([0,90]) title(sprintf('Channel %d',i)) axis([t(1) t(end) cF(1) cF(end)]) end
audioIn
— Audio inputAudio input, specified as a column vector or matrix. If specified as a matrix, the function treats columns as independent audio channels.
Data Types: single
| double
fs
— Input sample rate (Hz)Input sample rate in Hz, specified as a positive scalar.
Data Types: single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'WindowLength',1024
'WindowLength'
— Analysis window length (samples)round(0.03*fs
)
(default) | integer in the range [2,
size(audioIn
,1)]
Analysis window length in samples, specified as the comma-separated pair
consisting of 'WindowLength'
and an integer in the range
[2, size(
.audioIn
,1)]
Data Types: single
| double
'OverlapLength'
— Analysis window overlap length (samples)round(0.02*fs
)
(default) | integer in the range [0, (WindowLength
-
1)]
Analysis window overlap length in samples, specified as the comma-separated pair
consisting of 'OverlapLength'
and an integer in the range
[0, (
.WindowLength
- 1)]
Data Types: single
| double
'FFTLength'
— Number of DFT pointsWindowLength
(default) | positive integerNumber of points used to calculate the DFT, specified as the comma-separated pair
consisting of 'FFTLength'
and a positive integer greater than or
equal to WindowLength
. If unspecified,
FFTLength
defaults to WindowLength
.
Data Types: single
| double
'NumBands'
— Number of mel bandpass filters32
(default) | positive integerNumber of mel bandpass filters, specified as the comma-separated pair consisting
of 'NumBands'
and a positive integer.
Data Types: single
| double
'FrequencyRange'
— Frequency range over which to compute mel spectrogram (Hz)[0 fs
/2]
(default) | two-element row vectorFrequency range over which to compute the mel spectrogram in Hz, specified as the
comma-separated pair consisting of 'FrequencyRange'
and a
two-element row vector of monotonically increasing values in the range [0,
. fs
/2]
Data Types: single
| double
'SpectrumType'
— Type of mel spectrogram'power'
(default) | 'magnitude'
Type of mel spectrogram, specified as the comma-separated pair consisting of
'SpectrumType'
and 'power'
or
'magnitude'
.
Data Types: char
| string
S
— Mel spectrogramMel spectrogram, returned as a column vector, matrix, or 3-D array. The dimensions
of S
are
L-by-M-by-N, where:
L is the number of frequency bins in each mel spectrum.
NumBands
and fs
determine
L.
M is the number of frames the audio signal is partitioned
into. size(
,
audioIn
,1)WindowLength
, and OverlapLength
determine M.
N is the number of channels such that N
= size(
.audioIn
,2)
Trailing singleton dimensions are removed from the output
S
.
Data Types: single
| double
F
— Center frequencies of mel bandpass filters (Hz)Center frequencies of mel bandpass filters in Hz, returned as a row vector with
length size(
.S
,1)
Data Types: single
| double
T
— Location of each window of audio (s)Location of each analysis window of audio in seconds, returned as a row vector
length size(
. The location corresponds to
the center of each window.S
,2)
Data Types: single
| double
The melSpectrogram
function follows the general algorithm to compute
a mel spectrogram as described in [1].
In this algorithm, the audio input is first buffered into frames of
WindowLength
number of samples. The frames are overlapped by
OverlapLength
number of samples. A periodic
hamming
window is applied to each frame, and then the frame is
converted to frequency-domain representation with FFTLength
number of
points. The frequency-domain representation can be either magnitude or power, specified by
SpectrumType
. Each frame of the frequency-domain representation passes
through a mel filter bank. The spectral values output from the mel filter bank are summed, and
then the channels are concatenated so that each frame is transformed to a
NumBands
-element column vector.
The mel filter bank is designed as half-overlapped triangular filters equally spaced on
the mel scale. NumBands
controls the number of mel bandpass filters.
FrequencyRange
controls the band edges of the first and last filters
in the mel filter bank. The filters are normalized by their bandwidths, so that if white
noise is input to the system, each filter outputs an equal amount of energy.
[1] Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.
You have a modified version of this example. Do you want to open this example with your edits?