Time-stretch audio
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
sound(audioIn,fs)
Use stretchAudio
to apply a 1.5 speedup factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same, but the duration of the signal has decreased.
audioOut = stretchAudio(audioIn,1.5); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 1.5') axis tight grid on
sound(audioOut,fs)
Slow down the original audio signal by a 0.75 factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same as the original audio, but the duration of the signal has increased.
audioOut = stretchAudio(audioIn,0.75); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 0.75') axis tight grid on
sound(audioOut,fs)
stretchAudio
supports TSM on frequency-domain audio when using the default vocoder method. Applying TSM to frequency-domain audio enables you to reuse your STFT computation for multiple TSM factors.
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav'); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
Convert the audio signal to the frequency domain.
win = sqrt(hann(256,'periodic')); ovrlp = 192; S = stft(audioIn,'Window',win,'OverlapLength',ovrlp,'Centered',false);
Speed up the audio signal by a factor of 1.4. Specify the window and overlap length used to create the frequency-domain representation.
alpha = 1.4; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 1.4') axis tight grid on
Slow down the audio signal by a factor of 0.8. Specify the window and overlap length used to create the frequency-domain representation.
alpha = 0.8; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 0.8') axis tight grid on
The default TSM method (vocoder) enables you to additionally apply phase-locking to increase the fidelity to the original audio.
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav"); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
Phase-locking adds a nontrivial computational load to TSM and is not always required. By default, phase-locking is disabled. Apply a speedup factor of 1.8 to the input audio signal. Listen to the audio signal and plot it over time.
alpha = 1.8; tic audioOut = stretchAudio(audioIn,alpha); processingTimeWithoutPhaseLocking = toc
processingTimeWithoutPhaseLocking = 0.0798
sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = false') axis tight grid on
Apply the same 1.8 speedup factor to the input audio signal, this time enabling phase-locking. Listen to the audio signal and plot it over time.
tic
audioOut = stretchAudio(audioIn,alpha,"LockPhase",true);
processingTimeWithPhaseLocking = toc
processingTimeWithPhaseLocking = 0.1154
sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = true') axis tight grid on
The waveform similarity overlap-add (WSOLA) TSM method enables you to specify the maximum number of samples to search for the best signal alignment. By default, WSOLA delta is the number of samples in the analysis window minus the number of samples overlapped between adjacent analysis windows. Increasing the WSOLA delta increases the computational load but might also increase fidelity.
Read in an audio signal. Listen to the first 10 seconds of the audio signal.
[audioIn,fs] = audioread('RockGuitar-16-96-stereo-72secs.flac');
sound(audioIn(1:10*fs,:),fs)
Apply a TSM factor of 0.75 to the input audio signal using the WSOLA method. Listen to the first 10 seconds of the resulting audio signal.
alpha = 0.75; tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola"); processingTimeWithDefaultWSOLADelta = toc
processingTimeWithDefaultWSOLADelta = 19.4403
sound(audioOut(1:10*fs,:),fs)
Apply a TSM factor of 0.75 to the input audio signal, this time increasing the WSOLA delta to 1024. Listen to the first 10 seconds of the resulting audio signal.
tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola","WSOLADelta",1024); processingTimeWithIncreasedWSOLADelta = toc
processingTimeWithIncreasedWSOLADelta = 25.5306
sound(audioOut(1:10*fs,:),fs)
audioIn
— Input signalInput signal, specified as a column vector, matrix, or 3-D array. How the function
interprets audioIn
depends on the complexity of
audioIn
and the value of Method
:
If audioIn
is real, audioIn
is
interpreted as a time-domain signal. In this case, audioIn
must be a column vector or matrix. Columns are interpreted as individual
channels.
This syntax applies when Method
is set to
'vocoder'
or 'wsola'
.
If audioIn
is complex, audioIn
is
interpreted as a frequency-domain signal. In this case,
audioIn
must be an
L-by-M-by-N array,
where L is the FFT length, M is the number
of individual spectrums, and N is the number of
channels.
This syntax only applies when Method
is set to
'vocoder'
.
Data Types: single
| double
Complex Number Support: Yes
alpha
— TSM factorTSM factor, specified as a positive scalar.
Data Types: single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'Window',kbdwin(512)
'Method'
— Method used to time-scale audio'vocoder'
(default) | 'wsola'
Method used to time-scale audio, specified as the comma-separated pair consisting
of 'Method'
and 'vocoder'
or
'wsola'
. Set 'Method'
to
'vocoder'
to use the phase vocoder method. Set
'Method'
to 'wsola'
to use the WSOLA
method.
If 'Method'
is set to 'vocoder'
,
audioIn
can be real or complex. If 'Method'
is set to 'wsola'
, audioIn
must be
real.
Data Types: single
| double
'Window'
— Window applied in time domainsqrt(hann(1024,'periodic'))
(default) | real vectorWindow applied in the time domain, specified as the comma-separated pair
consisting of 'Window'
and a real vector. The number of elements in
the vector must be in the range [1,
size(
]. The number of elements in
the vector must also be greater than audioIn
,1)OverlapLength
.
Note
If using stretchAudio
with frequency-domain input, you must
specify Window
as the same window used to transform
audioIn
to the frequency domain.
Data Types: single
| double
'OverlapLength'
— Number of samples overlapped between adjacent windowsround(0.75*numel(Window
))
(default) | scalar in the range [0
numel(Window
)
)Number of samples overlapped between adjacent windows, specified as the
comma-separated pair consisting of 'OverlapLength'
and an integer
in the range [0, numel(Window)
).
Note
If using stretchAudio
with frequency-domain input, you must
specify OverlapLength
as the same overlap length used to
transform audioIn
to a time-frequency representation.
Data Types: single
| double
'LockPhase'
— Apply identity phase-lockingfalse
(default) | true
Apply identity phase-locking, specified as the comma-separated pair consisting of
'LockPhase'
and false
or
true
.
To enable this name-value pair argument, set Method
to
'vocoder'
.
Data Types: logical
'WSOLADelta'
— Maximum samples used to search for best signal alignmentnumel(Window
)-OverlapLength
(default) | nonnegative scalarMaximum number of samples used to search for the best signal alignment, specified
as the comma-separated pair consisting of 'WSOLADelta'
and a
nonnegative scalar.
To enable this name-value pair argument, set Method
to
'wsola'
.
Data Types: single
| double
audioOut
— Time-scale modified audioTime-scale modified audio, returned as a column vector or matrix of independent channels.
The phase vocoder algorithm is a frequency-domain approach to TSM [1][2]. The basic steps of the phase vocoder algorithm are:
The algorithm windows a time-domain signal at interval η, where η =
numel(
.
The windows are then converted to the frequency domain.Window
) - OverlapLength
To preserve horizontal (across time) phase coherence, the algorithm treats each bin as an independent sinusoid whose phase is computed by accumulating the estimates of its instantaneous frequency.
To preserve vertical (across an individual spectrum) phase coherence, the
algorithm locks the phase advance of groups of bins to the phase advance of local
peaks. This step only applies if LockPhase
is set to
true
.
The algorithm returns the modified spectrogram to the time domain, with windows
spaced at intervals of δ, where δ ≈ η/α. α is the speedup factor specified by the
alpha
input argument.
The WSOLA algorithm is a time-domain approach to TSM [1][2]. WSOLA is an extension of
the overlap and add (OLA) algorithm. In the OLA algorithm, a time-domain signal is windowed
at interval η, where η = numel(
. To construct the time-scale modified
output audio, the windows are spaced at interval δ, where δ ≈ η/α. α is the TSM factor
specified by the Window
) -
OverlapLength
alpha
input argument.
The OLA algorithm does a good job of recreating the magnitude spectra but can introduce
phase jumps between windows. The WSOLA algorithm attempts to smooth the phase jumps by
searching WSOLADelta
samples around the η interval for a window that
minimizes phase jumps. The algorithm searches for the best window iteratively, so that each
successive window is chosen relative to the previously selected window.
If WSOLADelta
is set to 0
, then the algorithm
reduces to OLA.
[1] Driedger, Johnathan, and Meinard Müller. "A Review of Time-Scale Modification of Music Signals." Applied Sciences. Vol. 6, Issue 2, 2016.
[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals", Master's thesis, Saarland University, Saarbrücken, Germany, 2011.
Using gpuArray
(Parallel Computing Toolbox) input with the stretchAudio
function is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other
hardware might not offer any performance advantage. To check your GPU compute capability,
see ComputeCompability
in the output from the gpuDevice
(Parallel Computing Toolbox)
function. For more information, see GPU Support by Release (Parallel Computing Toolbox).
For an overview of GPU usage in MATLAB®, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
audioDataAugmenter
| audioTimeScaler
| reverberator
| shiftPitch
You have a modified version of this example. Do you want to open this example with your edits?