Shift audio pitch
shifts the pitch of the audio input by the specified number of semitones,
audioOut
= shiftPitch(audioIn
,nsemitones
)nsemitones
.
specifies options using one or more audioOut
= shiftPitch(audioIn
,nsemitones
,Name,Value
)Name,Value
pair arguments.
Read in an audio file and listen to it.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
sound(audioIn,fs)
Increase the pitch by 3 semitones and listen to the result.
nsemitones = 3; audioOut = shiftPitch(audioIn,nsemitones); sound(audioOut,fs)
Decrease the pitch of the original audio by 3 semitones and listen to the result.
nsemitones = -3; audioOut = shiftPitch(audioIn,nsemitones); sound(audioOut,fs)
Read in an audio file and listen to it.
[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav");
sound(audioIn,fs)
Convert the audio signal to a time-frequency representation using stft
. Use a 512-point kbdwin
with 75% overlap.
win = kbdwin(512); overlapLength = 0.75*numel(win); S = stft(audioIn, ... "Window",win, ... "OverlapLength",overlapLength, ... "Centered",false);
Increase the pitch by 8 semitones and listen to the result. Specify the window and overlap length you used to compute the STFT.
nsemitones =8; lockPhase =
false; audioOut = shiftPitch(S,nsemitones, ... "Window",win, ... "OverlapLength",overlapLength, ... "LockPhase",lockPhase); sound(audioOut,fs)
Decrease the pitch of the original audio by 8 semitones and listen to the result. Specify the window and overlap length you used to compute the STFT.
nsemitones =-8; lockPhase =
false; audioOut = shiftPitch(S,nsemitones, ... "Window",win, ... "OverlapLength",overlapLength, ... "LockPhase",lockPhase); sound(audioOut,fs)
Read in an audio file and listen to it.
[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav');
sound(audioIn,fs)
Increase the pitch by 6 semitones and listen to the result.
nsemitones = 6; lockPhase = false; audioOut = shiftPitch(audioIn,nsemitones, ... 'LockPhase',lockPhase); sound(audioOut,fs)
To increase fidelity, set LockPhase
to true
. Apply pitch shifting, and listen to the results.
lockPhase = true; audioOut = shiftPitch(audioIn,nsemitones, ... 'LockPhase',lockPhase); sound(audioOut,fs)
Read in the first 11.5 seconds of an audio file and listen to it.
[audioIn,fs] = audioread('Rainbow-16-8-mono-114secs.wav',[1,8e3*11.5]);
sound(audioIn,fs)
Increase the pitch by 4 semitones and apply phase locking. Listen to the results. The resulting audio has a "chipmunk effect" that sounds unnatural.
nsemitones =4; lockPhase =
true; audioOut = shiftPitch(audioIn,nsemitones, ... "LockPhase",lockPhase); sound(audioOut,fs)
To increase fidelity, set PreserveFormants
to true
. Use the default cepstral order of 30
. Listen to the result.
cepstralOrder =30; audioOut = shiftPitch(audioIn,nsemitones, ... "LockPhase",lockPhase, ... "PreserveFormants",true, ... "CepstralOrder",cepstralOrder); sound(audioOut,fs)
audioIn
— Input signalInput signal, specified as a column vector, matrix, or 3-D array. How the function
interprets audioIn
depends on the complexity of
audioIn
:
If audioIn
is real, audioIn
is
interpreted as a time-domain signal. In this case, audioIn
must be a column vector or matrix. Columns are interpreted as individual
channels.
If audioIn
is complex, audioIn
is
interpreted as a frequency-domain signal. In this case,
audioIn
must be an
L-by-M-by-N array,
where L is the FFT length, M is the number
of individual spectrums, and N is the number of
channels.
Data Types: single
| double
Complex Number Support: Yes
nsemitones
— Number of semitones to shift audio byNumber of semitones to shift the audio by, specified as a real scalar.
The range of nsemitones
depends on the window length
(numel(
) and the overlap length
(Window
)OverlapLength
):
-12*log2(numel(
≤ Window
)-OverlapLength
)nsemitones
≤
-12*log2((numel(
Window
)-OverlapLength
)/numel(Window
))
Data Types: single
| double
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'Window',kbdwin(512)
'Window'
— Window applied in time domainsqrt(hann(1024,'periodic'))
(default) | real vectorWindow applied in the time domain, specified as the comma-separated pair
consisting of 'Window'
and a real vector. The number of elements in
the vector must be in the range [1,
size(
]. The number of elements in
the vector must also be greater than audioIn
,1)OverlapLength
.
Note
If using shiftPitch
with frequency-domain input, you must
specify Window
as the same window used to transform
audioIn
to the frequency domain.
Data Types: single
| double
'OverlapLength'
— Number of samples overlapped between adjacent windowsround(0.75*numel(Window
))
(default) | scalar in the range [0,
numel(Window
)
)Number of samples overlapped between adjacent windows, specified as the
comma-separated pair consisting of 'OverlapLength'
and an integer
in the range [0, numel(Window)
).
Note
If using shiftPitch
with frequency-domain input, you must
specify OverlapLength
as the same overlap length used to
transform audioIn
to a time-frequency representation.
Data Types: single
| double
'LockPhase'
— Apply identity phase lockingfalse
(default) | true
Apply identity phase locking, specified as the comma-separated pair consisting of
'LockPhase'
and false
or
true
.
Data Types: logical
'PreserveFormants'
— Preserve formantsfalse
(default) | true
Preserves formants, specified as the comma-separated pair consisting of
'PreserveFormants'
and true
or
false
. Formant preservation is attempted using spectral envelope
estimation with cepstral analysis.
Data Types: logical
'CepstralOrder'
— Cepstral order used for formant preservationCepstral order used for formant preservation, specified as the comma-separated
pair consisting of 'CepstralOrder'
and a nonnegative
integer.
To enable this name-value pair argument, set
PreserveFormants
to true
.
Data Types: single
| double
audioOut
— Pitch-shifted audioPitch-shifted audio, returned as a column vector or matrix of independent channels.
To apply pitch shifting, shiftPitch
modifies the time-scale of audio
using a phase vocoder and then resamples the modified audio. The time scale modification
algorithm is based on [1] and [2] and is implemented as in
stretchAudio
.
After time-scale modification, shiftPitch
performs sample rate
conversion using an interpolation factor equal to the analysis hop length and a decimation
factor equal to the synthesis hop length. The interpolation and decimation factors of the
resampling stage are selected as follows: The analysis hop length is determined as
analysisHopLength =
numel(
. The
Window
)-OverlapLength
shiftPitch
function assumes that there are 12 semitones in an octave,
so the speedup factor used to stretch the audio is speedupFactor =
2^(-
. The speedup factor and analysis hop
length determine the synthesis hop length for time-scale modification as
nsemitones
/12)synthesisHopLength = round((1/SpeedupFactor)*analysisHopLength)
.
The achievable pitch shift is determined by the window length
(numel(
) and
Window
)OverlapLength
. To see the relationship, note that the equation for
speedup factor can be rewritten as:
, and the equation for synthesis hop length can be
rewritten as nsemitones
=
-12*log2(speedupFactor)speedupFactor = analysisHopLengh/synthesisHopLength
. Using
simple substitution, nsemitones =
-12*log2(analysisHopLength/synthesisHopLength)
. The practical range of a synthesis
hop length is [1, numel(
]. The range of
achievable pitch shifts is:Window
)
Max number of semitones lowered:
-12*log2(numel(
Window
)-OverlapLength
)
Max number of semitones raised:
-12*log2((numel(
Window
)-OverlapLength
)/numel(Window
))
Pitch shifting can alter the spectral envelope of the pitch-shifted signal. To diminish
this effect, you can set PreserveFormants
to true
.
If PreserveFormants
is set to true
, the algorithm
attempts to estimate the spectral envelope using an iterative procedure in the cepstral
domain, as described in [3] and [4]. For both the original
spectrum, X, and the pitch-shifted spectrum, Y, the
algorithm estimates the spectral envelope as follows.
For the first iteration, EnvXa is set to X. Then, the algorithm repeats these two steps in a loop:
Lowpass filters the cepstral representation of
EnvXa to get a new estimate,
EnvXb. The
CepstralOrder
parameter controls the quefrency
bandwidth.
To update the current best fit, the algorithm takes the element-by-element maximum of the current spectral envelope estimate and the previous spectral envelope estimate:
The loop ends if either a maximum number of iterations
(100
) is reached, or if all bins of the estimated log envelope are
within a given tolerance of the original log spectrum. The tolerance is set to
log(10^(1/20))
.
Finally, the algorithm scales the spectrum of the pitch-shifted audio by the ratio of estimated envelopes, element-wise:
[1] Driedger, Johnathan, and Meinard Müller. "A Review of Time-Scale Modification of Music Signals." Applied Sciences. Vol. 6, Issue 2, 2016.
[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals." Master's Thesis. Saarland University, Saarbrücken, Germany, 2011.
[3] Axel Roebel, and Xavier Rodet. "Efficient Spectral Envelope Estimation and its application to pitch shifting and envelope preservation." International Conference on Digital Audio Effects, pp. 30–35. Madrid, Spain, September 2005. hal-01161334
[4] S. Imai, and Y. Abe. "Spectral envelope extraction by improved cepstral method." Electron. and Commun. in Japan. Vol. 62-A, Issue 4, 1997, pp. 10–17.
Using gpuArray
(Parallel Computing Toolbox) input with the shiftPitch
function is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other
hardware might not offer any performance advantage. To check your GPU compute capability,
see ComputeCompability
in the output from the gpuDevice
(Parallel Computing Toolbox)
function. For more information, see GPU Support by Release (Parallel Computing Toolbox).
For an overview of GPU usage in MATLAB®, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
audioDataAugmenter
| audioTimeScaler
| reverberator
| stretchAudio
You have a modified version of this example. Do you want to open this example with your edits?