This example shows how to train and use a generative adversarial network (GAN) to generate sounds.
In generative adversarial networks, a generator and a discriminator compete against each other to improve the generation quality.
GANs have generated significant interest in the field of audio and speech processing. Applications include text-to-speech synthesis, voice conversion, and speech enhancement.
This example trains a GAN for unsupervised synthesis of audio waveforms. The GAN in this example generates drumbeat sounds. The same approach can be followed to generate other types of sound, including speech.
Before you train a GAN from scratch, you will use a pretrained GAN generator to synthesize drum beats.
Download the pretrained generator.
matFileName = 'drumGeneratorWeights.mat'; if ~exist(matFileName,'file') websave(matFileName,'https://www.mathworks.com/supportfiles/audio/GanAudioSynthesis/drumGeneratorWeights.mat'); end
The function synthesizeDrumBeat
calls a pretrained network to synthesize a drumbeat sampled at 16 kHz. The synthesizeDrumBeat
function is included at the end of this example.
Synthesize a drumbeat and listen to it.
drum = synthesizeDrumBeat; fs = 16e3; sound(drum,fs)
Plot the synthesized drumbeat.
t = (0:length(drum)-1)/fs; plot(t,drum) grid on xlabel('Time (s)') title('Synthesized Drum Beat')
You can use the drumbeat synthesizer with other audio effects to create more complex applications. For example, you can apply reverberation to the synthesized drum beats.
Create a reverberator
(Audio Toolbox) object and open its parameter tuner UI. This UI enables you to tune the reverberator
parameters as the simulation runs.
reverb = reverberator('SampleRate',fs);
parameterTuner(reverb);
Create a dsp.TimeScope
(DSP System Toolbox) object to visualize the drum beats.
ts = dsp.TimeScope('SampleRate',fs, ... 'TimeSpanOverrunAction','Scroll', ... 'TimeSpan',10, ... 'BufferLength',10*256*64, ... 'ShowGrid',true, ... 'YLimits',[-1 1]);
In a loop, synthesize the drum beats and apply reverberation. Use the parameter tuner UI to tune reverberation. If you want to run the simulation for a longer time, increase the value of the loopCount
parameter.
loopCount = 20; for ii = 1:loopCount drum = synthesizeDrumBeat; drum = reverb(drum); ts(drum(:,1)); soundsc(drum,fs) pause(0.5) end
Now that you have seen the pretrained drumbeat generator in action, you can investigate the training process in detail.
A GAN is a type of deep learning network that generates data with characteristics similar to the training data.
A GAN consists of two networks that train together, a generator and a discriminator:
Generator - Given a vector or random values as input, this network generates data with the same structure as the training data. It is the generator's job to fool the discriminator.
Discriminator - Given batches of data containing observations from both the training data and the generated data, this network attempts to classify the observations as real or generated.
To maximize the performance of the generator, maximize the loss of the discriminator when given generated data. That is, the objective of the generator is to generate data that the discriminator classifies as real. To maximize the performance of the discriminator, minimize the loss of the discriminator when given batches of both real and generated data. Ideally, these strategies result in a generator that generates convincingly realistic data and a discriminator that has learned strong feature representations that are characteristic of the training data.
In this example, you train the generator to create fake time-frequency short-time Fourier transform (STFT) representations of drum beats. You train the discriminator to identify real STFTs. You create the real STFTs by computing the STFT of short recordings of real drum beats.
Train a GAN using the Drum Sound Effects dataset [1]. Download and extract the dataset.
url = 'http://deepyeti.ucsd.edu/cdonahue/wavegan/data/drums.tar.gz'; downloadFolder = tempdir; filename = fullfile(downloadFolder,'drums_dataset.tgz'); drumsFolder = fullfile(downloadFolder,'drums'); if ~exist(drumsFolder,'dir') disp('Downloading Drum Sound Effects Dataset (218 MB)...') websave(filename,url); untar(filename,downloadFolder) end
Create an audioDatastore
(Audio Toolbox) object that points to the drums dataset.
ads = audioDatastore(drumsFolder,'IncludeSubfolders',true);
Define a network that generates STFTs from 1-by-1-by-100 arrays of random values. Create a network that upscales 1-by-1-by-100 arrays to 128-by-128-by-1 arrays using a fully connected layer followed by a reshape layer and a series of transposed convolution layers with ReLU layers.
This figure shows the dimensions of the signal as it travels through the generator. The generator architecture is defined in Table 4 of [1].
The generator network is defined in modelGenerator
, which is included at the end of this example.
Define a network that classifies real and generated 128-by-128 STFTs.
Create a network that takes 128-by-128 images and outputs a scalar prediction score using a series of convolution layers with leaky ReLU layers followed by a fully connected layer.
This figure shows the dimensions of the signal as it travels through the discriminator. The discriminator architecture is defined in Table 5 of [1].
The discriminator network is defined in modelDiscriminator
, which is included at the end of this example.
Generate STFT data from the drumbeat signals in the datastore.
Define the STFT parameters.
fftLength = 256;
win = hann(fftLength,'periodic');
overlapLength = 128;
To speed up processing, distribute the feature extraction across multiple workers using parfor
.
First, determine the number of partitions for the dataset. If you do not have Parallel Computing Toolbox™, use a single partition.
if ~isempty(ver('parallel')) pool = gcp; numPar = numpartitions(ads,pool); else numPar = 1; end
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).
For each partition, read from the datastore and compute the STFT.
parfor ii = 1:numPar subds = partition(ads,numPar,ii); STrain = zeros(fftLength/2+1,128,1,numel(subds.Files)); for idx = 1:numel(subds.Files) x = read(subds); if length(x) > fftLength*64 % Lengthen the signal if it is too short x = x(1:fftLength*64); end % Convert from double-precision to single-precision x = single(x); % Scale the signal x = x ./ max(abs(x)); % Zero-pad to ensure stft returns 128 windows. x = [x ; zeros(overlapLength,1,'like',x)]; S0 = stft(x,'Window',win,'OverlapLength',overlapLength,'Centered',false); % Convert from two-sided to one-sided. S = S0(1:129,:); S = abs(S); STrain(:,:,:,idx) = S; end STrainC{ii} = STrain; end
Convert the output to a four-dimensional array with STFTs along the fourth dimension.
STrain = cat(4,STrainC{:});
Convert the data to the log scale to better align with human perception.
STrain = log(STrain + 1e-6);
Normalize training data to have zero mean and unit standard deviation.
Compute the STFT mean and standard deviation of each frequency bin.
SMean = mean(STrain,[2 3 4]); SStd = std(STrain,1,[2 3 4]);
Normalize each frequency bin.
STrain = (STrain-SMean)./SStd;
The computed STFTs have unbounded values. Following the approach in [1], make the data bounded by clipping the spectra to 3 standard deviations and rescaling to [-1 1].
STrain = STrain/3; Y = reshape(STrain,numel(STrain),1); Y(Y<-1) = -1; Y(Y>1) = 1; STrain = reshape(Y,size(STrain));
Discard the last frequency bin to force the number of STFT bins to a power of two (which works well with convolutional layers).
STrain = STrain(1:end-1,:,:,:);
Permute the dimensions in preparation for feeding to the discriminator.
STrain = permute(STrain,[2 1 3 4]);
Train with a mini-batch size of 64 for 1000 epochs.
maxEpochs = 1000; miniBatchSize = 64;
Compute the number of iterations required to consume the data.
numIterationsPerEpoch = floor(size(STrain,4)/miniBatchSize);
Specify the options for Adam optimization. Set the learn rate of the generator and discriminator to 0.0002
. For both networks, use a gradient decay factor of 0.5 and a squared gradient decay factor of 0.999.
learnRateGenerator = 0.0002; learnRateDiscriminator = 0.0002; gradientDecayFactor = 0.5; squaredGradientDecayFactor = 0.999;
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA-enabled NVIDIA GPU with compute capability of 3.0 or higher.
executionEnvironment = "auto";
Initialize the generator and discriminator weights. The initializeGeneratorWeights
and initializeDiscriminatorWeights
functions return random weights obtained using Glorot uniform initialization. The functions are included at the end of this example.
generatorParameters = initializeGeneratorWeights; discriminatorParameters = initializeDiscriminatorWeights;
Train the model using a custom training loop. Loop over the training data and update the network parameters at each iteration.
For each epoch, shuffle the training data and loop over mini-batches of data.
For each mini-batch:
Generate a dlarray
object containing an array of random values for the generator network.
For GPU training, convert the data to a gpuArray
(Parallel Computing Toolbox) object.
Evaluate the model gradients using dlfeval
and the helper functions, modelDiscriminatorGradients
and modelGeneratorGradients
.
Update the network parameters using the adamupdate
function.
Initialize the parameters for Adam.
trailingAvgGenerator = []; trailingAvgSqGenerator = []; trailingAvgDiscriminator = []; trailingAvgSqDiscriminator = [];
You can set saveCheckpoints
to true
to save the updated weights and states to a MAT file every ten epochs. You can then use this MAT file to resume training if it is interrupted. For the purpose of this example, set saveCheckpoints
to false
.
saveCheckpoints = false;
Specify the length of the generator input.
numLatentInputs = 100;
Train the GAN. This can take multiple hours to run.
iteration = 0; for epoch = 1:maxEpochs % Shuffle the data. idx = randperm(size(STrain,4)); STrain = STrain(:,:,:,idx); % Loop over mini-batches. for index = 1:numIterationsPerEpoch iteration = iteration + 1; % Read mini-batch of data. dlX = STrain(:,:,:,(index-1)*miniBatchSize+1:index*miniBatchSize); dlX = dlarray(dlX,'SSCB'); % Generate latent inputs for the generator network. Z = 2 * ( rand(1,1,numLatentInputs,miniBatchSize,'single') - 0.5 ) ; dlZ = dlarray(Z); % If training on a GPU, then convert data to gpuArray. if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu" dlZ = gpuArray(dlZ); dlX = gpuArray(dlX); end % Evaluate the discriminator gradients using dlfeval and the % |modelDiscriminatorGradients| helper function. gradientsDiscriminator = ... dlfeval(@modelDiscriminatorGradients,discriminatorParameters,generatorParameters,dlX,dlZ); % Update the discriminator network parameters. [discriminatorParameters,trailingAvgDiscriminator,trailingAvgSqDiscriminator] = ... adamupdate(discriminatorParameters,gradientsDiscriminator, ... trailingAvgDiscriminator,trailingAvgSqDiscriminator,iteration, ... learnRateDiscriminator,gradientDecayFactor,squaredGradientDecayFactor); % Generate latent inputs for the generator network. Z = 2 * ( rand(1,1,numLatentInputs,miniBatchSize,'single') - 0.5 ) ; dlZ = dlarray(Z); % If training on a GPU, then convert data to gpuArray. if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu" dlZ = gpuArray(dlZ); end % Evaluate the generator gradients using dlfeval and the % |modelGeneratorGradients| helper function. gradientsGenerator = ... dlfeval(@modelGeneratorGradients,discriminatorParameters,generatorParameters,dlZ); % Update the generator network parameters. [generatorParameters,trailingAvgGenerator,trailingAvgSqGenerator] = ... adamupdate(generatorParameters,gradientsGenerator, ... trailingAvgGenerator,trailingAvgSqGenerator,iteration, ... learnRateGenerator,gradientDecayFactor,squaredGradientDecayFactor); end % Every 10 iterations, save a training snapshot to a MAT file. if saveCheckpoints && mod(epoch,10)==0 fprintf('Epoch %d out of %d complete\n',epoch,maxEpochs); % Save checkpoint in case training is interrupted. save('audiogancheckpoint.mat',... 'generatorParameters','discriminatorParameters',... 'trailingAvgDiscriminator','trailingAvgSqDiscriminator',... 'trailingAvgGenerator','trailingAvgSqGenerator','iteration'); end end
Now that you have trained the network, you can investigate the synthesis process in more detail.
The trained drumbeat generator synthesizes short-time Fourier transform (STFT) matrices from input arrays of random values. An inverse STFT (ISTFT) operation converts the time-frequency STFT to a synthesized time-domain audio signal.
Load the weights of a pretrained generator. These weights were obtained by running the training highlighted in the previous section for 1000 epochs.
load(matFileName,'generatorParameters','SMean','SStd');
The generator takes 1-by-1-by-100 vectors of random values as an input. Generate a sample input vector.
numLatentInputs = 100;
dlZ = dlarray(2 * ( rand(1,1,numLatentInputs,1,'single') - 0.5 ));
Pass the random vector to the generator to create an STFT image. generatorParameters
is a structure containing the weights of the pretrained generator.
dlXGenerated = modelGenerator(dlZ,generatorParameters);
Convert the STFT dlarray
to a single-precision matrix.
S = dlXGenerated.extractdata;
Transpose the STFT to align its dimensions with the istft
function.
S = S.';
The STFT is a 128-by-128 matrix, where the first dimension represents 128 frequency bins linearly spaced from 0 to 8 kHz. The generator was trained to generate a one-sided STFT from an FFT length of 256, with the last bin omitted. Reintroduce that bin by inserting a row of zeros into the STFT.
S = [S ; zeros(1,128)];
Revert the normalization and scaling steps used when you generated the STFTs for training.
S = S * 3; S = (S.*SStd) + SMean;
Convert the STFT from the log domain to the linear domain.
S = exp(S);
Convert the STFT from one-sided to two-sided.
S = [S; S(end-1:-1:2,:)];
Pad with zeros to remove window edge-effects.
S = [zeros(256,100) S zeros(256,100)];
The STFT matrix does not contain any phase information. Use a fast version of the Griffin-Lim algorithm with 20 iterations to estimate the signal phase and produce audio samples.
myAudio = stftmag2sig(S,256, ... 'FrequencyRange','twosided', ... 'Window',hann(256,'periodic'), ... 'OverlapLength',128, ... 'MaxIterations',20, ... 'Method','fgla'); myAudio = myAudio./max(abs(myAudio),[],'all'); myAudio = myAudio(128*100:end-128*100);
Listen to the synthesized drumbeat.
sound(myAudio,fs)
Plot the synthesized drumbeat.
t = (0:length(myAudio)-1)/fs; plot(t,myAudio) grid on xlabel('Time (s)') title('Synthesized GAN Sound')
Plot the STFT of the synthesized drumbeat.
figure stft(myAudio,fs,'Window',hann(256,'periodic'),'OverlapLength',128);
The modelGenerator
function upscales 1-by-1-by-100 arrays (dlX) to 128-by-128-by-1 arrays (dlY). parameters
is a structure holding the weights of the generator layers. The generator architecture is defined in Table 4 of [1].
function dlY = modelGenerator(dlX,parameters) dlY = fullyconnect(dlX,parameters.FC.Weights,parameters.FC.Bias,'Dataformat','SSCB'); dlY = reshape(dlY,[1024 4 4 size(dlY,2)]); dlY = permute(dlY,[3 2 1 4]); dlY = relu(dlY); dlY = dltranspconv(dlY,parameters.Conv1.Weights,parameters.Conv1.Bias,'Stride' ,2 ,'Cropping','same','DataFormat','SSCB'); dlY = relu(dlY); dlY = dltranspconv(dlY,parameters.Conv2.Weights,parameters.Conv2.Bias,'Stride' ,2 ,'Cropping','same','DataFormat','SSCB'); dlY = relu(dlY); dlY = dltranspconv(dlY,parameters.Conv3.Weights,parameters.Conv3.Bias,'Stride' ,2 ,'Cropping','same','DataFormat','SSCB'); dlY = relu(dlY); dlY = dltranspconv(dlY,parameters.Conv4.Weights,parameters.Conv4.Bias,'Stride' ,2 ,'Cropping','same','DataFormat','SSCB'); dlY = relu(dlY); dlY = dltranspconv(dlY,parameters.Conv5.Weights,parameters.Conv5.Bias,'Stride' ,2 ,'Cropping','same','DataFormat','SSCB'); dlY = tanh(dlY); end
The modelDiscriminator
function takes 128-by-128 images and outputs a scalar prediction score. The discriminator architecture is defined in Table 5 of [1].
function dlY = modelDiscriminator(dlX,parameters) dlY = dlconv(dlX,parameters.Conv1.Weights,parameters.Conv1.Bias,'Stride' ,2 ,'Padding','same'); dlY = leakyrelu(dlY,0.2); dlY = dlconv(dlY,parameters.Conv2.Weights,parameters.Conv2.Bias,'Stride' ,2 ,'Padding','same'); dlY = leakyrelu(dlY,0.2); dlY = dlconv(dlY,parameters.Conv3.Weights,parameters.Conv3.Bias,'Stride' ,2 ,'Padding','same'); dlY = leakyrelu(dlY,0.2); dlY = dlconv(dlY,parameters.Conv4.Weights,parameters.Conv4.Bias,'Stride' ,2 ,'Padding','same'); dlY = leakyrelu(dlY,0.2); dlY = dlconv(dlY,parameters.Conv5.Weights,parameters.Conv5.Bias,'Stride' ,2 ,'Padding','same'); dlY = leakyrelu(dlY,0.2); dlY = stripdims(dlY); dlY = permute(dlY,[3 2 1 4]); dlY = reshape(dlY,4*4*64*16,numel(dlY)/(4*4*64*16)); weights = parameters.FC.Weights; bias = parameters.FC.Bias; dlY = fullyconnect(dlY,weights,bias,'Dataformat','CB'); end
The modelDiscriminatorGradients
functions takes as input the generator and discriminator parameters generatorParameters
and discriminatorParameters
, a mini-batch of input data dlX
, and an array of random values dlZ
, and returns the gradients of the discriminator loss with respect to the learnable parameters in the networks.
function gradientsDiscriminator = modelDiscriminatorGradients(discriminatorParameters , generatorParameters, dlX, dlZ) % Calculate the predictions for real data with the discriminator network. dlYPred = modelDiscriminator(dlX,discriminatorParameters); % Calculate the predictions for generated data with the discriminator network. dlXGenerated = modelGenerator(dlZ,generatorParameters); dlYPredGenerated = modelDiscriminator(dlarray(dlXGenerated,'SSCB'),discriminatorParameters); % Calculate the GAN loss lossDiscriminator = ganDiscriminatorLoss(dlYPred,dlYPredGenerated); % For each network, calculate the gradients with respect to the loss. gradientsDiscriminator = dlgradient(lossDiscriminator,discriminatorParameters); end
The modelGeneratorGradients
function takes as input the discriminator and generator learnable parameters and an array of random values dlZ
, and returns the gradients of the generator loss with respect to the learnable parameters in the networks.
function gradientsGenerator = modelGeneratorGradients(discriminatorParameters, generatorParameters , dlZ) % Calculate the predictions for generated data with the discriminator network. dlXGenerated = modelGenerator(dlZ,generatorParameters); dlYPredGenerated = modelDiscriminator(dlarray(dlXGenerated,'SSCB'),discriminatorParameters); % Calculate the GAN loss lossGenerator = ganGeneratorLoss(dlYPredGenerated); % For each network, calculate the gradients with respect to the loss. gradientsGenerator = dlgradient(lossGenerator, generatorParameters); end
The objective of the discriminator is to not be fooled by the generator. To maximize the probability that the discriminator successfully discriminates between the real and generated images, minimize the discriminator loss function. The loss function for the generator follows the DCGAN approach highlighted in [1].
function lossDiscriminator = ganDiscriminatorLoss(dlYPred,dlYPredGenerated) fake = dlarray(zeros(1,size(dlYPred,2))); real = dlarray(ones(1,size(dlYPred,2))); D_loss = mean(sigmoid_cross_entropy_with_logits(dlYPredGenerated,fake)); D_loss = D_loss + mean(sigmoid_cross_entropy_with_logits(dlYPred,real)); lossDiscriminator = D_loss / 2; end
The objective of the generator is to generate data that the discriminator classifies as "real". To maximize the probability that images from the generator are classified as real by the discriminator, minimize the generator loss function. The loss function for the generator follows the deep convolutional generative adverarial network (DCGAN) approach highlighted in [1].
function lossGenerator = ganGeneratorLoss(dlYPredGenerated) real = dlarray(ones(1,size(dlYPredGenerated,2))); lossGenerator = mean(sigmoid_cross_entropy_with_logits(dlYPredGenerated,real)); end
initializeDiscriminatorWeights
initializes discriminator weights using the Glorot algorithm.
function discriminatorParameters = initializeDiscriminatorWeights filterSize = [5 5]; dim = 64; % Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 1 dim]); bias = zeros(1,1,dim,'single'); discriminatorParameters.Conv1.Weights = dlarray(weights); discriminatorParameters.Conv1.Bias = dlarray(bias); % Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) dim 2*dim]); bias = zeros(1,1,2*dim,'single'); discriminatorParameters.Conv2.Weights = dlarray(weights); discriminatorParameters.Conv2.Bias = dlarray(bias); % Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 2*dim 4*dim]); bias = zeros(1,1,4*dim,'single'); discriminatorParameters.Conv3.Weights = dlarray(weights); discriminatorParameters.Conv3.Bias = dlarray(bias); % Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 4*dim 8*dim]); bias = zeros(1,1,8*dim,'single'); discriminatorParameters.Conv4.Weights = dlarray(weights); discriminatorParameters.Conv4.Bias = dlarray(bias); % Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 8*dim 16*dim]); bias = zeros(1,1,16*dim,'single'); discriminatorParameters.Conv5.Weights = dlarray(weights); discriminatorParameters.Conv5.Bias = dlarray(bias); % fully connected weights = iGlorotInitialize([1,4 * 4 * dim * 16]); bias = zeros(1,1,'single'); discriminatorParameters.FC.Weights = dlarray(weights); discriminatorParameters.FC.Bias = dlarray(bias); end
initializeGeneratorWeights
initializes generator weights using the Glorot algorithm.
function generatorParameters = initializeGeneratorWeights dim = 64; % Dense 1 weights = iGlorotInitialize([dim*256,100]); bias = zeros(dim*256,1,'single'); generatorParameters.FC.Weights = dlarray(weights); generatorParameters.FC.Bias = dlarray(bias); filterSize = [5 5]; % Trans Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 8*dim 16*dim]); bias = zeros(1,1,dim*8,'single'); generatorParameters.Conv1.Weights = dlarray(weights); generatorParameters.Conv1.Bias = dlarray(bias); % Trans Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 4*dim 8*dim]); bias = zeros(1,1,dim*4,'single'); generatorParameters.Conv2.Weights = dlarray(weights); generatorParameters.Conv2.Bias = dlarray(bias); % Trans Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 2*dim 4*dim]); bias = zeros(1,1,dim*2,'single'); generatorParameters.Conv3.Weights = dlarray(weights); generatorParameters.Conv3.Bias = dlarray(bias); % Trans Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) dim 2*dim]); bias = zeros(1,1,dim,'single'); generatorParameters.Conv4.Weights = dlarray(weights); generatorParameters.Conv4.Bias = dlarray(bias); % Trans Conv2D weights = iGlorotInitialize([filterSize(1) filterSize(2) 1 dim]); bias = zeros(1,1,1,'single'); generatorParameters.Conv5.Weights = dlarray(weights); generatorParameters.Conv5.Bias = dlarray(bias); end
synthesizeDrumBeat
uses a pretrained network to synthesize drum beats.
function y = synthesizeDrumBeat persistent pGeneratorParameters pMean pSTD if isempty(pGeneratorParameters) % If the MAT file does not exist, download it filename = 'drumGeneratorWeights.mat'; load(filename,'SMean','SStd','generatorParameters'); pMean = SMean; pSTD = SStd; pGeneratorParameters = generatorParameters; end % Generate random vector dlZ = dlarray(2 * ( rand(1,1,100,1,'single') - 0.5 )); % Generate spectrograms dlXGenerated = modelGenerator(dlZ,pGeneratorParameters); % Convert from dlarray to single S = dlXGenerated.extractdata; S = S.'; % Zero-pad to remove edge effects S = [S ; zeros(1,128)]; % Reverse steps from training S = S * 3; S = (S.*pSTD) + pMean; S = exp(S); % Make it two-sided S = [S ; S(end-1:-1:2,:)]; % Pad with zeros at end and start S = [zeros(256,100) S zeros(256,100)]; % Reconstruct the signal using a fast Griffin-Lim algorithm. myAudio = stftmag2sig(gather(S),256, ... 'FrequencyRange','twosided', ... 'Window',hann(256,'periodic'), ... 'OverlapLength',128, ... 'MaxIterations',20, ... 'Method','fgla'); myAudio = myAudio./max(abs(myAudio),[],'all'); y = myAudio(128*100:end-128*100); end
function out = sigmoid_cross_entropy_with_logits(x,z) out = max(x, 0) - x .* z + log(1 + exp(-abs(x))); end function w = iGlorotInitialize(sz) if numel(sz) == 2 numInputs = sz(2); numOutputs = sz(1); else numInputs = prod(sz(1:3)); numOutputs = prod(sz([1 2 4])); end multiplier = sqrt(2 / (numInputs + numOutputs)); w = multiplier * sqrt(3) * (2 * rand(sz,'single') - 1); end
[1] Donahue, C., J. McAuley, and M. Puckette. 2019. "Adversarial Audio Synthesis." ICLR.