Deep Learning with MATLAB on Multiple GPUs

Neural networks are inherently parallel algorithms. You can take advantage of this parallelism by using Parallel Computing Toolbox™ to distribute training across multicore CPUs, GPUs, and clusters of computers with multiple CPUs and GPUs.

If you have access to a machine with multiple GPUs, you can simply specify the training option 'multi-gpu' using the trainingOptions function. When training with multiple GPUs, each image batch is distributed between the GPUs. For more information about training with multiple GPUs, see Training with Multiple GPUs.

If you want to use more resources, you can scale up deep learning training to clusters or the cloud. To learn more about parallel options, see Scale Up Deep Learning in Parallel and in the Cloud. To try an example, see Train Network in the Cloud Using Automatic Parallel Support.

Select Particular GPUs to Use for Training

To use all available GPUs on your machine, simply specify the training option 'ExecutionEnvironment','multi-gpu'.

To select one of multiple GPUs to use to train a single model, use:

gpuDevice(index)

If you want to train a single model using multiple GPUs, and do not want to use all your GPUs, open the parallel pool in advance, and select the GPUs manually. To select particular GPUs, use the following code, where gpuIndices are the indices of the GPUs:

parpool('local', numel(gpuIndices));
spmd, gpuDevice(gpuIndices(labindex)); end

When you run trainNetwork with the ‘multi-gpu’ ExecutionEnvironment (or ‘parallel’ for the same result), the training function will use this pool and not open a new one.

Another option is to select workers using the ‘WorkerLoad’ option in trainingOptions. For example:

parpool('local', 5);
opts = trainingOptions('sgdm', 'WorkerLoad', [1 1 1 0 1], ...)

In this case, the 4th worker is part of the pool but idle, which is not an ideal use of the parallel resources. It is more efficient to specify GPUs with gpuDevice.

If you want to train multiple models with one GPU each, start a MATLAB session for each and select a device using gpuDevice.

Alternatively, use a parfor loop:

parfor i=1:gpuDeviceCount
 trainNetwork(…); 
end

Train Network in the Cloud Using Automatic Parallel Support

This example uses:

Open Live Script

This example shows how to train a convolutional neural network using MATLAB automatic support for parallel training. Deep learning training often takes hours or days. With parallel computing, you can speed up training using multiple graphical processing units (GPUs) locally or in a cluster in the cloud. If you have access to a machine with multiple GPUs, then you can complete this example on a local copy of the data. If you want to use more resources, then you can scale up deep learning training to the cloud. To learn more about your options for parallel training, see Scale Up Deep Learning in Parallel and in the Cloud. This example guides you through the steps to train a deep learning network in a cluster in the cloud using MATLAB automatic parallel support.

Requirements

Before you can run the example, you need to configure a cluster and upload data to the cloud. In MATLAB, you can create clusters in the cloud directly from the MATLAB Desktop. On the Home tab, in the Parallel menu, select Create and Manage Clusters. In the Cluster Profile Manager, click Create Cloud Cluster. Alternatively, you can use MathWorks Cloud Center to create and access compute clusters. For more information, see Getting Started with Cloud Center. After that, upload your data to an Amazon S3 bucket and access it directly from MATLAB. This example uses a copy of the CIFAR-10 data set that is already stored in Amazon S3. For instructions, see Upload Deep Learning Data to the Cloud.

Set Up Parallel Pool

Start a parallel pool in the cluster and set the number of workers to the number of GPUs in your cluster. If you specify more workers than GPUs, then the remaining workers are idle. This example assumes that the cluster you are using is set as the default cluster profile. Check the default cluster profile on the MATLAB Home tab, in Parallel > Select a Default Cluster.

numberOfWorkers = 8;
parpool(numberOfWorkers);

Starting parallel pool (parpool) using the 'MyClusterInTheCloud' profile ...
connected to 8 workers.

Load Data Set from the Cloud

Load the training and test data sets from the cloud using imageDatastore. In this example, you use a copy of the CIFAR-10 data set stored in Amazon S3. To ensure that the workers have access to the datastore in the cloud, make sure that the environment variables for the AWS credentials are set correctly. See Upload Deep Learning Data to the Cloud.

imdsTrain = imageDatastore('s3://cifar10cloud/cifar10/train', ...
 'IncludeSubfolders',true, ...
 'LabelSource','foldernames');

imdsTest = imageDatastore('s3://cifar10cloud/cifar10/test', ...
 'IncludeSubfolders',true, ...
 'LabelSource','foldernames');

Train the network with augmented image data by creating an augmentedImageDatastore object. Use random translations and horizontal reflections. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

imageSize = [32 32 3];
pixelRange = [-4 4];
imageAugmenter = imageDataAugmenter( ...
    'RandXReflection',true, ...
    'RandXTranslation',pixelRange, ...
    'RandYTranslation',pixelRange);
augmentedImdsTrain = augmentedImageDatastore(imageSize,imdsTrain, ...
    'DataAugmentation',imageAugmenter, ...
    'OutputSizeMode','randcrop');

Define Network Architecture and Training Options

Define a network architecture for the CIFAR-10 data set. To simplify the code, use convolutional blocks that convolve the input. The pooling layers downsample the spatial dimensions.

blockDepth = 4; % blockDepth controls the depth of a convolutional block
netWidth = 32; % netWidth controls the number of filters in a convolutional block

layers = [
    imageInputLayer(imageSize) 
    
    convolutionalBlock(netWidth,blockDepth)
    maxPooling2dLayer(2,'Stride',2)
    convolutionalBlock(2*netWidth,blockDepth)
    maxPooling2dLayer(2,'Stride',2)    
    convolutionalBlock(4*netWidth,blockDepth)
    averagePooling2dLayer(8) 
    
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer
];

Define the training options. Train the network in parallel using the current cluster, by setting the execution environment to parallel. When you use multiple GPUs, you increase the available computational resources. Scale up the mini-batch size with the number of GPUs to keep the workload on each GPU constant. Scale the learning rate according to the mini-batch size. Use a learning rate schedule to drop the learning rate as the training progresses. Turn on the training progress plot to obtain visual feedback during training.

miniBatchSize = 256 * numberOfWorkers;
initialLearnRate = 1e-1 * miniBatchSize/256;

options = trainingOptions('sgdm', ...
    'ExecutionEnvironment','parallel', ... % Turn on automatic parallel support.
    'InitialLearnRate',initialLearnRate, ... % Set the initial learning rate.
    'MiniBatchSize',miniBatchSize, ... % Set the MiniBatchSize.
    'Verbose',false, ... % Do not send command line output.
    'Plots','training-progress', ... % Turn on the training progress plot.
    'L2Regularization',1e-10, ...
    'MaxEpochs',50, ...
    'Shuffle','every-epoch', ...
    'ValidationData',imdsTest, ...
    'ValidationFrequency',floor(numel(imdsTrain.Files)/miniBatchSize), ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropFactor',0.1, ...
    'LearnRateDropPeriod',45);

Train Network and Use for Classification

Train the network in the cluster. During training, the plot displays the progress.

net = trainNetwork(augmentedImdsTrain,layers,options)

net = 
  SeriesNetwork with properties:

    Layers: [43×1 nnet.cnn.layer.Layer]

Determine the accuracy of the network, by using the trained network to classify the test images on your local machine. Then compare the predicted labels to the actual labels.

YPredicted = classify(net,imdsTest);
accuracy = sum(YPredicted == imdsTest.Labels)/numel(imdsTest.Labels)

Define Helper Function

Define a function to create a convolutional block in the network architecture.

function layers = convolutionalBlock(numFilters,numConvLayers)
    layers = [
        convolution2dLayer(3,numFilters,'Padding','same')
        batchNormalizationLayer
        reluLayer
    ];
    
    layers = repmat(layers,numConvLayers,1);
end

Documentation