Scale Up Deep Learning in Parallel and in the Cloud

Deep Learning on Multiple GPUs

Neural networks are inherently parallel algorithms. You can take advantage of this parallelism by using Parallel Computing Toolbox™ to distribute training across multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs.

Training deep networks is extremely computationally intensive and you can usually accelerate training by using a high performance GPU. If you do not have a suitable GPU, you can train on one or more CPU cores instead, or rent GPUs in the cloud. You can train a convolutional neural network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a cluster. Using GPU or any parallel option requires Parallel Computing Toolbox.

Tip

GPU support is automatic. By default, the trainNetwork function uses a GPU if available.

If you have access to a machine with multiple GPUs, simply specify the training option 'ExecutionEnvironment','multi-gpu'.

If you want to use more resources, you can scale up deep learning training to the cloud.

Deep Learning Built-In Parallel Support

Training Resource	Settings	Learn More
Single GPU on local machine	Automatic. By default, the `trainNetwork` function uses a GPU if available.	'ExecutionEnvironment' Create Simple Deep Learning Network for Classification
Multiple GPUs on local machine	Specify `'ExecutionEnvironment','multi-gpu'` with the `trainingOptions` function.	'ExecutionEnvironment' Select Particular GPUs to Use for Training
Multiple CPU cores on local machine	Specify `'ExecutionEnvironment','parallel'`. With default settings, `'parallel'` uses the local cluster profile. Only use CPUs if you do not have a GPU, because CPUs are generally far slower than GPUs for training.	'ExecutionEnvironment'
Cluster or in the cloud	After setting a default cluster, specify `'ExecutionEnvironment','parallel'` with the `trainingOptions` function. Training executes on the cluster and returns the built-in progress plot to your local MATLAB^®.	Train Network in the Cloud Using Automatic Parallel Support

Train Multiple Deep Networks in Parallel

Training Scenario Recommendations Learn More

Interactively on your local machine or in the cloud Use a parfor loop to train multiple networks, and plot results using the OutputFcn. Runs locally by default, or choose a different cluster profile. Use parfor to Train Multiple Deep Learning Networks

In the background on your local machine or in the cloud

Training Scenario	Recommendations	Learn More
Interactively on your local machine or in the cloud	Use a `parfor` loop to train multiple networks, and plot results using the `OutputFcn`. Runs locally by default, or choose a different cluster profile.	Use parfor to Train Multiple Deep Learning Networks
In the background on your local machine or in the cloud	Use `parfeval` to train without blocking your local MATLAB, and plot results using the `OutputFcn`. Runs locally by default, or choose a different cluster profile.	Train Deep Learning Networks in Parallel Use parfeval to Train Multiple Deep Learning Networks
On a cluster, and turn off your local machine	Use the `batch` function to send training code to the cluster. You can close MATLAB and fetch results later.	Send Deep Learning Batch Job to Cluster

Use parfeval to train without blocking your local MATLAB, and plot results using the OutputFcn. Runs locally by default, or choose a different cluster profile.

Train Deep Learning Networks in Parallel

Use parfeval to Train Multiple Deep Learning Networks

On a cluster, and turn off your local machine Use the batch function to send training code to the cluster. You can close MATLAB and fetch results later. Send Deep Learning Batch Job to Cluster

Deep Learning in the Cloud

If your deep learning training takes hours or days, you can rent high performance GPUs in the cloud to accelerate training. Working in the cloud requires some initial setup, but after the initial setup using the cloud can reduce training time, or allow you to train more networks in the same time. To try deep learning in the cloud, you can follow example steps to set up your accounts, copy your data into the cloud, and create a cluster. After this initial setup, you can run your training code with minimal changes to run in the cloud. After setting up your default cluster, simply specify the training option 'ExecutionEnvironment','parallel' to train networks on your cloud cluster on multiple GPUs.

Configure Deep Learning in the Cloud	Notes	Learn More
Set up MathWorks Cloud Center and Amazon accounts	One-time setup.	Getting Started with Cloud Center
Create a cluster	Use Cloud Center to set up and run clusters in the Amazon cloud. For deep learning, choose a machine type with GPUs such as the P2 or G3 instances.	Create a Cloud Cluster
Upload data to the cloud	To work with data in the cloud, upload to Amazon S3. Use datastores to access the data in S3 from your desktop client MATLAB, or from your cluster workers, without changing your code.	Upload Deep Learning Data to the Cloud

Advanced Support for Fast Multi-Node GPU Communication

If you are using a Linux compute cluster with fast interconnects between machines such as Infiniband, or fast interconnects between GPUs on different machines, such as GPUDirect RDMA, you might be able to take advantage of fast multi-node support in MATLAB. Enable this support on all the workers in your pool by setting the environment variable PARALLEL_SERVER_FAST_MULTINODE_GPU_COMMUNICATION to 1. Set this environment variable in the Cluster Profile Manager.

This feature is part of the NVIDIA NCCL library for GPU communication. To configure it, you must set additional environment variables to define the network interface protocol, especially NCCL_SOCKET_IFNAME. For more information, see the NCCL documentation and in particular the section on NCCL Environment Variables.

Documentation