Neural networks are inherently parallel algorithms. You can take advantage of this parallelism by using Parallel Computing Toolbox™ to distribute training across multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs.
Training deep networks is extremely computationally intensive and you can usually accelerate training by using a high performance GPU. If you do not have a suitable GPU, you can train on one or more CPU cores instead, or rent GPUs in the cloud. You can train a convolutional neural network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a cluster. Using GPU or any parallel option requires Parallel Computing Toolbox.
Tip
GPU support is automatic. By default, the trainNetwork
function uses a GPU if
available.
If you have access to a machine with multiple GPUs, simply specify the
training option 'ExecutionEnvironment','multi-gpu'
.
If you want to use more resources, you can scale up deep learning training to the cloud.
Training Resource | Settings | Learn More |
---|---|---|
Single GPU on local machine | Automatic. By default, the | |
Multiple GPUs on local machine | Specify
| |
Multiple CPU cores on local machine | Specify
With default settings,
| |
Cluster or in the cloud | After setting a default cluster, specify
Training executes on the cluster and returns the built-in progress plot to your local MATLAB®. |
Training Scenario | Recommendations | Learn More |
---|---|---|
Interactively on your local machine or in the cloud | Use a parfor loop to train multiple
networks, and plot results using the
OutputFcn . Runs locally by default, or
choose a different cluster profile. | Use parfor to Train Multiple Deep Learning Networks |
In the background on your local machine or in the cloud | Use parfeval to train without blocking
your local MATLAB, and plot results using the
OutputFcn . Runs locally by default, or
choose a different cluster profile. | |
On a cluster, and turn off your local machine | Use the batch function to send training
code to the cluster. You can close MATLAB and fetch results
later. | Send Deep Learning Batch Job to Cluster |
If your deep learning training takes hours or days, you can rent high performance
GPUs in the cloud to accelerate training. Working in the cloud requires some initial
setup, but after the initial setup using the cloud can reduce training time, or
allow you to train more networks in the same time. To try deep learning in the
cloud, you can follow example steps to set up your accounts, copy your data into the
cloud, and create a cluster. After this initial setup, you can run your training
code with minimal changes to run in the cloud. After setting up your default
cluster, simply specify the training option
'ExecutionEnvironment','parallel'
to train networks on your
cloud cluster on multiple GPUs.
Configure Deep Learning in the Cloud | Notes | Learn More |
---|---|---|
Set up MathWorks Cloud Center and Amazon accounts | One-time setup. | Getting Started with Cloud Center |
Create a cluster | Use Cloud Center to set up and run clusters in the Amazon cloud. For deep learning, choose a machine type with GPUs such as the P2 or G3 instances. | Create a Cloud Cluster |
Upload data to the cloud | To work with data in the cloud, upload to Amazon S3. Use datastores to access the data in S3 from your desktop client MATLAB, or from your cluster workers, without changing your code. | Upload Deep Learning Data to the Cloud |
If you are using a Linux compute cluster with fast interconnects between machines
such as Infiniband, or fast interconnects between GPUs on different machines, such
as GPUDirect RDMA, you might be able to take advantage of fast multi-node support in
MATLAB. Enable this support on all the workers in your pool by setting the
environment variable
PARALLEL_SERVER_FAST_MULTINODE_GPU_COMMUNICATION
to
1
. Set this environment variable in the Cluster Profile
Manager.
This feature is part of the NVIDIA NCCL library for GPU communication. To
configure it, you must set additional environment variables to define the network
interface protocol, especially NCCL_SOCKET_IFNAME
. For more
information, see the NCCL documentation and in particular the section on NCCL Environment Variables.