Training deep networks is computationally intensive; however, neural networks are inherently parallel algorithms. You can usually accelerate training of convolutional neural networks by distributing training in parallel across multicore CPUs, high-performance GPUs, and clusters with multiple CPUs and GPUs. Using GPU or parallel options requires Parallel Computing Toolbox™.
Tip
GPU support is automatic if you have Parallel Computing Toolbox. By default, the trainNetwork
function uses a GPU if available.
If you have access to a machine with multiple GPUs, then simply specify the training
option 'ExecutionEnvironment','multi-gpu'
.
You do not need multiple computers to solve
problems using data sets too large to fit in memory. You can use the augmentedImageDatastore
function to work with batches of data without
needing a cluster of machines. For an example, see Train Network with Augmented Images. However, if you have a cluster available,
it can be helpful to take your code to the data repository rather than moving large amounts
of data around.
Deep Learning Hardware and Memory Considerations | Recommendations | Required Products |
---|---|---|
Data too large to fit in memory | To import data from image collections that are too large to fit in
memory, use the augmentedImageDatastore function. This function is designed
to read batches of images for faster processing in machine learning and
computer vision applications. | MATLAB® Deep Learning Toolbox™ |
CPU | If you do not have a suitable GPU, then you can train on a CPU instead.
By default, the trainNetwork function uses the
CPU if no GPU is available. | MATLAB Deep Learning Toolbox |
GPU | By default, the trainNetwork function uses a
GPU if available. Requires a CUDA® enabled NVIDIA®
GPU with compute capability 3.0 or higher. Check your GPU using
gpuDevice . Specify the execution environment using
the trainingOptions
function. | MATLAB Deep Learning Toolbox Parallel Computing Toolbox |
Parallel on your local machine using multiple GPUs or CPU cores | Take advantage of multiple workers by specifying the execution
environment with the trainingOptions function. If
you have more than one GPU on your machine, specify
'multi-gpu' . Otherwise, specify
'parallel' . | MATLAB Deep Learning Toolbox Parallel Computing Toolbox |
Parallel on a cluster or in the cloud | Scale up to use workers on clusters or in the cloud to accelerate your
deep learning computations. Use trainingOptions and specify
'parallel' to use a compute cluster. For more
information, see Deep Learning in the Cloud. | MATLAB Deep Learning Toolbox Parallel Computing Toolbox MATLAB Parallel Server™ |
Tip
To learn more, see Scale Up Deep Learning in Parallel and in the Cloud.
All functions for deep learning training, prediction, and validation in
Deep Learning Toolbox perform computations using single-precision, floating-point arithmetic.
Functions for deep learning include trainNetwork
, predict
,
classify
, and
activations
.
The software uses single-precision arithmetic when you train networks using both CPUs and
GPUs.
Because single-precision and double-precision performance of GPUs can differ substantially, it is important to know in which precision computations are performed. If you only use a GPU for deep learning, then single-precision performance is one of the most important characteristics of a GPU. If you also use a GPU for other computations using Parallel Computing Toolbox, then high double-precision performance is important. This is because many functions in MATLAB use double-precision arithmetic by default. For more information, see Improve Performance Using Single Precision Calculations (Parallel Computing Toolbox).
MATLAB supports training a single network using multiple GPUs in parallel. This can be achieved using multiple GPUs on your local machine, or on a cluster or cloud with workers with GPUs. To speed up training using multiple GPUs, try increasing the mini-batch size and learning rate.
Enable multi-GPU training on your local machine by setting the 'ExecutionEnvironment'
option to 'multi-gpu'
with the trainingOptions
function.
On a cluster or cloud, set the 'ExecutionEnvironment'
option to 'parallel'
with the trainingOptions
function.
Convolutional neural networks are typically trained iteratively using batches of
images. This is done because the whole dataset is too large to fit into GPU memory. For
optimum performance, you can experiment with the MiniBatchSize
option that you specify with the trainingOptions
function.
The optimal batch size depends on your exact network, dataset, and GPU hardware. When training with multiple GPUs, each image batch is distributed between the GPUs. This effectively increases the total GPU memory available, allowing larger batch sizes. Because it improves the significance of each batch, you can increase the learning rate. A good general guideline is to increase the learning rate proportionally to the increase in batch size. Depending on your application, a larger batch size and learning rate can speed up training without a decrease in accuracy, up to some limit.
Using multiple GPUs can speed up training significantly. To decide if you expect multi-GPU training to deliver a performance gain, consider the following factors:
How long is the iteration on each GPU? If each GPU iteration is short, then the added overhead of communication between GPUs can dominate. Try increasing the computation per iteration by using a larger batch size.
Are all the GPUs on a single machine? Communication between GPUs on different machines introduces a significant communication delay. You can mitigate this if you have suitable hardware. For more information, see Advanced Support for Fast Multi-Node GPU Communication.
To learn more, see Scale Up Deep Learning in Parallel and in the Cloud and Select Particular GPUs to Use for Training.
If you do not have a suitable GPU available for faster training of a convolutional neural network, you can try your deep learning applications with multiple high-performance GPUs in the cloud, such as on Amazon® Elastic Compute Cloud (Amazon EC2®). MATLAB Deep Learning Toolbox provides examples that show you how to perform deep learning in the cloud using Amazon EC2 with P2 or P3 machine instances and data stored in the cloud.
You can accelerate training by using multiple GPUs on a single machine or in a cluster of machines with multiple GPUs. Train a single network using multiple GPUs, or train multiple models at once on the same data.
For more information on the complete cloud workflow, see Deep Learning in Parallel and in the Cloud.
When training a network in parallel, you can fetch and preprocess data in the
background. To perform data dispatch in the background, enable background dispatch in
the mini-batch datastore used by trainNetwork
. You can use a
built-in mini-batch datastore, such as augmentedImageDatastore
, denoisingImageDatastore
(Image Processing Toolbox), or
pixelLabelImageDatastore
(Computer Vision Toolbox). You
can also use a custom mini-batch datastore with background dispatch enabled. For more
information on creating custom mini-batch datastores, see Develop Custom Mini-Batch Datastore.
To enable background dispatch, set the DispatchInBackground
property
of the datastore to true
.
You can fine-tune the training computation and data dispatch loads between workers by
specifying the 'WorkerLoad'
name-value pair
argument of trainingOptions
. For advanced options, you can try
modifying the number of workers of the parallel pool. For more information, see Specify Your Parallel Preferences (Parallel Computing Toolbox)
trainingOptions
| trainNetwork