With GPU Coder™, you can generate optimized code for prediction of a variety of trained deep
learning networks from Deep Learning Toolbox™. The generated code implements the deep convolutional neural network (CNN) by
using the architecture, the layers, and parameters that you specify in the input SeriesNetwork
(Deep Learning Toolbox) or
DAGNetwork
(Deep Learning Toolbox) object.
The code generator takes advantage of the ARM®
Compute
Library for computer vision and machine learning. For performing deep learning on
ARM Mali GPU targets, you generate code on the host development computer. Then, to
build and run the executable program move the generated code to the ARM target platform. For example, HiKey960 is one of the target platforms that can
execute the generated code.
Deep Learning Toolbox.
Deep Learning Toolbox Model for MobileNet-v2 Network support package.
GPU Coder Interface for Deep Learning Libraries support package. To install the support packages, select the support package from the MATLAB® Add-Ons menu.
ARM Compute Library for computer vision and machine learning must be installed on the target hardware. For information on the supported versions of the compilers and libraries, see Installing Prerequisite Products.
Environment variables for the compilers and libraries. For more information, see Environment Variables.
Load the pretrained MobileNet-v2 network. You can choose to load a different pretrained network for image classification. If you do not have the required support packages installed, the software provides a download link.
net = mobilenetv2;
The object net
contains the DAGNetwork
object.
Use the analyzeNetwork
(Deep Learning Toolbox) function to display an interactive visualization of the
network architecture, to detect errors and issues in the network, and to display
detailed information about the network layers. The layer information includes the sizes
of layer activations and learnable parameters, the total number of learnable parameters,
and the sizes of state parameters of recurrent layers.
analyzeNetwork(net);
The image that you want to classify must have the same size as the input size of the
network. For GoogLeNet, the size of the imageInputLayer
(Deep Learning Toolbox) is 224-by-224-by-3. The Classes
property of the output classificationLayer
(Deep Learning Toolbox) contains the names of the classes learned by the
network. View 10 random class names out of the total of 1000.
classNames = net.Layers(end).Classes; numClasses = numel(classNames); disp(classNames(randperm(numClasses,10)))
cock apiary soap dispenser titi car wheel guenon muzzle agaric buckeye megalith
For more information, see List of Deep Learning Layers (Deep Learning Toolbox).
cnncodegen
To generate code with the ARM Compute Library, use the targetlib
option of the cnncodegen
command. The cnncodegen
command generates C++ code for the
SeriesNetwork
or DAGNetwork
network object.
Call cnncodegen
with 'targetlib'
specified
as 'arm-compute-mali'
. For example:
net = googlenet; cnncodegen(net,'targetlib','arm-compute-mali','batchsize',1);
For 'arm-compute-mali'
, the value of
batchsize
must be 1
.
The 'targetparams'
name-value pair arguments that enable you to
specify Library-specific parameters for the ARM Compute Library is not applicable when targeting ARM Mali GPUs.
The cnncodegen
command generates code, a makefile,
cnnbuild_rtw.mk
, and other supporting files to build the
generated code on the target hardware. The command places all the generated files in
the codegen
folder.
Write a C++ main function that calls predict
. For an example
main file that interfaces with the generated code, see Deep Learning Prediction on ARM Mali GPU
Move the generated codegen
folder and other files from the host
development computer to the ARM hardware by using your preferred Secure File Copy (SCP) and Secure Shell
(SSH) client. Build the executable program on the target.
The DAG network is generated as a C++ class (CnnMain
) containing an
array of 103 layer classes. The code generator reduces the number of layers is by layer
fusion optimization of convolutional and batch normalization layers. A snippet of the
class declaration from cnn_exec.hpp
file is shown.
The setup()
method of the class sets up handles and allocates
memory for each layer of the network object.
The predict()
method invokes prediction for each of the 103
layers in the network.
The cnn_exec.cpp
file contains the definitions of the object
functions for the CnnMain
class.
Binary files are exported for layers with parameters such as fully connected and
convolution layers in the network. For instance, files
cnn_CnnMain_Conv*_w
and cnn_CnnMain_Conv*_b
correspond to weights and bias parameters for the convolutional
layers
in the network. The code generator places these binary files in the
codegen
folder. The code generator builds the library file
cnnbuild
and places all the generated files in the
codegen
folder.
Code generation for the ARM Mali GPU is not supported for a 2-D grouped convolution layer that has
the NumGroups
property set as 'channel-wise'
or
a value greater than two.