cnncodegen

Generate code and build static library for Series or DAG Network

Description

cnncodegen(net,'targetlib',libraryname) generates CUDA® C++ code and builds a static library for the specified network object and target library by using default values for all properties.

example

cnncodegen(net,'targetlib',libraryname,Name,Value) generates CUDA C++ code and builds a static library for the specified network object and target library with additional code generation options specified by one or more Name,Value pair arguments.

Examples

collapse all

Use cnncodegen to generate C++ code for a pretrained network for deployment to an ARM® processor.

Get the pretrained GoogLeNet model by using the googlenet (Deep Learning Toolbox) function. This function requires the Deep Learning Toolbox™ Model for GoogLeNet Network. If you have not installed this support package, the function provides a download link. Alternatively, see https://www.mathworks.com/matlabcentral/fileexchange/64456-deep-learning-toolbox-model-for-googlenet-network.

net = googlenet;

Generate code by using cnncodegen with 'targetlib' set to 'arm-compute'. For 'arm-compute', you must provide the 'ArmArchitecture' parameter.

cnncodegen(net,'targetlib','arm-compute'...
,'targetparams',struct('ArmComputeVersion','19.02','ArmArchitecture','armv8'));

Generate CUDA C++ code from a SeriesNetwork object created for the YOLO architecture, trained for classifying the PASCAL dataset. This example requires the GPU Coder™ product and GPU Coder Interface for Deep Learning Libraries.

Get the pretrained YOLO network and convert it into a SeriesNetwork object.

url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/Yolo/yolonet.mat';
websave('yolonet.mat',url);
net = coder.loadDeepLearningNetwork('yolonet.mat');

The SeriesNetwork object net contains 58 layers. These layers are convolution layers followed by leaky ReLU and fully connected layers at the end of the network architecture. You can use net.Layers to see the all the layers in this network.

Use the cnncodegen function to generate CUDA code.

cnncodegen(net,'targetlib','cudnn');

The code generator generates the .cu and header files in the '/pwd/codegen' folder. The series network is generated as a C++ class called CnnMain, containing an array of 58 layer classes. The setup() method of this class sets up handles and allocates resources for each layer object. The predict() method invokes prediction for each of the 58 layers in the network. The cleanup() method releases all the memory and system resources allocated for each layer object. All the binary weights (cnn_**_w) and the bias files (cnn_**_b) for the convolution layers of the network are stored in the codegen folder. The files are compiled into the static library cnnbuild.a (on Linux®) or cnnbuild.lib (on Windows®).

Input Arguments

collapse all

Pretrained SeriesNetwork or DAGNetwork object.

The target library and the target platform to generate code for, specified as one of the values in this table.

ValueDescription
'arm-compute'

Target an ARM CPU processor supporting NEON instructions by using the ARM Compute Library for computer vision and machine learning.

Requires the MATLAB® Coder™ Interface for Deep Learning Libraries.

'arm-compute-mali'

Target an ARM GPU processor by using the ARM Compute Library for computer vision and machine learning.

Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.

'cudnn'

Target NVIDIA® GPUs by using the CUDA Deep Neural Network library (cuDNN).

Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.

'mkldnn'

Target Intel® CPU processor by using the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN).

Requires the MATLAB Coder Interface for Deep Learning Libraries.

'tensorrt'

Target NVIDIA GPUs by using NVIDIA TensorRT™, a high performance deep learning inference optimizer and run-time library.

Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: cnncodegen(net,'targetlib','mkldnn','codegenonly',0,'batchsize',1) generates C++ code for the Intel processor by using MKL-DNN and builds a static library for the network object in net.
General Options

collapse all

A positive nonzero integer value specifying the number of observations to operate on in a single call to the network predict() method. When calling network->predict(), the size of the input data must match the batchsize value specified during cnncodegen.

If libraryname is 'arm-compute' or 'arm-compute-mali', the value of batchsize must be 1.

Boolean flag that, when enabled, generates CUDA C++ code without generating and building a makefile.

Library-specific parameters specified as a 1-by-1 structure containing the fields described in these tables.

Parameters for ARM Compute Library (CPU)

Field

Description

ArmComputeVersion

Version of ARM Compute Library on the target hardware, specified as '18.05', '18.08', '18.11', '19.02', or '19.05'. The default value is '19.05'. If you set ArmComputeVersion to a version later than '19.05', ArmComputeVersion is set to '19.05'.

ArmArchitecture

ARM architecture supported on the target hardware, specified as 'armv7 or 'armv8'. The specified architecture must be the same as the architecture for the ARM Compute Library on the target hardware.

ArmArchitecture is a required parameter.

Parameters for ARM Compute Library (Mali GPU)

Field

Description

ArmComputeVersion

Version of ARM Compute Library on the target hardware, specified as '19.02' or '19.05'. The default value is '19.05'. If you set ArmComputeVersion to a version later than '19.05', ArmComputeVersion is set to '19.05'.

Parameters for NVIDIA cuDNN Library

Field

Description

AutoTuning

Enable or disable auto tuning feature. Enabling auto tuning allows the cuDNN library to find the fastest convolution algorithms. This increases performance for larger networks such as SegNet and ResNet. Default value is true.

Note

If AutoTuning is enabled for TensorRT targets, the software generates code with auto tuning disabled. It does so without generating any warning or error messages.

DataType

Specify the precision of the tensor data type input to the network. When performing inference in 32-bit floats, use 'FP32'. For 8-bit integer, use 'INT8'. Default value is 'FP32'.

The computecapability argument must be set to '6.1' or higher if the DataType is set to 'INT8'. Compute capability of 6.2 does not support INT8 precision.

CalibrationResultFile

Location of the MAT-file containing the calibration data. Default value is ''. This option is applicable only when DataType is set to 'INT8'.

Parameters for NVIDIA TensorRT Library

Field

Description

DataType

Specify the precision of the tensor data type input to the network or the tensor output of a layer. When performing inference in 32-bit floats, use 'FP32'. For 8-bit integer, use 'INT8'. For half-precision, use 'FP16' Default value is 'FP32'.

The computecapability argument must be set to '7.0' or higher if the DataType is set to 'FP16'.

The computecapability argument must be set to '6.1' or higher if the DataType is set to 'INT8'. Compute capability of 6.2 does not support INT8 precision.

DataPath

Location of the image dataset used during recalibration. Default value is ''. This option is applicable only when DataType is set to 'INT8'.

When you select the 'INT8' option, TensorRT quantizes the floating-point data to int8. The recalibration is performed with a reduced set of the calibration data. The calibration data must be present in the image data location specified by DataPath.

NumCalibrationBatches

Numeric value specifying the number of batches for int8 calibration. The software uses the product of batchsize*NumCalibrationBatches to pick a random subset of images from the image dataset to perform calibration. The batchsize*NumCalibrationBatches value must not be greater than the number of images present in the image dataset. Default value is 50. This option is applicable only when DataType is set to 'INT8'.

NVIDIA recommends that about 500 images are sufficient for calibrating. Refer to the TensorRT documentation for more information.

GPU Options (GPU Coder Only)

collapse all

This property affects GPU targeting only.

Character vector or string scalar specifying the NVIDIA GPU compute capability to compile for. Argument takes the format of major#.minor#.

Possible values are '3.2'|'3.5'|'3.7'|'5.0'|'5.2'|'5.3'|'6.0'|'6.1'|'6.2'|'7.0'|'7.1'|'7.2'.

Default value is '3.5'.

Compatibility Considerations

expand all

Behavior change in future release

Introduced in R2017b