cnncodegen

Generate code and build static library for Series or DAG Network

Syntax

cnncodegen(net,'targetlib',libraryname)

cnncodegen(net,'targetlib',libraryname,Name,Value)

Description

cnncodegen(net,'targetlib',libraryname) generates CUDA^® C++ code and builds a static library for the specified network object and target library by using default values for all properties.

example

cnncodegen(net,'targetlib',libraryname,Name,Value) generates CUDA C++ code and builds a static library for the specified network object and target library with additional code generation options specified by one or more Name,Value pair arguments.

Examples

collapse all

Generate C++ Code for a Pretrained Network to Run on an ARM Processor

Use cnncodegen to generate C++ code for a pretrained network for deployment to an ARM^® processor.

Get the pretrained GoogLeNet model by using the googlenet (Deep Learning Toolbox) function. This function requires the Deep Learning Toolbox™ Model for GoogLeNet Network. If you have not installed this support package, the function provides a download link. Alternatively, see https://www.mathworks.com/matlabcentral/fileexchange/64456-deep-learning-toolbox-model-for-googlenet-network.

net = googlenet;

Generate code by using cnncodegen with 'targetlib' set to 'arm-compute'. For 'arm-compute', you must provide the 'ArmArchitecture' parameter.

cnncodegen(net,'targetlib','arm-compute'...
,'targetparams',struct('ArmComputeVersion','19.02','ArmArchitecture','armv8'));

Generate Code for the YOLO Network to Run on NVIDIA GPU

Generate CUDA C++ code from a SeriesNetwork object created for the YOLO architecture, trained for classifying the PASCAL dataset. This example requires the GPU Coder™ product and GPU Coder Interface for Deep Learning Libraries.

Get the pretrained YOLO network and convert it into a SeriesNetwork object.

url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/Yolo/yolonet.mat';
websave('yolonet.mat',url);
net = coder.loadDeepLearningNetwork('yolonet.mat');

The SeriesNetwork object net contains 58 layers. These layers are convolution layers followed by leaky ReLU and fully connected layers at the end of the network architecture. You can use net.Layers to see the all the layers in this network.

Use the cnncodegen function to generate CUDA code.

cnncodegen(net,'targetlib','cudnn');

The code generator generates the .cu and header files in the '/pwd/codegen' folder. The series network is generated as a C++ class called CnnMain, containing an array of 58 layer classes. The setup() method of this class sets up handles and allocates resources for each layer object. The predict() method invokes prediction for each of the 58 layers in the network. The cleanup() method releases all the memory and system resources allocated for each layer object. All the binary weights (cnn_**_w) and the bias files (cnn_**_b) for the convolution layers of the network are stored in the codegen folder. The files are compiled into the static library cnnbuild.a (on Linux^®) or cnnbuild.lib (on Windows^®).

Input Arguments

collapse all

`net` — Name of the series or DAG network object
character vector | string scalar

Pretrained SeriesNetwork or DAGNetwork object.

`libraryname` — Deep learning target library
character vector | string scalar

The target library and the target platform to generate code for, specified as one of the values in this table.

Value	Description
`'arm-compute'`	Target an ARM CPU processor supporting `NEON` instructions by using the ARM Compute Library for computer vision and machine learning. Requires the MATLAB^® Coder™ Interface for Deep Learning Libraries.
`'arm-compute-mali'`	Target an ARM GPU processor by using the ARM Compute Library for computer vision and machine learning. Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.
`'cudnn'`	Target NVIDIA^® GPUs by using the CUDA Deep Neural Network library (cuDNN). Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.
`'mkldnn'`	Target Intel^® CPU processor by using the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN). Requires the MATLAB Coder Interface for Deep Learning Libraries.
`'tensorrt'`	Target NVIDIA GPUs by using NVIDIA TensorRT™, a high performance deep learning inference optimizer and run-time library. Requires the GPU Coder product and the GPU Coder Interface for Deep Learning Libraries.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: cnncodegen(net,'targetlib','mkldnn','codegenonly',0,'batchsize',1) generates C++ code for the Intel processor by using MKL-DNN and builds a static library for the network object in net.

General Options

collapse all

`'batchsize'` — Size
1 (default) | positive integer

A positive nonzero integer value specifying the number of observations to operate on in a single call to the network predict() method. When calling network->predict(), the size of the input data must match the batchsize value specified during cnncodegen.

If libraryname is 'arm-compute' or 'arm-compute-mali', the value of batchsize must be 1.

`'codegenonly'` — Option to generate only code
0 (default) | 1

Boolean flag that, when enabled, generates CUDA C++ code without generating and building a makefile.

`'targetparams'` — Library-specific parameters
structure

Library-specific parameters specified as a 1-by-1 structure containing the fields described in these tables.

Parameters for ARM Compute Library (CPU)

Field	Description
`ArmComputeVersion`	Version of ARM Compute Library on the target hardware, specified as `'18.05'`, `'18.08'`, `'18.11'`, `'19.02'`, or `'19.05'`. The default value is `'19.05'`. If you set `ArmComputeVersion` to a version later than `'19.05'`, `ArmComputeVersion` is set to `'19.05'`.
`ArmArchitecture`	ARM architecture supported on the target hardware, specified as `'armv7` or `'armv8'`. The specified architecture must be the same as the architecture for the ARM Compute Library on the target hardware. `ArmArchitecture` is a required parameter.

Field

Description

ArmComputeVersion

Version of ARM Compute Library on the target hardware, specified as '18.05', '18.08', '18.11', '19.02', or '19.05'. The default value is '19.05'. If you set ArmComputeVersion to a version later than '19.05', ArmComputeVersion is set to '19.05'.

ArmArchitecture

ARM architecture supported on the target hardware, specified as 'armv7 or 'armv8'. The specified architecture must be the same as the architecture for the ARM Compute Library on the target hardware.

ArmArchitecture is a required parameter.

Parameters for ARM Compute Library (Mali GPU)

Field	Description
`ArmComputeVersion`	Version of ARM Compute Library on the target hardware, specified as `'19.02'` or `'19.05'`. The default value is `'19.05'`. If you set `ArmComputeVersion` to a version later than `'19.05'`, `ArmComputeVersion` is set to `'19.05'`.

Parameters for NVIDIA cuDNN Library

Field	Description
`AutoTuning`	Enable or disable auto tuning feature. Enabling auto tuning allows the cuDNN library to find the fastest convolution algorithms. This increases performance for larger networks such as SegNet and ResNet. Default value is `true`. Note If `AutoTuning` is enabled for `TensorRT` targets, the software generates code with auto tuning disabled. It does so without generating any warning or error messages.
`DataType`	Specify the precision of the tensor data type input to the network. When performing inference in 32-bit floats, use `'FP32'`. For 8-bit integer, use `'INT8'`. Default value is `'FP32'`. The `computecapability` argument must be set to `'6.1'` or higher if the `DataType` is set to `'INT8'`. Compute capability of 6.2 does not support `INT8` precision.
`CalibrationResultFile`	Location of the MAT-file containing the calibration data. Default value is `''`. This option is applicable only when `DataType` is set to `'INT8'`.

Field

Description

AutoTuning

Enable or disable auto tuning feature. Enabling auto tuning allows the cuDNN library to find the fastest convolution algorithms. This increases performance for larger networks such as SegNet and ResNet. Default value is true.

Note

If AutoTuning is enabled for TensorRT targets, the software generates code with auto tuning disabled. It does so without generating any warning or error messages.

DataType

Specify the precision of the tensor data type input to the network. When performing inference in 32-bit floats, use 'FP32'. For 8-bit integer, use 'INT8'. Default value is 'FP32'.

The computecapability argument must be set to '6.1' or higher if the DataType is set to 'INT8'. Compute capability of 6.2 does not support INT8 precision.

CalibrationResultFile

Location of the MAT-file containing the calibration data. Default value is ''. This option is applicable only when DataType is set to 'INT8'.

Parameters for NVIDIA TensorRT Library

Field	Description
`DataType`	Specify the precision of the tensor data type input to the network or the tensor output of a layer. When performing inference in 32-bit floats, use `'FP32'`. For 8-bit integer, use `'INT8'`. For half-precision, use `'FP16'` Default value is `'FP32'`. The `computecapability` argument must be set to `'7.0'` or higher if the `DataType` is set to `'FP16'`. The `computecapability` argument must be set to `'6.1'` or higher if the `DataType` is set to `'INT8'`. Compute capability of 6.2 does not support `INT8` precision.
`DataPath`	Location of the image dataset used during recalibration. Default value is `''`. This option is applicable only when `DataType` is set to `'INT8'`. When you select the `'INT8'` option, TensorRT quantizes the floating-point data to `int8`. The recalibration is performed with a reduced set of the calibration data. The calibration data must be present in the image data location specified by `DataPath`.
`NumCalibrationBatches`	Numeric value specifying the number of batches for `int8` calibration. The software uses the product of `batchsizeNumCalibrationBatches` to pick a random subset of images from the image dataset to perform calibration. The `batchsizeNumCalibrationBatches` value must not be greater than the number of images present in the image dataset. Default value is 50. This option is applicable only when `DataType` is set to `'INT8'`. NVIDIA recommends that about 500 images are sufficient for calibrating. Refer to the TensorRT documentation for more information.

Field

Description

DataType

Specify the precision of the tensor data type input to the network or the tensor output of a layer. When performing inference in 32-bit floats, use 'FP32'. For 8-bit integer, use 'INT8'. For half-precision, use 'FP16' Default value is 'FP32'.

The computecapability argument must be set to '7.0' or higher if the DataType is set to 'FP16'.

The computecapability argument must be set to '6.1' or higher if the DataType is set to 'INT8'. Compute capability of 6.2 does not support INT8 precision.

DataPath

Location of the image dataset used during recalibration. Default value is ''. This option is applicable only when DataType is set to 'INT8'.

When you select the 'INT8' option, TensorRT quantizes the floating-point data to int8. The recalibration is performed with a reduced set of the calibration data. The calibration data must be present in the image data location specified by DataPath.

NumCalibrationBatches

Numeric value specifying the number of batches for int8 calibration. The software uses the product of batchsize*NumCalibrationBatches to pick a random subset of images from the image dataset to perform calibration. The batchsize*NumCalibrationBatches value must not be greater than the number of images present in the image dataset. Default value is 50. This option is applicable only when DataType is set to 'INT8'.

NVIDIA recommends that about 500 images are sufficient for calibrating. Refer to the TensorRT documentation for more information.

GPU Options (GPU Coder Only)

collapse all

`'computecapability'` — Compute version
character vector | string scalar

This property affects GPU targeting only.

Character vector or string scalar specifying the NVIDIA GPU compute capability to compile for. Argument takes the format of major#.minor#.

Possible values are '3.2'|'3.5'|'3.7'|'5.0'|'5.2'|'5.3'|'6.0'|'6.1'|'6.2'|'7.0'|'7.1'|'7.2'.

Default value is '3.5'.

Compatibility Considerations

expand all

Changes to Target Library Support

Behavior change in future release

In a future release, the cnncodegen function will generate C++ code and build a static library for only the ARM Mali GPU processor. You can continue to use the 'arm-compute-mali' value for the 'targetlib' argument to target an ARM Mali GPU by using the ARM Compute Library for computer vision and machine learning.

For all other targets, use the codegen command.

Documentation

cnncodegen

Syntax

Description

Examples

Generate C++ Code for a Pretrained Network to Run on an ARM Processor

Generate Code for the YOLO Network to Run on NVIDIA GPU

Input Arguments

`net` — Name of the series or DAG network object
character vector | string scalar

`libraryname` — Deep learning target library
character vector | string scalar

Name-Value Pair Arguments

`'batchsize'` — Size
1 (default) | positive integer

`'codegenonly'` — Option to generate only code
0 (default) | 1

`'targetparams'` — Library-specific parameters
structure

`'computecapability'` — Compute version
character vector | string scalar

Compatibility Considerations

Changes to Target Library Support

See Also

Functions

Topics

GPU Coder Documentation

Support

Documentation

cnncodegen

Syntax

Description

Examples

Generate C++ Code for a Pretrained Network to Run on an ARM Processor

Generate Code for the YOLO Network to Run on NVIDIA GPU

Input Arguments

net — Name of the series or DAG network object character vector | string scalar

libraryname — Deep learning target library character vector | string scalar

Name-Value Pair Arguments

'batchsize' — Size 1 (default) | positive integer

'codegenonly' — Option to generate only code 0 (default) | 1

'targetparams' — Library-specific parameters structure

'computecapability' — Compute version character vector | string scalar

Compatibility Considerations

Changes to Target Library Support

See Also

Functions

Topics

GPU Coder Documentation

Support

`net` — Name of the series or DAG network object
character vector | string scalar

`libraryname` — Deep learning target library
character vector | string scalar

`'batchsize'` — Size
1 (default) | positive integer

`'codegenonly'` — Option to generate only code
0 (default) | 1

`'targetparams'` — Library-specific parameters
structure

`'computecapability'` — Compute version
character vector | string scalar