This example demonstrates code generation with batch sizes greater than 1. This demo contains two examples, first, uses cnncodegen
to generate code which takes in a batch of images as input. The second example creates MEX file using codegen
and passes a batch of images as input.
CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.
NVIDIA CUDA toolkit and driver.
NVIDIA cuDNN and TensorRT library.
OpenCV 3.1.0 libraries for video read and image display operations.
Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).
This example is not supported on MATLAB® online.
Use the coder.checkGpuInstall
function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host'); envCfg.DeepLibTarget = 'tensorrt'; envCfg.DeepCodegen = 1; envCfg.Quiet = 1; coder.checkGpuInstall(envCfg);
The example uses the DAG network ResNet-50 for image classification. A pretrained ResNet-50 model for MATLAB® is available in the ResNet-50 support package of Deep Learning Toolbox. To download and install the support package, use the Add-On Explorer. To learn more about finding and installing add-ons, see Get Add-Ons (MATLAB). Use the analyzeNetwork
function to display an interactive visualization of the deep learning network architecture.
net = resnet50; analyzeNetwork(net);
For an NVIDIA target with TensorRT, code generation and execution is performed on the host development computer. To run the generated code, your development computer must have an NVIDIA GPU with compute capability of at least 3.2. Use the cnncodegen
command to generate code for the NVIDIA platform by using the 'tensorrt'
option. By default, the cnncodegen
command generates code that uses 32-bit float-point precision for the tensor inputs to the network. In the predict call, multiple images can be batched into a single call and passed as an input. This call performs predictions over the batch of inputs in parallel. The default value of the batch size is 1.
You can specify the input batch size by using the 'batchsize'
option. During execution, the generated code expects the same batch size value to be used. Passing a different batch size value at runtime causes errors. In this example 15 images are considered as a batch.
To generate code using cuDNN specify 'cudnn'
instead of 'tensorrt'
for the 'targetlib'
option.
status = evalc("cnncodegen(net,'targetlib','tensorrt', 'batchsize', 15)");
The presetup()
and postsetup()
functions perform additional configuration required for TensorRT. Layer classes in the generated code folder call into the TensorRT libraries.
The main file creates and sets up the CnnMain network object with layers and weights. It uses the OpenCV VideoCapture
method to read frames from input video. It performs prediction for each frame and fetches the output from the final fully connected layer.
Frames obtained from OpenCV VideoCapture
object are converted from packed BGR (OpenCV) format to planar RGB (MATLAB) format. A buffer is allocated and filled with the image data. This raw buffer is an input to the network.
void readBatchData(float *input, vector<Mat>& orig, int batchSize) { for (int i=0; i<batchSize; i++) { if (orig[i].empty()) { orig[i] = Mat::zeros(ROWS,COLS, orig[i-1].type()); continue; }
Mat tmpIm; resize(orig[i], tmpIm, Size(COLS,ROWS));
for (int j=0; j<ROWS*COLS; j++) { // BGR packed to RGB planar conversion input[CH*COLS*ROWS*i + 2*COLS*ROWS + j] = (float)(tmpIm.data[j*3+0]); input[CH*COLS*ROWS*i + 1*COLS*ROWS + j] = (float)(tmpIm.data[j*3+1]); input[CH*COLS*ROWS*i + 0*COLS*ROWS + j] = (float)(tmpIm.data[j*3+2]); } } }
Download the sample video file.
if ~exist('./object_class.avi', 'file') url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip'; websave('object_class.avi.zip',url); unzip('object_class.avi.zip'); end
Use the make
command to build the resnet_batchSize_exe
executable. Run the executable and specify batch size as the first argument and the name of the video file as the second argument.
if isunix system(['make -f Makefile_resnet_batchsize_linux.mk ','tensorrt']); system('./resnet_batchSize_exe 15 object_class.avi'); elseif ispc system('make_resnet_batchsize_win.bat'); system('resnet_predict.exe 15 object_class.avi'); end
resnet_predict
FunctionTo generate CUDA code for the resnet_predict
entry-point function, create a GPU code configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig
function to create a TensorRT
deep learning configuration object and assign it to the DeepLearningConfig
property of the GPU code configuration object. Run the codegen
command and specify the input as a 4D matrix of size [224,224,3,batchSize]. This value corresponds to the input layer size of the ResNet-50 network.
batchSize = 5; cfg = coder.gpuConfig('mex'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt'); codegen -config cfg resnet_predict -args {ones(224,224,3,batchSize,'uint8')} -report
Code generation successful: To view the report, open('codegen/mex/resnet_predict/html/report.mldatx').
im = imread('peppers.png');
im = imresize(im, [224,224]);
Concatenating 5 images since batchSize = 5
.
imBatch = cat(4,im,im,im,im,im); predict_scores = resnet_predict_mex(imBatch);
Get top 5 probability scores and their labels, for each image in the batch.
[val,indx] = sort(transpose(predict_scores), 'descend'); scores = val(1:5,:)*100; net = resnet50; classnames = net.Layers(end).ClassNames; for i = 1:batchSize labels = classnames(indx(1:5,i)); disp(['Top 5 predictions on image, ', num2str(i)]); disp(labels); end
Top 5 predictions on image, 1 {'bell pepper' } {'cucumber' } {'lemon' } {'acorn squash'} {'hamper' } Top 5 predictions on image, 2 {'bell pepper' } {'cucumber' } {'lemon' } {'acorn squash'} {'hamper' } Top 5 predictions on image, 3 {'bell pepper' } {'cucumber' } {'lemon' } {'acorn squash'} {'hamper' } Top 5 predictions on image, 4 {'bell pepper' } {'cucumber' } {'lemon' } {'acorn squash'} {'hamper' } Top 5 predictions on image, 5 {'bell pepper' } {'cucumber' } {'lemon' } {'acorn squash'} {'hamper' }
Clear the static network object that was loaded in memory.
clear mex;