Deep Learning Prediction by Using Different Batch Sizes

This example uses:

This example demonstrates code generation with batch sizes greater than 1. This demo contains two examples, first, uses cnncodegen to generate code which takes in a batch of images as input. The second example creates MEX file using codegen and passes a batch of images as input.

Prerequisites

CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.
NVIDIA CUDA toolkit and driver.
NVIDIA cuDNN and TensorRT library.
OpenCV 3.1.0 libraries for video read and image display operations.
Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).
This example is not supported on MATLAB® online.

Verify GPU Environment

Use the coder.checkGpuInstall function to verify that the compilers and libraries necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'tensorrt';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Classification by Using ResNet-50 Network

The example uses the DAG network ResNet-50 for image classification. A pretrained ResNet-50 model for MATLAB® is available in the ResNet-50 support package of Deep Learning Toolbox. To download and install the support package, use the Add-On Explorer. To learn more about finding and installing add-ons, see Get Add-Ons (MATLAB). Use the analyzeNetwork function to display an interactive visualization of the deep learning network architecture.

net = resnet50;
analyzeNetwork(net);

Generate Code for NVIDIA GPUs by Using TensorRT Library

For an NVIDIA target with TensorRT, code generation and execution is performed on the host development computer. To run the generated code, your development computer must have an NVIDIA GPU with compute capability of at least 3.2. Use the cnncodegen command to generate code for the NVIDIA platform by using the 'tensorrt' option. By default, the cnncodegen command generates code that uses 32-bit float-point precision for the tensor inputs to the network. In the predict call, multiple images can be batched into a single call and passed as an input. This call performs predictions over the batch of inputs in parallel. The default value of the batch size is 1.

You can specify the input batch size by using the 'batchsize' option. During execution, the generated code expects the same batch size value to be used. Passing a different batch size value at runtime causes errors. In this example 15 images are considered as a batch.

To generate code using cuDNN specify 'cudnn' instead of 'tensorrt' for the 'targetlib' option.

status = evalc("cnncodegen(net,'targetlib','tensorrt', 'batchsize', 15)");

Generated Code Description

The presetup() and postsetup() functions perform additional configuration required for TensorRT. Layer classes in the generated code folder call into the TensorRT libraries.

Main File

The main file creates and sets up the CnnMain network object with layers and weights. It uses the OpenCV VideoCapture method to read frames from input video. It performs prediction for each frame and fetches the output from the final fully connected layer.

Frames obtained from OpenCV VideoCapture object are converted from packed BGR (OpenCV) format to planar RGB (MATLAB) format. A buffer is allocated and filled with the image data. This raw buffer is an input to the network.

   void readBatchData(float *input, vector<Mat>& orig, int batchSize)
   {
        for (int i=0; i<batchSize; i++)
        {
           if (orig[i].empty())
           {
               orig[i] = Mat::zeros(ROWS,COLS, orig[i-1].type());
               continue;
           }

           Mat tmpIm;
           resize(orig[i], tmpIm, Size(COLS,ROWS));

           for (int j=0; j<ROWS*COLS; j++)
           {
               // BGR packed to RGB planar conversion
               input[CH*COLS*ROWS*i + 2*COLS*ROWS + j] = (float)(tmpIm.data[j*3+0]);
               input[CH*COLS*ROWS*i + 1*COLS*ROWS + j] = (float)(tmpIm.data[j*3+1]);
               input[CH*COLS*ROWS*i + 0*COLS*ROWS + j] = (float)(tmpIm.data[j*3+2]);
           }
        }
   }

Build and Run Executable

Download the sample video file.

if ~exist('./object_class.avi', 'file')
    url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip';
    websave('object_class.avi.zip',url);
    unzip('object_class.avi.zip');
end

Use the make command to build the resnet_batchSize_exe executable. Run the executable and specify batch size as the first argument and the name of the video file as the second argument.

if isunix
   system(['make -f Makefile_resnet_batchsize_linux.mk ','tensorrt']);
   system('./resnet_batchSize_exe 15 object_class.avi');
elseif ispc
   system('make_resnet_batchsize_win.bat');
   system('resnet_predict.exe 15 object_class.avi');
end

Generate CUDA MEX for the `resnet_predict` Function

To generate CUDA code for the resnet_predict entry-point function, create a GPU code configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig function to create a TensorRT deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. Run the codegen command and specify the input as a 4D matrix of size [224,224,3,batchSize]. This value corresponds to the input layer size of the ResNet-50 network.

batchSize = 5;
cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
codegen -config cfg resnet_predict -args {ones(224,224,3,batchSize,'uint8')} -report

Code generation successful: To view the report, open('codegen/mex/resnet_predict/html/report.mldatx').

Perform Prediction on Test Image Batch

im = imread('peppers.png');
im = imresize(im, [224,224]);

Concatenating 5 images since batchSize = 5.

imBatch = cat(4,im,im,im,im,im);
predict_scores = resnet_predict_mex(imBatch);

Get top 5 probability scores and their labels, for each image in the batch.

[val,indx] = sort(transpose(predict_scores), 'descend');
scores = val(1:5,:)*100;
net = resnet50;
classnames = net.Layers(end).ClassNames;
for i = 1:batchSize
    labels = classnames(indx(1:5,i));
    disp(['Top 5 predictions on image, ', num2str(i)]);
    disp(labels);
end

Top 5 predictions on image, 1
    {'bell pepper' }
    {'cucumber'    }
    {'lemon'       }
    {'acorn squash'}
    {'hamper'      }

Top 5 predictions on image, 2
    {'bell pepper' }
    {'cucumber'    }
    {'lemon'       }
    {'acorn squash'}
    {'hamper'      }

Top 5 predictions on image, 3
    {'bell pepper' }
    {'cucumber'    }
    {'lemon'       }
    {'acorn squash'}
    {'hamper'      }

Top 5 predictions on image, 4
    {'bell pepper' }
    {'cucumber'    }
    {'lemon'       }
    {'acorn squash'}
    {'hamper'      }

Top 5 predictions on image, 5
    {'bell pepper' }
    {'cucumber'    }
    {'lemon'       }
    {'acorn squash'}
    {'hamper'      }

Clear the static network object that was loaded in memory.

clear mex;

Documentation

Deep Learning Prediction by Using Different Batch Sizes

Prerequisites

Verify GPU Environment

Classification by Using ResNet-50 Network

Generate Code for NVIDIA GPUs by Using TensorRT Library

Generated Code Description

Main File

Build and Run Executable

Generate CUDA MEX for the `resnet_predict` Function

Perform Prediction on Test Image Batch

Deep Learning Toolbox Documentation

Support

Documentation

Deep Learning Prediction by Using Different Batch Sizes

Prerequisites

Verify GPU Environment

Classification by Using ResNet-50 Network

Generate Code for NVIDIA GPUs by Using TensorRT Library

Generated Code Description

Main File

Build and Run Executable

Generate CUDA MEX for the resnet_predict Function

Perform Prediction on Test Image Batch

Deep Learning Toolbox Documentation

Support

Generate CUDA MEX for the `resnet_predict` Function