This example shows how to generate CUDA® code from a DAGNetwork object and deploy the generated code onto the NVIDIA® Jetson TX2 board using the GPU Coder™ Support Package for NVIDIA GPUs. This example uses the resnet50 deep learning network to classify images from a USB webcam video stream.
Target Board Requirements
NVIDIA Jetson Tegra TX2 embedded platform.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
USB camera to connect to the TX2.
NVIDIA CUDA toolkit installed on the board.
NVIDIA cuDNN library (v5 or higher) on the target.
OpenCV 3.0 (or higher) library on the target for reading and displaying images/video.
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards (GPU Coder Support Package for NVIDIA GPUs) for NVIDIA boards.
Development Host Requirements
NVIDIA CUDA toolkit and driver.
Deep Learning Toolbox™ to use a DAGNetwork object.
GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.
GPU Coder Support Package for NVIDIA GPUs. To install this support package, use the Add-On Explorer.
Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products. For setting up the environment variables, see Setting Up the Prerequisite Products.
Use the checkHardwareSupportPackageInstall
function to verify that the host system is compatible to run this example.
checkHardwareSupportPackageInstall();
The GPU Coder Support Package for NVIDIA GPUs uses an SSH connection over TCP/IP to execute commands while building and running the generated CUDA code on the Jetson platform. You must therefore connect the target platform to the same network as the host computer or use an Ethernet crossover cable to connect the board directly to the host computer. Refer to the NVIDIA documentation on how to set up and configure your board.
To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the jetson
function. You must know the host name or IP address, username, and password of the target board to create a live hardware connection object.
hwobj= jetson('host-name','username','password');
NOTE:
In case of a connection failure, a diagnostics error message is reported on the MATLAB command line. If the connection has failed, the most likely cause is incorrect IP address or hostname.
When there are multiple live connection objects for different targets, the code generator performs remote build on the target for which a recent live object was created. To choose a hardware board for performing remote build, use the setupCodegenContext()
method of the respective live hardware object. If only one live connection object was created, it is not necessary to call this method.
hwobj.setupCodegenContext;
Use the coder.checkGpuInstall
function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('jetson'); envCfg.DeepLibTarget = 'cudnn'; envCfg.DeepCodegen = 1; envCfg.Quiet = 1; envCfg.HardwareObject = hwobj; coder.checkGpuInstall(envCfg);
resnet50_wrapper.m entry-point function uses a pre-trained ResNet-50 Network to classify images. ResNet-50 is a DAG Network trained on more than a million images from the ImageNet database. The output contains the categorical scores of each class the image belongs to.
type resnet50_wrapper
function out = resnet50_wrapper(im) %#codegen % Wrapper function to call ResNet50 predict function. % Copyright 2019 The MathWorks, Inc. % This example uses OpenCV for reading frames from a web camera % and displaying output image. Update buildinfo to link with % OpenCV library available on target. opencv_link_flags = '`pkg-config --cflags --libs opencv`'; coder.updateBuildInfo('addLinkFlags',opencv_link_flags); % To avoid multiple loads of the network for each run, we use % persistent rnet persistent rnet; if isempty(rnet) rnet = resnet50(); end out = rnet.predict(im); end
This program uses resnet50_wrapper.m, as the entry-point function for code generation. To generate a CUDA executable that can be deployed on to an NVIDIA target, create a GPU coder configuration object for generating an executable.
cfg = coder.gpuConfig('exe');
Use the coder.hardware
function to create a configuration object for the Jetson platform and assign it to the Hardware
property of the GPU code configuration object cfg
.
cfg.Hardware = coder.hardware('NVIDIA Jetson');
Set Deep Learning Configuration to 'cudnn' or tensorrt'
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
In this example, code generation is done using image as an input. However, webcam stream is fed a input to the executable after deployment.
Sample image input for code generation
im=single(imread('peppers.png'));
im=imresize(im,[224,224]);
The custom main file is coded to take video as input and classifies each frame in the video sequence. The custom main_resnet50.cu file is a wrapper that calls the predict function in the generated code. Post processing steps such as displaying output on the input frame are added in the main file using OpenCV interfaces.
cfg.CustomSource=fullfile('main_resnet50.h'); cfg.CustomSource=fullfile('main_resnet50.cu');
To generate CUDA code and deploy it onto target, use the codegen
function and pass the GPU code configuration object. After the code generation takes place on the host, the generated files are copied over and built on the target in the workspace directory.
codegen -config cfg -args {im} resnet50_wrapper -report
Copy the synsetWords_resnet50 text file from host computer to the target device by using the putFile
command.
hwobj.putFile('synsetWords_resnet50.txt',hwobj.workspaceDir);
Use the runApplication
method of the hardware object to launch the application on the target hardware. The application will be located in the workspace directory.
hwobj.runApplication('resnet50_wrapper');
Use the killApplication
method of the hardware object to kill the running application on the target.
hwobj.killApplication('resnet50_wrapper');