This example shows how to integrate the CUDA® code generated for a deep learning network into Simulink®. GPU coder™ does not support code generation for Simulink blocks but you can still use the computational power of GPUs in Simulink by generating a dynamic linked library (dll) with GPU Coder and then integrating it into Simulink as an S-Function block by using the legacy code tool. For more information, see legacy_code. To illustrate this concept, the example uses Lane Detection Optimized with GPU Coder. The original example used a C++ file with OpenCV functions to read the frames, draw lanes, and overlay frame rate information on the video output. This example uses Simulink blocks from the Computer Vision System Toolbox™ to perform the same operations.
CUDA enabled NVIDIA® GPU with compute capability 3.2 or higher.
NVIDIA CUDA toolkit and driver.
NVIDIA cuDNN library.
Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products. For setting up the environment variables, see Setting Up the Prerequisite Products.
GPU Coder™ Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.
Use the coder.checkGpuInstall
function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host'); envCfg.DeepLibTarget = 'cudnn'; envCfg.DeepCodegen = 1; envCfg.Quiet = 1; coder.checkGpuInstall(envCfg);
This diagram illustrates the general procedure for using the Legacy Code Tool
to integrate the CUDA code generated for a deep learning network into Simulink.
[laneNet,coeffMeans,coeffStds] = getLaneDetectionNetwork();
The architecture of the pretrained SeriesNetwork
is similar to AlexNet
except that the last few layers are replaced by a smaller, fully connected layer and regression output layer. This network takes an image input and outputs two lane boundaries that correspond to the left and right lanes of the ego vehicle. Each lane boundary is represented by a parabolic equation, . Here,
is the lateral offset and
is the longitudinal distance from the vehicle. The network outputs the three parameters
,
, and
that describe the parabolic equation for the left and right lane boundaries. The variables
coeffStds
and coeffMeans
contain the mean and std values from the trained network. These values are required during simulation.
This example uses the detect_lane.m entry-point function. The detect_lane
function computes the and
coordinates corresponding to the lane positions from the
,
, and
parameters. The
detect_lane
function also performs computations that map the and
coordinates to image coordinates.
To run the detect_lane
function on the GPU from Simulink, generate a shared library by using GPU Coder. The inputs to the detect_lane
function are the video frame, mean, and std values. The values passed by using the -args
option reflect the size of these inputs. Copy the generated library to the top-level folder.
Isize = single(zeros(227,227)); cfg = coder.gpuConfig('dll'); cfg.TargetLang = 'C++'; cfg.GenerateReport = true; cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn'); codegen -args {ones(227,227,3,'single'),ones(1,6,'double'),ones(1,6,'double')} -config cfg detect_lane if ispc copyfile(fullfile(pwd, 'codegen','dll', 'detect_lane','detect_lane.dll'), pwd); else copyfile(fullfile(pwd, 'codegen','dll', 'detect_lane','detect_lane.so'), pwd); end
Code generation successful: To view the report, open('codegen/dll/detect_lane/html/report.mldatx').
The lane detection example depends on the NVIDIA CUDA run time, cuBLAS, and the cuDNN library. The Legacy Code Tool
data structure specifies:
A name for the S-function
Specifications for the existing C++ function
All library and header files required for compilation and the file paths
Options for the generated S-function
After defining the structure, use the legacy_code function to:
Initialize the Legacy Code Tool
data structure for the C++ function
Generate an S-function for use during simulation
Compile and link the generated S-function into a dynamically loadable executable (MEX)
Generate a masked S-function block for calling the generated S-function
srcPath = fullfile(pwd, 'codegen', 'dll', 'detect_lane'); if ispc cuPath = getenv('CUDA_PATH'); cudaLibPath = fullfile(cuPath,'lib','x64'); cudaIncPath = fullfile(cuPath,'include'); cudnnPath = getenv('NVIDIA_CUDNN'); cudnnIncPath = fullfile(cudnnPath,'include'); cudnnLibPath = fullfile(cudnnPath,'lib','x64'); libs = {'detect_lane.lib','cudart.lib','cublas.lib','cudnn.lib'}; else [~,nvccPath] = system('which nvcc'); nvccPath = regexp(nvccPath, '[\f\n\r]', 'split'); cuPath = erase(nvccPath{1},'/bin/nvcc'); cudaLibPath = fullfile(cuPath,'lib64'); cudaIncPath = fullfile(cuPath,'include'); cudnnPath = getenv('NVIDIA_CUDNN'); cudnnIncPath = fullfile(cudnnPath,'include'); cudnnLibPath = fullfile(cudnnPath,'lib64'); [~,cmdout] = system('ldconfig -p | grep "libcublas.so "'); pathStrIdx = strfind(cmdout,'/usr/'); cublasLibPath = fileparts(cmdout(33:end)); cublasIncPath = '/usr/include'; libs = {'detect_lane.so','libcudart.so','libcublas.so','libcudnn.so'}; end headerPath = {srcPath;cudnnIncPath;cudaIncPath;cublasIncPath}; libPath = {srcPath;cudnnLibPath;cudaLibPath;cublasLibPath}; % Define the Legacy Code Tool data structure def = legacy_code('initialize'); def.SFunctionName = 'lane_detect_sfun'; def.OutputFcnSpec = 'void detect_lane(single u1[154587],double u2[6],double u3[6],uint8 y1[1],single y2[56],single y3[56])'; def.IncPaths = headerPath; def.HeaderFiles = {'detect_lane.h'}; def.LibPaths = libPath; def.HostLibFiles = libs; def.Options.useTlcWithAccel = false; def.Options.language = 'C++'; legacy_code('sfcn_cmex_generate', def); status = evalc("legacy_code('compile', def)");
The OutputFcnSpec
argument specifies the function that the S-function calls at each time step. The detect_lane.h
header file in the codegen folder provides the function specification information. Map the detect_lane
function arguments to the Simulink S-Function block by using a uniquely numbered u
token for input ports and the y
token for output ports. The code generation data types defined in tmwtypes.h must also be mapped to the data types that Simulink supports. For more information, see Declaring Legacy Code Tool Function Specifications. Because this example already contains a complete Simulink model, generation of the S-Function block is not performed. To generate the S-Function block, use:
legacy_code('slblock_generate', def);
Move all the pre- and post-processing operations in the main_lanenet.cpp
file of the original example into Simulink. The Input Video Processing subsystem removes normalization performed by the multimedia reader block and resizes the input video frame to the input layer size of the lane detection network, 227-by-227-by-3. The subsystem then converts the three-dimensional video frame into the one-dimensional vector required by the detect_lane
library. The Lane Points enabled subsystem processes of the left and right lane points to make them suitable for the Draw Lanes block. The Simulink model uses a video display to show lane detection on a sample video.
open_system('main_lanenet'); set_param('main_lanenet', 'SimulationCommand', 'update');
To see lane detection on a sample video, run simulation.
sim('main_lanenet', 'timeout', 30);
Close the Simulink model.
close_system('main_lanenet');
codegen
| coder.CuDNNConfig
| coder.DeepLearningConfig
| coder.gpuEnvConfig
| coder.loadDeepLearningNetwork