View the network prediction and performance data for the layers, convolution module and fully connected modules in your pretrained series network. The example shows how to retrieve the prediction and profiler results for the VGG-19 network.
Create an object of class Workflow
by using the
dlhdl.Workflow
class.
Set a pretrained deep learning network and bitstream for the workflow object.
Create an object of class dlhdl.Target
and specify the target
vendor and interface.
To deploy the network on a specified target FPGA board, call the
deploy
method for the workflow object.
Call the predict
function for the workflow object. Provide an
array of images as the InputImage
parameter. Provide arguments to
turn on the profiler.
The labels classifying the images are stored in a structure
struct
and displayed on the screen. The performance parameters of
speed and latency are returned in a structure struct
.
Use this image to run the code:
snet = vgg19; hT = dlhdl.Target('Intel'); hW = dlhdl.Workflow('Net', snet, 'Bitstream', 'arria10soc_single','Target',hT); hW.deploy; image = imread('zebra.jpeg'); inputImg = imresize(image, [224, 224]); imshow(inputImg); [prediction, speed] = hW.predict(single(inputImg),'Profile','on'); [val, idx] = max(prediction); snet.Layers(end).ClassNames{idx}
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 166206640 1.10804 1 166206873 0.9 conv_module 156100737 1.04067 conv1_1 2174602 0.01450 conv1_2 15580687 0.10387 pool1 1976185 0.01317 conv2_1 7534356 0.05023 conv2_2 14623885 0.09749 pool2 1171628 0.00781 conv3_1 7540868 0.05027 conv3_2 14093791 0.09396 conv3_3 14093717 0.09396 conv3_4 14094381 0.09396 pool3 766669 0.00511 conv4_1 6999620 0.04666 conv4_2 13725380 0.09150 conv4_3 13724671 0.09150 conv4_4 13725125 0.09150 pool4 465360 0.00310 conv5_1 3424060 0.02283 conv5_2 3423759 0.02283 conv5_3 3424758 0.02283 conv5_4 3424461 0.02283 pool5 113010 0.00075 fc_module 10105903 0.06737 fc6 8397997 0.05599 fc7 1370215 0.00913 fc8 337689 0.00225 * The clock frequency of the DL processor is: 150MHz ans = 'zebra'
The profiler data returns these parameters and their values:
LastLayerLatency(cycles)
- Total number of clock cycles for layer
or module execution.
Clock frequency- Clock frequency information is retrieved from the bitstream that
was used to deploy the network to the target board. For example, the profiler returns
* The clock frequency of the DL processor is: 150MHz
. The clock
frequency of 150 MHz is retrieved from the arria10soc_single
bitstream.
LastLayerLatency(seconds)
- Total number of seconds for layer or
module execution. The total time is calculated as
LastLayerLatency(cycles)/Clock Frequency
. For example the
conv_module
LastLayerLatency(seconds)
is calculated as
156100737/(150*10^6)
.
FramesNum
- Total number of input frames to the network. This
value will be used in the calculation of Frames/s
.
Total Latency
- Total number of clock cycles to execute all the
network layers and modules for FramesNum
.
Frames/s
- Number of frames processed in one second by the
network. The total Frames/s
is calculated as
(FramesNum*Clock Frequency)/Total Latency
. For example the
Frames/s
in the example is calculated as
(1*150*10^6)/166206873
.
dlhdl.Target
| dlhdl.Workflow
| predict