Profile Inference Run

View the network prediction and performance data for the layers, convolution module and fully connected modules in your pretrained series network. The example shows how to retrieve the prediction and profiler results for the VGG-19 network.

  1. Create an object of class Workflow by using the dlhdl.Workflow class.

  2. Set a pretrained deep learning network and bitstream for the workflow object.

  3. Create an object of class dlhdl.Target and specify the target vendor and interface.

  4. To deploy the network on a specified target FPGA board, call the deploy method for the workflow object.

  5. Call the predict function for the workflow object. Provide an array of images as the InputImage parameter. Provide arguments to turn on the profiler.

    The labels classifying the images are stored in a structure struct and displayed on the screen. The performance parameters of speed and latency are returned in a structure struct.

Use this image to run the code:

snet = vgg19;
hT = dlhdl.Target('Intel');
hW = dlhdl.Workflow('Net', snet, 'Bitstream', 'arria10soc_single','Target',hT);
hW.deploy;
image = imread('zebra.jpeg');
inputImg = imresize(image, [224, 224]);
imshow(inputImg);
[prediction, speed] = hW.predict(single(inputImg),'Profile','on');
[val, idx] = max(prediction);
snet.Layers(end).ClassNames{idx}

### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                  166206640                  1.10804                       1          166206873              0.9
    conv_module          156100737                  1.04067 
        conv1_1            2174602                  0.01450 
        conv1_2           15580687                  0.10387 
        pool1              1976185                  0.01317 
        conv2_1            7534356                  0.05023 
        conv2_2           14623885                  0.09749 
        pool2              1171628                  0.00781 
        conv3_1            7540868                  0.05027 
        conv3_2           14093791                  0.09396 
        conv3_3           14093717                  0.09396 
        conv3_4           14094381                  0.09396 
        pool3               766669                  0.00511 
        conv4_1            6999620                  0.04666 
        conv4_2           13725380                  0.09150 
        conv4_3           13724671                  0.09150 
        conv4_4           13725125                  0.09150 
        pool4               465360                  0.00310 
        conv5_1            3424060                  0.02283 
        conv5_2            3423759                  0.02283 
        conv5_3            3424758                  0.02283 
        conv5_4            3424461                  0.02283 
        pool5               113010                  0.00075 
    fc_module             10105903                  0.06737 
        fc6                8397997                  0.05599 
        fc7                1370215                  0.00913 
        fc8                 337689                  0.00225 
 * The clock frequency of the DL processor is: 150MHz



ans =

    'zebra'
 

The profiler data returns these parameters and their values:

  • LastLayerLatency(cycles)- Total number of clock cycles for layer or module execution.

  • Clock frequency- Clock frequency information is retrieved from the bitstream that was used to deploy the network to the target board. For example, the profiler returns * The clock frequency of the DL processor is: 150MHz. The clock frequency of 150 MHz is retrieved from the arria10soc_single bitstream.

  • LastLayerLatency(seconds)- Total number of seconds for layer or module execution. The total time is calculated as LastLayerLatency(cycles)/Clock Frequency. For example the conv_module LastLayerLatency(seconds) is calculated as 156100737/(150*10^6).

  • FramesNum- Total number of input frames to the network. This value will be used in the calculation of Frames/s.

  • Total Latency- Total number of clock cycles to execute all the network layers and modules for FramesNum.

  • Frames/s- Number of frames processed in one second by the network. The total Frames/s is calculated as (FramesNum*Clock Frequency)/Total Latency. For example the Frames/s in the example is calculated as (1*150*10^6)/166206873.

See Also

| |

Related Topics