Deploy Quantized Neural Network

This example shows how to train, compile, and deploy a modified quantized AlexNet pretrained series network by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results from the target device.

Prerequisites

To run this example, you need the products listed under FPGA in Quantization Workflow Prerequisites.

Create Modified Series Network by Using Transfer Learning

Create a modified series network by using transfer learning. For more information, see Create Series Network for Quantization.

Create Quantized Network Object

Create a dlquantizer object and specify the network to quantize and ExecutionEnvironment . The netTransfer network is the output of the modified network created by transfer learning. To create the netTransfer series network, see Create Series Network for Quantization.

dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');

Load Training Data

Unzip and load the new images as an image datastore. imageDatastore automatically labels the images based on folder names and stores the data as an ImageDatastore object. An image datastore enables you to store large image data, including data that does not fit in memory, and efficiently read batches of images during training of a convolutional neural network.

Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. splitEachLabel splits the images datastore into two new datastores.

curDir = pwd;
newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip');
copyfile(newDir,curDir);
unzip('logos_dataset.zip');
unzip('logos_dataset.zip');

imds = imageDatastore('logos_dataset', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

Calibrate Quantized Network

Use the calibrate function to run the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network. For best quantization results, the calibration data must be a representative of actual inputs that would be predicted by the network.

imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');

dlQuantObj.calibrate(imageData);

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','192.168.1.101');

Create Workflow Object

Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify dlQuantObj as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses an int8 data type.

hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'zcu102_int8','Target',hTarget);

Compile Quantized Series Network

Compile the quantized series network.

dn = hW.compile

    offset_name                 offset_address    allocated_space
    "InputDataOffset"           "0x00000000"     "48.0 MB"        
    "OutputResultOffset"        "0x03000000"     "4.0 MB"         
    "SystemBufferOffset"        "0x03400000"     "28.0 MB"        
    "InstructionDataOffset"     "0x05000000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x05400000"     "4.0 MB"         
    "FCWeightDataOffset"        "0x05800000"     "56.0 MB"        
    "EndOffset"                 "0x09000000"     "Total: 144.0 MB"
dn = struct with fields:
       Operators: [1×1 struct]
    LayerConfigs: [1×1 struct]
      NetConfigs: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

Run the deploy function of the dlhdl.Workflow object to deploy the network on the Xilinx ZCU102 SoC hardware. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

hW.deploy

Load the Example Images and Run the Prediction

Load the example images and retrieve the prediction results.

idx = randperm(numel(imdsValidation.Files),4);
figure
for i = 1:4
    subplot(2,2,i)
    I = readimage(imdsValidation,idx(i));
    imshow(I)
    [prediction, speed] = hW.predict(single(I),'Profile','on');
    [val, index] = max(prediction);
    netTransfer.Layers(end).ClassNames{index}
    label = netTransfer.Layers(end).ClassNames{index}
    title(string(label));
end

### Finished writing input activations.
### Running single input activations.
Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7615557                  0.05077                       1            7616123             19.7
    conv_module            3123657                  0.02082 
        conv1               733903                  0.00489 
        norm1               485953                  0.00324 
        pool1               108979                  0.00073 
        conv2               631639                  0.00421 
        norm2               289646                  0.00193 
        pool2               115286                  0.00077 
        conv3               307112                  0.00205 
        conv4               249627                  0.00166 
        conv5               176223                  0.00117 
        pool5                25404                  0.00017 
    fc_module              4491900                  0.02995 
        fc6                3083885                  0.02056 
        fc7                1370258                  0.00914 
        fc                   37755                  0.00025 
 * The clock frequency of the DL processor is: 150MHz
ans = 'carlsberg'

### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7615364                  0.05077                       1            7615905             19.7
    conv_module            3123385                  0.02082 
        conv1               733946                  0.00489 
        norm1               485695                  0.00324 
        pool1               108971                  0.00073 
        conv2               631616                  0.00421 
        norm2               289612                  0.00193 
        pool2               115363                  0.00077 
        conv3               307034                  0.00205 
        conv4               249683                  0.00166 
        conv5               176216                  0.00117 
        pool5                25364                  0.00017 
    fc_module              4491979                  0.02995 
        fc6                3083961                  0.02056 
        fc7                1370258                  0.00914 
        fc                   37758                  0.00025 
 * The clock frequency of the DL processor is: 150MHz
ans = 'pepsi'

### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7615042                  0.05077                       1            7615582             19.7
    conv_module            3123107                  0.02082 
        conv1               733949                  0.00489 
        norm1               485783                  0.00324 
        pool1               108565                  0.00072 
        conv2               631567                  0.00421 
        norm2               289568                  0.00193 
        pool2               115037                  0.00077 
        conv3               307355                  0.00205 
        conv4               249793                  0.00167 
        conv5               176217                  0.00117 
        pool5                25388                  0.00017 
    fc_module              4491935                  0.02995 
        fc6                3083920                  0.02056 
        fc7                1370258                  0.00914 
        fc                   37755                  0.00025 
 * The clock frequency of the DL processor is: 150MHz
ans = 'tsingtao'

### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7615303                  0.05077                       1            7615843             19.7
    conv_module            3123324                  0.02082 
        conv1               733883                  0.00489 
        norm1               485688                  0.00324 
        pool1               108995                  0.00073 
        conv2               631598                  0.00421 
        norm2               289636                  0.00193 
        pool2               115351                  0.00077 
        conv3               307108                  0.00205 
        conv4               249623                  0.00166 
        conv5               176193                  0.00117 
        pool5                25364                  0.00017 
    fc_module              4491979                  0.02995 
        fc6                3083961                  0.02056 
        fc7                1370258                  0.00914 
        fc                   37758                  0.00025 
 * The clock frequency of the DL processor is: 150MHz
ans = 'singha'

Documentation