This example shows how to train, compile, and deploy a modified quantized AlexNet pretrained series network by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results from the target device.
To run this example, you need the products listed under FPGA
in
Quantization Workflow Prerequisites.
Create a modified series network by using transfer learning. For more information, see Create Series Network for Quantization.
Create a dlquantizer
object and specify the network to quantize and
ExecutionEnvironment
. The netTransfer
network is
the output of the modified network created by transfer learning. To create the
netTransfer
series network, see Create Series Network for Quantization.
dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');
Unzip and load the new images as an image datastore. imageDatastore
automatically labels the images based on folder names and stores the data as an
ImageDatastore
object. An image datastore enables you to store large
image data, including data that does not fit in memory, and efficiently read batches of
images during training of a convolutional neural network.
Divide the data into training and validation data sets. Use 70% of the images for
training and 30% for validation. splitEachLabel
splits the
images
datastore into two new datastores.
curDir = pwd; newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip'); copyfile(newDir,curDir); unzip('logos_dataset.zip'); unzip('logos_dataset.zip'); imds = imageDatastore('logos_dataset', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');
Use the calibrate
function to run the network with sample inputs and
collect range information. The calibrate
function exercises the network and
collects the dynamic ranges of the weights and biases in the convolution and fully connected
layers of the network and the dynamic ranges of the activations in all layers of the
network. The function returns a table. Each row of the table contains range information for
a learnable parameter of the optimized network. For best quantization results, the
calibration data must be a representative of actual inputs that would be predicted by the
network.
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); dlQuantObj.calibrate(imageData);
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','192.168.1.101');
Create an object of the dlhdl.Workflow
class. When you create the
object, specify the network and the bitstream name. Specify dlQuantObj
as
the network. Make sure that the bitstream name matches the data type and the FPGA board that
you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The
bitstream uses an int8 data type.
hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'zcu102_int8','Target',hTarget);
Compile the quantized series network.
dn = hW.compile
offset_name offset_address allocated_space "InputDataOffset" "0x00000000" "48.0 MB" "OutputResultOffset" "0x03000000" "4.0 MB" "SystemBufferOffset" "0x03400000" "28.0 MB" "InstructionDataOffset" "0x05000000" "4.0 MB" "ConvWeightDataOffset" "0x05400000" "4.0 MB" "FCWeightDataOffset" "0x05800000" "56.0 MB" "EndOffset" "0x09000000" "Total: 144.0 MB" dn = struct with fields: Operators: [1×1 struct] LayerConfigs: [1×1 struct] NetConfigs: [1×1 struct]
Run the deploy function of the dlhdl.Workflow
object to deploy the
network on the Xilinx ZCU102 SoC hardware. This function uses the output of the compile
function to program the FPGA board by using the programming file. It also downloads the
network weights and biases. The deploy function starts programming the FPGA device, displays
progress messages, and the time it takes to deploy the network.
hW.deploy
Load the example images and retrieve the prediction results.
idx = randperm(numel(imdsValidation.Files),4); figure for i = 1:4 subplot(2,2,i) I = readimage(imdsValidation,idx(i)); imshow(I) [prediction, speed] = hW.predict(single(I),'Profile','on'); [val, index] = max(prediction); netTransfer.Layers(end).ClassNames{index} label = netTransfer.Layers(end).ClassNames{index} title(string(label)); end
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7615557 0.05077 1 7616123 19.7 conv_module 3123657 0.02082 conv1 733903 0.00489 norm1 485953 0.00324 pool1 108979 0.00073 conv2 631639 0.00421 norm2 289646 0.00193 pool2 115286 0.00077 conv3 307112 0.00205 conv4 249627 0.00166 conv5 176223 0.00117 pool5 25404 0.00017 fc_module 4491900 0.02995 fc6 3083885 0.02056 fc7 1370258 0.00914 fc 37755 0.00025 * The clock frequency of the DL processor is: 150MHz ans = 'carlsberg'
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7615364 0.05077 1 7615905 19.7 conv_module 3123385 0.02082 conv1 733946 0.00489 norm1 485695 0.00324 pool1 108971 0.00073 conv2 631616 0.00421 norm2 289612 0.00193 pool2 115363 0.00077 conv3 307034 0.00205 conv4 249683 0.00166 conv5 176216 0.00117 pool5 25364 0.00017 fc_module 4491979 0.02995 fc6 3083961 0.02056 fc7 1370258 0.00914 fc 37758 0.00025 * The clock frequency of the DL processor is: 150MHz ans = 'pepsi'
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7615042 0.05077 1 7615582 19.7 conv_module 3123107 0.02082 conv1 733949 0.00489 norm1 485783 0.00324 pool1 108565 0.00072 conv2 631567 0.00421 norm2 289568 0.00193 pool2 115037 0.00077 conv3 307355 0.00205 conv4 249793 0.00167 conv5 176217 0.00117 pool5 25388 0.00017 fc_module 4491935 0.02995 fc6 3083920 0.02056 fc7 1370258 0.00914 fc 37755 0.00025 * The clock frequency of the DL processor is: 150MHz ans = 'tsingtao'
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance ResultsLastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7615303 0.05077 1 7615843 19.7 conv_module 3123324 0.02082 conv1 733883 0.00489 norm1 485688 0.00324 pool1 108995 0.00073 conv2 631598 0.00421 norm2 289636 0.00193 pool2 115351 0.00077 conv3 307108 0.00205 conv4 249623 0.00166 conv5 176193 0.00117 pool5 25364 0.00017 fc_module 4491979 0.02995 fc6 3083961 0.02056 fc7 1370258 0.00914 fc 37758 0.00025 * The clock frequency of the DL processor is: 150MHz ans = 'singha'