Defect Detection

This example shows how to deploy a custom trained series network to detect defects in objects such as hexagon nuts. The custom networks were trained by using transfer learning. Transfer learning is commonly used in deep learning applications. You can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training signals. This example uses two trained series networks trainedDefNet.mat and trainedBlemDetNet.mat.

Prerequisites

  • Xilinx ZCU102 SoC development kit

  • Deep Learning HDL Toolbox™Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

Load Pretrained Networks

To download and load the custom pretrained series networks trainedDefNet and trainedBlemDetNet, enter:

if ~isfile('trainedDefNet.mat')
        url = 'https://www.mathworks.com/supportfiles/dlhdl/trainedDefNet.mat';
        websave('trainedDefNet.mat',url);
    end
    net1 = load('trainedDefNet.mat');
    snet_defnet = net1.custom_alexnet
snet_defnet = 
  SeriesNetwork with properties:

         Layers: [25×1 nnet.cnn.layer.Layer]
     InputNames: {'data'}
    OutputNames: {'output'}

Analyze snet_defnet layers.

    analyzeNetwork(snet_defnet)  

    
   
if ~isfile('trainedBlemDetNet.mat')
        url = 'https://www.mathworks.com/supportfiles/dlhdl/trainedBlemDetNet.mat';
        websave('trainedBlemDetNet.mat',url);
    end
    net2 = load('trainedBlemDetNet.mat');
    snet_blemdetnet = net2.convnet
snet_blemdetnet = 
  SeriesNetwork with properties:

         Layers: [12×1 nnet.cnn.layer.Layer]
     InputNames: {'imageinput'}
    OutputNames: {'classoutput'}

    analyzeNetwork(snet_blemdetnet)

Create Target Object

Create a target object that has a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use the JTAG connection, install the Xilinx(TM) Vivado(TM) Design Suite 2019.2.

To set the Xilinx Vivado toolpath, enter:

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2019.2\bin\vivado.bat');
    hT = dlhdl.Target('Xilinx','Interface','Ethernet')
hT = 
  Target with properties:

       Vendor: 'Xilinx'
    Interface: Ethernet
    IPAddress: '10.10.10.15'
     Username: 'root'
         Port: 22

Create Workflow Object for trainedDefNet Network

Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify the saved pretrained trainedDefNet as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('Network',snet_defnet,'Bitstream','zcu102_single','Target',hT)
hW = 
  Workflow with properties:

            Network: [1×1 SeriesNetwork]
          Bitstream: 'zcu102_single'
    ProcessorConfig: []
             Target: [1×1 dlhdl.Target]

Compile trainedDefNet Series Network

To compile the trainedDefnet series network, run the compile function of the dlhdl.Workflow object .

hW.compile
          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "8.0 MB"         
    "OutputResultOffset"        "0x00800000"     "4.0 MB"         
    "SystemBufferOffset"        "0x00c00000"     "28.0 MB"        
    "InstructionDataOffset"     "0x02800000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x02c00000"     "12.0 MB"        
    "FCWeightDataOffset"        "0x03800000"     "84.0 MB"        
    "EndOffset"                 "0x08c00000"     "Total: 140.0 MB"
ans = struct with fields:
       Operators: [1×1 struct]
    LayerConfigs: [1×1 struct]
      NetConfigs: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

hW.deploy
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.

Run Prediction for One Image

Load an image from the attached testImages folder, resize the image to match the network image input layer dimensions, and run the predict function of the dlhdl.Workflow object to retrieve and display the defect prediction from the FPGA.

wi = uint32(320);
he = uint32(240);
ch = uint32(3);

filename=[pwd,'\ng1.png'];
img=imread(filename);
img = imresize(img, [he, wi]);
img = mat2ocv(img);

    % Extract ROI for preprocessing
    [Iori, imgPacked, num, bbox] = myNDNet_Preprocess(img);

    % row-major > column-major conversion
    imgPacked2 = zeros([128,128,4],'uint8');
    for c = 1:4
        for i = 1:128
            for j = 1:128
                imgPacked2(i,j,c) = imgPacked((i-1)*128 + (j-1) + (c-1)*128*128 + 1);
            end
        end
    end

    % Classify detected nuts by using CNN
    scores = zeros(2,4);
    for i = 1:num
         [scores(:,i), speed] = hW.predict(single(imgPacked2(:,:,i)),'Profile','on');
    end
### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   12199544                  0.05545                       1           12199586             18.0
    conv_module            3292478                  0.01497 
        conv1               412777                  0.00188 
        norm1               173433                  0.00079 
        pool1                58705                  0.00027 
        conv2               656607                  0.00298 
        norm2               128094                  0.00058 
        pool2                53221                  0.00024 
        conv3               780491                  0.00355 
        conv4               600179                  0.00273 
        conv5               409095                  0.00186 
        pool5                19991                  0.00009 
    fc_module              8907066                  0.04049 
        fc6                1759795                  0.00800 
        fc7                7030223                  0.03196 
        fc8                 117046                  0.00053 
 * The clock frequency of the DL processor is: 220MHz
    Iori = reshape(Iori, [1, he*wi*ch]);
    bbox = reshape(bbox, [1,16]);
    scores = reshape(scores, [1, 8]);

    % Insert an annotation for postprocessing
    out = myNDNet_Postprocess(Iori, num, bbox, scores, wi, he, ch);

    sz = [he wi ch];
    out = ocv2mat(out,sz);
    imshow(out)

    

Create Workflow Object for trainedBlemDetNet Network

Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify the saved pretrained trainedblemDetNet as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SOC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('Network',snet_blemdetnet,'Bitstream','zcu102_single','Target',hT)
hW = 
  Workflow with properties:

            Network: [1×1 SeriesNetwork]
          Bitstream: 'zcu102_single'
    ProcessorConfig: []
             Target: [1×1 dlhdl.Target]

Compile trainedBlemDetNet Series Network

To compile the trainedBlemDetNet series network, run the compile function of the dlhdl.Workflow object.

hW.compile
          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "8.0 MB"        
    "OutputResultOffset"        "0x00800000"     "4.0 MB"        
    "SystemBufferOffset"        "0x00c00000"     "28.0 MB"       
    "InstructionDataOffset"     "0x02800000"     "4.0 MB"        
    "ConvWeightDataOffset"      "0x02c00000"     "4.0 MB"        
    "FCWeightDataOffset"        "0x03000000"     "36.0 MB"       
    "EndOffset"                 "0x05400000"     "Total: 84.0 MB"
ans = struct with fields:
       Operators: [1×1 struct]
    LayerConfigs: [1×1 struct]
      NetConfigs: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

 hW.deploy
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Loading weights to FC Processor.
### 50% finished, current time is 28-Jun-2020 12:33:36.
### FC Weights loaded. Current time is 28-Jun-2020 12:33:37

Run Prediction for One Image

Load an image from the attached testImages folder, resize the image to match the network image input layer dimensions, and run the predict function of the dlhdl.Workflow object to retrieve and display the defect prediction from the FPGA.

wi = uint32(320);
he = uint32(240);
ch = uint32(3);

filename=[pwd,'\ok1.png'];
img=imread(filename);
img = imresize(img, [he, wi]);
img = mat2ocv(img);

    % Extract ROI for preprocessing
    [Iori, imgPacked, num, bbox] = myNDNet_Preprocess(img);

    % row-major > column-major conversion
    imgPacked2 = zeros([128,128,4],'uint8');
    for c = 1:4
        for i = 1:128
            for j = 1:128
                imgPacked2(i,j,c) = imgPacked((i-1)*128 + (j-1) + (c-1)*128*128 + 1);
            end
        end
    end

    % classify detected nuts by using CNN
    scores = zeros(2,4);
    for i = 1:num
         [scores(:,i), speed] = hW.predict(single(imgPacked2(:,:,i)),'Profile','on');
    end
### Finished writing input activations.
### Running single input activations.
              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    4886257                  0.02221                       1            4886299             45.0
    conv_module            1256664                  0.00571 
        conv_1              467349                  0.00212 
        maxpool_1           191204                  0.00087 
        crossnorm           159553                  0.00073 
        conv_2              397552                  0.00181 
        maxpool_2            41066                  0.00019 
    fc_module              3629593                  0.01650 
        fc_1               3614829                  0.01643 
        fc_2                 14763                  0.00007 
 * The clock frequency of the DL processor is: 220MHz
    Iori = reshape(Iori, [1, he*wi*ch]);
    bbox = reshape(bbox, [1,16]);
    scores = reshape(scores, [1, 8]);

    % Insert annotation for postprocessing
    out = myNDNet_Postprocess(Iori, num, bbox, scores, wi, he, ch);

    sz = [he wi ch];
    out = ocv2mat(out,sz);
    imshow(out)