Logo Recognition Network

This example shows code generation for a logo classification application that uses deep learning. It uses the codegen command to generate a MEX function that performs prediction on a SeriesNetwork object called LogoNet.

Prerequisites

  • CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

Verify GPU Environment

Use the coder.checkGpuInstall function to verify that the compilers and libraries necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

The Logo Recognition Network

Logos assist users in brand identification and recognition. Many companies incorporate their logos in advertising, documentation materials, and promotions. The logo recognition network (logonet) was developed in MATLAB® and can recognize 32 logos under various lighting conditions and camera motions. Because this network focuses only on recognition, you can use it in applications where localization is not required.

Training the Network

The network is trained in MATLAB by using training data that contains around 200 images for each logo. Because the number of images for training the network is small, data augmentation increases the number of training samples. Four types of data augmentation are used: contrast normalization, Gaussian blur, random flipping, and shearing. This data augmentation helps in recognizing logos in images captured by different lighting conditions and camera motions. The input size for logonet is [227 227 3]. Standard SGDM trains by using a learning rate of 0.0001 for 40 epochs with a mini-batch size of 45. The trainLogonet.m helper script demonstrates the data augmentation on a sample image, architecture of the logonet, and training options.

Get Pretrained SeriesNetwork

Download the logonet network and save it to LogoNet.mat.

getLogonet();

The saved network contains 22 layers including convolution, fully connected, and the classification output layers.

load('LogoNet.mat');
convnet.Layers
ans = 

  22x1 Layer array with layers:

     1   'imageinput'    Image Input             227x227x3 images with 'zerocenter' normalization and 'randfliplr' augmentations
     2   'conv_1'        Convolution             96 5x5x3 convolutions with stride [1  1] and padding [0  0  0  0]
     3   'relu_1'        ReLU                    ReLU
     4   'maxpool_1'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'        Convolution             128 3x3x96 convolutions with stride [1  1] and padding [0  0  0  0]
     6   'relu_2'        ReLU                    ReLU
     7   'maxpool_2'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'conv_3'        Convolution             384 3x3x128 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'relu_3'        ReLU                    ReLU
    10   'maxpool_3'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    11   'conv_4'        Convolution             128 3x3x384 convolutions with stride [2  2] and padding [0  0  0  0]
    12   'relu_4'        ReLU                    ReLU
    13   'maxpool_4'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    14   'fc_1'          Fully Connected         2048 fully connected layer
    15   'relu_5'        ReLU                    ReLU
    16   'dropout_1'     Dropout                 50% dropout
    17   'fc_2'          Fully Connected         2048 fully connected layer
    18   'relu_6'        ReLU                    ReLU
    19   'dropout_2'     Dropout                 50% dropout
    20   'fc_3'          Fully Connected         32 fully connected layer
    21   'softmax'       Softmax                 softmax
    22   'classoutput'   Classification Output   crossentropyex with 'adidas' and 31 other classes

The logonet_predict Entry-Point Function

The logonet_predict.m entry-point function takes an image input and performs prediction on the image using the deep learning network saved in the LogoNet.mat file. The function loads the network object from LogoNet.mat into a persistent variable logonet and reuses the persistent variable on subsequent prediction calls.

type('logonet_predict.m')
function out = logonet_predict(in)
%#codegen

% Copyright 2017-2019 The MathWorks, Inc.

persistent logonet;

if isempty(logonet)
    
    logonet = coder.loadDeepLearningNetwork('LogoNet.mat','logonet');
end

out = logonet.predict(in);

end

Generate CUDA MEX for the logonet_predict Function

Create a GPU configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig function to create a CuDNN deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. To generate CUDA MEX, use the codegen command and specify the input to be of size [227,227,3]. This value corresponds to the input layer size of the logonet network.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg logonet_predict -args {ones(227,227,3,'uint8')} -report
Code generation successful: To view the report, open('codegen/mex/logonet_predict/html/report.mldatx').

Run Generated MEX

Load an input image. Call logonet_predict_mex on the input image.

im = imread('test.png');
imshow(im);
im = imresize(im, [227,227]);
predict_scores = logonet_predict_mex(im);

Map the top five prediction scores to words in the Wordnet dictionary synset (logos).

synsetOut = {'adidas', 'aldi', 'apple', 'becks', 'bmw', 'carlsberg', ...
    'chimay', 'cocacola', 'corona', 'dhl', 'erdinger', 'esso', 'fedex',...
    'ferrari', 'ford', 'fosters', 'google', 'guinness', 'heineken', 'hp',...
    'milka', 'nvidia', 'paulaner', 'pepsi', 'rittersport', 'shell', 'singha', 'starbucks', 'stellaartois', 'texaco', 'tsingtao', 'ups'};

[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
top5labels = synsetOut(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow], [top5labels{k},' ',num2str(scores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Clear the static network object that was loaded in memory.

clear mex;

Related Topics