Create SSD Object Detection Network

This example uses:

This example shows how to modify a pretrained MobileNet v2 network to create a SSD object detection network.

The procedure to convert a pretrained network into a SSD network is similar to the transfer learning procedure for image classification:

Load the pretrained network.
Select one or more layers from the pretrained network to use for feature extraction.
Remove all layers after the feature extraction layers
Add new layers to support the object detection task.

Load Pretrained Network

Load a pretrained MobileNet v2 network using mobilenetv2. This requires the Deep Learning Toolbox Model for MobileNet v2 Network™ support package. If this support package is not installed, then the function provides a download link. After you load the network, convert the network into a layerGraph object so that you can manipulate the layers.

net = mobilenetv2();
lgraph = layerGraph(net);

Update Network Input Size

Update the network input size to meet the training data requirements. For example, assume the training data are 300-by-300 RGB images. Set the input size.

imageInputSize = [300 300 3];

Next, create a new image input layer with the same name as the original layer.

imgLayer = imageInputLayer(imageInputSize,"Name","input_1");

Replace the old image input layer with the new image input layer.

lgraph = replaceLayer(lgraph,"input_1",imgLayer);

Select Feature Extraction Layers

SSD predict object locations using multiple feature maps. Typically, you choose feature extraction layers with different output sizes to leverage the benefit of multi-scale features. You can use the analyzeNetwork function or the Deep Network Designer app to determine the output sizes of layers within a network. Note that selecting an optimal set feature extraction layers requires empirical evaluation.

For brevity, this example illustrates the use one feature extraction layer. Set the feature extraction layer to "block_12_add".

featureExtractionLayer = "block_12_add";

Remove Layers After Feature Extraction Layer

Next, remove the layers after the feature extraction layer. You can do so by importing the network into the Deep Network Designer app, manually removing the layers, and exporting the modified the network to your workspace.

For this example, load the modified network, which has been added to this example as a supporting file.

modified = load("mobilenetv2Block12Add.mat");
lgraph = modified.mobilenetv2Block12Add;

Attach AnchorBoxLayer

Specify the anchor boxes and number of object classes and use anchorBoxLayer to create an anchor box layer.

numClasses = 5;

anchorBoxes = [
    16 16
    32 16
    ];

anchorBox = anchorBoxLayer(anchorBoxes,"Name","anchors");

Attach the anchor box layer to the feature extraction layer.

lgraph = addLayers(lgraph,anchorBox);
lgraph = connectLayers(lgraph,"block_12_add","anchors");

Create SSD Classifcation Branch

Create a convolution layer where the number of convolution filters equals the numAnchors times the numClasses + 1. The additional class represents the background class.

numAnchors = size(anchorBoxes,1);
numClassesPlusBackground = numClasses + 1;
numClsFilters = numAnchors * numClassesPlusBackground;
filterSize = 3;
conv = convolution2dLayer(filterSize,numClsFilters,...
    "Name","convClassification",...
    "Padding","same");

Add and connect the convolution layer to the anchor box layer.

lgraph = addLayers(lgraph,conv);
lgraph = connectLayers(lgraph,"anchors","convClassification");

Create SSD Regression Branch

Create a convolution layer where the number of convolution filters equals the four times number of anchor boxes.

numRegFilters = 4 * numAnchors;
conv = convolution2dLayer(filterSize,numRegFilters,...
    "Name","convRegression",...
    "Padding","same");

Add and connect the convolution layer to the anchor box layer.

lgraph = addLayers(lgraph,conv);
lgraph = connectLayers(lgraph,"anchors","convRegression");

Merge Classification Features

Create an ssdMergeLayer initialized with the number of classes and the number of feature extraction layers.

numFeatureExtractionLayers = numel(featureExtractionLayer);
mergeClassification = ssdMergeLayer(numClassesPlusBackground,numFeatureExtractionLayers,...
    "Name","mergeClassification");

Add and connect the SSD merge layer to the convClassification layer.

lgraph = addLayers(lgraph,mergeClassification);
lgraph = connectLayers(lgraph,"convClassification","mergeClassification/in1");

Merge Regression Features

Create an ssdMergeLayer initialized with the number of coordinate offsets used to refine anchor box positions and the number of feature extraction layers.

numCoordinates = 4;
mergeRegression = ssdMergeLayer(numCoordinates,numFeatureExtractionLayers,...
    "Name","mergeRegression");

Add and connect the SSD merge layer to the convRegression layer.

lgraph = addLayers(lgraph,mergeRegression);
lgraph = connectLayers(lgraph,"convRegression","mergeRegression/in1");

Complete SSD Detection Network

To complete the classification branch, create and attach a softmax layer and a focal loss layer.

clsLayers = [
    softmaxLayer("Name","softmax")
    focalLossLayer("Name","focalLoss")
    ];

lgraph = addLayers(lgraph,clsLayers);
lgraph = connectLayers(lgraph,"mergeClassification","softmax");

To complete the regression branch, create and attach a box regression layer.

reg = rcnnBoxRegressionLayer("Name","boxRegression");

lgraph = addLayers(lgraph,reg);
lgraph = connectLayers(lgraph,"mergeRegression","boxRegression");

Use analyzeNetwork to check the network.

analyzeNetwork(lgraph)

The SSD network is complete and can be trained using the trainSSDObjectDetector function.

Documentation