This example shows how to modify a pretrained MobileNet v2 network to create a SSD object detection network.
The procedure to convert a pretrained network into a SSD network is similar to the transfer learning procedure for image classification:
Load the pretrained network.
Select one or more layers from the pretrained network to use for feature extraction.
Remove all layers after the feature extraction layers
Add new layers to support the object detection task.
Load a pretrained MobileNet v2 network using mobilenetv2
. This requires the Deep Learning Toolbox Model for MobileNet v2 Network™ support package. If this support package is not installed, then the function provides a download link. After you load the network, convert the network into a layerGraph
object so that you can manipulate the layers.
net = mobilenetv2(); lgraph = layerGraph(net);
Update the network input size to meet the training data requirements. For example, assume the training data are 300-by-300 RGB images. Set the input size.
imageInputSize = [300 300 3];
Next, create a new image input layer with the same name as the original layer.
imgLayer = imageInputLayer(imageInputSize,"Name","input_1");
Replace the old image input layer with the new image input layer.
lgraph = replaceLayer(lgraph,"input_1",imgLayer);
SSD predict object locations using multiple feature maps. Typically, you choose feature extraction layers with different output sizes to leverage the benefit of multi-scale features. You can use the analyzeNetwork
function or the Deep Network Designer app to determine the output sizes of layers within a network. Note that selecting an optimal set feature extraction layers requires empirical evaluation.
For brevity, this example illustrates the use one feature extraction layer. Set the feature extraction layer to "block_12_add"
.
featureExtractionLayer = "block_12_add";
Next, remove the layers after the feature extraction layer. You can do so by importing the network into the Deep Network Designer app, manually removing the layers, and exporting the modified the network to your workspace.
For this example, load the modified network, which has been added to this example as a supporting file.
modified = load("mobilenetv2Block12Add.mat");
lgraph = modified.mobilenetv2Block12Add;
Specify the anchor boxes and number of object classes and use anchorBoxLayer to create an anchor box layer.
numClasses = 5; anchorBoxes = [ 16 16 32 16 ]; anchorBox = anchorBoxLayer(anchorBoxes,"Name","anchors");
Attach the anchor box layer to the feature extraction layer.
lgraph = addLayers(lgraph,anchorBox); lgraph = connectLayers(lgraph,"block_12_add","anchors");
Create a convolution layer where the number of convolution filters equals the numAnchors
times the numClasses + 1
. The additional class represents the background class.
numAnchors = size(anchorBoxes,1); numClassesPlusBackground = numClasses + 1; numClsFilters = numAnchors * numClassesPlusBackground; filterSize = 3; conv = convolution2dLayer(filterSize,numClsFilters,... "Name","convClassification",... "Padding","same");
Add and connect the convolution layer to the anchor box layer.
lgraph = addLayers(lgraph,conv); lgraph = connectLayers(lgraph,"anchors","convClassification");
Create a convolution layer where the number of convolution filters equals the four times number of anchor boxes.
numRegFilters = 4 * numAnchors; conv = convolution2dLayer(filterSize,numRegFilters,... "Name","convRegression",... "Padding","same");
Add and connect the convolution layer to the anchor box layer.
lgraph = addLayers(lgraph,conv); lgraph = connectLayers(lgraph,"anchors","convRegression");
Create an ssdMergeLayer
initialized with the number of classes and the number of feature extraction layers.
numFeatureExtractionLayers = numel(featureExtractionLayer); mergeClassification = ssdMergeLayer(numClassesPlusBackground,numFeatureExtractionLayers,... "Name","mergeClassification");
Add and connect the SSD merge layer to the convClassification
layer.
lgraph = addLayers(lgraph,mergeClassification); lgraph = connectLayers(lgraph,"convClassification","mergeClassification/in1");
Create an ssdMergeLayer
initialized with the number of coordinate offsets used to refine anchor box positions and the number of feature extraction layers.
numCoordinates = 4; mergeRegression = ssdMergeLayer(numCoordinates,numFeatureExtractionLayers,... "Name","mergeRegression");
Add and connect the SSD merge layer to the convRegression
layer.
lgraph = addLayers(lgraph,mergeRegression); lgraph = connectLayers(lgraph,"convRegression","mergeRegression/in1");
To complete the classification branch, create and attach a softmax layer and a focal loss layer.
clsLayers = [ softmaxLayer("Name","softmax") focalLossLayer("Name","focalLoss") ]; lgraph = addLayers(lgraph,clsLayers); lgraph = connectLayers(lgraph,"mergeClassification","softmax");
To complete the regression branch, create and attach a box regression layer.
reg = rcnnBoxRegressionLayer("Name","boxRegression"); lgraph = addLayers(lgraph,reg); lgraph = connectLayers(lgraph,"mergeRegression","boxRegression");
Use analyzeNetwork
to check the network.
analyzeNetwork(lgraph)
The SSD network is complete and can be trained using the trainSSDObjectDetector
function.