trainSSDObjectDetector

Train an SSD deep learning object detector

Description

Train a Detector

example

trainedDetector = trainSSDObjectDetector(trainingData,lgraph,options) trains a single shot multibox detector (SSD) using deep learning. You can train an SSD detector to detect multiple object classes.

This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.

[trainedDetector,info] = trainSSDObjectDetector(___) also returns information on the training progress, such as training loss and accuracy, for each iteration.

Resume Training a Detector

trainedDetector = trainSSDObjectDetector(trainingData,checkpoint,options) resumes training from a detector checkpoint.

Fine-Tune a Detector

trainedDetector = trainSSDObjectDetector(trainingData,detector,options) continues training an SSD multibox object detector with additional fine-tuning options. Use this syntax with additional training data or to perform more training iterations to improve detector accuracy.

Additional Properties

trainedDetector = trainSSDObjectDetector(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments and any of the previous inputs.

Examples

collapse all

Load the training data for vehicle detection into the workspace.

data = load('vehicleTrainingData.mat');

trainingData = data.vehicleTrainingData;

Specify the directory in which training samples are stored. Add full path to the file names in training data.

dataDir = fullfile(toolboxdir('vision'),'visiondata');
trainingData.imageFilename = fullfile(dataDir,trainingData.imageFilename);

Create an image datastore using the files from the table.

imds = imageDatastore(trainingData.imageFilename);

Create a box label datastore using the label columns from the table.

blds = boxLabelDatastore(trainingData(:,2:end));

Combine the datastores.

ds = combine(imds, blds);

Load a preinitialized SSD object detection network.

net = load('ssdVehicleDetector.mat');
lgraph = net.lgraph
lgraph = 
  LayerGraph with properties:

         Layers: [23×1 nnet.cnn.layer.Layer]
    Connections: [24×2 table]
     InputNames: {'input'}
    OutputNames: {'focalLoss'  'anchorBoxRegression'}

Inspect the layers in the SSD network and their properties. You can also create the SSD network by following the steps given in Create SSD Object Detection Network.

lgraph.Layers
ans = 
  23x1 Layer array with layers:

     1   'input'                 Image Input             300x300x3 images
     2   'conv_1'                Convolution             16 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   'relu_1'                ReLU                    ReLU
     4   'maxpool1'              Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'                Convolution             32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'relu_2'                ReLU                    ReLU
     7   'maxpool2'              Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'conv_3'                Convolution             64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     9   'relu_3'                ReLU                    ReLU
    10   'maxpool3'              Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    11   'conv_4'                Convolution             128 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
    12   'relu_4'                ReLU                    ReLU
    13   'confmerge'             SSD Merge Layer.        SSD Merge Layer.
    14   'locmerge'              SSD Merge Layer.        SSD Merge Layer.
    15   'relu_4_anchorbox'      Anchor Box Layer.       Anchor Box Layer.
    16   'relu_4_mbox_conf'      Convolution             8 3x3x128 convolutions with stride [1  1] and padding [1  1  1  1]
    17   'relu_4_mbox_loc'       Convolution             16 3x3x128 convolutions with stride [1  1] and padding [1  1  1  1]
    18   'relu_3_anchorbox'      Anchor Box Layer.       Anchor Box Layer.
    19   'relu_3_mbox_conf'      Convolution             12 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
    20   'relu_3_mbox_loc'       Convolution             24 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
    21   'anchorBoxSoftmax'      Softmax                 softmax
    22   'focalLoss'             Focal Loss Layer.       Focal Loss Layer.
    23   'anchorBoxRegression'   Box Regression Output   smooth-l1 loss

Configure the network training options.

options = trainingOptions('sgdm',...
          'InitialLearnRate',5e-5,...
          'MiniBatchSize',16,...
          'Verbose',true,...
          'MaxEpochs',50,...
          'Shuffle','every-epoch',...
          'VerboseFrequency',10,...
          'CheckpointPath',tempdir);

Train the SSD network.

[detector,info] = trainSSDObjectDetector(ds,lgraph,options);
*************************************************************************
Training an SSD Object Detector for the following object classes:

* vehicle

Training on single GPU.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |     RMSE     |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:04 |       250.16 |        197.6 |      5.0000e-05 |
|       4 |          10 |       00:00:23 |        26.19 |         20.7 |      5.0000e-05 |
|       7 |          20 |       00:00:42 |         8.35 |          5.4 |      5.0000e-05 |
|      10 |          30 |       00:01:01 |         3.02 |          2.3 |      5.0000e-05 |
|      14 |          40 |       00:01:25 |         1.82 |          1.0 |      5.0000e-05 |
|      17 |          50 |       00:01:44 |         1.20 |          0.7 |      5.0000e-05 |
|      20 |          60 |       00:02:02 |         1.39 |          0.8 |      5.0000e-05 |
|      24 |          70 |       00:02:26 |         1.17 |          0.6 |      5.0000e-05 |
|      27 |          80 |       00:02:44 |         1.07 |          0.6 |      5.0000e-05 |
|      30 |          90 |       00:03:03 |         1.15 |          0.6 |      5.0000e-05 |
|      34 |         100 |       00:03:27 |         1.02 |          0.5 |      5.0000e-05 |
|      37 |         110 |       00:03:46 |         1.20 |          0.6 |      5.0000e-05 |
|      40 |         120 |       00:04:04 |         1.12 |          0.6 |      5.0000e-05 |
|      44 |         130 |       00:04:29 |         1.16 |          0.6 |      5.0000e-05 |
|      47 |         140 |       00:04:49 |         1.19 |          0.6 |      5.0000e-05 |
|      50 |         150 |       00:05:09 |         1.08 |          0.6 |      5.0000e-05 |
|========================================================================================|
Detector training complete.
*************************************************************************

Inspect the properties of the detector.

detector
detector = 
  ssdObjectDetector with properties:

     ModelName: 'vehicle'
       Network: [1×1 DAGNetwork]
    ClassNames: {'vehicle'  'Background'}

You can verify the training accuracy by inspecting the training loss for each iteration.

figure
plot(info.TrainingLoss)
grid on
xlabel('Number of Iterations')
ylabel('Training Loss for Each Iteration')

Test the SSD detector on a test image.

img = imread('ssdTestDetect.png');

Run the SSD object detector on the image for vehicle detection.

[bboxes,scores] = detect(detector,img);

Display the detection results.

if(~isempty(bboxes))
    img = insertObjectAnnotation(img,'rectangle',bboxes,scores);
end
figure
imshow(img)

Input Arguments

collapse all

Labeled ground truth images, specified as a datastore or a table.

  • If you use a datastore, your data must be set up so that calling the datastore with the read and readall functions returns a cell array or table with two or three columns. When the output contains two columns, the first column must contain bounding boxes, and the second column must contain labels, {boxes,labels}. When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data.

    databoxeslabels
    The first column can contain data, such as point cloud data or images.The second column must be a cell array that contains M-by-5 matrices of bounding boxes of the form [xcenter, ycenter, width, height, yaw]. The vectors represent the location and size of bounding boxes for the objects in each image.The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories.

    For more information, see Datastores for Deep Learning (Deep Learning Toolbox).

Layer graph, specified as a LayerGraph object. The layer graph contains the architecture of the SSD multibox network. You can create this network by using the ssdLayers function or create a custom network. For more information, see Getting Started with SSD Multibox Detection.

Previously trained SSD object detector, specified as a ssdObjectDetector object. Use this syntax to continue training a detector with additional training data or to perform more training iterations to improve detector accuracy.

Training options, specified as a TrainingOptionsSGDM, TrainingOptionsRMSProp, or TrainingOptionsADAM object returned by the trainingOptions (Deep Learning Toolbox) function. To specify the solver name and other options for network training, use the trainingOptions (Deep Learning Toolbox) function.

Note

The trainSSDObjectDetector function does not support these training options:

  • The OutputFcn option.

  • Datastore inputs are not supported when you set the DispatchInBackground training option to true.

Saved detector checkpoint, specified as an ssdObjectDetector object. To save the detector after every epoch, set the 'CheckpointPath' name-value argument when using the trainingOptions function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the CheckpointPath property of the object specified by options is '/checkpath', you can load a checkpoint MAT-file by using this code.

data = load('/checkpath/ssd_checkpoint__216__2018_11_16__13_34_30.mat');
checkpoint = data.detector;

The name of the MAT-file includes the iteration number and timestamp of when the detector checkpoint was saved. The detector is saved in the detector variable of the file. Pass this file back into the trainSSDObjectDetector function:

ssdDetector = trainSSDObjectDetector(trainingData,checkpoint,options);

Output Arguments

collapse all

Trained SSD object detector, returned as ssdObjectDetector object. You can train a SSD object detector to detect multiple object classes.

Training progress information, returned as a structure array with eight fields. Each field corresponds to a stage of training.

  • TrainingLoss — Training loss at each iteration is the mean squared error (MSE) calculated as the sum of localization error, confidence loss, and classification loss. For more information about the training loss function, see Training Loss.

  • TrainingAccuracy — Training set accuracy at each iteration.

  • TrainingRMSE — Training root mean squared error (RMSE) is the RMSE calculated from the training loss at each iteration.

  • BaseLearnRate — Learning rate at each iteration.

  • ValidationLoss — Validation loss at each iteration.

  • ValidationAccuracy — Validation accuracy at each iteration.

  • ValidationRMSE — Validation RMSE at each iteration.

  • FinalValidationLoss — Final validation loss at end of the training.

  • FinalValidationRMSE — Final validation RMSE at end of the training.

Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are assigned as NaN. The struct contains ValidationLoss, ValidationAccuracy, ValidationRMSE, FinalValidationLoss, and FinalValidationRMSE fields only when options specifies validation data.

References

[1] W. Liu, E. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Fu, and A.C. Berg. "SSD: Single Shot MultiBox Detector." European Conference on Computer Vision (ECCV), Springer Verlag, 2016

Introduced in R2020a