trainRCNNObjectDetector

Train an R-CNN deep learning object detector

collapse all in page

Syntax

detector = trainRCNNObjectDetector(trainingData,network,options)

detector = trainRCNNObjectDetector(___,Name,Value)

detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)

[detector,info] = trainRCNNObjectDetector(___)

Description

example

detector = trainRCNNObjectDetector(trainingData,network,options) trains an R-CNN (regions with convolutional neural networks) based object detector. The function uses deep learning to train the detector to detect multiple object classes.

This implementation of R-CNN does not train an SVM classifier for each object class.

This function requires that you have Deep Learning Toolbox™ and Statistics and Machine Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA^®-enabled NVIDIA^® GPU with compute capability 3.0 or higher.

detector = trainRCNNObjectDetector(___,Name,Value) returns a detector object with optional input properties specified by one or more Name,Value pair arguments.

detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn) optionally trains an R-CNN detector using a custom region proposal function.

[detector,info] = trainRCNNObjectDetector(___) also returns information on the training progress, such as training loss and accuracy, for each iteration.

Examples

collapse all

Train R-CNN Stop Sign Detector

This example uses:

Open Script

Load training data and network layers.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

Add the image directory to the MATLAB path.

imDir = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata',...
  'stopSignImages');
addpath(imDir);

Set network training options to use mini-batch size of 32 to reduce GPU memory usage. Lower the InitialLearningRate to reduce the rate at which network parameters are changed. This is beneficial when fine-tuning a pre-trained network and prevents the network from changing too rapidly.

options = trainingOptions('sgdm', ...
  'MiniBatchSize', 32, ...
  'InitialLearnRate', 1e-6, ...
  'MaxEpochs', 10);

Train the R-CNN detector. Training can take a few minutes to complete.

rcnn = trainRCNNObjectDetector(stopSigns, layers, options, 'NegativeOverlapRange', [0 0.3]);

*******************************************************************
Training an R-CNN Object Detector for the following object classes:

* stopSign

Step 1 of 3: Extracting region proposals from 27 training images...done.

Step 2 of 3: Training a neural network to classify objects in training data...

|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            3 |           50 |         9.27 |       0.2895 |       96.88% |     0.000001 |
|            5 |          100 |        14.77 |       0.2443 |       93.75% |     0.000001 |
|            8 |          150 |        20.29 |       0.0013 |      100.00% |     0.000001 |
|           10 |          200 |        25.94 |       0.1524 |       96.88% |     0.000001 |
|=========================================================================================|

Network training complete.

Step 3 of 3: Training bounding box regression models for each object class...100.00%...done.

R-CNN training complete.
*******************************************************************

Test the R-CNN detector on a test image.

img = imread('stopSignTest.jpg');

[bbox, score, label] = detect(rcnn, img, 'MiniBatchSize', 32);

Display strongest detection result.

[score, idx] = max(score);

bbox = bbox(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

detectedImg = insertObjectAnnotation(img, 'rectangle', bbox, annotation);

figure
imshow(detectedImg)

Remove the image directory from the path.

rmpath(imDir);

Resume Training an R-CNN Object Detector

This example uses:

Open Script

Resume training an R-CNN object detector using additional data. To illustrate this procedure, half the ground truth data will be used to initially train the detector. Then, training is resumed using all the data.

Load training data and initialize training options.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...
    stopSigns.imageFilename);

options = trainingOptions('sgdm', ...
    'MiniBatchSize', 32, ...
    'InitialLearnRate', 1e-6, ...
    'MaxEpochs', 10, ...
    'Verbose', false);

Train the R-CNN detector with a portion of the ground truth.

rcnn = trainRCNNObjectDetector(stopSigns(1:10,:), layers, options, 'NegativeOverlapRange', [0 0.3]);

Get the trained network layers from the detector. When you pass in an array of network layers to trainRCNNObjectDetector, they are used as-is to continue training.

network = rcnn.Network;
layers = network.Layers;

Resume training using all the training data.

rcnnFinal = trainRCNNObjectDetector(stopSigns, layers, options);

Create a network for multiclass R-CNN object detection

This example uses:

Open Live Script

Create an R-CNN object detector for two object classes: dogs and cats.

objectClasses = {'dogs','cats'};

The network must be able to classify both dogs, cats, and a "background" class in order to be trained using trainRCNNObjectDetector. In this example, a one is added to include the background.

numClassesPlusBackground = numel(objectClasses) + 1;

The final fully connected layer of a network defines the number of classes that the network can classify. Set the final fully connected layer to have an output size equal to the number of classes plus a background class.

layers = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(5,20)        
    fullyConnectedLayer(numClassesPlusBackground);
    softmaxLayer()
    classificationLayer()];

These network layers can now be used to train an R-CNN two-class object detector.

Use A Saved Network In R-CNN Object Detector

This example uses:

Open Script

Create an R-CNN object detector and set it up to use a saved network checkpoint. A network checkpoint is saved every epoch during network training when the trainingOptions 'CheckpointPath' parameter is set. Network checkpoints are useful in case your training session terminates unexpectedly.

Load the stop sign training data.

load('rcnnStopSigns.mat','stopSigns','layers')

Add full path to image files.

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...
      stopSigns.imageFilename);

Set the 'CheckpointPath' using the trainingOptions function.

checkpointLocation = tempdir;
options = trainingOptions('sgdm','Verbose',false, ...
    'CheckpointPath',checkpointLocation);

Train the R-CNN object detector with a few images.

rcnn = trainRCNNObjectDetector(stopSigns(1:3,:),layers,options);

Load a saved network checkpoint.

wildcardFilePath = fullfile(checkpointLocation,'convnet_checkpoint__*.mat');
contents = dir(wildcardFilePath);

Load one of the checkpoint networks.

filepath = fullfile(contents(1).folder,contents(1).name);
checkpoint = load(filepath);

checkpoint.net

ans = 

  SeriesNetwork with properties:

    Layers: [15×1 nnet.cnn.layer.Layer]

Create a new R-CNN object detector and set it up to use the saved network.

rcnnCheckPoint = rcnnObjectDetector();
rcnnCheckPoint.RegionProposalFcn = @rcnnObjectDetector.proposeRegions;

Set the Network to the saved network checkpoint.

rcnnCheckPoint.Network = checkpoint.net

rcnnCheckPoint = 

  rcnnObjectDetector with properties:

              Network: [1×1 SeriesNetwork]
           ClassNames: {'stopSign'  'Background'}
    RegionProposalFcn: @rcnnObjectDetector.proposeRegions

Input Arguments

collapse all

`trainingData` — Labeled ground truth images
table

Labeled ground truth images, specified as a table with two or more columns.

If you use a table, the table must have two or more columns. The first column of the table must contain image file names with paths. The images must be grayscale or truecolor (RGB) and they can be in any format supported by imread. Each of the remaining columns must be a cell vector that contains M-by-4 matrices that represent a single object class, such as vehicle, flower, or stop sign. The columns contain 4-element double arrays of M bounding boxes in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the bounding box in the corresponding image. To create a ground truth table, you can use the Image Labeler app or Video Labeler app. To create a table of training data from the generated ground truth, use the objectDetectorTrainingData function.

The table variable name defines the object class name. To create the ground truth table, use the Image Labeler app. Boxes smaller than 32-by-32 are not used for training.

`network` — Network
`SeriesNetwork` object | array of `Layer` objects | `LayerGraph` object | network name

Network, specified as a SeriesNetwork (Deep Learning Toolbox), an array of Layer (Deep Learning Toolbox) objects, a layerGraph (Deep Learning Toolbox) object, or by the network name. The network is trained to classify the object classes defined in the trainingData table. The SeriesNetwork (Deep Learning Toolbox), Layer (Deep Learning Toolbox), and layerGraph (Deep Learning Toolbox) objects are available in the Deep Learning Toolbox.

When you specify the network as a SeriesNetwork, an array of Layer objects, or by the network name, the network is automatically transformed into a R-CNN network by adding new classification and regression layers to support object detection.
The array of Layer (Deep Learning Toolbox) objects must contain a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. An example of an array of Layer (Deep Learning Toolbox) objects:
```
layers = [imageInputLayer([28 28 3])
        convolution2dLayer([5 5],10)
        reluLayer()
        fullyConnectedLayer(10)
        softmaxLayer()
        classificationLayer()];
```
When you specify the network as SeriesNetwork, Layer array, or network by name, the weights for convolution and fully-connected layers are initialized to 'narrow-normal'.
The network name must be one of the following valid networks names. You must also install the corresponding Add-on.
- 'alexnet (Deep Learning Toolbox)'
- 'vgg16 (Deep Learning Toolbox)'
- 'vgg19 (Deep Learning Toolbox)'
- resnet18 (Deep Learning Toolbox)
- 'resnet50 (Deep Learning Toolbox)'
- 'resnet101 (Deep Learning Toolbox)'
- 'inceptionv3 (Deep Learning Toolbox)'
- 'googlenet (Deep Learning Toolbox)'
- 'inceptionresnetv2 (Deep Learning Toolbox)'
- 'mobilenetv2 (Deep Learning Toolbox)'
- 'squeezenet (Deep Learning Toolbox)'
The LayerGraph object must be a valid R-CNN object detection network. You can also use a LayerGraph object to train a custom R-CNN network.

See Getting Started with R-CNN, Fast R-CNN, and Faster R-CNN to learn more about how to create a R-CNN network.

`options` — Training options
`traingingOptions` output

Training options, returned by the trainingOptions (Deep Learning Toolbox) function from the Deep Learning Toolbox. To specify solver and other options for network training, use trainingOptions.

Note

trainRCNNObjectDetector does not support these training options:

The ValidationData, ValidationFrequency, or ValidationPatience options
The OutputFcn option.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'PositiveOverlapRange',

[0.5
1]

`'PositiveOverlapRange'` — Positive training sample ratios for range of bounding box overlap
`[0.5 1]` (default) | two-element vector

Positive training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of 'PositiveOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:

$\frac{a r e a (A \cap B)}{a r e a (A \cup B)}$

A and B are bounding boxes.

`'NegativeOverlapRange'` — Negative training sample ratios for range of bounding box overlap
`[0.1 0.5]` (default) | two-element vector

Negative training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of 'NegativeOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

`'NumStrongestRegions'` — Maximum number of strongest region proposals
`2000` (default) | integer

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of 'NumStrongestRegions' and an integer. Reduce this value to speed up processing time, although doing so decreases training accuracy. To use all region proposals, set this value to inf.

`'RegionProposalFcn'` — Custom region proposal
function handle

Custom region proposal function handle, specified as the comma-separated pair consisting of 'RegionProposalFcn' and the function name. If you do not specify a custom region proposal function, the default variant of the Edge Boxes algorithm [3], set in rcnnObjectDetector, is used. A custom proposalFcn must have the following functional form:

 [bboxes,scores] = proposalFcn(I)

The input, I, is an image defined in the groundTruth table. The function must return rectangular bounding boxes in an M-by-4 array. Each row of bboxes contains a four-element vector, [x,y,width,height], that specifies the upper–left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an M-by-1 vector. Higher scores indicate that the bounding box is more likely to contain an object. The scores are used to select the strongest regions, which you can specify in NumStrongestRegions.

`'BoxRegressionLayer'` — Box regression layer name
`'auto'` (default) | character vector

Box regression layer name, specified as the comma-separated pair consisting of 'BoxRegressionLayer' and a character vector. Valid values are 'auto' or the name of a layer in the input network. The output activations of this layer are used as features to train a regression model for refining the detected bounding boxes.

If the name is 'auto', then trainRCNNObjectDetector automatically selects a layer from the input network based on the type of input network:

If the input network is a SeriesNetwork or an array of Layer objects, then the function selects the last convolution layer.
If the input network is a LayerGraph, then the function selects the source of the last fully connected layer.

Output Arguments

collapse all

`detector` — Trained R-CNN-based object detector
`rcnnObjectDetector` object

Trained R-CNN-based object detector, returned as an rcnnObjectDetector object. You can train an R-CNN detector to detect multiple object classes.

`info` — Training information
structure

Training information, returned as a structure with the following fields. Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are represented by NaN.

TrainingLoss — Training loss at each iteration. This is the combination of the classification and regression loss used to train the R-CNN network.
TrainingAccuracy — Training set accuracy at each iteration
BaseLearnRate — Learning rate at each iteration

Limitations

This implementation of R-CNN does not train an SVM classifier for each object class.

Tips

To accelerate data preprocessing for training, trainRCNNObjectDetector automatically creates and uses a parallel pool based on your parallel preference settings. This requires Parallel Computing Toolbox.
VGG-16, VGG-19, ResNet-101, and Inception-ResNet-v2 are large models. Training with large images may produce "Out of Memory" errors. To mitigate these errors, manually resize the images along with the bounding box ground truth data before calling trainRCNNObjectDetector.
This function supports transfer learning. When a network is input by name, such as 'resnet50', then the software automatically transforms the network into a valid R-CNN network model based on the pretrained resnet50 (Deep Learning Toolbox) model. Alternatively, manually specify a custom R-CNN network using the LayerGraph (Deep Learning Toolbox) extracted from a pretrained DAG network. See Create R-CNN Object Detection Network.
Use the trainingOptions (Deep Learning Toolbox) function to enable or disable verbose printing.

References

[1] Girshick, R., J. Donahue, T. Darrell, and J. Malik. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, pp. 580–587.

[2] Girshick, R. “Fast R-CNN.” Proceedings of the IEEE International Conference on Computer Vision. 2015, pp. 1440–1448.

[3] Zitnick, C. Lawrence, and P. Dollar. “Edge Boxes: Locating Object Proposals from Edges.” Computer Vision-ECCV, Springer International Publishing. 2014, pp. 391–405.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set 'UseParallel' to true or enable this by default using the Computer Vision Toolbox™ preferences.

For more information, see Parallel Computing Toolbox Support.

Documentation

trainRCNNObjectDetector

Syntax

Description

Examples

Train R-CNN Stop Sign Detector

Resume Training an R-CNN Object Detector

Create a network for multiclass R-CNN object detection

Use A Saved Network In R-CNN Object Detector

Input Arguments

`trainingData` — Labeled ground truth images
table

`network` — Network
`SeriesNetwork` object | array of `Layer` objects | `LayerGraph` object | network name

`options` — Training options
`traingingOptions` output

Name-Value Pair Arguments

`'PositiveOverlapRange'` — Positive training sample ratios for range of bounding box overlap
`[0.5 1]` (default) | two-element vector

`'NegativeOverlapRange'` — Negative training sample ratios for range of bounding box overlap
`[0.1 0.5]` (default) | two-element vector

`'NumStrongestRegions'` — Maximum number of strongest region proposals
`2000` (default) | integer

`'RegionProposalFcn'` — Custom region proposal
function handle

`'BoxRegressionLayer'` — Box regression layer name
`'auto'` (default) | character vector

Output Arguments

`detector` — Trained R-CNN-based object detector
`rcnnObjectDetector` object

`info` — Training information
structure

Limitations

Tips

References

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

See Also

Apps

Functions

Objects

Topics

Computer Vision Toolbox Documentation

Support

Documentation

trainRCNNObjectDetector

Syntax

Description

Examples

Train R-CNN Stop Sign Detector

Resume Training an R-CNN Object Detector

Create a network for multiclass R-CNN object detection

Use A Saved Network In R-CNN Object Detector

Input Arguments

trainingData — Labeled ground truth images table

network — Network SeriesNetwork object | array of Layer objects | LayerGraph object | network name

options — Training options traingingOptions output

Name-Value Pair Arguments

'PositiveOverlapRange' — Positive training sample ratios for range of bounding box overlap [0.5 1] (default) | two-element vector

'NegativeOverlapRange' — Negative training sample ratios for range of bounding box overlap [0.1 0.5] (default) | two-element vector

'NumStrongestRegions' — Maximum number of strongest region proposals 2000 (default) | integer

'RegionProposalFcn' — Custom region proposal function handle

'BoxRegressionLayer' — Box regression layer name 'auto' (default) | character vector

Output Arguments

detector — Trained R-CNN-based object detector rcnnObjectDetector object

info — Training information structure

Limitations

Tips

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

See Also

Apps

Functions

Objects

Topics

Computer Vision Toolbox Documentation

Support

`trainingData` — Labeled ground truth images
table

`network` — Network
`SeriesNetwork` object | array of `Layer` objects | `LayerGraph` object | network name

`options` — Training options
`traingingOptions` output

`'PositiveOverlapRange'` — Positive training sample ratios for range of bounding box overlap
`[0.5 1]` (default) | two-element vector

`'NegativeOverlapRange'` — Negative training sample ratios for range of bounding box overlap
`[0.1 0.5]` (default) | two-element vector

`'NumStrongestRegions'` — Maximum number of strongest region proposals
`2000` (default) | integer

`'RegionProposalFcn'` — Custom region proposal
function handle

`'BoxRegressionLayer'` — Box regression layer name
`'auto'` (default) | character vector

`detector` — Trained R-CNN-based object detector
`rcnnObjectDetector` object

`info` — Training information
structure

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.