trainFastRCNNObjectDetector

Train a Fast R-CNN deep learning object detector

collapse all in page

Syntax

trainedDetector = trainFastRCNNObjectDetector(trainingData,network,options)

[trainedDetector,info] = trainFastRCNNObjectDetector(___)

trainedDetector = trainFastRCNNObjectDetector(trainingData,checkpoint,options)

trainedDetector = trainFastRCNNObjectDetector(trainingData,detector,options)

trainedDetector = trainFastRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)

trainedDetector = trainFastRCNNObjectDetector(___,Name,Value)

Description

Train a Detector

example

trainedDetector = trainFastRCNNObjectDetector(trainingData,network,options) trains a Fast R-CNN (regions with convolution neural networks) object detector using deep learning. You can train a Fast R-CNN detector to detect multiple object classes.

This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA^®-enabled NVIDIA^® GPU with compute capability 3.0 or higher.

[trainedDetector,info] = trainFastRCNNObjectDetector(___) also returns information on the training progress, such as training loss and accuracy, for each iteration.

Resume Training a Detector

trainedDetector = trainFastRCNNObjectDetector(trainingData,checkpoint,options) resumes training from a detector checkpoint.

Fine Tune a Detector

trainedDetector = trainFastRCNNObjectDetector(trainingData,detector,options) continues training a detector with additional training data or performs more training iterations to improve detector accuracy.

Custom Region Proposal

trainedDetector = trainFastRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn) optionally trains a custom region proposal function, proposalFcn, using any of the previous inputs. If you do not specify a proposal function, then the function uses a variation of the Edge Boxes[2] algorithm.

Additional Properties

trainedDetector = trainFastRCNNObjectDetector(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Train Fast R-CNN Stop Sign Detector

This example uses:

Open Live Script

Load training data.

data = load('rcnnStopSigns.mat', 'stopSigns', 'fastRCNNLayers');
stopSigns = data.stopSigns;
fastRCNNLayers = data.fastRCNNLayers;

Add fullpath to image files.

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...
    stopSigns.imageFilename);

Randomly shuffle data for training.

rng(0);
shuffledIdx = randperm(height(stopSigns));
stopSigns = stopSigns(shuffledIdx,:);

Create an imageDatastore using the files from the table.

imds = imageDatastore(stopSigns.imageFilename);

Create a boxLabelDatastore using the label columns from the table.

blds = boxLabelDatastore(stopSigns(:,2:end));

Combine the datastores.

ds = combine(imds, blds);

The stop sign training images have different sizes. Preprocess the data to resize the image and boxes to a predefined size.

ds = transform(ds,@(data)preprocessData(data,[920 968 3]));

Set the network training options.

options = trainingOptions('sgdm', ...
    'MiniBatchSize', 10, ...
    'InitialLearnRate', 1e-3, ...
    'MaxEpochs', 10, ...
    'CheckpointPath', tempdir);

Train the Fast R-CNN detector. Training can take a few minutes to complete.

frcnn = trainFastRCNNObjectDetector(ds, fastRCNNLayers , options, ...
    'NegativeOverlapRange', [0 0.1], ...
    'PositiveOverlapRange', [0.7 1]);

*******************************************************************
Training a Fast R-CNN Object Detector for the following object classes:

* stopSign

--> Extracting region proposals from training datastore...done.

Training on single GPU.
|=======================================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |     Loss     |   Accuracy   |     RMSE     |      Rate       |
|=======================================================================================================|
|       1 |           1 |       00:00:29 |       0.3787 |       93.59% |         0.96 |          0.0010 |
|      10 |          10 |       00:05:14 |       0.3032 |       98.52% |         0.95 |          0.0010 |
|=======================================================================================================|

Detector training complete.
*******************************************************************

Test the Fast R-CNN detector on a test image.

img = imread('stopSignTest.jpg');

Run the detector.

[bbox, score, label] = detect(frcnn, img);

Display detection results.

detectedImg = insertObjectAnnotation(img,'rectangle',bbox,score);
figure
imshow(detectedImg)

Supporting Functions

function data = preprocessData(data,targetSize)
% Resize image and bounding boxes to the targetSize.
scale = targetSize(1:2)./size(data{1},[1 2]);
data{1} = imresize(data{1},targetSize(1:2));
bboxes = round(data{2});
data{2} = bboxresize(bboxes,scale);
end

Input Arguments

collapse all

`trainingData` — Labeled ground truth
datastore | table

Labeled ground truth, specified as a datastore or a table.

Each bounding box must be in the format [x y width height].

If you use a datastore, your data must be set up so that calling the datastore with the read and readall functions returns a cell array or table with two or three columns. When the output contains two columns, the first column must contain bounding boxes, and the second column must contain labels, {boxes,labels}. When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data.

data	boxes	labels
The first column can contain data, such as point cloud data or images.	The second column must be a cell array that contains M-by-5 matrices of bounding boxes of the form [x_center, y_center, width, height, yaw]. The vectors represent the location and size of bounding boxes for the objects in each image.	The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories.

For more information, see Datastores for Deep Learning (Deep Learning Toolbox).

If you use a table, the table must have two or more columns. The first column of the table must contain image file names with paths. The images must be grayscale or truecolor (RGB) and they can be in any format supported by imread. Each of the remaining columns must be a cell vector that contains M-by-4 matrices that represent a single object class, such as vehicle, flower, or stop sign. The columns contain 4-element double arrays of M bounding boxes in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the bounding box in the corresponding image. To create a ground truth table, you can use the Image Labeler app or Video Labeler app. To create a table of training data from the generated ground truth, use the objectDetectorTrainingData function.

`network` — Network
`SeriesNetwork` object | array of `Layer` objects | `LayerGraph` object | network name

Network, specified as a SeriesNetwork (Deep Learning Toolbox), an array of Layer (Deep Learning Toolbox) objects, a layerGraph (Deep Learning Toolbox) object, or by the network name. The network is trained to classify the object classes defined in the trainingData table. The SeriesNetwork (Deep Learning Toolbox), Layer (Deep Learning Toolbox), and layerGraph (Deep Learning Toolbox) objects are available in the Deep Learning Toolbox.

When you specify the network as a SeriesNetwork, an array of Layer objects, or by the network name, the network is automatically transformed into a Fast R-CNN network by adding an ROI max pooling layer, and new classification and regression layers to support object detection. Additionally, the GridSize property of the ROI max pooling layer is set to the output size of the last max pooling layer in the network.
The array of Layer (Deep Learning Toolbox) objects must contain a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. An example of an array of Layer (Deep Learning Toolbox) objects:
```
layers = [imageInputLayer([28 28 3])
        convolution2dLayer([5 5],10)
        reluLayer()
        fullyConnectedLayer(10)
        softmaxLayer()
        classificationLayer()];
```
When you specify the network as SeriesNetwork, Layer array, or network by name, the weights for additional convolution and fully-connected layers that you add to create the network, are initialized to 'narrow-normal'.

The network name must be one of the following valid network names. You must also install the corresponding Add-on.

Network Name	Feature Extraction Layer Name	ROI Pooling Layer OutputSize	Description
`alexnet` (Deep Learning Toolbox)	`'relu5'`	[6 6]	Last max pooling layer is replaced by ROI max pooling layer
`vgg16` (Deep Learning Toolbox)	`'relu5_3'`	[7 7]
`vgg19` (Deep Learning Toolbox)	`'relu5_4'`	[7 7]
`squeezenet` (Deep Learning Toolbox)	`'fire5-concat'`	[14 14]
`resnet18` (Deep Learning Toolbox)	`'res4b_relu'`		ROI pooling layer is inserted after the feature extraction layer.
`resnet50` (Deep Learning Toolbox)	`'activation_40_relu'`
`resnet101` (Deep Learning Toolbox)	`'res4b22_relu'`
`googlenet` (Deep Learning Toolbox)	`'inception_4d-output'`
`mobilenetv2` (Deep Learning Toolbox)	`'block_13_expand_relu'`
`inceptionv3` (Deep Learning Toolbox)	`'mixed7'`	[17 17]
`inceptionresnetv2` (Deep Learning Toolbox)	`'block17_20_ac'`	[17 17]

The LayerGraph object must be a valid Fast R-CNN object detection network. You can also use a LayerGraph object to train a custom Fast R-CNN network.
Tip
If your network is a DAGNetwork, use the layerGraph (Deep Learning Toolbox) function to convert the network to a LayerGraph object. Then, create a custom Fast R-CNN network as described by the Create Fast R-CNN Object Detection Network example.

See Getting Started with R-CNN, Fast R-CNN, and Faster R-CNN to learn more about how to create a Fast R-CNN network.

`options` — Training options
`trainingOptions` output

Training options, returned by the trainingOptions (Deep Learning Toolbox) function from the Deep Learning Toolbox. To specify solver and other options for network training, use trainingOptions.

Note

trainFastRCNNObjectDetector does not support these training options:

The OutputFcn option.
The trainingOptions 'once' and 'every-epoch' Shuffle options are not supported for combined datastore inputs.
The trainingOptions 'parallel' and 'multi-gpu' ExecutionEnvironment options are not supported when you use a combined datastore input.
Datastore inputs are not supported when you set the DispatchInBackground training option to true.

`checkpoint` — Saved detector checkpoint
`fastRCNNObjectDetector` object

Saved detector checkpoint, specified as a fastRCNNObjectDetector object. To save the detector after every epoch, set the 'CheckpointPath' property when using the trainingOptions function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the 'CheckpointPath' property of options is '/tmp', load a checkpoint MAT-file using:

data = load('/tmp/faster_rcnn_checkpoint__105__2016_11_18__14_25_08.mat');

The name of the MAT-file includes the iteration number and timestamp of when the detector checkpoint was saved. The detector is saved in the detector variable of the file. Pass this file back into the trainFastRCNNObjectDetector function:

frcnn = trainFastRCNNObjectDetector(stopSigns,...
                           data.detector,options);

`detector` — Previously trained Fast R-CNN object detector
`fastRCNNObjectDetector` object

Previously trained Fast R-CNN object detector, specified as a fastRCNNObjectDetector object.

`proposalFcn` — Region proposal method
function handle

Region proposal method, specified as a function handle. If you do not specify a region proposal function, the function implements a variant of the EdgeBoxes[2] algorithm. The function must have the form:

[bboxes,scores] = proposalFcn(I)

The input, I, is an image defined in the trainingData table. The function must return rectangular bound boxes, bboxes, in an m-by-4 array. Each row of bboxes contains a four-element vector, [x,y,width,height]. This vector specifies the upper-left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an m-by-1 vector. Higher score values indicate that the bounding box is more likely to contain an object. The scores are used to select the strongest n regions, where n is defined by the value of NumStrongestRegions.

Dependencies

If you do not specify a custom proposal function and you use a table for the input training data, the function uses a variation of the Edge Boxes algorithm. If you use a datastore for input training data for multichannel images, you must specify a custom region proposal function.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'PositiveOverlapRange',[0.75 1]

`'PositiveOverlapRange'` — Bounding box overlap ratios for positive training samples
`[0.5 1]` (default) | two-element vector

Bounding box overlap ratios for positive training samples, specified as the comma-separated pair consisting of 'PositiveOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:

$\frac{a r e a (A \cap B)}{a r e a (A \cup B)}$

A and B are bounding boxes.

`'NegativeOverlapRange'` — Bounding box overlap ratios for negative training samples
`[0.1 0.5]` (default) | two-element vector

Bounding box overlap ratios for negative training samples, specified as the comma-separated pair consisting of NegativeOverlapRange and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:

$\frac{a r e a (A \cap B)}{a r e a (A \cup B)}$

A and B are bounding boxes.

`'NumStrongestRegions'` — Maximum number of strongest region proposals
`2000` (default) | positive integer

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of 'NumStrongestRegions' and a positive integer. Reduce this value to speed up processing time at the cost of training accuracy. To use all region proposals, set this value to Inf.

`'NumRegionsToSample'` — Number of region proposals
128 (default) | integer

Number of region proposals to randomly sample from each training image, specified by an integer. Reduce the number of regions to sample to reduce memory usage and speed-up training. Reducing the value can also decrease training accuracy.

`'SmallestImageDimension'` — Length of smallest image dimension
`[]` (default) | positive integer

Length of smallest image dimension, either width or height, specified as the comma-separated pair consisting of 'SmallestImageDimension' and a positive integer. Training images are resized such that the length of the shortest dimension is equal to the specified integer. By default, training images are not resized. Resizing training images helps reduce computational costs and memory used when training images are large. Typical values range from 400–600 pixels.

Dependencies

The SmallestImageDimension property only supports table input training data. To resize the input data of a datastore input, use the transform function.

`'FreezeBatchNormalization'` — Frozen batch normalization
`true` (default) | `false`

Frozen batch normalization during training, specified as the comma-separated pair consisting of 'FreezeBatchNormalization' and true or false. The value indicates whether the input layers to the network are frozen during training. Set this value to true if you are training with a small mini-batch size. Small batch sizes result in poor estimates of the batch mean and variance that is required for effective batch normalization.

If you do not specify a value for 'FreezeBatchNormalization', the function sets the property to

true if the 'MiniBatchSize' name-value argument for the trainingOptions (Deep Learning Toolbox) function is less than 8.
false if the 'MiniBatchSize' name-value argument for the trainingOptions (Deep Learning Toolbox) function is greater than or equal to 8.

You must specify a value for 'FreezeBatchNormalization' to overide this default behavior.

Output Arguments

collapse all

`trainedDetector` — Trained Fast R-CNN object detector
`fastRCNNObjectDetector` object

Trained Fast R-CNN object detector, returned as a fastRCNNObjectDetector object.

`info` — Training progress information
structure array

Training progress information, returned as a structure array with eight fields. Each field corresponds to a stage of training.

TrainingLoss — Training loss at each iteration is the mean squared error (MSE) calculated as the sum of localization error, confidence loss, and classification loss. For more information about the training loss function, see Training Loss.
TrainingAccuracy — Training set accuracy at each iteration.
TrainingRMSE — Training root mean squared error (RMSE) is the RMSE calculated from the training loss at each iteration.
BaseLearnRate — Learning rate at each iteration.
ValidationLoss — Validation loss at each iteration.
ValidationAccuracy — Validation accuracy at each iteration.
ValidationRMSE — Validation RMSE at each iteration.
FinalValidationLoss — Final validation loss at end of the training.
FinalValidationRMSE — Final validation RMSE at end of the training.

Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are assigned as NaN. The struct contains ValidationLoss, ValidationAccuracy, ValidationRMSE, FinalValidationLoss, and FinalValidationRMSE fields only when options specifies validation data.

Tips

To accelerate data preprocessing for training, trainFastRCNNObjectDetector automatically creates and uses a parallel pool based on your parallel preference settings. For more details about setting these preferences, see parallel preference settings. Using parallel computing preferences requires Parallel Computing Toolbox.
VGG-16, VGG-19, ResNet-101, and Inception-ResNet-v2 are large models. Training with large images can produce "Out of Memory" errors. To mitigate these errors, try one or more of these options:
- Reduce the size of your images by using the 'SmallestImageDimension' argument.
- Decrease the value of the 'NumRegionsToSample' name-value argument value.
This function supports transfer learning. When you input a network by name, such as 'resnet50', then the function automatically transforms the network into a valid Fast R-CNN network model based on the pretrained resnet50 (Deep Learning Toolbox) model. Alternatively, manually specify a custom Fast R-CNN network by using the LayerGraph (Deep Learning Toolbox) extracted from a pretrained DAG network. For more details, see Create Fast R-CNN Object Detection Network.

This table describes how to transform each named network into a Fast R-CNN network. The feature extraction layer name specifies which layer is processed by the ROI pooling layer. The ROI output size specifies the size of the feature maps output by the ROI pooling layer.

Network Name	Feature Extraction Layer Name	ROI Pooling Layer OutputSize	Description
`alexnet` (Deep Learning Toolbox)	`'relu5'`	[6 6]	Last max pooling layer is replaced by ROI max pooling layer
`vgg16` (Deep Learning Toolbox)	`'relu5_3'`	[7 7]
`vgg19` (Deep Learning Toolbox)	`'relu5_4'`	[7 7]
`squeezenet` (Deep Learning Toolbox)	`'fire5-concat'`	[14 14]
`resnet18` (Deep Learning Toolbox)	`'res4b_relu'`		ROI pooling layer is inserted after the feature extraction layer.
`resnet50` (Deep Learning Toolbox)	`'activation_40_relu'`
`resnet101` (Deep Learning Toolbox)	`'res4b22_relu'`
`googlenet` (Deep Learning Toolbox)	`'inception_4d-output'`
`mobilenetv2` (Deep Learning Toolbox)	`'block_13_expand_relu'`
`inceptionv3` (Deep Learning Toolbox)	`'mixed7'`	[17 17]
`inceptionresnetv2` (Deep Learning Toolbox)	`'block17_20_ac'`	[17 17]

To modify and transform a network into a Fast R-CNN network, see Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model.

Use the trainingOptions (Deep Learning Toolbox) function to enable or disable verbose printing.

References

[1] Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE International Conference on Computer Vision. 2015.

[2] Zitnick, C. Lawrence, and Piotr Dollar. "Edge Boxes: Locating Object Proposals From Edges." Computer Vision-ECCV 2014. Springer International Publishing, 2014, pp. 391–405.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set 'UseParallel' to true or enable this by default using the Computer Vision Toolbox™ preferences.

For more information, see Parallel Computing Toolbox Support.

Documentation

trainFastRCNNObjectDetector

Syntax

Description

Train a Detector

Resume Training a Detector

Fine Tune a Detector

Custom Region Proposal

Additional Properties

Examples

Train Fast R-CNN Stop Sign Detector

Input Arguments

trainingData — Labeled ground truth datastore | table

network — Network SeriesNetwork object | array of Layer objects | LayerGraph object | network name

options — Training options trainingOptions output

checkpoint — Saved detector checkpoint fastRCNNObjectDetector object

detector — Previously trained Fast R-CNN object detector fastRCNNObjectDetector object

proposalFcn — Region proposal method function handle

Dependencies

Name-Value Pair Arguments

'PositiveOverlapRange' — Bounding box overlap ratios for positive training samples [0.5 1] (default) | two-element vector

'NegativeOverlapRange' — Bounding box overlap ratios for negative training samples [0.1 0.5] (default) | two-element vector

'NumStrongestRegions' — Maximum number of strongest region proposals 2000 (default) | positive integer

'NumRegionsToSample' — Number of region proposals 128 (default) | integer

'SmallestImageDimension' — Length of smallest image dimension [] (default) | positive integer

Dependencies

'FreezeBatchNormalization' — Frozen batch normalization true (default) | false

Output Arguments

trainedDetector — Trained Fast R-CNN object detector fastRCNNObjectDetector object

info — Training progress information structure array

Tips

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

See Also

Apps

Functions

Objects

Topics

Computer Vision Toolbox Documentation

Support

`trainingData` — Labeled ground truth
datastore | table

`network` — Network
`SeriesNetwork` object | array of `Layer` objects | `LayerGraph` object | network name

`options` — Training options
`trainingOptions` output

`checkpoint` — Saved detector checkpoint
`fastRCNNObjectDetector` object

`detector` — Previously trained Fast R-CNN object detector
`fastRCNNObjectDetector` object

`proposalFcn` — Region proposal method
function handle

`'PositiveOverlapRange'` — Bounding box overlap ratios for positive training samples
`[0.5 1]` (default) | two-element vector

`'NegativeOverlapRange'` — Bounding box overlap ratios for negative training samples
`[0.1 0.5]` (default) | two-element vector

`'NumStrongestRegions'` — Maximum number of strongest region proposals
`2000` (default) | positive integer

`'NumRegionsToSample'` — Number of region proposals
128 (default) | integer

`'SmallestImageDimension'` — Length of smallest image dimension
`[]` (default) | positive integer

`'FreezeBatchNormalization'` — Frozen batch normalization
`true` (default) | `false`

`trainedDetector` — Trained Fast R-CNN object detector
`fastRCNNObjectDetector` object

`info` — Training progress information
structure array

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.