objectDetectorTrainingData

Create training data for an object detector

Description

example

[imds,blds] = objectDetectorTrainingData(gTruth) creates an image datastore and a box label datastore training data from the specified ground truth.

You can combine the image and box label datastores using combine(imds,blds) to create a datastore needed for training. Use the combined datastore with the training functions, such as trainACFObjectDetector, trainYOLOv2ObjectDetector, trainFastRCNNObjectDetector, trainFasterRCNNObjectDetector, and trainRCNNObjectDetector.

This function supports parallel computing using multiple MATLAB® workers. Enable parallel computing using the Computer Vision Toolbox Preferences dialog.

example

trainingDataTable = objectDetectorTrainingData(gTruth) returns a table of training data from the specified ground truth. gTruth is an array of groundTruth objects. You can use the table to train an object detector using the Computer Vision Toolbox™ training functions.

___ = objectDetectorTrainingData(gTruth,Name,Value) returns a table of training data with additional options specified by one or more name-value pair arguments. If you create the groundTruth objects in gTruth using a video file, a custom data source, or an imageDatastore object with different custom read functions, then you can specify any combination of name-value pair arguments. If you create the groundTruth objects from an image collection or image sequence data source, then you can specify only the 'SamplingFactor' name-value pair argument.

Examples

collapse all

Train a vehicle detector based on a YOLO v2 network.

Add the folder containing images to the workspace.

imageDir = fullfile(matlabroot,'toolbox','vision','visiondata','vehicles');
addpath(imageDir);

Load the vehicle ground truth data.

data = load('vehicleTrainingGroundTruth.mat');
gTruth = data.vehicleTrainingGroundTruth;

Load the detector containing the layerGraph object for training.

vehicleDetector = load('yolov2VehicleDetector.mat');
lgraph = vehicleDetector.lgraph
lgraph = 
  LayerGraph with properties:

         Layers: [25×1 nnet.cnn.layer.Layer]
    Connections: [24×2 table]
     InputNames: {'input'}
    OutputNames: {'yolov2OutputLayer'}

Create an image datastore and box label datastore using the ground truth object.

[imds,bxds] = objectDetectorTrainingData(gTruth);

Combine the datastores.

cds = combine(imds,bxds);

Configure training options.

options = trainingOptions('sgdm', ...
       'InitialLearnRate', 0.001, ...
       'Verbose',true, ...
       'MiniBatchSize',16, ...
       'MaxEpochs',30, ...
       'Shuffle','every-epoch', ...
       'VerboseFrequency',10); 

Train the detector.

[detector,info] = trainYOLOv2ObjectDetector(cds,lgraph,options);
*************************************************************************
Training a YOLO v2 Object Detector for the following object classes:

* vehicle

Training on single CPU.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |     RMSE     |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |         7.50 |         56.2 |          0.0010 |
|       1 |          10 |       00:00:02 |         1.73 |          3.0 |          0.0010 |
|       2 |          20 |       00:00:04 |         1.58 |          2.5 |          0.0010 |
|       2 |          30 |       00:00:06 |         1.36 |          1.9 |          0.0010 |
|       3 |          40 |       00:00:08 |         1.13 |          1.3 |          0.0010 |
|       3 |          50 |       00:00:09 |         1.01 |          1.0 |          0.0010 |
|       4 |          60 |       00:00:11 |         0.95 |          0.9 |          0.0010 |
|       4 |          70 |       00:00:13 |         0.84 |          0.7 |          0.0010 |
|       5 |          80 |       00:00:15 |         0.84 |          0.7 |          0.0010 |
|       5 |          90 |       00:00:17 |         0.70 |          0.5 |          0.0010 |
|       6 |         100 |       00:00:19 |         0.65 |          0.4 |          0.0010 |
|       7 |         110 |       00:00:21 |         0.73 |          0.5 |          0.0010 |
|       7 |         120 |       00:00:23 |         0.60 |          0.4 |          0.0010 |
|       8 |         130 |       00:00:24 |         0.63 |          0.4 |          0.0010 |
|       8 |         140 |       00:00:26 |         0.64 |          0.4 |          0.0010 |
|       9 |         150 |       00:00:28 |         0.57 |          0.3 |          0.0010 |
|       9 |         160 |       00:00:30 |         0.54 |          0.3 |          0.0010 |
|      10 |         170 |       00:00:32 |         0.52 |          0.3 |          0.0010 |
|      10 |         180 |       00:00:33 |         0.45 |          0.2 |          0.0010 |
|      11 |         190 |       00:00:35 |         0.55 |          0.3 |          0.0010 |
|      12 |         200 |       00:00:37 |         0.56 |          0.3 |          0.0010 |
|      12 |         210 |       00:00:39 |         0.55 |          0.3 |          0.0010 |
|      13 |         220 |       00:00:41 |         0.52 |          0.3 |          0.0010 |
|      13 |         230 |       00:00:42 |         0.53 |          0.3 |          0.0010 |
|      14 |         240 |       00:00:44 |         0.58 |          0.3 |          0.0010 |
|      14 |         250 |       00:00:46 |         0.47 |          0.2 |          0.0010 |
|      15 |         260 |       00:00:48 |         0.49 |          0.2 |          0.0010 |
|      15 |         270 |       00:00:50 |         0.44 |          0.2 |          0.0010 |
|      16 |         280 |       00:00:52 |         0.45 |          0.2 |          0.0010 |
|      17 |         290 |       00:00:54 |         0.47 |          0.2 |          0.0010 |
|      17 |         300 |       00:00:55 |         0.43 |          0.2 |          0.0010 |
|      18 |         310 |       00:00:57 |         0.44 |          0.2 |          0.0010 |
|      18 |         320 |       00:00:59 |         0.44 |          0.2 |          0.0010 |
|      19 |         330 |       00:01:01 |         0.38 |          0.1 |          0.0010 |
|      19 |         340 |       00:01:03 |         0.41 |          0.2 |          0.0010 |
|      20 |         350 |       00:01:04 |         0.39 |          0.2 |          0.0010 |
|      20 |         360 |       00:01:06 |         0.42 |          0.2 |          0.0010 |
|      21 |         370 |       00:01:08 |         0.42 |          0.2 |          0.0010 |
|      22 |         380 |       00:01:10 |         0.39 |          0.2 |          0.0010 |
|      22 |         390 |       00:01:12 |         0.37 |          0.1 |          0.0010 |
|      23 |         400 |       00:01:13 |         0.37 |          0.1 |          0.0010 |
|      23 |         410 |       00:01:15 |         0.35 |          0.1 |          0.0010 |
|      24 |         420 |       00:01:17 |         0.29 |      8.3e-02 |          0.0010 |
|      24 |         430 |       00:01:19 |         0.36 |          0.1 |          0.0010 |
|      25 |         440 |       00:01:21 |         0.28 |      7.9e-02 |          0.0010 |
|      25 |         450 |       00:01:22 |         0.29 |      8.1e-02 |          0.0010 |
|      26 |         460 |       00:01:24 |         0.28 |      8.0e-02 |          0.0010 |
|      27 |         470 |       00:01:26 |         0.27 |      7.1e-02 |          0.0010 |
|      27 |         480 |       00:01:28 |         0.25 |      6.3e-02 |          0.0010 |
|      28 |         490 |       00:01:30 |         0.24 |      5.9e-02 |          0.0010 |
|      28 |         500 |       00:01:31 |         0.29 |      8.4e-02 |          0.0010 |
|      29 |         510 |       00:01:33 |         0.35 |          0.1 |          0.0010 |
|      29 |         520 |       00:01:35 |         0.31 |      9.3e-02 |          0.0010 |
|      30 |         530 |       00:01:37 |         0.18 |      3.1e-02 |          0.0010 |
|      30 |         540 |       00:01:38 |         0.22 |      4.6e-02 |          0.0010 |
|========================================================================================|
Detector training complete.
*************************************************************************

Read a test image.

I = imread('detectcars.png');

Run the detector.

[bboxes,scores] = detect(detector,I);

Display the results.

if(~isempty(bboxes))
  I = insertObjectAnnotation(I,'rectangle',bboxes,scores);
end
figure
imshow(I)

Use training data to train an ACF-based object detector for stop signs

Add the folder containing images to the MATLAB path.

imageDir = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata', 'stopSignImages');
addpath(imageDir);

Load ground truth data, which contains data for stops signs and cars.

load('stopSignsAndCarsGroundTruth.mat','stopSignsAndCarsGroundTruth')

View the label definitions to see the label types in the ground truth.

stopSignsAndCarsGroundTruth.LabelDefinitions

Select the stop sign data for training.

stopSignGroundTruth = selectLabels(stopSignsAndCarsGroundTruth,'stopSign');

Create the training data for a stop sign object detector.

trainingData = objectDetectorTrainingData(stopSignGroundTruth);
summary(trainingData)
Variables:

    imageFilename: 41×1 cell array of character vectors

    stopSign: 41×1 cell

Train an ACF-based object detector.

acfDetector = trainACFObjectDetector(trainingData,'NegativeSamplesFactor',2);
ACF Object Detector Training
The training will take 4 stages. The model size is 34x31.
Sample positive examples(~100% Completed)
Compute approximation coefficients...Completed.
Compute aggregated channel features...Completed.
--------------------------------------------
Stage 1:
Sample negative examples(~100% Completed)
Compute aggregated channel features...Completed.
Train classifier with 42 positive examples and 84 negative examples...Completed.
The trained classifier has 19 weak learners.
--------------------------------------------
Stage 2:
Sample negative examples(~100% Completed)
Found 84 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 42 positive examples and 84 negative examples...Completed.
The trained classifier has 20 weak learners.
--------------------------------------------
Stage 3:
Sample negative examples(~100% Completed)
Found 84 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 42 positive examples and 84 negative examples...Completed.
The trained classifier has 54 weak learners.
--------------------------------------------
Stage 4:
Sample negative examples(~100% Completed)
Found 84 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 42 positive examples and 84 negative examples...Completed.
The trained classifier has 61 weak learners.
--------------------------------------------
ACF object detector training is completed. Elapsed time is 30.3579 seconds.

Test the ACF-based detector on a sample image.

I = imread('stopSignTest.jpg');
bboxes = detect(acfDetector,I);

Display the detected object.

annotation = acfDetector.ModelName;
I = insertObjectAnnotation(I,'rectangle',bboxes,annotation);

figure 
imshow(I)

Remove the image folder from the path.

rmpath(imageDir); 

Use training data to train an ACF-based object detector for vehicles.

imageDir = fullfile(matlabroot,'toolbox','driving','drivingdata','vehiclesSequence');
addpath(imageDir);

Load the ground truth data.

load vehicleGroundTruth.mat

Create the training data for an object detector for vehicles

trainingData = objectDetectorTrainingData(gTruth,'SamplingFactor',2);

Train the ACF-based object detector.

acfDetector = trainACFObjectDetector(trainingData,'ObjectTrainingSize',[20 20]);
ACF Object Detector Training
The training will take 4 stages. The model size is 20x20.
Sample positive examples(~100% Completed)
Compute approximation coefficients...Completed.
Compute aggregated channel features...Completed.
--------------------------------------------
Stage 1:
Sample negative examples(~100% Completed)
Compute aggregated channel features...Completed.
Train classifier with 71 positive examples and 355 negative examples...Completed.
The trained classifier has 68 weak learners.
--------------------------------------------
Stage 2:
Sample negative examples(~100% Completed)
Found 76 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 71 positive examples and 355 negative examples...Completed.
The trained classifier has 120 weak learners.
--------------------------------------------
Stage 3:
Sample negative examples(~100% Completed)
Found 54 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 71 positive examples and 355 negative examples...Completed.
The trained classifier has 170 weak learners.
--------------------------------------------
Stage 4:
Sample negative examples(~100% Completed)
Found 63 new negative examples for training.
Compute aggregated channel features...Completed.
Train classifier with 71 positive examples and 355 negative examples...Completed.
The trained classifier has 215 weak learners.
--------------------------------------------
ACF object detector training is completed. Elapsed time is 28.4547 seconds.

Test the ACF detector on a test image.

I = imread('highway.png');
[bboxes, scores] = detect(acfDetector,I,'Threshold',1);

Select the detection with the highest classification score.

[~,idx] = max(scores);

Display the detected object.

annotation = acfDetector.ModelName;
I = insertObjectAnnotation(I,'rectangle',bboxes(idx,:),annotation);

figure 
imshow(I)

Remove the image folder from the path.

rmpath(imageDir);

Input Arguments

collapse all

Ground truth data, specified as a scalar or an array of groundTruth objects. You can create ground truth objects from existing ground truth data by using the groundTruth object.

If you use custom data sources in groundTruth with parallel computing enabled, then the reader function is expected to work with a pool of MATLAB workers to read images from the data source in parallel.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'SamplingFactor',5

Factor for subsampling images in the ground truth data source, specified as 'auto', an integer, or a vector of integers. For a sampling factor of N, the returned training data includes every Nth image in the ground truth data source. The function ignores ground truth images with empty label data.

ValueSampling Factor
'auto'The sampling factor N is 5 for data sources with timestamps, and 1 for a collection of images.
IntegerAll ground truth data sources in gTruth are sampled with the same sampling factor N.
Vector of integersThe kth ground truth data source in gTruth is sampled with a sampling factor of N(k).

Folder name to write extracted images to, specified as a string scalar or character vector. The specified folder must exist and have write permissions.

This argument applies only for:

The function ignores this argument when:

  • The input groundTruth object was created from an image sequence data source.

  • The array of input groundTruth objects all contain image datastores using the same custom read function.

  • Any of the input groundTruth objects containing datastores, use the default read functions.

Image file format, specified as a string scalar or character vector. File formats must be supported by imwrite.

This argument applies only for:

The function ignores this argument when:

  • The input groundTruth object was created from an image sequence data source.

  • The array of input groundTruth objects all contain image datastores using the same custom read function.

  • Any of the input groundTruth objects containing datastores, use the default read functions.

Prefix for output image file names, specified as a string scalar or character vector. The image files are named as:

<name_prefix><source_number>_<image_number>.<image_format>

The default value uses the name of the data source that the images were extracted from, strcat(sourceName,'_'), for video and a custom data source, or 'datastore', for an image datastore.

This argument applies only for:

The function ignores this argument when:

  • The input groundTruth object was created from an image sequence data source.

  • The array of input groundTruth objects all contain image datastores using the same custom read function.

  • Any of the input groundTruth objects containing datastores, use the default read functions.

Flag to display training progress at the MATLAB command line, specified as either true or false. This property applies only for groundTruth objects created using a video file or a custom data source.

Output Arguments

collapse all

Image datastore, returned as an imageDatastore object containing images extracted from the gTruth objects. The images in imds contain at least one class of annotated labels. The function ignores images that are not annotated.

Box label datastore, returned as a boxLabelDatastore object. The datastore contains categorical vectors for ROI label names and M-by-4 matrices of M bounding boxes. The locations and sizes of the bounding boxes are represented as double M-by-4 element vectors in the format [x,y,width,height].

Training data table, returned as a table with two or more columns. The first column of the table contains image file names with paths. The images can be grayscale or truecolor (RGB) and in any format supported by imread. Each of the remaining columns correspond to an ROI label and contains the locations of bounding boxes in the image (specified in the first column), for that label. The bounding boxes are specified as M-by-4 matrices of M bounding boxes in the format [x,y,width,height]. [x,y] specifies the upper-left corner location. To create a ground truth table, you can use the Image Labeler app or Video Labeler app.

The output table ignores any sublabel or attribute data present in the input gTruth object.

Introduced in R2017a