Evaluate average orientation similarity metric for object detection
computes the average orientation similarity (AOS) metric. The metric can be used to measure
the detection results metrics
= evaluateDetectionAOS(detectionResults
,groundTruthData
)detectionResults
against ground truth data
groundTruthData
. The AOS is a metric for measuring detector
performance on rotated rectangle detections.
additionally specifies the overlap threshold for assigning a detection to a ground truth
bounding box.metrics
= evaluateDetectionAOS(detectionResults
,groundTruthData
,threshold
)
Define ground truth bounding boxes for a vehicle class. Each row defines a rotated bounding box of the form [xcenter, ycenter, width, height, yaw].
gtbbox = [
2 2 10 20 45
80 80 30 40 15
];
gtlabel = "vehicle";
Create a table to hold the ground truth data.
groundTruthData = table({gtbbox},'VariableNames',gtlabel)
groundTruthData=table
vehicle
____________
{2x5 double}
Define detection results for rotated bounding boxes, scores, and labels.
bbox = [ 4 4 10 20 20 50 50 30 10 30 90 90 40 50 10 ]; scores = [0.9 0.7 0.8]'; labels = [ "vehicle" "vehicle" "vehicle" ]; labels = categorical(labels,"vehicle");
Create a table to hold the detection results.
detectionResults = table({bbox},{scores},{labels},'VariableNames',{'Boxes','Scores','Labels'})
detectionResults=1×3 table
Boxes Scores Labels
____________ ____________ _________________
{3x5 double} {3x1 double} {3x1 categorical}
Evaluate the detection results against ground truth by calculating the AOS metric.
metrics = evaluateDetectionAOS(detectionResults,groundTruthData)
metrics=1×5 table
AOS AP OrientationSimilarity Precision Recall
______ _______ _____________________ ____________ ____________
vehicle 0.5199 0.54545 {4x1 double} {4x1 double} {4x1 double}
detectionResults
— Detection resultsDetection results, specified as a three-column table. The columns contain bounding boxes, scores, and labels. The bounding boxes can be axis-aligned rectangles or rotated rectangles.
Bounding Box | Format | Description |
---|---|---|
Axis-aligned rectangle | [xmin, ymin, width, height] | This type of bounding box is defined in pixel coordinates as an M-by-4 matrix representing M bounding boxes |
Rotated rectangle | [xcenter, ycenter, width, height, yaw] | This type of bounding box is defined in spatial coordinates as an M-by-5 matrix representing M bounding boxes. The xcenter and ycenter coordinates represent the center of the bounding box. The width and height elements represent the length of the box along the x and y axes, respectively. The yaw represents the rotation angle in degrees. The amount of rotation about the center of the bounding box is measured in the clockwise direction. |
groundTruthData
— Labeled ground truth imagesLabeled ground truth images, specified as a datastore or a table.
If you use a datastore, your data must be set up so that
calling the datastore with the read
and readall
functions returns a cell array or table with two or three columns. When
the output contains two columns, the first column must contain bounding boxes, and the second
column must contain labels, {boxes,labels}. When the
output contains three columns, the second column must contain the bounding boxes, and the third
column must contain the labels. In this case, the first column can contain any type of data. For
example, the first column can contain images or point cloud data.
data | boxes | labels |
---|---|---|
The first column can contain data, such as point cloud data or images. | The second column must be a cell array that contains M-by-5 matrices of bounding boxes of the form [xcenter, ycenter, width, height, yaw]. The vectors represent the location and size of bounding boxes for the objects in each image. | The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories. |
For more information, see Datastores for Deep Learning (Deep Learning Toolbox).
If you use a table, the table must have two or more columns.
data | boxes | ... |
---|---|---|
The first column can contain data, such as point cloud data or images. | Each of the remaining columns must be a cell vector that contains M-by-5 matrices representing rotated rectangle bounding boxes. Each rotated rectangle must be of the form[xcenter, ycenter, width, height, yaw]. The vectors represent the location and size of bounding boxes for the objects in each image. |
threshold
— Overlap thresholdOverlap threshold, specified as a nonnegative scalar. The overlap ratio is defined as the intersection over union.
metrics
— AOS metricsAOS metrics, returned as a five-column table. Each row in the table contains the
evaluation metrics for a class which is defined in the ground truth data contained in
the groundTruthData
input. To get the object class
names:
metrics.Properties.RowNames
metrics
table.
Column | Description |
---|---|
AOS | Average orientation similarity value |
AP | Average precision over all the detection results, returned as a numeric scalar. Precision is a ratio of true positive instances to all positive instances of objects in the detector, based on the ground truth. |
OrientationSimilarity | Orientation similarity values for each detection, returned as an
M-element numeric column. M is one
more than the number of detections assigned to a class. The first value of
Orientation similarity is a normalized variant of the cosine similarity that measures the similarity between the predicted rotation angle and the ground truth rotation angle. |
Precision | Precision values from each detection, returned as an
M-element numeric column vector. M
is one more than the number of detections assigned to a class. For example,
if your detection results contain 4 detections with class label 'car', then
Precision is a ratio of true positive instances to all positive instances of objects in the detector, based on the ground truth. |
Recall | Recall values from each detection, returned as an
M-element numeric column vector. M
is one more than the number of detections assigned to a class. For example,
if your detection results contain 4 detections with class label 'car', then
Recall is a ratio of true positive instances to the sum of true positives and false negatives in the detector, based on the ground truth. |
[1] Geiger, A., P. Lenz., and R. Urtasun. "Are we ready for autonomous driving? The KITTI vision benchmark suite." IEEE Conference on Computer Visin and Pattern Recognition. IEEE, 2012.
You have a modified version of this example. Do you want to open this example with your edits?