evaluateSemanticSegmentation

Evaluate semantic segmentation data set against ground truth

Description

example

ssm = evaluateSemanticSegmentation(dsResults,dsTruth) computes various metrics to evaluate the quality of the semantic segmentation results, dsResults, against the ground truth segmentation, dsTruth.

ssm = evaluateSemanticSegmentation(dsResults,dsTruth,Name,Value) computes semantic segmentation metrics using one or more Name,Value pair arguments to control the evaluation.

Examples

collapse all

Evaluate the results of semantic segmentation by computing a confusion matrix and metrics for each class, each image, and the entire data set.

Perform Semantic Segmentation

Label each pixel in a series of images either as an object or as the background. This example uses the triangleImages data set, which has 100 test images of triangles with ground truth labels.

Define the location of the data set, test images, and ground truth labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
testImagesDir = fullfile(dataSetDir,'testImages');
testLabelsDir = fullfile(dataSetDir,'testLabels');

Create an image datastore holding the test images.

imds = imageDatastore(testImagesDir);

Define the class names and their associated label IDs.

classNames = ["triangle","background"];
labelIDs = [255 0];

Create a pixel label datastore holding the ground truth pixel labels for the test images.

pxdsTruth = pixelLabelDatastore(testLabelsDir,classNames,labelIDs);

Load a semantic segmentation network that has been trained on the training images of noisy shapes.

net = load('triangleSegmentationNetwork');
net = net.net;

Run the network on the test images. Predicted labels are written to disk in a temporary folder and returned as a pixelLabelDatastore.

pxdsResults = semanticseg(imds,net,"WriteLocation",tempdir);
Running semantic segmentation network
-------------------------------------
* Processed 100 images.

Compute Confusion Matrix and Segmentation Metrics

Evaluate the prediction results against the ground truth. By default, evaluateSemanticSegmentation computes all available metrics, including the confusion matrix, normalized confusion matrix, data set metrics, class metrics, and image metrics.

metrics = evaluateSemanticSegmentation(pxdsResults,pxdsTruth)
Evaluating semantic segmentation results
----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processed 100 images.
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

       0.90624          0.95085       0.61588      0.87529        0.40652  
metrics = 
  semanticSegmentationMetrics with properties:

              ConfusionMatrix: [2x2 table]
    NormalizedConfusionMatrix: [2x2 table]
               DataSetMetrics: [1x5 table]
                 ClassMetrics: [2x3 table]
                 ImageMetrics: [100x5 table]

To explore the results, display the classification accuracy, the intersection over union, and the boundary F-1 score for each class. These values are stored in the ClassMetrics property. Also, display the normalized confusion matrix.

metrics.ClassMetrics
ans=2×3 table
                  Accuracy      IoU      MeanBFScore
                  ________    _______    ___________

    triangle            1     0.33005     0.028664  
    background     0.9017      0.9017      0.78438  

metrics.NormalizedConfusionMatrix
ans=2×2 table
                  triangle    background
                  ________    __________

    triangle            1            0  
    background     0.0983       0.9017  

Input Arguments

collapse all

Predicted pixel labels resulting from semantic segmentation, specified as a datastore or a cell array of datastore objects. dsResults can be any datastore that returns categorical images, such as PixelLabelDatastore or pixelLabelImageDatastore. The read(dsResults) must return a categorical array, a cell array, or a table. If the read function returns a multicolumn cell array or table, the second column must contain categorical arrays.

Ground truth pixel labels, specified as a datastore or a cell array of datastore objects. dsResults can be any datastore that returns categorical images, such as PixelLabelDatastore or pixelLabelImageDatastore. Using read(dsTruth) must return a categorical array, a cell array, or a table. If the read function returns a multicolumn cell array or table, the second column must contain categorical arrays.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: metrics = evaluateSemanticSegmentation(pxdsResults,pxdsTruth,'Metrics',"bfscore") computes only the mean BF score of each class, each image, and the entire data set.

Segmentation metrics in semanticSegmentationMetrics to compute, specified as the comma-separated pair consisting of 'Metrics' and a vector of strings. This argument specifies which variables in the DataSetMetrics, ClassMetrics, and ImageMetrics tables to compute. ConfusionMatrix and NormalizedConfusionMatrix are computed regardless of the value of 'Metric'.

ValueDescriptionAggregate Data Set MetricImage MetricClass Metric
"all"Evaluate all semantic segmentation metrics.All aggregate data set metricsAll image metricsAll class metrics
"accuracy"

Accuracy indicates the percentage of correctly identified pixels for each class. Use the accuracy metric if you want to know how well each class correctly identifies pixels.

  • For each class, Accuracy is the ratio of correctly classified pixels to the total number of pixels in that class, according to the ground truth. In other words,

    Accuracy score = TP / (TP + FN)

    TP is the number of true positives and FN is the number of false negatives.

  • For the aggregate data set, MeanAccuracy is the average Accuracy of all classes in all images.

  • For each image, MeanAccuracy is the average Accuracy of all classes in that particular image.

The class accuracy is a simple metric analogous to global accuracy, but it can be misleading. For example, labeling all pixels "car" gives a perfect score for the "car" class (although not for the other classes). Use class accuracy in conjuction with IoU for a more complete evaluation of segmentation results.

MeanAccuracyMeanAccuracyAccuracy
"bfscore"

The boundary F1 (BF) contour matching score indicates how well the predicted boundary of each class aligns with the true boundary. Use the BF score if you want a metric that tends to correlate better with human qualitative assessment than the IoU metric.

  • For each class, MeanBFScore is the average BF score of that class over all images.

  • For each image, MeanBFScore is the average BF score of all classes in that particular image.

  • For the aggregate data set, MeanBFScore is the average BF score of all classes in all images.

For more information, see bfscore.

MeanBFScoreMeanBFScoreMeanBFScore
"global-accuracy"

GlobalAccuracy is the ratio of correctly classified pixels, regardless of class, to the total number of pixels. Use the global accuracy metric if you want a quick and computationally inexpensive estimate of the percentage of correctly classified pixels.

GlobalAccuracyGlobalAccuracynone
"iou"

Intersection over union (IoU), also known as the Jaccard similarity coefficient, is the most commonly used metric. Use the IoU metric if you want a statistical accuracy measurement that penalizes false positives.

  • For each class, IoU is the ratio of correctly classified pixels to the total number of ground truth and predicted pixels in that class. In other words,

    IoU score = TP / (TP + FP + FN)

    The image describes the true positives (TP), false positives (FP), and false negatives (FN).

  • For each image, MeanIoU is the average IoU score of all classes in that particular image.

  • For the aggregate data set, MeanIoU is the average IoU score of all classes in all images.

For more information, see jaccard.

MeanIoUMeanIoUIoU
"weighted-iou"Average IoU of each class, weighted by the number of pixels in that class. Use this metric if images have disproportionally sized classes, to reduce the impact of errors in the small classes on the aggregate quality score.WeightedIoUWeightedIoUnone

Example: metrics = evaluateSemanticSegmentation(pxdsResults, pxdsTruth,'Metrics',["global-accuracy","iou"]) calculates the global accuracy and IoU metrics across the data set, images, and classes.

Data Types: string

Flag to display evaluation progress information in the command window, specified as the comma-separated pair consisting of 'Verbose' and either 1 (true) or 0 (false).

The displayed information includes a progress bar, elapsed time, estimated time remaining, and data set metrics.

Example: metrics = evaluateSemanticSegmentation(pxdsResults, pxdsTruth,'Verbose',0) calculates segmentation metrics without displaying progress information.

Data Types: logical

Output Arguments

collapse all

Semantic segmentation metrics, returned as a semanticSegmentationMetrics object.

References

[1] Csurka, G., D. Larlus, and F. Perronnin. "What is a good evaluation measure for semantic segmentation?" Proceedings of the British Machine Vision Conference, 2013, pp. 32.1–32.11.

Extended Capabilities

Introduced in R2017b