Clean Outlier Data

Find, fill, or remove outliers in the Live Editor

Description

The Clean Outlier Data task lets you interactively handle outliers in data. The task automatically generates MATLAB® code for your live script.

Using this task, you can:

  • Find, fill, or remove outliers from data in a workspace variable.

  • Customize the methods for finding and filling outliers.

  • Automatically visualize the outlier data and cleaned data.

Clean Outlier Data task in Live Editor

Open the Task

To add the Clean Outlier Data task to a live script in the MATLAB Editor:

  • On the Live Editor tab, select Task > Clean Outlier Data.

  • In a code block in the script, type a relevant keyword, such as outlier or clean. Select Clean Outlier Data from the suggested command completions.

Parameters

Specify the method for filling outliers using one of the following options.

Fill MethodDescription
Linear interpolationLinear interpolation of neighboring, nonoutlier values.
Constant valueSpecified scalar value, which is 0 by default.
Center valueCenter value determined by the find method.
Clip to threshold valueFills lower threshold value for elements smaller than the lower threshold determined by the find method. Fills with the upper threshold value for elements larger than the upper threshold determined by the find method.
Previous valuePrevious nonoutlier value.
Next valueNext nonoutlier value.
Nearest valueNearest nonoutlier value.
Spline interpolationPiecewise cubic spline interpolation.
Shape-preserving cubic interpolation (PCHIP)Shape-preserving piecewise cubic spline interpolation.
Modified Akima cubic interpolationModified Akima cubic Hermite interpolation.

Specify the detection method for finding outliers using one of the following options.

MethodDescription
MedianOutliers are defined as elements more than the specified threshold of scaled median absolute deviations (MAD) from the median, which is 3 by default. For input data A, the scaled MAD is defined as c*median(abs(A-median(A))), where c=-1/(sqrt(2)*erfcinv(3/2)).
MeanOutliers are defined as elements more than the specified threshold of standard deviations from the mean, which is 3 by default. This method is faster but less robust than Median.
QuartilesOutliers are defined as elements more than the specified threshold of interquartile ranges above the upper quartile (75 percent) or below the lower quartile (25 percent), which is 1.5 by default. This method is useful when the input data is not normally distributed.
GrubbsOutliers are detected using Grubbs’s test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the input data is normally distributed.
Generalized extreme studentized deviate (GESD)Outliers are detected using the generalized extreme studentized deviate test for outliers. This iterative method is similar to Grubbs, but can perform better when multiple outliers are masking each other.
Moving medianOutliers are defined as elements more than the specified threshold of local scaled MAD from the local median over a specified window. The default threshold is 3.
Moving meanOutliers are defined as elements more than the specified threshold of local standard deviations from the local mean over a specified window. The default threshold is 3.
PercentilesOutliers are defined as elements outside of the percentile range specified by an upper and lower threshold. The default lower percentile threshold is 10 and the default upper percentile threshold is 90. Valid threshold values are in the interval [0,100].

Specify the window type and size when the method for detecting outliers is Moving median or Moving mean.

WindowDescription
CenteredSpecified window length centered about the current point.
AsymmetricSpecified window containing the number of elements before the current point and the number of elements after the current point.

Window sizes are relative to the X-axis variable units.

Introduced in R2019b