Data preprocessing is the second stage of the workflow for predictive maintenance algorithm development:
Data preprocessing is often necessary to clean the data and convert it into a form from which you can extract condition indicators. Data preprocessing can include:
Outlier and missing-value removal, offset removal, and detrending.
Noise reduction, such as filtering or smoothing.
Transformations between time and frequency domain.
More advanced signal processing such as short-time Fourier transforms and transformations to the order domain.
You can perform data preprocessing on arrays or tables of measured or simulated data that you manage with Predictive Maintenance Toolbox™ ensemble datastores, as described in Data Ensembles for Condition Monitoring and Predictive Maintenance. Generally, you preprocess your data before analyzing it to identify a promising condition indicator, a quantity that changes in a predictable way as system performance degrades. (See Condition Indicators for Monitoring, Fault Detection, and Prediction.) There can be some overlap between the steps of preprocessing and identifying condition indicators. Typically, though, preprocessing results in a cleaned or transformed signal, on which you perform further analysis to condense the signal information into a condition indicator.
Understanding your machine and the kind of data you have can help determine what preprocessing methods to use. For example, if you are filtering noisy vibration data, knowing what frequency range is most likely to display useful features can help you choose preprocessing techniques. Similarly, it might be useful to transform gearbox vibration data to the order domain, which is used for rotating machines when the rotational speed changes over time. However, that same preprocessing would not be useful for vibration data from a car chassis, which is a rigid body.
MATLAB® includes many functions that are useful for basic preprocessing of data in arrays or tables. These include functions for:
Data cleaning, such as fillmissing
and filloutliers
. Data cleaning
uses various techniques for finding, removing, and replacing bad or missing
data.
Smoothing data, such as smoothdata
and movmean
. Use smoothing to
eliminate unwanted noise or high variance in data.
Detrending data, such as detrend
. Removing a trend
from the data lets you focus your analysis on the fluctuations in the data
about the trend. While trends can be meaningful, others are due to
systematic effects, and some types of analyses yield better insight once you
remove them. Removing offsets is another, similar type of
preprocessing.
Scaling or normalizing data, such as rescale
. Scaling changes the bounds of the data, and can be
useful, for example, when you are working with data in different
units.
Another common type of preprocessing is to extract a useful portion of the signal and discard other portions. For instance, you might discard the first five seconds of a signal that is part of some start-up transient, and retain only the data from steady-state operation. For an example that performs this kind of preprocessing, see Using Simulink to Generate Fault Data.
For more information on basic preprocessing commands in MATLAB, see Preprocessing Data.
Filtering is another way to remove noise or unwanted components from a signal.
Filtering is helpful when you know what frequency range in the data is most likely
to display useful features for condition monitoring or prediction. The basic
MATLAB function filter
lets you filter a signal
with a transfer function. You can use designfilt
to generate filters for use with
filter
, such as passband, high-pass and low-pass filters,
and other common filter forms. For more information about using these functions, see
Digital and Analog Filters.
If you have a Wavelet Toolbox™ license, you can use wavelet tools for more complex filter approaches.
For instance, you can divide your data into subbands, process the data in each
subband separately, and recombine them to construct a modified version of the
original signal. For more information about such filters, see
Filter Banks (Wavelet Toolbox). You can also use the
Signal Processing Toolbox™ function emd
to decompose separate a mixed signal into components with different time-frequency
behavior.
Predictive Maintenance Toolbox and Signal Processing Toolbox provides functions that let you study and characterize vibrations in mechanical systems in the time domain. Use these functions for preprocessing or extraction of condition indicators. For example:
tsa
— Remove noise coherently with time-synchronous
averaging and analyze wear using envelope spectra. The example
Using Simulink to Generate Fault Data uses time-synchronous averaging to preprocess vibration data.
tsadifference
— Remove the regular signal, the first-order
sidebands and other specific sidebands with their harmonics from a
time-synchronous averaged (TSA) signal.
tsaregular
— Isolate the known signal from a TSA signal by
removing the residual signal and specific sidebands.
tsaresidual
— Isolate the residual signal from a TSA signal
by removing the known signal components and their harmonics.
ordertrack
— Use order
analysis to analyze and visualize spectral content occurring in rotating
machinery. Track and extract orders and their time-domain waveforms.
rpmtrack
— Track and extract the RPM profile from a
vibration signal by computing the RPM as a function of time.
envspectrum
— Compute an envelope spectrum. The envelope
spectrum removes the high-frequency sinusoidal components from the signal
and focuses on the lower-frequency modulations. The example
Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.
For more information on these and related functions, see Vibration Analysis.
For vibrating or rotating systems, fault development can be indicated by changes in frequency-domain behavior such as the changing of resonant frequencies or the presence of new vibrational components. Signal Processing Toolbox provides many functions for analyzing such spectral behavior. Often these are useful as preprocessing before performing further analysis for extracting condition indicators. Such functions include:
pspectrum
— Compute the power spectrum, time-frequency power
spectrum, or power spectrogram of a signal. The spectrogram contains
information about how the power distribution changes with time. The example
Multi-Class Fault Detection Using Simulated Data performs data preprocessing using pspectrum
.
envspectrum
— Compute an envelope spectrum. A fault that
causes a repeating impulse or pattern will impose amplitude modulation on
the vibration signal of the machinery. The envelope spectrum removes the
high-frequency sinusoidal components from the signal and focuses on the
lower-frequency modulations. The example
Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.
orderspectrum
— Compute an
average order-magnitude spectrum.
modalfrf
— Estimate the
frequency-response function of a signal.
For more information on these and related functions, see Vibration Analysis.
Signal Processing Toolbox includes functions for analyzing systems whose frequency-domain behavior changes with time. Such analysis is called time-frequency analysis, and is useful for analyzing and detecting transient or changing signals associated with changes in system performance. These functions include:
spectrogram
— Compute a
spectrogram using a short-time Fourier transform. The spectrogram describes
the time-localized frequency content of a signal and its evolution over
time. The example
Condition Monitoring and Prognostics Using Vibration Signals uses spectrogram
to preprocess signals and help
identify potential condition indicators.
hht
— Compute the Hilbert spectrum of a signal. The Hilbert
spectrum is useful for analyzing signals that comprise a mixture of signals
whose spectral content changes in time. This function computes the spectrum
of each component in the mixed signal, where the components are determined
by empirical mode decomposition.
emd
— Compute the empirical mode decomposition of a signal.
This decomposition describes the mixture of signals analyzed in a Hilbert
spectrum, and can help you separate a mixed signal to extract a component
whose time-frequency behavior changes as system performance degrades. You
can use emd
to generate the inputs for
hht
.
kurtogram
— Compute the time-localized spectral kurtosis,
which characterizes a signal by differentiating stationary Gaussian signal
behavior from nonstationary or non-Gaussian behavior in the frequency
domain. As preprocessing for other tools such as envelope analysis, spectral
kurtosis can supply key inputs such as optimal band. (See pkurtosis
.) The example
Rolling Element Bearing Fault Diagnosis uses spectral kurtosis for preprocessing and extraction of condition
indicators.
For more information on these and related functions, see Time-Frequency Analysis.