This example shows how to use the Diagnostic Feature Designer app to analyze and select features to diagnose faults in a triplex reciprocating pump.
The example uses simulated pump fault data generated by the Multi-Class Fault Detection Using Simulated Data example. The data has been preprocessed to remove the pump startup transients.
Load the triplex pump fault data. The pump data contains 240 flow and pressure measurements for different fault conditions. There are three fault types (leaking pump cylinder, blocked pump inlet, increased pump bearing friction). The measurements cover conditions where none, one, or multiple faults are present. The data is collected in a table where each row is a different measurement.
load('savedPumpData')
pumpData
pumpData=240×3 table
flow pressure faultCode
__________________ __________________ _________
{1201x1 timetable} {1201x1 timetable} 0
{1201x1 timetable} {1201x1 timetable} 0
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 0
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
{1201x1 timetable} {1201x1 timetable} 100
⋮
Open Diagnostic Feature Designer by using the diagnosticFeatureDesigner
command. Import the pump data into the app. The data is organized as a multi-member ensemble, so use that option for import.
Once we specify the variable we want to import as pumpData
, we can then review the various signals we are importing. Ensure that the faultCode
variable is a condition variable. Condition variables denote the presence or absence of a fault and are used by the app for grouping and classification
Plot the flow signal by selecting flow
from the Signals & Spectra section of the data browser and clicking Signal Trace in the plot gallery. Plot the pressure
signal the same way.
These plots show the pressure and flow signals for all 240 members in the dataset. You can click the Signal Trace tab and select group by fault code to display signals with the same fault code in the same color. Grouping signals in this way can help you to quickly determine if there are any clear differences between signals of different fault types. In this case, the measured signals do not show any clear differences for different fault codes.
As the measured signals do not show any differences for different fault conditions, the next step is to extract time-domain features such as signal mean and standard deviation from the signal. To open the dialog box shown here, select Time-Domain Features and then Signal Features. Select the features you would like to extract and click OK. For now, clear the Plot results check box. We will plot results later to see if the features help distinguish different fault conditions. Repeat this process for the pressure signal by changing the signal selected at the top of the dialog box.
A reciprocating pump uses a drive shaft and cylinders to pump fluid. Because of the mechanical construction of the pump, we expect there to be cyclic fluctuations in the pump flow and pressure. For example, zoom into a section of the flow signals using the signal panner below the signal trace plot.
Computing the frequency spectrum of the flow will highlight the cyclic nature of the flow signal and could give better insight into how the flow signal changes under different fault conditions. Estimate the frequency spectra using the autoregressive method. This method fits an autoregressive model of the prescribed order to the data, and then computes the spectrum of that estimated model. This approach reduces any overfitting to the raw data signal. In this case specify a model order of 20
.
Plotting the computed spectra on a linear scale clearly shows resonant peaks. Grouping by fault code highlights how the spectra change for different fault conditions.
Perform the same computations for the pressure signal as the results will provide additional features to help distinguish different fault conditions.
We can now compute spectral features such as peaks, modal coefficients, and band power. We extract these features in a smaller band of frequencies between 25-250 Hz as the peaks after 250 Hz are smaller. Note that we are extracting 5 spectral peaks for each signal. For now, clear the Plot results check box. We will plot results later to see if the features help distinguish different fault conditions. Repeat this process for the pressure signal by changing the signal selected at the top of the dialog box.
All the features we have extracted have been collected in a table shown in the Feature Tables browser. To view the computed feature data, select FeatureTable1
from the data browser and click Feature Table View in the plot gallery. The fault code is also displayed in the feature table view as the rightmost column in the table. As more features are computed, more columns get appended to the table.
You can see the distributions of the feature values for different condition variable values, i.e. fault types, by viewing the feature table as a histogram. Click Histogram in the plot gallery to create a histogram plot. Use the next and previous buttons to show histograms for different features. Histogram plots grouped by fault code can help to determine if certain features are strong differentiators between fault types. If they are strong differentiators, their distributions will be more distant from each other. For the triplex pump data, the feature distributions tend to overlap and there are no features that can clearly be used to identify faults. The next section looks at using automated ranking to find which features are more useful for fault prediction.
From the Feature Designer tab, click Rank Features and select FeatureTable1
. The app gathers all the feature data and ranks the features based on a metric such as ANOVA. The features are then listed in terms of importance based on the metric value. In this case, we can see that the RMS value for the flow signal and the RMS and mean values for the pressure signal are the features that most strongly distinguish different fault types from each other.
After we have ranked our features in terms of importance, the next step is to export them so that we can train a classification model based on these features. Click Export, select Export to Classification Learner, and select the features you want to use for classification. In this case, we will export all the features that have a One-way ANOVA
metric > 1, i.e., all the features up to and including pressure_ps_spec/Data_Zeta1
. The features are then sent to Classification Learner and can be used to design a classifier to identify different faults.
In Classification Learner, select 5-fold cross validation
and start the session.
From Classification Learner, train all available models. The RUSBoosted trees
method has the highest classification accuracy of 81%. A next step could be to iterate on the features — especially the spectral features — and perhaps to modify spectral computation method, change the bandwidth, or use different frequency peaks to improve the classification accuracy.
This example showed how to use Diagnostic Feature Designer to analyze and select features and create a classifier to diagnose faults in a triplex reciprocating pump.
Classification Learner | Diagnostic Feature Designer