This example shows how to model the relationship between the number of automobile trips generated from an area and the demographics of the area using the genfis
function. Demographic and trip data are from 100 traffic analysis zones in New Castle County, Delaware. Five demographic factors are considered: population, number of dwelling units, vehicle ownership, median household income, and total employment. Hence, the model has five input variables and one output variable.
Load and plot the data.
mytripdata subplot(2,1,1) plot(datin) ylabel('input') subplot(2,1,2) plot(datout) ylabel('output')
The mytripdata
command creates several variables in the workspace. Of the original 100 data points, use 75 data points as training data (datin
and datout
) and 25 data points as checking data (as well as for test data to validate the model). The checking data input/output pair variables are chkdatin
and chkdatout
.
Generate a model from the data using subtractive clustering using the genfis
command.
First, create a genfisOptions
option set for subtractive clustering, specifying ClusterInfluenceRange
range property. The ClusterInfluenceRange
property indicates the range of influence of a cluster when you consider the data space as a unit hypercube. Specifying a small cluster radius usually yields many small clusters in the data, and results in many rules. Specifying a large cluster radius usually yields a few large clusters in the data, and results in fewer rules.
opt = genfisOptions('SubtractiveClustering','ClusterInfluenceRange',0.5);
Generate the FIS model using the training data and the specified options.
fismat = genfis(datin,datout,opt);
The genfis
command uses a one-pass method that does not perform any iterative optimization. The model type for the generated FIS object is a first order Sugeno model with three rules.
Verify the model. Here, trnRMSE
is the root mean squared error of the system generated by the training data.
fuzout = evalfis(fismat,datin); trnRMSE = norm(fuzout-datout)/sqrt(length(fuzout))
trnRMSE = 0.5276
Next, apply the test data to the FIS to validate the model. In this example, the validation data is used for both checking and testing the FIS parameters. Here, chkRMSE
is the root mean squared error of the system generated by the validation data.
chkfuzout = evalfis(fismat,chkdatin); chkRMSE = norm(chkfuzout-chkdatout)/sqrt(length(chkfuzout))
chkRMSE = 0.6179
Plot the output of the model, chkfuzout
, against the validation data, chkdatout
.
figure plot(chkdatout) hold on plot(chkfuzout,'o') hold off
The model output and validation data are shown as circles and solid blue line, respectively. The plot shows that the model does not perform well on the validation data.
At this point, you can use the optimization capability of anfis
to improve the model. First, try using a relatively short training period (20 epochs) without using validation data, and then test the resulting FIS model against the testing data.
anfisOpt = anfisOptions('InitialFIS',fismat,'EpochNumber',20,... 'InitialStepSize',0.1); fismat2 = anfis([datin datout],anfisOpt);
ANFIS info: Number of nodes: 44 Number of linear parameters: 18 Number of nonlinear parameters: 30 Total number of parameters: 48 Number of training data pairs: 75 Number of checking data pairs: 0 Number of fuzzy rules: 3 Start training ANFIS ... 1 0.527607 2 0.513727 3 0.492996 4 0.499985 5 0.490585 6 0.492924 Step size decreases to 0.090000 after epoch 7. 7 0.48733 8 0.485037 9 0.480813 Step size increases to 0.099000 after epoch 10. 10 0.475097 11 0.469759 12 0.462516 13 0.451177 Step size increases to 0.108900 after epoch 14. 14 0.447856 15 0.444357 16 0.433904 17 0.433739 Step size increases to 0.119790 after epoch 18. 18 0.420408 19 0.420512 20 0.420275 Designated epoch number reached. ANFIS training completed at epoch 20. Minimal training RMSE = 0.420275
After the training is complete, validate the model.
fuzout2 = evalfis(fismat2,datin); trnRMSE2 = norm(fuzout2-datout)/sqrt(length(fuzout2))
trnRMSE2 = 0.4203
chkfuzout2 = evalfis(fismat2,chkdatin); chkRMSE2 = norm(chkfuzout2-chkdatout)/sqrt(length(chkfuzout2))
chkRMSE2 = 0.5894
The model has improved a lot with respect to the training data, but only a little with respect to the validation data. Plot the improved model output obtained using anfis
against the testing data.
figure plot(chkdatout) hold on plot(chkfuzout2,'o') hold off
The model output and validation data are shown as circles and solid blue line, respectively. This plot shows that subtractive clustering with genfis
can be used as a standalone, fast method for generating a fuzzy model from data, or as a preprocessor to determine the initial rules for anfis
training. An important advantage of using a clustering method to find rules is that the resultant rules are more tailored to the input data than they are in a FIS generated without clustering. This result reduces the problem of an excessive propagation of rules when the input data has a high dimension.
Overfitting can be detected when the checking error starts to increase while the training error continues to decrease.
To check the model for overfitting, use anfis
with validation data to train the model for 200 epochs.
First configure the ANFIS training options by modifying the existing anfisOptions
option set. Specify the epoch number and validation data. Since the number of training epochs is larger, suppress the display of training information to the Command Window.
anfisOpt.EpochNumber = 200; anfisOpt.ValidationData = [chkdatin chkdatout]; anfisOpt.DisplayANFISInformation = 0; anfisOpt.DisplayErrorValues = 0; anfisOpt.DisplayStepSize = 0; anfisOpt.DisplayFinalResults = 0;
Train the FIS.
[fismat3,trnErr,stepSize,fismat4,chkErr] = anfis([datin datout],anfisOpt);
Here,
fismat3
is the FIS object when the training error reaches a minimum.
fismat4
is the snapshot FIS object when the validation data error reaches a minimum.
stepSize
is a history of the training step sizes.
trnErr
is the RMSE using the training data
chkErr
is the RMSE using the validation data for each training epoch.
After the training completes, validate the model.
fuzout4 = evalfis(fismat4,datin); trnRMSE4 = norm(fuzout4-datout)/sqrt(length(fuzout4))
trnRMSE4 = 0.3405
chkfuzout4 = evalfis(fismat4,chkdatin); chkRMSE4 = norm(chkfuzout4-chkdatout)/sqrt(length(chkfuzout4))
chkRMSE4 = 0.5821
The error with the training data is the lowest thus far, and the error with the validation data is also slightly lower than before. This result suggests possible overfitting, which occurs when you fit the fuzzy system to the training data so well that it no longer does a good job of fitting the validation data. The result is a loss of generality.
View the improved model output. Plot the model output against the checking data.
figure plot(chkdatout) hold on plot(chkfuzout4,'o') hold off
The model output and validation data are shown as circles and solid blue line, respectively.
Next, plot the training error, trnErr
.
figure plot(trnErr) title('Training Error') xlabel('Number of Epochs') ylabel('Training Error')
This plot shows that the training error settles at about the 60th epoch point.
Plot the checking error, chkErr
.
figure plot(chkErr) title('Checking Error') xlabel('Number of Epochs') ylabel('Checking Error')
The plot shows that the smallest value of the validation data error occurs at the 52nd epoch. After this point it increases slightly even as anfis
continues to minimize the error against the training data all the way to the 200th epoch. Depending on the specified error tolerance, the plot also indicates the ability of the model to generalize the test data.
You can also compare the output of fismat2
and fistmat4
against the validation data, chkdatout
.
figure plot(chkdatout) hold on plot(chkfuzout4,'ob') plot(chkfuzout2,'+r')