Clustering of numerical data forms the basis of many classification and system modeling algorithms. The purpose of clustering is to identify natural groupings of data from a large data set to produce a concise representation of a system's behavior.
Fuzzy Logic Toolbox™ tools allow you to find clusters in input-output training data. You
can use the cluster information to generate a Sugeno-type fuzzy inference system
that best models the data behavior using a minimum number of rules. The rules
partition themselves according to the fuzzy qualities associated with each of the
data clusters. to automatically generate this type of FIS, use the genfis
command.
Fuzzy c-means (FCM) is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. This technique was originally introduced by Jim Bezdek in 1981 [1] as an improvement on earlier clustering methods. It provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters.
The command line function fcm
starts with an initial guess for the cluster centers, which are
intended to mark the mean location of each cluster. The initial guess for these
cluster centers is most likely incorrect. Additionally, fcm
assigns every data point a membership grade for each cluster. By iteratively
updating the cluster centers and the membership grades for each data point,
fcm
iteratively moves the cluster centers to the right
location within a data set. This iteration is based on minimizing an objective
function that represents the distance from any given data point to a cluster center
weighted by that data point's membership grade.
The command line function fcm
outputs a list of cluster
centers and several membership grades for each data point. You can use the
information returned by fcm
to help you build a fuzzy inference
system by creating membership functions to represent the fuzzy qualities of each
cluster. To generate a Sugeno-type fuzzy inference system that models the behavior
of input/output data, you can configure the genfis
command to use FCM clustering.
If you do not have a clear idea how many clusters there should be for a given set
of data, subtractive clustering is a fast, one-pass algorithm
for estimating the number of clusters and the cluster centers for a set of data
[2]. The cluster estimates, which are obtained from the subclust
function, can be used to initialize iterative
optimization-based clustering methods (fcm
) and model
identification methods (like anfis
). The subclust
function finds the
clusters using the subtractive clustering method.
To generate a Sugeno-type fuzzy inference system that models the behavior of
input/output data, you can configure the genfis
command to use subtractive clustering.
[1] Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.
[2] Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol. 2, No. 3, Sept. 1994.