A COMPARISON OF TWO FUZZY CLUSTERING TECHNIQUES

In fuzzy clustering, unlike hard clustering, depending on the membership value, a single object may belong exactly to one cluster or partially to more than one cluster. Out of a number of fuzzy clustering techniques Bezdek’s Fuzzy CMeans and Gustafson-Kessel clustering techniques are well known where Euclidian distance and Mahalanobis distance are used respectively as a measure of similarity. We have applied these two fuzzy clustering techniques on a dataset of individual differences consisting of fifty feature vectors of dimension (feature) three. Based on some validity measures we have tried to see the performances of these two clustering techniques from three different aspectsfirst, by initializin g the membership values of the feature vectors considering the values of the three features separately one at a time, secondly, by changing the number of the predefined clusters and thirdly, by changing the size of the dataset.


Introduction
Clustering is a process by means of which objects with similarities are placed to belong to same group called a cluster.In hard clustering an object is allowed to belong exactly to one cluster only.But with the advent of the concept of fuzzy set theory (FST) developed by Zadeh (1965) which particularly deals the situations pertaining to non-probabilistic uncertainty, the traditional hard clustering technique has unlocked a new way of clustering known as fuzzy clustering, where a single object may belong exactly to one cluster or partially to more than one cluster depending on the membership value of that object.A complete presentation of all aspects of FST is available in the work of Zimmermann (1991).The applications of FST in dealing with ambiguous problems where non-probabilistic uncertainty prevails have been reflected in the works of Dewit (1982) and Ostaszewski (1993).Baruah (2011aBaruah ( , 2011b) ) has shown that the membership value of a fuzzy number can be expressed as the difference between the membership function and a reference function, and therefore the fuzzy membership function and the fuzzy membership value for the complement of a fuzzy set are not the same.Based on this concept, Das (2012) tried to modify the design of Park and Park (2010) and was able to overcome the limitations of their work by visualizing the complement of a fuzzy set in a correct manner.Derring and Ostaszewski (1995) have explained in their research work a method of pattern recognition for risk and claim classification.Bezdek (1981) has discussed in his Fuzzy C-Means (FCM) technique that the data to be analyzed must be in the form of numerical vectors called feature vectors, and the number of clusters must be predefined for obtaining the membership values of the feature vectors.Das (2013) has tried the FCM algorithm of Bezdek with three different distances namely Euclidean distance, Canberra distance and Hamming distance which revealed that out of the three distances, the algorithm produces the result fastest as well as the most expected when Euclidean distance is considered and the slowest as well as the least expected when Canberra distance is considered.Das and Baruah (2013) have shown the application of the FCM algorithm of Bezdek on vehicular pollution.Wang, Huang, Yao, Qian and Jiang (2011) have applied the Gustafson-Kessel (GK) clustering algorithm of Gustafson and Kessel (1979) in the pattern recognition for gas insulated www.japmnt.comswitchgear (GIS).Although a number of fuzzy clustering techniques are available in the literature, FCM and GK clustering techniques are well known for analyzing data which are in the form of numerical vectors.In both FCM and GK clustering techniques the number of clusters must be predefined but Euclidian distance and Mahalanobis distances are used respectively in FCM and GK as a measure of similarity.We have applied these two fuzzy clustering techniques on a dataset of individual differences (see table1) consisting of fifty feature vectors of dimension (feature) three.In our present work, based on three validity measures namely Partition Coefficient (PC), Clustering Entropy (CE) and Partition Index (PI) (see section3) we have tried to see the performances of these two clustering techniques from three different aspects-first, by initializing the membership values of the feature vectors considering the values of the three features separately one at a time, secondly, by changing the number of the predefined clusters and thirdly, by changing the size of the dataset.
In Section 2, we shall provide the steps of FCM and GK algorithms.In Section 3, we explain our present work.The results and analysis of our work have been given in Section 4. Finally we put the conclusions in Section 5.

Mathematical calculations of FCM and GK algorithms
The basic task of a clustering technique is to divide n patterns, where n is a natural number , represented by vectors in a pdimensional Euclidean space, into c, 2≤ c <n , categorically homogeneous subsets which are called clusters.Let the data set be X= {x 1 , x 2 , ……….., x n }, where x k ={ x k1 , x k2 , ……….., x kp }, k= 1,2,3,……..,n.Each x k is called a feature vector and x kj where j=1,2,…..p is the j th feature of the k th feature vector.
A partition of the dataset X into clusters is described by the membership functions of the elements of the cluster.Let S 1 , S 2 ,…….,S c denote the clusters with corresponding membership functions (1) Condition (1) says that each feature vector x k has its total membership value 1 divided among all clusters.Condition (2) states that the sum of membership degrees of feature vectors in a given cluster does not exceed the total number of feature vectors.
In sections 2.1 and 2.2 we provide respectively the steps of FCM and GK algorithms.

Bezdek's FCM algorithm
Step 1: Choose the number of clusters, c, 2≤c<n, where n is the total number of feature vectors.Choose m, 1≤ m <α.Define the vector norm || || (generally defined by the Euclidean distance) i.e.

|| ||
where kj x is the j th feature of the k th feature vector, for k=1,2,……,n; j=1,2,….,pand ij v , j-dimensional centre of the i th cluster for i=1,2,……,c; j=1,2,….,p;n, p and c denote the total number of feature vector , no. of features in each feature vector and total number of clusters respectively.

Choose the initial fuzzy partition
Choose a parameter ∈>0 (this will tell us when to stop the iteration).Set the iteration counting parameter l equal to 0.
Step 2: Calculate the fuzzy cluster centers given by the following formula Step 3: Calculate the new partition matrix (i.e.membership matrix) Step 5: The final fuzzy matrix * l U is structured for operational use by means of the normalized α -cut, for some 0 < α< 1.All membership values less than α are replaced with zero and the function is renormalized (sums to one) to preserve partition condition (1).

The GK algorithm
Let n, p, c, m(>1),∈(>0) and A denote the total number of feature vectors , no. of features in each feature vector ,total number of clusters, the weighting exponent ,the termination tolerance and the norm-inducing matrix respectively for a given data set.
Following are the steps of GK algorithm.

Repeat for
Step 1: Compute the cluster prototypes (centers) : Step 2: Compute the cluster covariance matrices: Step 3: Compute the distances (Known as Mahalanobis distance): Step 4: Update the partition matrix: for Step 5 has been introduced in the same way as in FCM algorithm for operational use.www.japmnt.com

Our present work
In our present work we have applied the FCM and GK algorithms on a dataset of individual differences(see table 1) which consists of fifty (50) feature vectors with three features namely Intelligent Quotient(IQ), Achievement Motivation(AM) and Social adjustment(SA) in each feature vector.We have tried to see the performances of these two clustering techniques on the same dataset using three validity measures PC, CE and PI from three different aspects-the initial membership values of the feature vectors, the no. of the predefined clusters and the size of the dataset.For this purpose first, we have executed both the algorithms thrice on the same dataset by initializing the membership values of the feature vectors considering the values of the features IQ, AM and SA separately one at a time in each round of execution.Secondly, both the algorithms have been executed thrice on the dataset by considering the no. of predefined clusters as four(04), three(03) and two(02) respectively for first , second and third execution.Thirdly, we have reduced the size of the dataset by ten(10) in each round and executed the FCM and GK algorithms for three times.To measure the performances of both the clustering techniques we have used three validity measures.The mathematical formulae of these three validity measures have been given in the following.

Results and analysis
In this section we place the results obtained in our work and an analysis of these results.The graphical representations of the membership values (full or partial) of the feature vectors in the final clusters obtained in different executions have been given in figures 1 to 20.In table2 we have recorded the values of different validity measures of FCM algorithm when the membership values of the feature vectors are initialized with different features at different no. of clusters.The same for GK algorithm have been recorded in table3.The values of different validity measures of FCM and GK algorithms at different sizes of dataset have been given in table4 and table5 respectively.Based on the values recorded in table 1, 2, 3, 4 and 5 we have tried to analyze the performances of each algorithm separately from all three aspects i.e. when the membership values of the feature vectors are initialized considering the values of the features one at a time, when the no. of the predefined clusters is changed and when the size of the dataset is reduced.Next we have tried to compare the performances of both the algorithms for each of these three aspects.In our analysis we have also considered the no. of iterations required by both the algorithms to reach the final clusters in different executions.The graphical representations of all these analysis have been given in figures 21 to 49.
A classification algorithm which exhibits greater value of Partition Coefficient (PC) and smaller values of Classification Entropy (CE) and Partition Index (PI) is considered to have better performance.
The no. of cluster wise performances of FCM when the membership values of the feature vectors are initialized with different features have been shown in figures 21, 22 and 23.It is seen that when the no. of clusters is 4 FCM performs better with feature AM, when the no. of clusters is 3 it performs better with feature IQ and when the no. of clusters is 2 it performs better with feature IQ.Again the feature wise performances of FCM at different no. of clusters have been shown in figures 24, 25 and 26.In these figures we see that with feature IQ FCM has better performance when the no. of clusters is 3, with feature AM it has better performance when the no. of clusters is 4, and with feature SA it has better performance when the no. of clusters is 4. Similar performances of GK algorithm based on these two aspects have been shown in figures 27 to 29 and figures 30 to 32 respectively.These reveal that when the no. of clusters is 4 GK performs better with feature IQ, when the no. of clusters is 3 it performs better with feature SA and when the no. of clusters is 2 it has better performance with feature IQ.Again with feature IQ GK performs better when the no. of clusters is 4, with feature AM it performs better when the no. of clusters is 3 and with feature SA it has better performance when the no. of cluster is 3.
In figures 33 to 41 we provide the comparison of different validity measures of both the algorithms.It is seen that almost in all the cases FCM has better performance except in one where GK performs partially better (see figure 38).
We have provided the graphical representations of different validity measures and the no. of iterations of FCM at different sizes of datasets respectively in figures 48 and 49.It is seen that FCM performs better when the size of the dataset is 30.Also it performs faster when the dataset's size is 40 and 30.On the other hand the performance of GK algorithm deteriorates significantly when the size of the dataset becomes very small as the respective norm inducing matrix becomes singular (see table5).
From iterations point of view GK performs faster when initialized with feature AM and the no. of clusters is 4 (figures 43 and 45).It also performs faster when initialized with feature SA and the no. of clusters is 4 (figures 44 and 45).In rest of the cases FCM performs faster.

+
If ∆ >∈, repeat steps 2, 3 and 4. Otherwise, stop at some iteration count * l .To make the result operational the fifth step had been introduced by Derring and Ostaszewski (1995).
(a) Partition Coefficient (PC): measures the overlapping between clusters.Partition Index (PI): is the ratio of the sum of the compactness and separation of the clusters.

Figure 1 :
Figure 1: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature IQ and no. of clusters is 4.

Figure 2 :
Figure 2: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature AM and no. of clusters is 4.

Figure 3 :
Figure 3: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature SA and no. of clusters is 4.

Figure 4 :
Figure 4: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature IQ and no. of clusters is 3.

Figure 5 :
Figure 5: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature AM and no. of clusters is 3.

Figure 6 :
Figure 6: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature SA and no. of clusters is 3.

Figure 7 :
Figure 7: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature IQ and no. of clusters is 2.

Figure 8 : 2 Figure 9 : 2 Figure 10 :
Figure 8: Graphical representation of the membership values of feature vectors produced by GK algorithm when initialized with feature AM and no. of clusters is 2

Figure 11 : 4 Figure 12 :
Figure 11: Graphical representation of the membership values of feature vectors produced by FCM algorithm when initialized with feature AM and no. of clusters is 4

Figure 14 : 3 Figure 15 : 3 Figure 16 : 2 Figure 17 : 2 Figure 18 : 2 Figure 19 :
Figure 14: Graphical representation of the membership values of feature vectors produced by FCM algorithm when initialized with feature AM and no. of clusters is 3

Figure 20 :Figure 21 :
Figure 20: Graphical representation of the membership values of feature vectors produced by FCM algorithm when initialized with feature IQ, no. of clusters is 4 and no. of feature vectors is 30

Figure 22 :
Figure 22: Comparison of the values of different validity measures of FCM algorithm with different features for initializing the membership values when the no. of clusters is 3.

Figure 23 :
Figure 23: Comparison of the values of different validity measures of FCM algorithm with different features for initializing the membership values when the no. of clusters is 2.

Figure 24 :
Figure 24: Comparison of the values of different validity measures of FCM algorithm at different no. of clusters when the feature for initializing the membership values is IQ.

Figure 25 :
Figure 25: Comparison of the values of different validity measures of FCM algorithm at different no. of clusters when the feature for initializing the membership values is AM.

cluster is 4 No. of cluster is 3 No. of cluster is 2
Figure 26: Comparison of the values of different validity measures of FCM algorithm at different no. of clusters when the feature for initializing the membership values is SA.

Figure 27 :
Figure 27: Comparison of the values of different validity measures of GK algorithm with different features for initializing the membership values when the no. of clusters is 4.

Figure 28 :
Figure 28: Comparison of the values of different validity measures of GK algorithm with different features for initializing the membership values when the no. of clusters is 3.

Figure 29 :
Figure 29: Comparison of the values of different validity measures of GK algorithm with different features for initializing the membership values when the no. of clusters is 2.

Figure 30 :cluster is 4 No. of cluster is 3 No. of cluster is 2 4 Figure 34 : 4 Figure 35 : 4 Figure 36 : 3 Figure 37 :Figure 38 : 3 Figure 39 : 2 Figure 40 : 2 Figure 41 : 2 Figure 42 :
Figure 30: Comparison of the values of different validity measures of GK algorithm at different no. of clusters when the feature for initializing the membership values is IQ

Figure 44 : 4 Figure 46 :Figure 47 : 2 Figure 48 :
Figure 44: Comparisons of the no. of iterations of FCM and GK algorithm at different no. of clusters when the feature for initializing the membership values is SA

Figure 49 :
Figure 49: Comparisons of the no. of iterations of FCM algorithm at different sizes of dataset.

Table 1 :
Data set of individual differences of fifty(50) feature vectors with dimension(feature) three(03).

Table 2 :
The values of different validity measures of FCM algorithm with different features for initializing the membership values at different no. of clusters

Table 3 :
The values of different validity measures of GK algorithm with different features for initializing the membership values at different no. of clusters

Table 4 :
The values of different validity measures of FCM at different sizes of dataset.
N: the size of the datasets, C: the no. of predefined clusters, Itn: the no. of iterations to reach the final clusters.

Table 5 :
The values of different validity measures of GK at different sizes of dataset N: the size of the datasets, C: the no. of predefined clusters, Itn: the no. of iterations to reach the final clusters.www.japmnt.com