APPLICATION OF FUZZY C-MEANS CLUSTERING TECHNIQUE IN VEHICULAR POLLUTION

Presently in most of the urban areas all over the world, due to the exponential increase in traffic, vehicular pollution has become one of the key contributors to air pollution. As uncertainty prevails in the process of designating the level of pollution of a particular region, a fuzzy method can be applied to see the membership values of that region to a number of predefined clusters. Also, due to the existence of different pollutants in vehicular pollution, the data used to represent it are in the form of numerical vectors. In our work, we shall apply the fuzzy c-means technique of Bezdek on a dataset representing vehicular pollution to obtain the membership values of pollution due to vehicular emission of a city to one or more of some predefined clusters. We shall try also to see the benefits of adopting a fuzzy approach over the traditional way of determining the level of pollution of the particular region.


Introduction
Air pollution has been aggravated due to the rapid growth of cities, increasing traffic, industrialization and higher levels of energy consumption.Currently, all over the world, air pollution is widespread in urban areas where vehicles are the major contributors and in some other areas with high concentration of industries and thermal power plants.Vehicular emissions are of particular concern since these are ground level sources and thus have the maximum impact on the general population.Motor vehicles produce harmful air emissions which have various adverse effects not only on human beings but also on the entire environment.Therefore it is very essential to make the common people aware of the present status of pollution along with its key causes so as to prevent it from creating dreadful conditions.Various government and nongovernment organizations have already taken initiative in this regard.From the literature, it can be seen that the whole range of pollution has been divided into some categories which can also be considered as clusters, for example from 'Good' to 'Hazardous' and the level of pollution of a particular region or city is bound to belong to exactly one of the predefined clusters.In the present work, we shall try to see the level of pollution of a particular city due to vehicular emissions.As vehicular emissions contribute different pollutants to the environment, due to the existence of different pollutants it will not be appropriate to see the belongingness of the level of pollution of a particular region to exactly one of the predefined clusters.Rather, it will be appropriate to see the membership values, full or partial, of the level of pollution of a particular region to one or more than one of the predefined clusters of pollution.So the application of a fuzzy technique would be convenient here.In the next paragraph, a literature review of some fuzzy applications has been put forward.Zadeh (1965) developed the concept of fuzzy set theory (FST) particularly to deal with the situations pertaining to non-probabilistic uncertainty.A complete presentation of all aspects of FST is available in the work of Zimmermann (1991).The applications of FST in dealing with ambiguous problems where nonprobabilistic uncertainty prevails have been reflected in the works of Dewit (1982) and Ostaszewski (1993).Park and Park (2010) developed a design to visualize the fuzzy set operations considering the traditional Zadehian theory of fuzzy sets in which it was taken that there is no difference between the fuzzy membership function and the fuzzy membership value for the complement of a fuzzy set.Baruah (2011aBaruah ( , 2011b) ) has shown that the membership value of a fuzzy number can be expressed as the difference between the membership function and a reference function, and therefore the fuzzy membership function and the fuzzy membership value for the complement of a fuzzy set are not the same.Based on this concept, Das (2012) tried to modify the design of Park and Park (2010) and was able to overcome the limitations of their work by visualizing the complement of a fuzzy set in a correct manner.
Derring and Ostaszewski (1995) have explained in their research work a method of pattern recognition for risk and claim classification.Bezdek (1981) has discussed in his fuzzy c-means technique that the data to be analyzed must be in the form of numerical vectors called feature vectors, and the number of clusters must be predefined for obtaining the membership values of the feature vectors.Das (2013) tried the fuzzy c-means algorithm of Bezdek with three different distances namely Euclidean distance, Canberra distance and Hamming distance which revealed that out of the three distances, the algorithm produces the result fastest as well as the most expected when Euclidean distance is considered and the slowest as well as the least expected when Canberra distance is considered.
In the present work, we shall try to see the membership values, full or partial, of the level of pollution due to vehicular emissions of a particular city to one or more of some predefined clusters of pollution.For this purpose, we shall arbitrarily choose fifty (50) cities (Table -3) and six (06) predefined clusters of pollution (Table -2).Out of the different pollutants emitted by vehicles, we shall consider Carbon Monoxide (CO 2 ), Ozone (O 3 ) and Sulfur Dioxide (SO 2 ) which have the most adverse effects on the environment (Table -1).Further, due to the existence of these three pollutants in vehicular pollution, the data used to represent it are in the form of numerical vectors.In our work, we shall apply the fuzzy c-means technique of Bezdek (1981) on a dataset representing vehicular pollution (Table -3) to obtain the membership values of pollution of a city to one or more of the six predefined clusters.
In Section -2, we shall explain the fuzzy c-means algorithm of Bezdek.The application of this algorithm on vehicular pollution has been shown in Section -3.In Section -4, we shall show the findings and analysis of our work.Section -5 consists of the conclusions.

Bezdek's fuzzy c-means algorithm
The fuzzy c-means algorithm due to Bezdek, which we have applied on a dataset of vehicular pollution has been explained through the following steps.
Step 1: Choose the number of clusters, c, 2 ≤ c < n, where n is the total number of feature vectors.Choose m, 1≤ m <α.Define the vector norm || .||, generally defined as the Euclidean distance, where kj x is the j th feature of the k th feature vector, for k = 1, 2,……,n; j = 1, 2,….,p and ij v , j-dimensional centre of the i th cluster for i = 1, 2,……,c; j = 1, 2,….,p; n, p and c being the total number of feature vectors, features in each feature vector and total number of clusters respectively.
Choose the initial fuzzy partition Choose a parameter ∈ > 0. This will help us to decide when exactly to stop the iteration.Set the iteration counting parameter l equal to 0.
Step 2: Compute the fuzzy cluster centers Step 3: Compute the new partition matrix, i.e. membership matrix Step 4: If ∆ > ∈, repeat steps 2, 3 and 4. Otherwise, stop at some iteration count * l .To make the result operational a fifth step had been introduced by Derring and Ostaszewski (1995).
Step 5: The final fuzzy matrix * l U is structured for operational use by means of the normalized α -cut, for some 0 < α< 1.All membership values less than α are replaced with zero and the function is renormalized to preserve partition condition.

Application on vehicular pollution
In our present work we have chosen 50 cities to see the level of pollution of a particular city due to vehicular emissions.Although vehicular emissions contribute different pollutants to the environment, we have considered only three, Carbon Monoxide (CO), Ozone (O 3 ) and Sulfur dioxide(SO 2 ), which have the most adverse effects on environment (Table -1).Due to the existence of these three pollutants, unlike in the traditional techniques, it would not be appropriate to see the belongingness of the level of pollution of a particular city to exactly one of some predefined clusters.Rather, it would be appropriate to see the membership values, full or partial, of the level of pollution of a particular city to one or more of the predefined clusters of pollution.Therefore the application of a fuzzy technique will be more convenient here.For this purpose, we have divided the whole range of pollution into 6 clusters namely: Good (C1), Moderate (C2), Unhealthy for Sensitive Group (C3), Unhealthy (C4), Very Unhealthy (C5) and Hazardous (C6) (Table -

Source: Guidelines for the Reporting of daily Air Qualitythe Air Quality Index (AQI), U.S. Environmental Protection Agency (EPA)
In Table -2, the ranges of the numerical values of each pollutant (can also be thought as feature) i.e.Carbon Monoxide (CO), Ozone (O 3 ) and Sulfur Dioxide (SO 2 ) have been provided for evaluations needed Bezdek's algorithm which we have applied on a dataset of vehicular pollution (Table -3) for the purpose of partitioning it into 6 predefined clusters: Good(C1), Moderate (C2), Unhealthy for Sensitive Group (C3), Unhealthy (C4), Very Unhealthy (C5) and Hazardous (C6).Bezdek (1981) has discussed in his fuzzy c-means technique that the data to be analyzed must be in the form of numerical vectors called feature vectors, and the number of clusters must be predefined for obtaining the membership values of the feature vectors.In our present work, the data we have used to represent the pollution due to vehicular emission of a city is in the form of numerical vector i.e. to explain it mathematically if X represents the pollution level of a city, then X is in the following form X = (x 1 , x 2 , x 3 ), where x 1 , x 2 and x 3 are the amounts (in ppm) of Carbon Monoxide (CO), Ozone (O 3 ) and Sulfur Dioxide (SO 2 ) respectively available in X (Table 3).
As the dataset we have analyzed satisfies the prerequisite conditions of Bezdek i.e. it is in the form of numerical vectors and also the number of clusters has been predefined, 6 in our case, the fuzzy c-means technique of Bezdek (1981) has been considered to be the most appropriate to be applied in the dataset of our present work.

Findings and analysis
In the traditional way of designating the level of pollution of a particular region or city, we have observed that out of the different available pollutants only one whose value is the maximum is considered, based on which the intensity of pollution of that particular city is calculated and accordingly placed to belong to exactly any one of some predefined clusters.In this method the existence or availability of other pollutants is completely neglected.In our present work, we have observed that out of the three available pollutants i.e.CO, O 3 and SO 2 , the availability of CO is the maximum for every city (Table -3).Thus following the traditional way we have first considered the availability of CO out of the three available pollutants and have got the results as given in Table - Therefore we have applied the fuzzy cmeans technique of Bezdek on our dataset (Table -3) considering the existence of all three pollutants i.e.CO, O 3 and SO 2 , and have tried to see the level of pollution of a city as a combined effect of all these three pollutants.The results obtained by this combined effect have been given in Table -6.Here it can be observed that each city does not belong to exactly one of the six predefined clusters of pollution.Instead we find that some cities belong to exactly one while some cities belong to more than one of the six predefined clusters of pollution.Here, the membership values of the level of pollution of a city to a predefined cluster lies between 0 and 1, both 0 and 1 inclusive, unlike the traditional way where we find this value is either 0 or 1. Summarized results have been shown in Table -7.In Figure -1, we have provided a graphical representation of this result to show the membership value of the level of pollution of a city to a cluster of pollution by symbolizing the full membership value by a diamond and partial membership value by a square.
If we observe the 3 rd column of Table -7, we find that the cities with partial membership values are in two consecutive clusters of pollution i.e. a city partially belong to two clusters which appear successively in the hierarchy of pollution.For example, the city with city_ID 11 belongs to cluster C1 (i.e.Good) with membership value 0.4817 and to cluster C2 (i.e.Moderate) with membership value 0.5183.This means that the pollution level of the city with city_ID 11 is neither fully 'Good' nor fully 'Moderate'.It also indicates that if adequate measure is taken there still is approximately 48% possibility that this city could belong to cluster C1 (which is 'Good').Similar arguments can be given in support of the other cities appearing in column -3 of Table -7.The city_IDs and the names of the two different clusters where these belong partially have been shown in Table -8.In Table -8, we have provided the possibility of shifting the level of pollution of a city to a cluster which has less average pollution out of the two clusters to which that city partially belongs.A graphical representation of the same has been given in Figure -2.

Figure 1 .
Figure 1.Partial and full membership values of the cities to one or more than one of the six (06) predefined clusters of pollution when the availability of all the pollutants are considered.

Table 2 :
Predefined clusters of pollutions

Table 3 :
The dataset of pollution due to vehicular emission

Table 5 .
4.Here we have neglected the presence of the other two pollutants i.e.O 3 and SO 2 , and therefore every city belongs to exactly any one of the 6 predefined clusters of pollution.A summarized form of the results in Table -4 has been given in Table -5.But the fact is that we cannot simply neglect the existence of the other two pollutants.A city belonging to exactly one of the six predefined clusters.