Available online at www.sciencedirect.com

SciVerse ScienceDirect

Procedia Technology 6 (2012) 307 - 314

2nd International Conference on Communication, Computing & Security

Scalable Rough-fuzzy Weighted Leader based Non-parametric

methods for Large Data Sets

Suresh Velurua*, P. Viswanathb, Bidyut Kr. Patrac, V. Jayachandra Naidud

a* School of Engineering and Mathematical Sciences, City University London, EC1V0HB, U. K. b Department of CSE, Rajeev Gandhi Memorial College of Eng. & Tech., Nandyal, Andhra Pradesh, India. cDepartment of CSE, National Institute of Technology Rourkela, Orissa, India. d Department of ECE, Sri Venkateswara College of Eng. & Tech., Chittoor, Andhra Pradesh, India.

Abstract

Popular non-parametric methods like k-nearest neighbor classifier and density based clustering method like DBSCAN show good performance when data set sizes are large. The time complexity to find a density at a point in the data set is O(n) where n is the size of the data set, hence these non-parametric methods are not scalable for large data sets. A two level rough fuzzy weighted leader based classifier has been developed which is a scalable and efficient method for classification. However, a generalized model does not exist to estimate density non-parametrically that can be used for density based classification and clustering. This paper presents a generalized model which proposes a single level rough fuzzy weighted leader clustering method to condense data set inorder to reduce computational burden and use these rough-fuzzy weighted leaders to estimate density at a point in the data set for classification and clustering. We show that the proposed rough fuzzy weighted leader based non-parametric methods are fast and efficient when compared with related existing methods interms of accuracy and computational time.

©2012 Elsevier Ltd...Selection and/or peer-review under responsibility of the Department of Computer Science & Engineering, National Institute of Technology Rourkela

Keywords: k-nearest neighbor classifier; DBSCAN; Rough-fuzzy weighted leaders clustering; non-parametric methods; classification

1. Introduction

Nearest neighbor classifier (NNC) and Parzen-window based density estimation (Duda, E.Hart & Stork 2000) are popular non-parametric methods which are more general than parametric methods because they do not assume any parametric distribution form from which the data set is generated. Practically, they show good performance with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers like NNC and its variant like k-nearest neighbor classifier (k-NNC) (Duda, E.Hart & Stork 2000) are using this approach. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart 1967). Also, DBSCAN (Density-Based

* Corresponding author. Tel.: +44-75773-06019. E-mail address: Suresh .Veluru. 1@city.ac.uk

2212-0173 © 2012 Elsevier Ltd...Selection and/or peer-review under responsibility of the Department of Computer Science & Engineering, National Institute of Technology Rourkela doi: 10.1016/j.protcy.2012.10.037

Spatial Clustering of Applications with Noise) is a popular density based clustering method (Han & Kamber 2000) which finds arbitrary shaped clusters along with noisy outlier (Ester, Kriegel & Xu 1996) uses this approach.

Non-parametric methods require large data sets in order to produce accurate results. Nowadays, in the areas like data mining, large data sets are available. The other side of the issue is that, when data set size is large, non-parametric methods suffer from huge computational burden and often are not feasible to use. The space and classification time complexities are O(n) for NNC and &-NNC. Whereas the space complexity of DBSCAN is O(n) and time complexity is O(n2). Hence, these methods are not salable for large data sets. In order to reduce both space and time complexities, several techniques have been developed (Wilson & Martizen 2000) which are training set reduction, training set condensation, reference set thinning, and prototype selection methods. These methods find a set of representative patterns which are either a subset or a new set of patterns formed such that noisy, redundant or superfluous patterns are eliminated from the given data set (Angiulli 2007). But, these methods are costly which require to evaluate the classifier performance at each iteration of the elimination process.

Another way of tackling the computational burden problem is to partition the data set using some fast clustering method and then choosing a representative for each cluster. Instead of working with the data set, one can work with these set of representatives. For example, Leaders-subleaders method (Vijaya, Murty & Subramanian 2004), l-DBSCAN method (Viswanath & Pinkesh 2006), counted leaders method (Babu, Viswanath & Murty 2008), and weighted leaders method (Babu & Viswanath 2007) are developed which partitions data set using leaders clustering method and use derived leaders set as representative patterns of the large data set. Leaders-subleaders method does not preserve density information which in-turn degrades the performance of the classifier whereas l-DBSCAN, counted leaders, and weighted leaders preserves density information in the form of a weight or count which improve the performance. The derived count or weight associate with each leader can be used to estimate the probability density at a point in the feature space.

Uncertainty and vagueness exist in the data set which in-turn affects the performance. Rough set theory (Pawlak 1982) and fuzzy set theory (Zadeh 1965) are well known mathematical theories to capture uncertainty associated with the data. Combined principles of these two theories are used in many pattern recognition techniques. For example, adaptive rough fuzzy single pass algorithm for large data sets is developed by Asharaf et al. (Asharaf & Murty 2003) to resolve the uncertainty in leaders clustering method. Rough-fuzzy c-medoid algorithm is introduced by Pradipta et al. (Maji & Pal 2007) and applied to selection of bio-basis for amino acid sequence analysis. Recently, rough-fuzzy weighted k-nearest leader classifier for large data sets is developed by Suresh Babu et al. (Babu & Viswanath 2009) which is a generalized classification method to overcome the limitations of leaders-subleaders method developed by Vijaya et al. (Vijaya, Murty & Subramanian 2004).

This paper presents a generalized model which proposes a rough fuzzy weighted leaders clustering method by resolving the uncertainty present in the leaders clustering method. It performs rough fuzzy leaders clustering on large data set in order to condense and use scaled rough-fuzzy membership values to calculate the weights of leaders. The proposed rough-fuzzy weighed leaders clustering method is efficient while calculating rough-fuzzy membership values and produces clusters in a single scan. Further, the rough fuzzy weighted leaders are used to estimate density information. Finally, classification and clustering are performed on the rough fuzzy leaders set and shown that the proposed methods are fast when compared with the existing related methods.

This paper is organized as follows. Section 2 explains preliminary background and detailed literature review. In section 3, the proposed rough fuzzy weighted leaders method is described. In section 4, the scalable non-parametric methods are developed which are based on the weighted rough fuzzy leaders clustering method. In section 5, the experimental results are given in comparison with the existing related methods. Finally, the concluding remarks are given in section 6.

2. Background

This section describes background of non-parametric methods. Non-parametric methods either explicitly or implicitly estimate the arbitrary density function from the data set and perform classification and clustering tasks using estimated density. Prominent non-parametric classifiers are NNC and k-NNC whereas DBSCAN is a popular non-parametric density based clustering method.

Suppose there are c classes which are represented as ( a2,ac. The training set for each class ( is D, for

i = 1 to c and the total training set is given by D = D1 U D2 U ... U Dc. The k-NNC works as follows. Given a test pattern t, it finds k nearest patterns from the training set D and assigns the most frequent class label of a pattern among k nearest patterns. If there are more than one class which occur most frequent among k patterns then any one of the most frequent class labels is assigned for the test pattern t. Theoretically, the approximate posterior probability at t for class ai is:

P(co,\t) = ^ (1)

where mi is the number of patterns that are present in the region R that belongs to the class ( i and n is the total number of training patterns, and V is the volume of the region R. Asymptotically as n ^ mi ^ mi/n ^ 0, and V ^ 0, it can be shown that p(a>i \ t) ^ p(ai \ t) [(Duda, E.Hart & Stork 2000)] where p(ai \ t) is the posterior probability of t for the class ai. From equation (1), it is clear that the k-NNC tries to estimate mi when the data sets are large and hence it assigns the class label which has the maximum estimated posterior probability of a test pattern. Assuming k is small then the time complexity to classify a test pattern is O(n).

Density based clustering methods like DBSCAN groups the data points which are dense and connected into a single cluster. Density at a point is found non-parametrically. It is assumed that probability density over a small region is uniformly distributed and the density is given by m/nV, where m is the number of points out of n input data points that are falling in a small region around the point and V is the volume of the region. The region is assumed to be a hyper sphere of radius e and hence threshold density can be specified by a parameter MinPts, the minimum number of points required to be present in the region to make it dense. Given an input data set D, and the parameters and MinPts, DBSCAN finds a dense point in D and expands it by merging neighboring dense points. Patterns in the data set which do not belong to any of the clusters are called noisy patterns. A non dense point can be a part of a cluster if it is at distance less than or equal to from a dense pattern, otherwise it is a noisy outlier (Viswanath & Pinkesh 2006). The time complexity of DBSCAN is O(n2).

3. Rough fuzzy weighted leaders clustering method

This section describes the proposed rough fuzzy weighted leaders clustering method. We present some refinements to use rough fuzzy principles in the assignment of patterns to the prototypes (i.e., leaders). A user defined upper threshold (US) and lower threshold (L_7) are used which are said to be the upper and lower approximations of the prototypes such that LS < US. Hence, patterns can be assigned to the prototypes as follows.

• If pattern x falls within the lower threshold of a prototype then it is assigned without any ambiguity to the respective prototype. Note that even if there are more than one such prototype then the pattern x is assigned to any one of them.

• If there is no such prototype (pattern x is not in the lower threshold of any prototype) then the method checks if pattern x is in the upper threshold of one or more prototype(s) then it is assigned to each of these prototype(s) with some rough-fuzzy membership value(s) which will be calculated based on the proximity of the pattern x with respect to these prototype(s).

• If pattern x is not within the upper approximation of a prototype then it does not belong to that prototype.

Rough-fuzzy membership values of a pattern which is in the boundary of one or more prototypes (if there are more than one such prototype then we call them as overlapping prototypes) is calculated based on it's proximity with respect to the overlapping prototypes. Suppose there are r overlapping prototypes for a pattern x, say l(1), l(2),..., l(r), then the rough-fuzzy membership of x to each of these overlapping prototypes are

X?=1 Pp

j = 1,

fy m-^lhV1

^ivil^liyj

From equation (2) we have

X Vp = 1 (4)

An example for these kind of assignments is given in Figure 1. x1 belongs to the lower threshold of prototype l(1) then

* Patterns

+ Leaders ^^

(j Lower Threshold

' Upper Threshold Fig. 1. Rough fuzzy assignment of patterns to the prototypes

x1 is assigned to l(1) only. But, x2 is in the boundary of three prototypes viz., l(1), l(2), and l(3). Hence x2 is assigned to each of these prototypes with some rough-fuzzy membership values. Similarly x3 is not in the upper threshold of l(1) and hence x3 is not assigned to l(1).

In order to preserve the density information, a modified leaders clustering method using rough-fuzzy set theory called Rough-Fuzzy Weightecl-Leaclers method is described in Algorithm 1. The method uses a pair of upper (U_T) and lower(L_7) thresholds which are said to be upper and lower approximations of the prototypes. The proposed rough-fuzzy weighted leaders method uses L_T=U_T!2. The reasons being, (i) the method can be specified by only one parameter (i.e., U-T), (ii) patterns which belong to lower threshold of two distinct leaders will not intersect tlieir regions (in the feature space). It is an incremental clustering method which requires a single scan of the data set and has O(n) time complexity where n is the data set size. For a given upper threshold (U.T), rough-fuzzy weighted leaders clustering method is given as follows.

It maintains a set of leaders L which is initially empty. For each pattern x G D, if there is a leader l G L such that distance between x and I is less than LS then x is assigned to the lower approximation of I and the weight of I is updated as weight(l)=weight(l)+1. If there is no such leader exist then pattern x is verified with the p leaders for which distance between x and / is less than U-T and if p is greater than 0 then x is assigned to each of the p leaders with some rough-fuzzy membership value which is described in section 3. Suppose x is assigned to leader l with rough-fuzzy membership value V then the weight of leader l is updated as weight(l)=weight(l)+v. If p = 0 then x itself becomes a new leader whose weight is initialized as 1. The rough-fuzzy leaders method is described in Algorithm 1. The weight of the proposed method depends on the scanning order of the data set which is similar to other variants of leaders method.

4. Scalable non-parametric methods for large data sets

This section describes density based classification and clustering methods by using rough fuzzy weighted leaders clustering method.

4.1. Rough fuzzy weighted k-nearest leader classifier

This section describes the scalable classifier. Let L be the set of rough fuzzy weighted leaders obtained for the patterns of the class (, for i = 1 to c. Let L be the set of all leaders. That is, L = L1 U ... U Lc. For a given query

Algorithm 1 Rough-Fuzzy Weighted-Leaders(5>, UJ')

& = 0; L_T=U.TI2; for each x G D do

if tliere is a leader / G such tliat 11/ — x| | <L-T then

weight(l) = weight(l) + 1; else

Find the set P = {I \ I G L, 11/ -x|| < U_T}; if P= $ then

for each l such that l G P do

{ Let V is a rough-fuzzy membership value of l using rough-fuzzy set theory described in section 3} weight(l) = weight(l) + v; end for else L = L U{x}; weight(x) = 1; end if end if end for

Output L which is a set of tuples such that each tuple is in the form < l, weight(l) > where l is a leader and

weight(l) is its weight.

pattern q, the k nearest leaders from ^ is obtained. For each class of leaders among these k leaders, their respective cumulative weight is found. Let this for class ( be Wi, for i = 1 to c. The classifier choses the class according to

argmaxa {W1,...,Wc}.

It is easy to see that from equation (1), the Wi ~ mt and hence Wi is proportionate to the posterior probability P(a>i \ q). The classifier chooses the class according to argmax(i{W1:.. .,Wc}. The rough-fuzzy weighted k-nearest classifier is given in Algorithm (2). From the above argument, it is clear that the scalable classifier is approximately

Algorithm 2 Rough-Fuzzy-weighted-k-Nearest-Leader(^, q)

{L is the set of all leaders derived from all classes. q is the query pattern to be classified} Find k nearest leaders of q from L.

Among the k nearest leaders find the cumulative weight of leaders that belongs to each class. Let this be W, for class (Oi, for i = 1 to c.

Class label assigned for q = argmax(i {Wi,... ,Wc}.

doing the k-nearest neighbor classifier. The space and time complexities of RF-wk-NLC is O(\L\) where \L\ is the number of all leaders.

4.2. Rough fuzzy weighted leader based DBSCAN

This subsection describes the scalable clustering method called rough fuzzy weighted leader based DBSCAN (RF-wl-DBSCAN). In rough-fuzzy weighted leader based DBSCAN, let L be a set of leaders derived by applying rough fuzzy weighted leaders clustering method given in Algorithm 1 for a large data set. Each leader is associated with a rough-fuzzy weight value which are used in finding a better density estimation. DBSCAN is applied to the set of leaders L and it is given as follows.

For each leader in the leaders set L, find an e-nearest leaders of l and let it be Li = {lj G L \ \\lj — l\ \ < e} where e is a radius of hypersphere around a leader l. Find the cumulative weights of the leaders in Ll and it is given as weight (Ll) = XlGLl weight(l). This measure is used to find the minimum number of points (MinPts) required to present in the region of radius e to make the leader l dense or not. If a leader l is dense then then the weight (Ll)

Table 1. Details of data sets

Classification Clustering

Data set Number of Number of Number of Number of Features Classes Training patterns Test patterns Data set Number of Number of Features patterns

Synthetic 2 2 300000 100000 Seismic 50 3 78823 19705 Shuttle 9 58000 Letter 16 20000

must be greater than or equal to MinPts. The scalable clustering method primarily applies DBSCAN on leaders set L which finds a dense leader and expands it by merging neighboring dense leader and hence it groups all the leaders which are dense and nearby (which are within e distance) into one group. The dense leader can be found as given above. If a leader is non-dense which is near by any dense leader then it groups into the dense leader group. If a leader is non-dense and not near by any dense leader then it is a noisy leader. The scalable clustering method outputs the clustering of leaders by applying DBSCAN method on leaders set L and expands the cluster labels of the leaders to its followers. If a leader is noisy leader then all its followers are noisy.

The scalable method works with only leaders which is an approximate method to DBSCAN with reduced computational burden. The computational complexity of the scalable method is O(n + \L|2), where n is the number of patterns in the data set and \L\ is the number of leaders.

5. Experimental Results

Experimental studies are done for various data sets. One synthetic data set and one standard data set are used for classification and two standard data sets are used for clustering method. The details of the data sets are given in Table 1. These are available at http://www.ics.uci.edu/mlearn/MLRepository.html.

5.1. Experimental results for RF-wk-NLC

Experimental studies are done for RF-wk-NLC with one synthetic data set and one standard data set, viz., SensIT vehicle(seismic).

A two dimensional synthetic data for a two class problem is generated as follows. First class having 200000 patterns were i.i.d. drawn from a normal distribution having mean as (0,0)T and covariance matrix as I2x2(i.e,identity matrix). Second class also is of 200000 patterns which is also i.i.d. drawn from a normal distribution with mean (2.56,0)T and covariance matrix I2x2. The Bayes error rate for this synthetic data set is 10%. The data set is divided randomly into two parts consisting of 300000 and 100000 patterns which are used as training and testing sets respectively.

The classifiers chosen for the comparative study are: (1) the nearest neighbor classifier(NNC), (2) the k-nearest neighbor classifier(k-NNC), (3) the nearest leader classifier(NLC), (4) the weighed k-nearest leader classifier(wk-NLC), (5) Adaptive rough fuzzy nearest leader classifier (ARFNLC), and (6) the rough-fuzzy weighted k-nearest leader classifier(RF-wk-NLC) which is the classifier in this paper.

The experiments are conducted for various leader's upper threshold i.e., U-T values. For Synthetic data set the U-T values chosen are {0.6,0.5,0.4,0.3} and for SensIT vehicle(seismic) data set the U-T values chosen are {1.2,1.0,0.9,0.8}. The comparison of number of leaders, k value, design time (time taken to generate leaders), classification time, and accuracy of the related classifiers for various thresholds are tabulated in Table 2 and 3 for Synthetic data set and SensIT vehicle(seismic) data set respectively. From tables 2 and 3, it is clear that the design time of the proposed rough fuzzy weighted k-nearest leader classifier is very less than the design time of the adaptive rough fuzzy nearest leader classifier and has better accuracy.

5.2. Experimental Results for RF-wl-DBSCAN

Experimental studies are done with the following objectives, (i) compare the clustering result obtained by RF-wl-DBSCAN with that of DBSCAN and l-DBSCAN and (ii) compare the time taken by DBSCAN, l-DBSCAN and RF-

Table 2. Synthetic data set results

Classifier Threshold Number ot k Design classifica- Accuracy (%)

prototypes time(s) tion time(s)

NNC - 1 - 6543 85.17

k-NNC 74 - 7390 89.61

0.6 221 1 0.72 1.52 82.16

0.5 306 1 0.82 2.1 83.622

NLC 0.4 455 1 1.04 3.05 84.954

0.3 755 1 1.49 4.93 85.789

0.6 221 '¿5 0.72 7.65 88.32

0.5 306 25 0.82 12.9 88.63

wk-NLC 0.4 455 25 1.04 24.59 88.85

0.3 755 25 1.49 41.4 89.05

U.T L-T #L #S

0.6 0.5 196 25 1 666.97 2.9 81.06

ARFNLC 0.5 0.4 280 24 1 1306.67 20.7 82.043

0.4 0.3 467 120 1 3615.46 38.33 80.777

0.3 0.2 945 113 1 1382.32 72.44 80.143

0 6 221 '¿5 2.16 7.65 89.54

0.5 306 25 2.75 12.9 89.79

RF-wk-NLC 0.4 455 25 3.85 24.59 89.85

0.3 755 25 6.10 41.4 89.89

Table 3. SensIT vehicle(seismic) data set results

Classifier Threshold Number ot k Design classifica- Accuracy (%)

prototypes time(s) tion time(s)

NNC - 1 - 12924 65.41

k-NNC 30 - 17150 73.21

1.2 271 1 6.22 3.64 38.38

1 600 1 6.36 6.49 45.67

NLC 0.9 9S iO 1 6.64 9.21 49.84

0 8 1490 1 7.22 14.73 57.50

1.2 271 1 5 6.22 6.02 66.82

1 600 16 6.36 11.29 66.98

wk-NLC 0.9 9S iO 20 6.64 18.26 67.48

0 8 1490 20 7.22 27.53 67.52

U.T L.T #L #S 1 666.97 2.9 33.65

1.2 1.0 196 25

ARFNLC 1.0 0.8 280 24 1 1306.67 20.7 40.47

0.9 0.7 467 120 1 3615.46 38.33 45.88

0.8 0.6 945 113 1 1382.32 72.44 50.59

1 2 271 1 5 6.22 6.02 67.81

1 600 16 6.36 11.29 68.71

RF-wk-NLC 0.9 9S iO 20 6.64 18.26 69.26

0 8 1490 20 7.22 27.53 69.79

Table 4. Shuttle data set

Performance Comparison Time Comparison (in seconds)

Threshold Rand-Index : DBSCAN vs l-DBSCAN Rand-Index : DBSCAN vs RF-wl-DBSCAN l-DBSCAN RF-wl-DBSCAN DBSCAN

0.04 0.851 0.880 8 10 947

0.03 0.970 0.990 16 17 947

0.02 0.983 0.995 24 25 947

0.01 0.988 0.999 54 55 947

wl-DBSCAN. The clustering result of RF-wl-DBSCAN and DBSCAN and the clustering results of RF-wl-DBSCAN and l-DBSCAN are compared using the similarity measure Rand-Index (Rand 1971), (Hubert & Arabie 1985) which is described below. Rand-Index has a value between 0 and 1, with 0 indicating that two sets of partitions do not agree on any pair of patterns and 1 indicating that the two sets of partitions are exactly the same.

Experimental studies are done with two large data sets, viz., Shuttle and Letter data sets. Table 4 and 5 show the experimental results for the Shuttle and Letter data sets respectively. Rand-Index values and computational time for different threshold values are tabulated. The parameter e and MinPts are chosen same for three methods (DBSCAN, l-DBSCAN, and RF-wl-DBSCAN). The proposed RF-wl-DBSCAN is outperforming l-DBSCAN since it uses rough

Table 5. Letter data set

Performance Comparison Time Comparison (in seconds)

Threshold Rand-Index : DBSCAN vs l-DBSCAN Rand-Index : DBSCAN vs RF-wl-DBSCAN l-DBSCAN RF-wl-DBSCAN DBSCAN

0.7 0.760 0.810 12 14 225

0.6 0.850 0.890 23 25 225

0.5 0.915 0.945 31 33 225

0.4 0.960 0.985 51 52 225

fuzzy weighted leaders. But the computational burden for both RF-wl-DBSCAN and l-DBSCAN is almost same. For threshold value 0.01 in the Shuttle data set, the rand-index value of RF-wl-DBSCAN is 0.999 which shows that it is almost equal to DBSCAN, but, it has taken only 6% of the DBSCAN's time. For threshold value 0.4 in the Letter data set, the rand-index value of RF-wl-DBSCAN is 0.985 which shows that it is close to DBSCAN, but, it has taken only 25% of the DBSCAN's time.

6. Conclusion

The paper presented scalable rough fuzzy weighted leader based non-parametric methods for large data sets. This paper proposed an efficient rough fuzzy weighted leaders clustering method which resolved uncertainty and used rough fuzzy membership values for better density estimation. This paper presented two nonparametric methods for classification and clustering which uses proposed rough fuzzy weighted leaders clustering method . Experimental studies are shown that the proposed non-parametric methods performing better than the existing related methods which can be applied for large data sets.

References

Angiulli, Fabrizio. 2007. "Fast Nearest Neighbor Condensation for Large Data Sets Classification." IEEE Transactions on Knowledge and Data Engineering 19(11):1450-1464.

Asharaf, S. & M. Narasimha Murty. 2003. "An adaptive rough fuzzy single pass algorithm for clustering large data sets." Pattern Recognition 36(12):3015-18.

Babu, V. Suresh & P. Viswanath. 2007. Weighted k-Nearest Leader Classifier for Large Data Sets. In 2nd International Conference on Pattern

Recognition (PReMI-07). Vol. LNCS-4815 pp. 17-24. Babu, V. Suresh & P. Viswanath. 2009. "Rough Fuzzy Weighted k-Nearest Leader Classifier for Large Data Sets." Pattern Recognition 42(9):1719-1731.

Babu, V. Suresh, P. Viswanath & M. Narasimha Murty. 2008. Scalable Non-parametric Methods for Large Data Sets. Montclair State University,

USA: To appear in Encyclopedia of Data Warehousing and Mining, 2nd Edition, Idea group inc. Cover, T.M. & P.E. Hart. 1967. "Nearest Neighbor Pattern Classification." IEEE Transactions on Information Theory 13(1):21-27. Duda, Richard O., Peter E.Hart & David G. Stork. 2000. Pattern Classification. 2 nd ed. John Wiley & Sons: A Wiley-interscience Publication. Ester, M., H. P. Kriegel & X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of

2nd ACM SIGKDD. Portand, Oregon: pp. 226-231. Han, Jiawei & Micheline Kamber. 2000. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers. Hubert, Lawrence & Phipps Arabie. 1985. "Comparing Partitions." Journal of Classification 2:193-218.

Maji, P. & S. K. Pal. 2007. "Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis." IEEE Transactions

on Knowledge and Data Engineering 19(6):859-872. Pawlak, Z. 1982. "Rough sets." International Journal of Computer and Information Sciences 11:341-356.

Rand, W.M. 1971. "Objective Criteria for the Evaluation of Clustering Methods." Journal of the American Statistical Association 66:846-850. Vijaya, P. A., M. Narasimha Murty & D. K. Subramanian. 2004. "Leaders-Subleaders: An efficient hierarchical clustering algorithm for large data

sets." Pattern Recognition Letters 25:505-513. Viswanath, P. & Rajwala Pinkesh. 2006. l-DBSCAN : A Fast Hybrid Density Based Clustering Method. In Proceedings of the 18th Intl. Conf. on

Pattern Recognition (ICPR-06). Vol. 1 Hong Kong: IEEE Computer Society pp. 912-915. Wilson, D. R. & T. R. Martizen. 2000. "Reduction Techniques for Instance-Based Learning Algorithms." Machine Learning 38:257-286. Zadeh, L. A. 1965. "Fuzzy sets." Information and control 8:338-353.