Scholarly article on topic 'Classifier fusion based on Bayes aggregation method for Indian sign language datasets'

Classifier fusion based on Bayes aggregation method for Indian sign language datasets Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Engineering
OECD Field of science
Keywords
{"Indian Sign Language" / "Classifier fusion" / "Accuracy rate" / "Simple Bayes" / KNN / "Support Vector Machine"}

Abstract of research paper on Computer and information sciences, author of scientific article — M. Krishnaveni, V. Radha

Abstract Automation over sign language recognition systems can greatly facilitate the vocal and the non-vocal communities which can be equivalently best and successive as speech-recognition systems. It offers enhancement of communication capabilities for the speech and hearing impaired, promising improved social opportunities and integration. This paper describes a gesture classification scheme which can classify a wide class of hand gesture in a view based setup. Since the images are from single camera view, it seems to be hardware complexity; however it needs a high accuracy classifier for classification and recognition purpose. The decision making of the system in this work employs fusion technique for three classifiers namely KNN, MLP and SVM to classify sign language isolated signs. The process involves two layer classifications. At first, coarse classification is done according to single classifier and second classification is fusion based on combination methods. Experimental results demonstrate that the classifier fusion approach can be used reliably in classifying some signs of native Indian sign language.

Academic research paper on topic "Classifier fusion based on Bayes aggregation method for Indian sign language datasets"

Available online at www.sciencedirect.com

W70 ELSEVIER

Procedía Engineering

Procedía Engineering 30 (2012) 1110 - 1118

www.elsevier.com/Iocate/procedia

International Conference on Communication Technology and System Design 2011

Classifier fusion based on Bayes aggregation method for Indian sign

language datasets

abDepartment of Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore-641043, India

Automation over sign language recognition systems can greatly facilitate the vocal and the non-vocal communities which can be equivalently best and successive as speech-recognition systems. It offers enhancement of communication capabilities for the speech and hearing impaired, promising improved social opportunities and integration. This paper describes a gesture classification scheme which can classify a wide class of hand gesture in a view based setup. Since the images are from single camera view, it seems to be hardware complexity; however it needs a high accuracy classifier for classification and recognition purpose. The decision making of the system in this work employs fusion technique for three classifiers namely KNN, MLP and SVM to classify sign language isolated signs. The process involves two layer classifications. At first, coarse classification is done according to single classifier and second classification is fusion based on combination methods. Experimental results demonstrate that the classifier fusion approach can be used reliably in classifying some signs of native Indian sign language.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of ICCTSD 2011

Keywords: Indian Sign Language; Classifier fusion; Accuracy rate; Simple Bayes; KNN; Support Vector Machine

1. Introduction

Sign language is important in humankind that is showing an increasing research interest in eradicating barriers faced by differently abled people in communicating and contributing to the society. A functioning sign language recognition system can provide an opportunity for a mute person to communicate with non-signing people without the need for an interpreter [8]. It can be used to generate speech or text making the mute more independent. Unfortunately, there has not been any system with these capabilities so far. All researches on this field have limited to small scale systems capable of classifying and recognizing only a minimal subset of a full sign language with stable accuracy rate [10]. The sign language used in India is commonly known as Indian Sign Language (henceforth called

* Krishnaveni Marimuthu. Tel.: +91- 9442571571. E-mail address: krishnaveni.rd@gmail.com.

M.Krishnavenia, V.Radhab a*

Abstract

1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.970

ISL).Linguistic studies on ISL were started around 1978 and it has been found that ISL is a complete natural language, instigated in India, having its own morphology, phonology, syntax, and grammar. ISL is not only used by the deaf people but also by the hearing parents of the deaf children, the hearing children of deaf adults and hearing deaf educators. Therefore the need to build an automation system that can associate signs to the words of spoken language, and which can further be used to learn ISL, is significant. This paper proposes an efficient classifier approach for recognizing Indian sign language using the view based approach. This fusion technique produces high quality of recognition system, ease implementation and operational robustness. The paper is arranged as follows. Section 2 gives a brief overview of Indian sign language. Section 3 deals with the features extracted from hand gestures. Section 4 deals with the short description of classifiers. Section 5 explains the proposed approach based on combination methods for the selected dataset. Section 6 gives the experimental results and performance evaluation of the classifier fusion in ISL datasets. Finally, section 7 summarizes the framework and future work that can be adopted.

2. Overview of the Indian sign language

Indian Sign languages are rich, faceted language, and their full complexity is beyond current gesture recognition technologies[12]. The interpersonal communication problem between signer and hearing community could be resolved by building up a new communication bridge integrating components for sign. Several Significant problems specific to Automatic Sign Recognition are i) Distinguishing gestures from signs ii) Context dependency (directional, verbs, inflections, etc) iii) Basic unit modeling (how do we describe them?) iii) Transitions between signs (Movements) iv) Repetition (Cycles of movement)[11]. The ISL dictionary is been build by Sri Ramakrishna mission Vidyalaya College of Education, Coimbatore which as split the ISL into five parameters namely handshapes, locations, Orientation, Location, Movements and Facial Expression.

3. Feature extraction and data partitioning

The great variability in gestures and signs, both in time, size, and position, as well as interpersonal differences, makes the recognition task difficult[7]. By extracting features from image processing sequence classification can be done by discriminative classifiers [1]. Gestures, particularly in sign language, involve significant motion of the hands. Thus, in developing a sign language recognition system, it is important to model both the motion (temporal characteristics) and shape (spatial characteristics) of the hand. In this research work only the spatial characteristics of the hand are of concern. Feature extraction is the process of generating a set of descriptors or characteristic attributes from a binary image.[9] Most of the features used in existing sign language recognition systems focus only on one aspect of the signing like hand movements or facial expressions. Figure 2 shows the feature extraction concept.

FVJ" '

yvvVv^

OuLJlcw I ni Li 111

Figure 2: Hand feature extraction

The strategy used for segmentation is boundary segmentation method which traces the exterior boundaries of objects, as well as boundaries of holes inside these objects, in the binary image. The two important need for this method of sign extraction is First, it keeps the overall complexity of the segmentation process low. Secondly, it

eliminates candidates that may produce smooth continuations, but otherwise are inconsistent with the segments in the image. This simplifies the decisions made by the segmentation process and generally leads to more accurate segmentation. The feature extracted from the database used is mean intensity, area, perimeter, diameter, centroid.

Mean Intensity:

The mean intensity u in the selected region of interest (ROI) is given in eqn (1):

U = — I ROI (y) dxdy

N Jx, y

Area: The area of an object is a relatively noise-immune measure of object size, because every pixel in the object contributes towards the measurement. The area can be defined using eqn (2)

A0S<) = JJ ^xy^y^ ................................................(2) Where,

I(x,y) = 1 if the pixel is within a shape,(x,y)e S, 0 otherwise.

Perimeter:

The perimeter measurement gives the smooth relative boundaries and the perimeter of the region is defined as by the eqn (3)

P( S) = J yj x 21) + y 2 (t) dt

t .....................................(3)

Diameter:

The distance around a selected region is called the circumference. The distance across a circle through the center is called the diameter. K is the radius of the circumference of a circle.. Thus, for any circle, if you divide the circumference by the diameter, you get a value close to 71. This relationship is expressed in the following eqn (4):

— = n......................................................................

Where, C is Circumference and d is diameter

Centroid:

It specifies the center of mass of the region. Centroid is the horizontal coordinate (or x-coordinate) of the center of mass, and the second element is the vertical coordinate (or y-coordinate) and it is written as given in eqn (5)

J xg (x)dx

J g (x)dx

The data partitioning is handled by Holdout method Train-and-Test is done by dividing the given data set in

• A training sample for generating the classification model

• A test sample to test the model on independent objects with given classifications (randomly selected,20-30% of the complete data set)

The holdout method reserves a certain amount of data for testing and uses the remainder for training, hence they are called disjoint. Usually, one third (N/3) of data is used for testing, and the rest (2N/3) for training. Here the choice of records for the train and test data is essential. The holdout method randomly partitions the dataset into two

independent sets, training and testing. Generally, two-thirds of the data are allocated to be the training set and remaining one-third is allocated as test set (Figure 3). The method is pessimistic because only a portion of the initial data is used to derive the model.

Total number of examples

— —

Training Set Test Set

Figure 3: Dataset Split into training and testing set

4. Short descriptions of classifiers

Fusion at the classifier level in the current work has been achieved by training the output of the single best classifiers (MLP and SVM in the current case) as new features, with the aim of achieving more reliable and robust results. This is due to the fact that every classifier makes a different kind of error on a different region of the input space. Hence it is hoped that combining the information of more than one classifier might result in better classification rates for a given problem.

4.1 K-Nearest Neighbor Classification

The K-Nearest Neighbor classifier is a supervised learning algorithm where the result of a new instance query is classified based on majority of the K-nearest neighbor category. More robust models can be achieved by locating k, where k > 1, neighbours and letting the majority vote decide the outcome of the class labelling. A higher value of k results in a smoother, less locally sensitive, function. The algorithm is given below. (1) For each row (case) in the target dataset (the set to be classified), locate the k closest members (the K Nearest Neighbors) of the training dataset. A Euclidean Distance measure is used to calculate how close each member of the training set is to the target row that is being examined. (2) Examine the K Nearest Neighbors to find the class that is very near to the category and assign this category to the row being examined. (3) Repeat this procedure for the remaining rows (cases) in the target set. The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. In experiment, a value of 3 was set to 'k' (k=3).

4.2 Multi Layer Perceptron

The MLP is a special kind of Artificial Neural Network (ANN). MLP has been chosen because of its well-known learning and generalization abilities, which is necessary for dealing with imprecision in input patterns[6]. Following are the issues involved in designing and training a multilayer perceptron network: (l)Selecting of hidden layers to use in the network. (2) Deciding of neurons to use in each hidden layer. (3) Finding a globally optimal solution that avoids local minima. (4) Converging to an optimal solution in a reasonable period of time. (5) Validating the neural network to test for overfitting. Depending on the models of ANNs, training is performed either under supervision of some teacher (i.e., with labeled data of known input-output responses) or without supervision. The MLP to be used for the present work requires supervised training.

4.3 Support Vector Machine

Support Vector Machine has been used successfully for pattern recognition and regression tasks formulized under the concept of structural risk minimization rule. For the Support Vector Machine classifier, an open source software LibSVM tool can be used. In general, a classification task usually involves with training and testing data which consist of some data instances. Each instance in the training set contains one "target value" (class labels) and several

"attributes" (features). The goal of SVM is to produce a model which predicts target value of data instances in the testing set which are given only the attributes[2]. Before considering the data directly from the linearly scaling each

x e Rn

attribute to the range [-1,+1] or [0,1]. Given a training set of instance-label pairs (x;,y; );i= 1 where ' and

y e {1,-1>l

the support vector machines (SVM)require the solution of the following optimization problem :

Subject to ^ ..................(6)

~ --^^^ -1- C ^^ '

yt(wT((xt) - b) > 1 -£.> 0

Here training vectors x; are mapped into a higher dimensional space by the function p. Then SVM finds a linear separating hyper plane with the maximal margin in this higher dimensional space. C>0 is the penalty parameter

K ( xi ,x j )=p ( xi ) p ( x j )

of the error term. Futhermore, is called the kernel function.Support Vector Machine

(SVM) models are a closely related to classical multilayer perceptron neural networks[6].

5. Proposed Classifier Fusion Methodology

When solving a classification problem, choose the best classifier to bring good accuracy[5]. However, the determination of the best classifier is a time-consuming process. This is because a classification algorithm may form different decision functions based on different initialization, different parameter settings, different training sets, or different feature selections. For instance, different initialization may result in different neural network classifiers. Different parameter choices can also result in different classifiers, such as kernel functions and regularization parameter in the SVM algorithm, and the number of neighbors in the k-NN algorithm. Even if the best classifier is identified, it might not necessarily be an "ideal" choice. A classification algorithm is designed internally based on some classifier performance measure criteria, e.g., training accuracy or complexity of the classifier, and the "best" classifier is selected according to the criteria. Maybe more than one classifier has same training accuracy or meets the criteria. However, the learning algorithm simply selects one classifier and discards others. The discarded classifiers can also correctly classify some data examples which are misclassified by the selected best classifiers. Potentially valuable information might be lost by discarding the classification results from less-successful classifiers. Classifier combination methods have proved to be an effective tool to increase the performance of pattern recognition applications[13]. There are different categories of classifier combinations which attempts to put forward more specific directions for future theoretical research. There are well-known techniques for classifier combination, so called ensemble methods, such as bagging, boosting, and dagging. These methods try to make individual classifiers different by training them with different training sets or weighting data points differently. Another powerful and general method, called stacked generalisation can be used to combine lower-level models[3]. Stacking methods for classifier combination use another classifier which has as inputs both the original inputs and the output of the individual classifiers. Therefore, classifier combination deals with a different problem from those which are usually handled using ensemble and stacking methods. While using multiple classifiers, a method that combines the results of the various classifiers is needed[3]. Several techniques exists, namely, majority voting, maximum, sum, min, average, product, Bayes, decision template and behavior knowledge space[14]. This research work uses Ensemble approach is handled for the classification purpose based on majority voting rule and bayes method.

5.1 Majority Voting

There are a number of approaches to combination of such uncertain information units in order to obtain the best final decision. The binary characteristic function is defined as follows in eqn (7):

' 1 if dj = ci

Bj (C, ) =

•(7)

0 if dj ^ 0;

Then the general voting routine can be defined in eqn (8):

E(d) = c if « « ......................(8)

w i V X Bj (C ) BJ (c,) ^«m + k (d)

1 = (1,..m} j = 1 j = 1

r otherwise

There are three versions of majority voting, where the ensemble choose the class (i) on which all classifiers agree (unanimous voting); (ii) predicted by at least one more than half the number of classifiers (simple majority); or (iii) that receives the highest number of votes, whether or not the sum of those votes exceeds 50% (plurality voting or just majority voting). The ensemble decision for the plurality voting can be defined as choose class raj if

\d, j =

Majority voting is an optimal combination rule under the minor assumptions of: (1) we have an odd number of classifiers for a two class problem; (2) the probability of each classifier choosing the correct class is p for any instance x; and (3) the classifier outputs are independent. Given a set of classifier H = (hi........, hT } for a binary

classification problem such that each individual classifier assigns a data example xt e Rn into a class label w1 or w2

ht : Rn ^n

where Q = ( w1 , w2 }

5.2 Simple Bayes

Two basic Bayesian fusion methods are introduced. The first one named Bayes Average is a simple average of posterior probabilities. The second method uses Bayesian methodology to provide a belief measure associated with each classifier output and eventually integrates all single beliefs resulting in a combined final belief.If the outputs of the multiple classifier system are given as posterior probabilities that an input sample x comes from a particular class

C1,i \,...m . p(x e Ci /x), it is possible to calculate an average posterior probability taken from all classifiers shown in eqn (10):

p (x e C1/ x) = — X Pk (x e Ct / x)

K k=1 ............................................................(10)

where i = 1,..,m. Such a Bayes decision, based on the newly estimated posterior probabilities is called an average Bayes classifier. For other classifiers there are a number of methods to estimate posterior probability. As an example for the k - NN classifier the transformation is given in the following eqn (11):

Pt (x e CJ x) = A..............................................................................(11)

where k denotes the number of prototype samples from class Q out of all km nearest prototype samples.

6. Experimental results

A dataset S is first divided into n subsets. One subset is used as a group of testing data and all the other subsets used as training data. So, there are n groups of testing data and training data. Each data example is either in testing dataset or in training dataset in one group. The performance of the model is estimated by the average of n accuracies from n different testing data.

Table 1 : Performance evaluation of Classifiers using majority vote

S.No Performance Measures SVM KNN NN KNN+SVM NN+SVM NN+KNN

1. Classified rate 1 1 1 1 1 1

2. Sensitivity 1 1 1 1 1 1

3. Specificity 0 0 0 0 0 0

4. Error rate 0.9286 0.7143 0.7143 0.9286 0.7143 0.7857

5. Inconclusive rate 0 0 0 0 0 0

6. Positive predictive value 0.0714 0.2857 0.2857 0.0714 0.2857 0.2143

7. Negative Predictive Value NaN NaN NaN NaN NaN NaN

8. Negative likelihood NaN NaN NaN NaN NaN NaN

9. Positive Likelihood 1 1 1 1 1 1

10. Prevalence 0.0714 0.2857 0.2857 0.0714 0.2857 0.2143

Table 2 : Performance evaluation of Classifiers using Bayes rule

S.No Performance Measures SVM KNN NN SVM+KNN NN+SVM NN+KNN

1. Classified rate 1 1 1 1 1 1

2. Sensitivity 0 1 1 0 0.75 1

3. Specificity 1 0.8333 0.8462 0.8 0.9 0.7273

4. Error rate 0.8571 0.5000 0 0.7143 0.5 0.4286

5. Inconclusive rate 0 0 0.3333 0 0 0

6. Positive predictive value NaN 0.5 1 0 0.75 0.5

7. Negative Predictive Value 0.7857 1 0.6667 0.6667 0.9 1

8. Negative likelihood 1 0 0 1.2500 0.27 0

9. Positive Likelihood NAN 6 6.5 0 7.5 3.6667

10. Prevalence 0.2143 0.1429 0.1429 0.2857 0.2857 0.2143

Figure 4 : Average accuracy of the Classifiers

Figure 5 : Average Error rate and time taken for classification

The proposed fusion works in a three-steps namely (i)Train the classifiers with the training feature vector (ii) Use the selected classifiers to classify the test features vector to an output label (iii) Perform aggregation to combine the results and make the final decision. Five features of the images were extracted to create the feature vector that is used as input during classification. During training the last column of the feature vector contains the target label of each image. The training and testing dataset was partitioned using the hold off method, which divides the dataset into 60% and 40%. The two aggregation methods are used to combine the results of the various classifiers namely bayes and majority voting rule. A performance analysis is done for the proposed fusion classifiers in terms of accuracy, error rate of the classification, time and compares them against their single classifier counterparts. The Classifier Combination KNN+SVM classifier gives better result when the aggregation method is bayes. To keep track of the performance during the validation of classifiers the mentioned values in Table 1 and Table 2 are the measures taken into consideration. Conducted experiments with Indian sign language datasets show classifiers performance and error rate is low in NN+SVM. The results based on accuracy and error rate and time again proves that the KNN + SVM combination produces best classification accuracy with the combination rule of bayes method. Figure 4 ,5 depicts the same.

7. Conclusion

The emergence of information and communication technologies has drastically changed sign language domain. Experimental data and results today are easy to share and repurpose by enabling connection to databases containing such information. As a consequence, the Indian sign language dataset very diverse and requires high-throughput. The

advantage of fusion approaches makes them particularly suitable for solving these databases. When there are many competing approaches to classification problem, an effort to determine the best is inevitable. The best algorithm depends on the structure of the available data and prior knowledge. The best combination method, just as for the best ensemble method, depends much on the particular problem. . If the accuracies of the classifiers can be reliably estimated, then the majority approaches may be considered. However, there is a growing consensus on using the bayes method due to its consistent performance over a broad spectrum of applications. Hence the classifier outputs correctly estimate the posterior probabilities, by considering bayes combination method.

References :

[1] Joshi, A.J, Porikli, F.,Papanikolopoulos, N (2009), "Multi-Class Active Learning for Image Classification", IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, PP. 2372-2379.

[2] Kim, H. C., Pang, S., Je, H. M., Kim, D., and Bang, S. Y. (2003), "Constructing support vector machine ensemble", Pattern Recognition, Vol.36, No.12, Pp. 2757-67.

[3] Kuncheva, L.I. (2004), " Combining pattern classifiers: methods and algorithms", John Wiley & Sons, Hoboken, New Jersey.

[4] Kuncheva, L.I. and Rodriguez, J.J. (2007), " Classifier ensembles with a random linear oracle", IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.4, Pp. 500-508.

[5] Kuncheva, L.I. and Whitaker, C.J. (2003), " Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning", Vol. 51, No.2, Pp. 181-207.

[6] Nibaran Das 1, Brindaban Das ( 2010), "Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier", Journal of computing, Volume 2,Issue 2,February 2010,ISSN 2151-9617

[7] Jiong June Phu and Yong Haur Tay , "Computer Vision Based Hand Gesture Recognition Using Artificial Neural Network", Faculty of Information and Communication Technology,Universiti Tunku Abdul Rahman (UTAR), MALAYSIA.tensaix2j@yahoo.com, tayyh@mail.utar.edu. my

[8] Noor Saliza Mohd Salleh, Jamilin Jais, Lucyantie Mazalan, Roslan Ismail, Salman Yussof, Azhana Ahmad, Adzly Anuar, Dzulkifli Mohamad (2006), "Sign Language to Voice Recognition: Hand Detection Techniques for Vision-Based Approach", Current Developments in Technology- Assisted Education

[9] Kov'a"r, J. P"rikryl, and M. Vl"cek (2003) "Still Image Objective Segmentation Evaluation using Ground Truth" 5th COST 276 Workshop pp. 9-14 B

[10] Mahmoud Elmezain, Ayoub Al-Hamadi, J'org Appenrodt, and Bernd Michaelis (2009), "A Hidden Markov Model-Based Isolated and Meaningful Hand Gesture Recognition ", International Journal of Computer Systems Science and Engineering 5:2

[11] . Reza Hassanpour, Stephan Wong1 Asadollah Shahbahrami (2008), "VisionBased Hand Gesture Recognition for Human Computer Interaction: A Review", IADIS International Conference Interfaces and Human Computer Interaction

[12] Tirthankar Dasgupta , Sambit Shukla, Sandeep Kumar (2008), "A Multilingual Multimedia Indian Sign language Dictionary Tool," The 6th Workshop on Asian Language Resources.

[13] Dymitr Ruta and Bogdan Gabrys (2000), "An Overview of Classifier Fusion Methods. Computing and Information Systems" pp.1-10 Xu, L., Krzyzak, A., and Suen, C.Y. (1992), "Methods of combining multiple classifiers and their applications to handwriting recognition", IEEE Trans. Systems, Man, and Cybernetics, 22(3), 418-435.