Scholarly article on topic 'Enhancing service discovery using cat swarm optimisation based web service clustering'

Enhancing service discovery using cat swarm optimisation based web service clustering Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Perspectives in Science
OECD Field of science
Keywords
{"Web service discovery" / WSDL / CSO / Clustering / "Swarm intelligence"}

Abstract of research paper on Computer and information sciences, author of scientific article — Sunaina Kotekar, Sowmya S. Kamath

Summary Web service discovery is a critical task in service oriented application development. Due to extensive proliferation in the number of available services, it is challenging to obtain all the relevant services available for a given task. For the retrieval of most relevant Web services, a user would have to use those service-specific terms that best describe and match the natural language documentation contained within a service description. This process can be time intensive, due to functional diversity of available services in a repository. Domain specific clustering of Web Services based on the similarities of their functionalities would greatly boost the ability of a Web service search engine to retrieve the most relevant service. In this paper, we propose a novel technique to cluster service documents into functionally similar service groups using the Cat Swarm Optimisation Algorithm. We present experimental results that show that the proposed technique was effective and enhanced the process of service discovery.

Academic research paper on topic "Enhancing service discovery using cat swarm optimisation based web service clustering"

I rffFWM I IIIII.E IN PRESS

Perspectives in Science (2016) xxx, xxx—xxx

Available online at www.sciencedirect.com

ScienceDirect

journal homepage www.elsevier.com/pisc

U Perspectives in Science

Enhancing service discovery using cat swarm optimisation based web service clustering^

Sunaina Kotekar, Sowmya S. Kamath *

Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India

Received 20 February 2016; accepted 9 June 2016 Available online xxx

KEYWORDS

Web service discovery; WSDL; CSO;

Clustering; Swarm intelligence

Summary Web service discovery is a critical task in service oriented application development. Due to extensive proliferation in the number of available services, it is challenging to obtain all the relevant services available for a given task. For the retrieval of most relevant Web services, a user would have to use those service-specific terms that best describe and match the natural language documentation contained within a service description. This process can be time intensive, due to functional diversity of available services in a repository. Domain specific clustering of Web Services based on the similarities of their functionalities would greatly boost the ability of a Web service search engine to retrieve the most relevant service. In this paper, we propose a novel technique to cluster service documents into functionally similar service groups using the Cat Swarm Optimisation Algorithm. We present experimental results that show that the proposed technique was effective and enhanced the process of service discovery. ©2016 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Introduction

Web services (WS) are client and server applications that communicate over standard Web protocols like HTTP and HTTPS. The main components of web service architecture

* This article belongs to the special issue on Engineering and Material Sciences.

* Corresponding author. Tel.: +91 9741799088.

E-mail addresses: sunainakotekar@gmail.com (S. Kotekar), sowmyakamath@nitk.ac.in (S.S. Kamath).

are provider, consumer and a service broker like the UDDI (Universal Description and Discovery Integration). In UDDI, the service descriptions in WSDL (Web Service Description Language) format, which describe functionality of particular WS are available. The task of searching for relevant WS for a given requirement is normally based on the service name and natural language description. But as per many studies (Elgazzar et al., 2010), most of the services may not have well-described natural language documentation. To overcome this limitation, text mining techniques can be applied on WSDL to identify useful components, which describe actual functionality of the corresponding WS. Using

http://dx.doi.org/10.1016Zj.pisc.2016.06.068

2213-0209/© 2016 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

I rPFWM I IIIII.E IN PRESS

S. Kotekar, S.S. Kamath

this functionality related information, WSDLs can be clustered so as to achieve search space reduction during the process of service discovery and selection (Zhuetal., 2012).

In document clustering, a chosen algorithm plays major role. Traditional algorithms are K-means, Hierarchical Agglomerative, Suffix Tree, etc. Ant Colony Optimisation (ACO), Particle Swarm Optimisation (PSO), Genetic Algorithms (GA) etc., are popular swarm intelligence based clustering algorithms (Abraham et al., 2008). In this paper, we adapted the a new approach called the Cat Swarm Optimisation (CSO) (Chu et al., 2006) algorithm for WS clustering, which is based on a cat's social and foraging behaviour in nature. We applied the adapted CSO algorithm to a set of WSs to determine similar groups. The clustering accuracy of the CSO algorithm is compared with that of a traditional K-means basic clustering algorithm and the results are presented in this paper.

Proposed system

The proposed methodology encompasses the problem of extracting the functional information of services, and using this to automatically categorise a set of WSs in a domain specific manner. At first, a WSDL document is pre-processed using basic NLP techniques like stop-word removal, stemming to get natural language terms referred to as 'Attributes'. Using a NLP technique TF-IDF (Term Frequency-Inverse Document Frequency) (Eqs. (1)—(3)), the nearness and dissimilarity between the documents are calculated using Euclidean distance (Eq. (4)).

tf - idfj = tfij X idfj

number of times attribute j in document,-j Total number of words in document

idfj = log N

d e D : j e D

(1) (2)

The k-means algorithm and also the CSO algorithm are applied to the dataset and the documents are clustered using the computed similarity values. Finally, after a predefined condition is reached, the formed clusters are observed. Standard datasets chosen are Iris, Glass, Balance scale, Soybean small, Wine. Along with these datasets WSDL documents are taken from OWLS-TC4 and processed to create TF-IDF matrix along with their domain as classes.

K-means clustering

In k-means algorithm, the number of clusters to be formed is given by k. In the beginning, randomly k documents are chosen as cluster centres. All documents are assigned to the nearest centre by calculating Euclidean distance between centre and document as per Eq. (4). The mean of all the documents in each cluster are found and the one with the least value is made the new cluster centre. Now, the documents are reassigned as per newly calculated value of Euclidean distance, and the process is continued till there

Figure 1 Flowchart of CSO algorithm.

are no more reassignments possible, i.e., stable clustering has been reached.

d(x,y ) = |x -y|2 =

Cat swarm clustering

The CSO (Santosa and Ningrum, 2009) algorithm consists of two sub-procedures based on live cat behaviour in nature while hunting pray, termed as the ''seeking mode'' and ''tracing mode''. In CSO, number of cats required within each iteration is initialised; each cat has a position of D dimensions, velocities for every dimension, a fitness worth, that shows the accommodation of the cat to the fitness operate, and a flag which identifies mode of cat(seeking/tracing). Ultimate resolution is the most effective position of one of the cats. CSO has to be applied till the best clustering is obtained, i.e. one with the least computed SSE (sum of squared errors) value. Fig. 1 presents the process of CSO, submodules of CSO are explained later.

Seeking mode: Four fundamental aspect of Seeking mode are: seeking memory pool (SMP) number of cluster centre copy, self-position consideration (SPC) boolean random value 0 or 1, seeking range of the selected dimension (SRD) is mutative ratio in between [0.1], counts of dimension to change (CDC).

Define seeking mode specifications (SMP, SPC, and SRD). For all cluster centre: SMP times replicate cluster centre position, Find j = SMP—SPC value, Determine shifting value (SRD*cluster centre).

m = 1, While (m less than j), do add or subtract shifting value to centres randomly. ((SMP x k) cluster centre candidates are produced.)

Determine distance, assign data to clusters, then find SSE.

Use roulette wheel selection method to choose a new cluster centre candidate.

IicihPPW^M riiile in press

Enhancing service discovery using cat swarm optimisation based web service clustering 3

Table 1 Purity of cluster formation for different datasets for k-means and CSO algorithms.

Dataset name No of documents Attributes Classes K-means purity (%) CSO purity (%)

Iris 150 4 3 67 90

Glass 214 9 6 54 58

Balance scale 625 4 3 61 78

Soybean small 47 35 4 79 83

Wine 178 13 3 70 72

WSDL documents 684 644 9 41 45

Figure 2 Purity comparison graph of different dataset k-means vs CSO.

sse = (||x - m'ii2)

i=1 x £ Dj

Tracing mode: Is the sub-model depicting cats while tracing pray.

1. For all cluster centres Update velocity (6), Update position (7), find new cluster centre.

2. Determine distance, assign data to clusters, then find SSE.

of WSDL documents, domain for which WS belongs is taken into account for calculating the accuracy. Domains were: ''communication'', ''economy'', ''education'', ''food'', ''geography'', ''medical'', ''simulation'', ''travel'' and ''weapon'' (Fig. 2 and Table 1)

Purity =

J^max^ (Document belong to each class)

Total number of documents Conclusion and future work

In this paper, an approach for categorising Web services to deal with their functional diversity was discussed. For clustering the services, both K-means and the bio-inspired CSO clustering algorithms were applied and their performance was compared for both standard datasets and WSDL dataset. Based on the results, it is evident that CSO performed better than K-means, as K-means stops when documents are stable in cluster, but in case of CSO tracing mode, the centres are changed randomly to check for better clusters. We intend to extend the proposed clustering methodology for optimising real time Web service search engines, for enhancing the time and precision related performance during Web service discovery.

References

Vk,d = Vk,d + r x C1 (Xbe5t,d - Xk,d) (6)

Xk,d = Xk,d + Vk,d (7)

where xbest>d implies cat position with best fitness value, xk>d implies catk position, c1 denotes constant and r1 refers to randomly generated value in between 0 and 1.

Experimental analysis and results

The experiment was conducted to cluster the standard datasets chosen as well as the WSDL documents. The clustering purity or accuracy was calculated using Eq. (8), where k denotes #clusters and j is # classes. In the case

Abraham, A., Das, S., Roy, S., 2008. Swarm Intelligence Algorithms for Data Clustering. Soft Computing for Knowledge Discovery and Data Mining, pp. 279—313.

Chu, S.C., Tsai, P.W., Pan, J.S., 2006. Cat Swarm Optimization, LNAI 4099, 3 (1). Springer-Verlag, Berlin/Heidelberg, pp. 854—858.

Elgazzar, K., Hassan, A.E., Martin, P., 2010 July. Clustering WSDL documents to bootstrap the discovery of web services. The 8th IEEE International Conference on Web Services (ICWS'10), Miami, FL, pp. 147—154.

Santosa, B., Ningrum, M.K.,2009. Cat swarm optimization for clustering. In: International Conference of Soft Computing and Pattern Recognition, 2009 (S0CPAR'09). IEEE.

Zhu, J., Kang, Y., Zheng, Z., Lyu, M.R., 2012. A Clustering-Based QoS Prediction Approach for Web Service Recommendation. IEEE Paper.