Scholarly article on topic 'Web Service Framework Research of Data Mining in E-business'

Web Service Framework Research of Data Mining in E-business Academic research paper on "Economics and business"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Engineering
OECD Field of science
Keywords
{"Web service" / ID3 / apriori / "data mining" / framework}

Abstract of research paper on Economics and business, author of scientific article — Hong Liu, Jin Hua Xu

Abstract Developing e-business data mining applications more efficiently and reliably is a critical challenge for many enterprises since existing approaches are characteristically complex and costly. Web services have been gaining popularity with simpler service-oriented architectures and the potential for lower development costs. In this paper, we propose a service composition framework to support a Web services-based approach for developing e-business data mining applications. Through a case study, we show the proposed framework is feasible.

Academic research paper on topic "Web Service Framework Research of Data Mining in E-business"

Available online at www.sciencedirect.com

SciVerse ScienceDirect

Procedía Engineering 15 (2011) 1968 - 1972

Procedía Engineering

www.elsevier.com/Iocate/procedia

Advanced in Control Engineeringand Information Science

Web Service Framework Research of Data Mining in Ebusiness

HongLiua*,JinHuaXub

aCollege of Computer and Information Engineering, Zhejiang Gongshang University, HangZhou,ZheJiang, China bCollege of Computer and Information Engineering, Zhejiang Gongshang University, HangZhou,ZheJiang, China

Abstract

Developing e-business data mining applications more efficiently and reliably is a critical challenge for many enterprises since existing approaches are characteristically complex and costly. Web services have been gaining popularity with simpler service-oriented architectures and the potential for lower development costs. In this paper, we propose a service composition framework to support a Web services-based approach for developing e-business data mining applications. Through a case study, we show the proposed framework is feasible.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [CEIS 2011]

web service; ID3; apriori; data mining; framework

1. Introduction

In a business environment this translates into automatic cooperation between enterprises. Any enterprise requiring a business interaction with another enterprise can automatically discover and select the appropriate optimal web services relying on selection policies. They can be invoked automatically and payment processes can be initiated. Any necessary mediation is applied based on data and process ontologies and the automatic translation of their concepts into each other. An example would be supply chain relationships where an enterprise manufacturing short-lived goods has to frequently seek suppliers as well as buyers dynamically. Instead of employees constantly searching for suppliers and buyers, the

* * *Corresponding author. HongLiu E-mail address: LLH@mail.zjgsu.edu.cn

1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2011.08.367

web service infrastructure does it automatically within the defined constraints. Web services are self-contained and modular applications that can be described, published, located, and invoked over the Web. They use open standards and common infrastructure for their description, discovery and invocation. Web services perform encapsulated business functions ranging from simple request-reply to full business process interactions.

The main objective of this paper is to propose a service composition framework that adopts a Web services-based approach to integrate e-business with data mining technology. The remainder of this paper is organized as follows: in section 2, we introduce the architectural models of Web services. Where we explore key issues of using Web services to integrate e-business with data mining technology, and propose a framework that permits the development of new intelligent applications through the structured composition of Web services. Next in Section 3, we introduce a case study to illustrate how one might apply the framework to develop real-world e-business applications. In Section 4, we conclude with comments on future research directions.

2. Web service framework of data mining

In this section, we propose a framework to develop Web services-based business integration solutions. In this framework, we take a "services perspective'' as well as a ''system perspective'' of an e-business integration solution, which integrates data mining web service into enterprise business system, whose structure is shown in Fig. 1.

Apriori association rule ID3 Other service

Service level

Business system ERP system Other system

Sy_ste_mJeYeL

Database data

Xml data

Data of Other structure

Database level

Fig.1. Web service framework

2.1. ID3 algorithm

Files should be in MS Word format only and should be formatted for direct printing. Figures and tables An ID3 algorithm works as follows. Suppose T = u NE , where PE is the set of positive examples, and NE is the set of negative examples, p=|PE| and n=|NE|. An example e will be determined to belong to PE with probability p/(p+n) and Ne with probability n/(p+n). By employing the information theoretic heuristic, a decision tree is considered as a source of message, PE or NE, with the expected information needed to generate this message, given by

Ip p n n

—-log3-—-log3-where p ^ 0 /nd n ^ 0

p + n p + n p + n p + n

otherwise d

If attribute X with value domain {vi,v2,.. .,vN} is used for the root of the decision tree, it will partition

T into {T1,T2,.....,TN} where Ti contains those examples in T that have value vi of X. Let Ti contain pi

examples of PE and n of NE. The expected information required for the sub-tree for Ti is I(pi,nj). The expected information required for the tree with X as the root, EI(X), is then obtained as a weighted average:

N^p + n

EI (X) = --I (p, n) (2)

i=i p + n

Where the weight for the ith branches is the proportion of the examples in T that belong to Ti. The information gained by branching on X, G(X), is therefore

G( X) = I (p, n) — EI (X) (3)

ID3 examines all candidate attributes, chooses X to maximize G(X), constructs the tree, and then uses the same process recursively to construct decision trees for residual subsets T1,.,TN. For each Ti(i=1,...,N): if all the examples in Ti are positive; it creates a "yes" node and halts; if all the examples in Ti are negative, it creates a "NO" node and halts; otherwise it selects another attribute in the same way as given earlier.

2.2. Apriori association rule algorithm

Agrawal and Srikant proposed the Apriori association rule algorithm [8] to discover meaningful itemsets and construct association rules for market analysis. The following is a formal statement of the problem. Let I={i1, i2, i3,.,iItemNo} be a set of ItemNo distinct literals, called items. In general, a set of items is called an itemset. The number of items in an itemset is the length of an itemset. Itemset of some length k is referred to as a k-itemset. Let D be a set of variable length transactions, where each transaction T is a set of items such that T - I. Associated with each transaction is a unique identifier, which shall be

referred to as its TID. |D| is the number of records in database D. A transaction T is said to support an

X - I X - I

itemset - if it contains all items of X, i.e. - . The fraction of the transactions in D that support X

is called the support of X, denoted support(X).An itemset is large if its support is above some user-

specified minimum support threshold, denoted MinSup. An association rule is an implication of the form

R: X ^ Y , where X c I'Y c I ,and X n Y = 0 . The support for rule R is defined as support(X u Y). A

confidence factor defined as a case study support(X u Y )/support(X), is used to evaluate the strength of

such association rules.

3. A case study

To illustrate such web service framework can contribute to constructing e-business applications. We give a case study: adopt an example database of SqlServer 2005 about books trade information to test the two data mining service in our framework.

3.1. ID3 web service model

In the case study, apply ID3 web service model to analyze consumers' satisfaction degree, separate influence factor about satisfaction degree into six items as consumers' age, purchase fund, book discount, carriage, payment mode and education degree, which provide reference for bookshop's decision-making, in order to upgrade consumers' satisfaction degree and consumers' purchase desire, and form consumers' purchase behavior. The influence factor table is shown in Table. 1.

Table 1. Influence factor

influence factor

consumers age purchase fund Carriage and book discount payment mode education degree

0-16, 17-25, 26-34, 35-43, more than 44 Very abundant, abundant, commonly,little

1-2days high, 3-4 days middle, 5-7 days low

Cash on delivery, payment treasure, internet banking, credit card, rechargeable card Under high school, high school, undergraduate course, undergraduate course upwards

3.2. Apriori web service model

In the case study, application process of Apriori web service model is shown as follows:

(1) Push data in the testing database, through scanning the data, the Apriori model gets the whole book information in an order form.

(2) Computer purchasing frequency of every book, remove those book information whose frequency is less than least support degree.

(3) Combination of two data in the residual data, get purchasing frequency together of two books, and then remove those book information whose frequency is less than least support degree again.

(4) Get the result of (2) and (3), combination them into group of every three books, and continue scanning data, and then remove book groups whose frequency is less than least support degree.

(5) Combine the result of (3) and (4) based on two monotonicity principle of Apriori algorithm.

The result of step (2), (3) and (4) is shown with the mode of histogram as follows:

sales volume CO _| 4 4 Mm 1 44,33 44,11 4 33,22 BookID

Fig.2. Sales volume top 3 Fig.3. Association degree of two books

Fig.2 shows sales volume information of tops 3 books, for example books of id 33 all sales six times. And then association degree of two books is shown in Fig.3. From the figure above, we can see that books of id 33 and 11 are bought together for 5 times. And association degree of three books is shown in Fig.4. In the Fig.4, the books of id 44, 11 and 33 are sold together for 3 times, whose association degree

is largest in association degree of books of id 11, 22,33,44,55. In the model, we can analyze specified id book, association degree of id 11 is shown in Fig.5.

4. Conclusions

To summarize, this paper proposes a framework to address the key questions of how to take advantage of Web services' flexible and standards-based capabilities to improve the efficiency and reliability of business integration solution development. Through a case study, it shows that such the framework is feasible.

44,11,33 55,11,33 55,33,44 33,11,22 44,11,22

BookID

11,33 11,44

Fig.4. Association degree of three books

Fig.5. Association degree of id 11

Acknowledgements

This paper is supported by the Natural Science Foundation project of ZheJiang provincial (No. Y1110995). And this paper is also supported the National Natural Science Foundation of China under Grant No. 60903053.

References

[1] S. Graham, S. Simeonov et al., Building Web services with Java, SAMS, 2001.

[2] J. Hagel III, J.S. Brown, Your next IT strategy, Harvard Business Review October (2001).

[3] UDDI.ORG: UDDI version 2.0 API specification, UDDI open draft specification, http: / /www.uddi.org/ specifi-cation.html, June 2001.

[4] W3C: Web services Description Language (WSDL) version 1.1, W3C standard, http: / /www.w3.org/TR/wsdl, March 2001.

[5] W3C: SOAP Version 1.2 Part 0-2, W3C working draft, http: / /www.w3.org/TR/ soap12-part0, December 2001.

[6] Xinyu Shao, Guojun Zhang, Peigen Li, Yubao Chen, Application of ID3 algorithm in knowledge acquisition for tolerance design, Journal of Materials processing Technology 117(2001), pp. 66-74.

[7] Yuh-Jiuan Tsay, Jiunn-Yann Chiang, CBAR: an efficient method for mining association rules, Knowledge-Based Systems 18(2005), pp.99-105.

[8] R. Agrawal, R. Srikant, Fast algorithm for mining association rules in large databases, Proceedings of 1994 International Conference on VLDB, 1994 pp. 487-499.