Scholarly article on topic 'Towards Integrated Study of Data Management and Data Mining'

Towards Integrated Study of Data Management and Data Mining Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{"Big Data" / "granular computing" / granularity / databases / "rough set theory" / "database keyword search (DBKWS)"}

Abstract of research paper on Computer and information sciences, author of scientific article — Zhengxin Chen

Abstract From very beginning, research and practice of database management systems (DBMSs) have been cantered on handling granulation and granularities at various levels, thus sharing common interests with granular computing (GrC). Although DBMS and GrC have different focuses, the advent of Big Data has brought these two research areas closer to each other, because Big Data requires integrated study of data storage and analysis. In this paper, we explore this issue. Starting with an examination of granularities from a database perspective, we discuss new challenges of Big Data. We then turn to data management issues related to GrC. As an example of possible cross-fertilization of these two fields, we examine the recent development of database keyword search (DBKWS). Even research in DBKWS is largely independent to GrC, DBKWS has to handle various issues related to granularity handling. In particular, aggregation of DBKWS results is closely related to studies in granularities and granulation, which echoes L. Zadeh's famous formula: Granulation = Summarization. We present our proposed approach, termed as extended keyword search, which illustrates that an integrated study of data management and data mining/analysis is not restricted to GrC or rough set theory

Academic research paper on topic "Towards Integrated Study of Data Management and Data Mining"

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Computer Science 55 (2015) 1331 - 1339

Information Technology and Quantitative Management (ITQM 2015)

Towards Integrated Study of Data Management and Data Mining

Zhengxin Chen

Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182-0500, USA

e-mail: zchen@unomaha.edu

Abstract

From very beginning, research and practice of database management systems (DBMSs) have been cantered on handling granulation and granularities at various levels, thus sharing common interests with granular computing (GrC). Although DBMS and GrC have different focuses, the advent of Big Data has brought these two research areas closer to each other, because Big Data requires integrated study of data storage and analysis. In this paper, we explore this issue. Starting with an examination of granularities from a database perspective, we discuss new challenges of Big Data. We then turn to data management issues related to GrC. As an example of possible cross-fertilization of these two fields, we examine the recent development of database keyword search (DBKWS). Even research in DBKWS is largely independent to GrC, DBKWS has to handle various issues related to granularity handling. In particular, aggregation of DBKWS results is closely related to studies in granularities and granulation, which echoes L. Zadeh's famous formula: Granulation = Summarization. We present our proposed approach, termed as extended keyword search, which illustrates that an integrated study of data management and data mining/analysis is not restricted to GrC or rough set theory

© 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-reviewunderresponsibilityofthe OrganizingCommitteeof ITQM 2015

Keywords: Big Data; granular computing; granularity; databases; rough set theory; database keyword search (DBKWS)

1. Introduction

In a broad sense, granular computing (GrC) [10] is the general term referring any computing theory/technology involves elements and granules, with granule, granulated view, granularity, and hierarchy as its key concepts. In GrC, the process of forming granules is referred to as granulation. Since granules have structures, the term granularity is used to refer the collective properties of granules at a certain level of granular structure. By focusing on granulations and granularities in an abstract manner, GrC shares some common interest with database management systems (DBMSs), because DBMS is about storage and retrieval of structured data (at front end) and maintenance of such data (at backend) at various levels of granularities. Yet

1877-0509 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of ITQM 2015

doi:10.1016/j.procs.2015.07.117

GrC and DBMS are two very different camps, largely because GrC is about inference of data, which is different from the main interest of management of data (as DBMSs do). However, with the advent of Big Data, the separation between these two fields are rapidly becoming questionable. Of course, each field will thrive in its own right at the Big Data era, but the relationship between these two fields deserves much attention. In this paper, we examine some aspects of this relationship. In particular, we aim to examine GrC from a DBMS perspective. Continuing this author's previous examination on the philosophical foundation of GrC based on its past, this paper is intended to address two important aspects for the future of GrC: We discuss the importance of data management for GrC by examining how to deal with granularities and storage issues from a database perspective, calling for an integrated study of storage and mining to handle challenges of Big Data. In addition, we point out the advantage of studying the dynamic nature of granulation and granularity in Big Data management, and use recent studies related to aggregate database keyword search (aggregate DBKWS) to illustrate how Zadeh's famous formula: Granulation = summarization can be realized.

The balance of this paper is organized as follows. We start with a discussion how granularities are handled in traditional DBMS, followed by an examination of opportunities and challenges of Big Data, with a focus on integrated study of storage and mining of data (Section 2). We further discuss GrC at Big Data era (Section 3), including our observations on important and interesting developments from Infobright. In Section 4 we examine the case of aggregate DBKWS, and discuss its indications. We wrap up our discussion in Section 5, where we summarize our findings and offer several suggestions.

2. Granulation handling and data management

2.1. Granulations in traditional database management

In order to guarantee consistency, DBMSs impose additional requirements beyond granularities, such as various forms of integrity constraints. DBMS offers a unique opportunity for studying granulation and granularity handling not available elsewhere, because it reveals various kinds of granularities we have to deal with throughout all functionalities of DBMS. This includes:

Structural granularity for data modeling: This is the most visible form of granularity. For example, we can talk about tuples or relations in relational databases. As a kind of metadata, database schema imposes properties must be satisfied at a certain granularity level. Normal forms impose additional constraints among attributes.

Operational granularity in database processing: Granularities are not restricted to data storage, but also related to how data and queries are processed. Decades of research and practice in query processing and transaction processing are all around various forms of granularities in operations. For example, in transaction processing, granularities involved are not only transactions as a whole, but also other forms of granularities such as individual instructions within each transaction. These instructions are granularities used by transaction manager to form serializable schedule. Granularities handled by locking manager are various kinds of data items, as illustrated in various concurrency control protocols.

Constructed or temporary granularity: This may be the richest form of granularities in database management. First of all, relational algebra (RA) operations (or equivalent SQL query) on relational tables are closed operations - i.e., results are still relations, but they are constructed temporal tables, where both the schema and tuples in the instance are constructed on the fly according to the specifications of queries. The resulting tuples are constructed (as well as temporary) granularities. Various data structures employed in database internal processing such as B+ tree used for indexing of physical data are also constructed granularities, but they are not temporary.

In addition, we have to consider interactions of different kinds of granularities; for example, relational tables and physical blocks. This is concerned with file structure and conceptual to physical level mapping.

Parallel and distributed database systems [15] offer additional opportunities for study of handling

granularities. For example, in interquery parallelism, queries/transactions execute in parallel with one another. Here queries are coarse granularities in query processing. In order to achieve higher level of parallelism, intraquery parallelism can be used, which consists of two subtypes. Interoperation parallelism is used to speed up processing by parallelizing the execution of different RA operations, where granularity level is on RA operations. This is in contrast to intraoperation parallelism, which is used to speed up processing by parallelizing the execution of each individual RA operation. In this case, the operational granularity is the partition of the data allocated on a processor used by that particular RA operator.

In summary, it is fair to say that data management is about granulation handling. In particular, we have to handle granularities at various levels, and some of them could be dynamically generated. In the rest of paper, we will further discuss how to handle granulation process and granularities at Big Data era. We will also discuss granularity construction through the example of database keyword search. The importance of examining granulation process in data management can pave the way for an integrated study of data management (including storage and OLTP query) and analysis (including OLAP queries and various forms of reasoning), which is crucial for understanding today's Big data.

2.2. Managing today's data: The Big Data way

In order to guarantee consistency, DBMSs impose additional requirements beyond granularities, such as various forms of integrity constraints. DBMS offers a unique opportunity for studying granulation and granularity handling not available elsewhere, because it reveals various kinds of granularities we have to deal with throughout all functionalities of DBMS. This includes:

Structural granularity for data modeling: This is the most visible form of granularity. For example, we can talk about tuples or relations in relational databases. As a kind of metadata, database schema imposes properties must be satisfied at a certain granularity level. Normal forms impose additional constraints among attributes.

Operational granularity in database processing: Granularities are not restricted to data storage, but also related to how data and queries are processed. Decades of research and practice in query processing and transaction processing are all around various forms of granularities in operations. For example, in transaction processing, granularities involved are not only transactions as a whole, but also other forms of granularities such as individual instructions within each transaction. These instructions are granularities used by transaction manager to form serializable schedule. Granularities handled by locking manager are various kinds of data items, as illustrated in various concurrency control protocols.

Constructed or temporary granularity: This may be the richest form of granularities in database management. First of all, relational algebra (RA) operations (or equivalent SQL query) on relational tables are closed operations - i.e., results are still relations, but they are constructed temporal tables, where both the schema and tuples in the instance are constructed on the fly according to the specifications of queries. The resulting tuples are constructed (as well as temporary) granularities. Various data structures employed in database internal processing such as B+ tree used for indexing of physical data are also constructed granularities, but they are not temporary.

In addition, we have to consider interactions of different kinds of granularities; for example, relational tables and physical blocks. This is concerned with file structure and conceptual to physical level mapping.

Parallel and distributed database systems [15] offer additional opportunities for study of handling granularities. For example, in interquery parallelism, queries/transactions execute in parallel with one another. Here queries are coarse granularities in query processing. In order to achieve higher level of parallelism, intraquery parallelism can be used, which consists of two subtypes. Interoperation parallelism is used to speed up processing by parallelizing the execution of different RA operations, where granularity level is on RA operations. This is in contrast to intraoperation parallelism, which is used to speed up processing by parallelizing the execution of each individual RA operation. In this case, the operational granularity is the partition of the data allocated on a processor used by that particular RA operator.

In summary, it is fair to say that data management is about granulation handling. In particular, we have to

handle granularities at various levels, and some of them could be dynamically generated. In the rest of paper, we will further discuss how to handle granulation process and granularities at Big Data era. We will also discuss granularity construction through the example of database keyword search. The importance of examining granulation process in data management can pave the way for an integrated study of data management (including storage and OLTP query) and analysis (including OLAP queries and various forms of reasoning), which is crucial for understanding today's Big Data.

2.3. Mining and Analysis of Massive Data

Although many data mining algorithms were outgrown from machine learning and statistical methods, these early methods only considered in-memory data. Reference [6] provided many examples of scaling up various data mining algorithms, including those revised from earlier algorithms developed from machine learning. One popular technique is to make use various sampling techniques. Another useful technique is to keep certain summary statics or other forms of metadata, as BIRCH algorithm [29] does for scalable clustering. Bur even with sampling, it is not realistic to assume all needed data can always reside in main memory at the same time. But BIRCH is only considered as an ancestral study of several more advanced studies in cluster analysis, as shown in [14], which provides excellent overview on storage issues for mining massive data, with a focal concern on how to exploit main memory storage.

The discussion in this section serves as the backdrop for conducting analysis on today's data, particularly for Big Data. Of course all kinds of techniques developed in data mining can be applied, but for the purpose of this paper, below we will only focus on how GrC can be conducted to perform analysis in this Big Data environment.

3. Granular computing at Big Data era

We are now ready to discuss several important aspects on GrC and Big Data. As generally agreed by researchers and practitioners, Big Data is more concerned with analysis (which relies on reasoning) of the data than the retrieval of individual pieces of data. Therefore, managing today's data is no longer restricted to storage and retrieval. On the other hand, for granular computing be useful in today's big data, it has to pay respect to how today's data is actually stored and managed.

3.1. Complexity related to GrC and rough set theory and new challenges from Big Data

Rough set [12] is a formal approximation of a conventional set, using a pair of sets as the lower and the upper approximations of the original set. Rough sets provide a single-layered granulation structure of the universe. Even GrC is not about storage and retrieval, it is unrealistic to assume datasets to be analysed by rough set theory or any other GrC theory can be completely stored in main memory.

Due to the importance of reducts, for decades researchers have made many efforts for developing efficient algorithms (including recent studies such as [7,20,22]). Many methods have been proposed to generate reducts and the most popular technique used in these methods is to make use discernibility matrix, which is a matrix in which the classes are indices. In the matrix, the (condition) attributes which can be used to discern between the classes in the corresponding row and column are inserted. However, although algorithms vary, their complexity is usually in O(cn2) (where c is number of attributes and n is the number of rows in "information systems" (or decision tables) and space complexity is O(n2). Again due to the fact that n is large in real word data sets, the algorithms are time consuming, and may be quite challenging even with appropriate database storage and effective sampling techniques.

3.2. Integrated storage and reasoning: The Infobright approach

Various attempts have been made to improve the computational complexity of GrC and rough set theory. Yet so far the best discussion on data storage management for rough set theory implementation is from Infobright [16, 17] which provided a discussion on the usage of the paradigms of rough sets and granular computing in the core components of the Infobright's database engine. Having data stored in the form of compressed blocks of attribute values, the systems' query execution methods utilize compact information about those blocks' contents instead of brute-force data decompression. Algorithms were developed to minimize the need of accessing the compressed data in such operations as filtering, joining and aggregating.

Incorporating basic considerations of rough set theory, Infobright's approach endorses a new kind of analytic database engine. It implements a form of adaptive query processing and automates the task of physical database design [8]. By leveraging MySQL pluggable storage engine architecture and employing various internal mechanisms based on data compression, columnar data storage, adaptive query processing, both Infobright's open source ICE edition and commercial IEE edition can run analytic ad hoc queries against terabytes of data [18,19]. As noted, rough set based algorithms and similar techniques can be applied to improve database performance by employing the automatically discovered dependencies to better deal with query conditions (thus supporting inductive query optimization). In addition, from available information, it is possible to calculate rough approximations of data needed to resolve queries and to assist the database engine in accessing relevant data. Another interesting development in Infobright is Rough SQL, as exemplified in a recent article [8].

4. Database keyword search: An examination from the perspective of granularity

Integrated study of data management and data mining/analysis is of course not restricted to GrC or rough set theory. As an example of possible cross-fertilization of these two fields, we examine the recent development of database keyword search (DBKWS).

4.1. DBKWS overview

In the last decade, database keyword search (DBKWS) has become an active research area. By allowing users enter keywords for database access, DBKWS not only relieves users' burden of writing SQL queries, but also offers the potential of unifying accessing various forms of data (including unstructured data as in information retrieval, semistructured data as XML and structured data as in DBMS). Typically, a DBKWS system (e.g. DbXplorer [1]) takes user-provided keywords as input, construct SQL queries, executes the queries, and returns results of SQL queries to the user. Although research in DBKWS has been largely independent to GrC, it has to handle various issues related to granularities. For example, two typical approaches used by DBKWS are making use of a schema graph or data graph [26], which represents granularities of two different levels (i.e., table schema level or tuple level). In addition, various intermediate data structures are usually needed, which illustrates dynamically constructed granularities in problem solving. A number of interesting aspects of DBKWS emerge from a perspective of granularities, such as:

• User requested granule construction;

• Different complexities at different granularity levels; e.g., schema graph vs. data graph;

• Aggregation for better understanding of data; and

• Human-centered information processing (users in the loop)

A further examination on the granularity in DBKWS can lead to a discussion on the results of XML keyword search where ranking of the results has to be conducted at XML element level rather than XML document level. But here we will not explore this issue any further.

4.2. DBKWS and aggregation

Although in general the main interest of DBKWS may not be relevant to GrC, considerations related to granularities have been considered in DBKWS, as illustrated in an earlier prototype system DBXplore [1], where pros and cons of column level or cell level granularities have been considered for symbol table implementation.

As indicated by [13], a problem related to DBKWS is that it becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. This kind of giant or fragmentary results has necessitated new research on for better granularities through various ways of aggregation or summarization. This direction of research echoes the famous simple formula given by Zadeh: Granulation=Summarization [28].

An increasing number of authors have tried to tackle this problem with various approaches. A direct attempt is to deal with aggregation. References [30,31] considered answering aggregate keyword quries on RDB using minimal group-byes, but an obvious problem with that approach is that the authors only consider one table, leaving plenty rooms for improvement, such as merging related joining tuples from multiple tables to a single tuple to reduce redundancy in the results.

The good news is that aggregation can be done at a more advanced level. Based on the observations that OLAP tools provide elaborate query languages that allow users to group and aggregate data in the data, and explore interesting trends and patterns in the data, [25] introduced Keyword-Driven Analytical Processing (KDAP) to combine intuitive KWS and the power of OLAP. KDAP can handle both categorical and numerical data (but not measures as in fact table of star schema), and is able to find exceptions or surprises in the data and identify bellwater regions where local aggregations are highly correlated with global aggregates.

One way to summarize the result other than direct aggregation is to compute structural statistics, as used by [13]. An RDB is viewed as a large directed graph where nodes represent tuples, and edges represent links among tuples. Instead of using tuples as a member to be grouped, roote subgraphs are used to represent interconnected tuple-structures among tuples and some of the tuples contain keywords. Two rooted subgraphs are grouped into the same group if they are isomorphic based on "dimensional keywords." For a query such as "Which conference is good for SQL query optimization" can be formed as {Conference, SQL, query, optimization} where Conference is underlined to denote dimensional keywords. The result could be something like {(SIGMOD, 340,1), (VLDB, 274.5)...}., etc., where the numbers indicate scores calculated.

Instead of statistics, other forms such as object summaries can be produced to summarize results of KWS. Reference [4] introduced what the author called "a novel keyword searching paradigm" which supports automated generation of object summaries from relational databases. According to this approach, the data graph is traversed by staring form a tuple containing the keyword kw (denoted as tDS), and continues traversing neighbouring tuples as long as the data traversed is relevant to tDS. But object summaries (OSs) may still be quite big, so the concept of size-l OS was further proposed with effective and efficient algorithms proposed [5]

The study of aggregation is also related to multifaceted search where new levels of granularities can be constructed in a dynamic fashion, as illustrated in [21].

4.3. The STRUCT approach for extended DBKWS

Although keyword search are convenient for users and have the potential of providing a unified approach for retrieval of information with different formats (including databases, documents, Web pages, etc.), it is not

problem-free, because keywords are usually out of contexts, and there could be numerous ways to explain how these keywords are related to each other and how users will use them to satisfy their needs of information. Naturally, in parallel to recent research on database keyword search (DBKWS) and XML keyword search, there has been another direction of research, namely, translation (or mapping) the natural language (i.e., English) queries to SQL (or XQuery) queries, as exemplified in [9]. Note that although both of these two directions of research are aimed at relieving users' burden for database access, the basic ideas behind these two directions are completely different: Keyword search completely ignore the contexts of the keywords, whereas natural language translation completely bypassing any concerns about keyword search at all. One may claim natural language translation is superior to keyword search, but keyword search is not without merit, because it is a very common way to express users' information needs, as widely used in information retrieval (IR) and Web search.

In fact, it is highly unrealistic to require natural language queries as the dominate mode for database access, for very obvious reasons. If writing SQL queries is a burden for many naïve database users to access the data, then requiring writing queries in natural language could be an even more significant burden for some database users, particularly for those who are not the native speakers of that language. After all, although keyword search has obvious shortcomings, it is still the most explicit way to a user to express his or her information need.

The root of GrC can be traced to Zadeh's early statement which claimed that fuzzy information granulation in an intuitive form underlies human problem solving [27]. The human-centered consideration has inspired us to propose our own approach: expanded keyword search - that is, if we can offer appropriate interfaces to allow users submitting English sentences queries (rather than keywords alone), because the intension of users can be better interpreted through the context of English statements. The basic idea behind our search engine is to extract keywords from the context of English sentences and to convert the given English query into its equivalent SQL queries. As the result, users are no longer to be confined in the usage of just conjunctive, disjunctive, and negative semantics, because the system also provides the flexibility of including frequently used aggregate functions (SUM, AVG, COUNT, MIN, MAX) in the user's English query to be composed. An example of such a query is:

Q. Display the list of computer science students having average score (of all the subjects) is greater than 80.

The SQL query generated by STRUCT is: Select students, avg(score) from Student

where branch= 'computer science' group by(students) having avg(score)>80.

Basics of STRUCT approach can be found in [11].

We summarize our discussion on DBKWS as discussed in this section as follows. As a subfield in database management systems, the study of DBKWS has been largely independent to GrC. Yet there are some overlapped interests between DBKWS and GrC, as indicated in our examination of aggregation. To further facilitate human-centered information processing for integrated data retrieval and analysis, we introduced our proposed STRUCT approach. Although by itself STRUCT does not directly serve the purpose of data mining, it endorses the role of users in the human-centered information processing involving granulation.

5. Conclusion: Making GrC the driving force for integrated study of storage and analysis of Big Data

In this paper, we have examined both traditional and Big Data from a database perspective. Although research on granularities and granulation in GrC community has long focused mainly on reasoning, recent

progress as exemplified by Infobright has indicated the interesting new trend of integrated study of data management and data analysis. One objective of this paper is to provide a review on related developments, and call for a more systematic research on data management and mining/analysis. Our second objective is to show that integrated study of data management and data mining/analysis is not necessarily restricted to GrC or rough set theory, using database keyword search as an example. Yet GrC can still serve as the driving force for this integrated study, as illustrated in the success of the Infobright approach. But more research along this line of thinking is still needed.

While embracing Big Data, we are still dealing with various issues with traditional data. While GrC and DBMS are adjusting themselves to deal with challenges involving both traditional and Big Data using their own established approaches, a cross-fertilization between these two is becoming increasingly inevitable.

Acknowledgements

Research on extended keyword search is a joint research with PhD student R. Patil at Data Science Lab. References

[1] Agrawal S, Chaudhur S, Das, G. DBXplorer: A System for Keyword-Based Search over Relational Databases,.. ICDE 2002: 5-16

[2] Agneeswaran VS. Big-Data - theoretical, engineering and analytics perspective, in S. Srinivasa and V. Bhatnagar (eds.), BDA 2012, LNCS778, pp.8-15, 2012.

[3] Dong LX, Srivastava D. Big Data integration Tutorial in ICDE'13, VLDB'13

[4] Fakas GJ. Automated generation of object summaries from relational databases: A novel keyword searching paradigm. ICDE Workshops 2008: 564-567

[5] Fakas GJ. A novel keyword search paradigm in relational databases: Object summaries. Data Knowl. Eng. 70(2): 208-229 (2011)

[6] Han J, Kember M, Pei J. Data Mining: Concepts and Techniques (3e), Morgan Kaufmann, 2012.

[7] Huang L, Liang J, Pan Y, Xian Y. A complete attribute reduction algorithm based on improved FP tree, Circuits, Communications and System (PACCS), 2011 Third Pacific-Asia Conference.

[8] Kowalski M, Slezak D, Toppin G, Wojna A. Injecting knowledge into RDBMS - compression of alphanumeric data attributes, Proc. ISMIS 2011, LNAI 6804, 386-395.

[9] Li Y, Yang H, Jagaaadddish HV, NaLIX: A generic natural language search environment for XML data, ACM TODS, 32(4), 2007.

[10] Lin T Y (2003) Granular computing: Structures, representations, and applications, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (Proc. RSFDGrC 2003), LNCS Volume 2639: 16-24.

[11] Patil R, Chen Z. STRUCT: Inforporating contextual informatin for English query search on relational databases, SIGMOD KEYS Workshop, 2012.

[12] Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning About Data.. Kluwer, 1991.

[13] Qin L. Yu JX, Chang L. Computing structural stateistics by keywords in databases, ICDE 2011.

[14] Rajaraman A, Lskovec J, Ullman JD. Mining of Massive Datasetsllman, Mining of Massive Datasets (2/e),: 2012. http://infolab.stanford.edu/~ullman/mmds/book.pdf

[15] Silberschatz A, Korth HF, Sudashan S. Database Concepts, 6/e, 2010.

[16] Slezak D. Rough set approach to scale data processing and mining operations, WIUI 2013.

[17] Slezak D, Synak P, Wroblewsk J, Toppin G. Infobright Analytic Database Engine Using Rough Sets and Granular Computing, GrC 2010, 432-437.

[18] Slezak D, Eastwood V. Data warehouse technology by Infobright, Proc. SIGMOD '09 841-845.

[19] Slezak D, Wroblewski J, Eastwood V, Synak P. Rough sets in data warehousing (extended abstract), in Proc. RSCTC 2008, LNAI 5306, 505-507.

[20] Tiwari K, Kothari A, Sha R. FPGA Implementation of a reduct generation algorithm based on rough set theory, http://www.irdindia.in/journal_ijaeee/pdfVol2_iss6/9.pdf 2013.

[21] Tunkelang D. Dynamic category sets: An approach for faceted search, SIGIR Faceted Search Workshop, 2006

[22] Wang PC. Highly scalable rough set reducts geenration, J. Info. Sc. Eng., 23, 1281-1298 (2007).

[23] Wikipedia, NewSQL.

[24] Wikipedia, NoSQL.

[25] Wu P, Sismanis Y, .Reinwald B. Towards Keyword-Driven Analytical Processing, SIGMOD 2007.

[26] Yu JX, Qin L, Chang L. Keyword search in relational databases: A survey, Bulletin of Data Engineering, 2010.

[27] Zadeh LA.Toward a theory of fuzzy inforamtion granulation and its certainty in human reasoning and fuzzy logic, Fuzzy Sets Sys., 90, 111-127, 1997.

[28] Zadeh LA. Granular computing - Computing with uncertain, imprecise and partially true data, Proc. 5th Int'l Sym. Spatial Data Quality (ISSDQ 2007), http://www.itc.nl/ISSDQ2007/Documents/keynote_Zadeh.pdf

[29] Zhang, T, Ramakrishnan R, Livny M. BIRCH: A New Data Clustering Algorithm and Its Applications. Data Min. Knowl. Discov. 1 (2): 141-182 (1997)

[30] Zhou B, Pei J. Answering aggregate keyword queries on relational databases using minimal group-bys, Proc. EDBT 2009.

[31] Zhou B, Pei J. Aggregate keyword searrch on large relational databases, https://www.cs.sfu.ca/~jpei/publications/AKS-KAIS.pdf , 2010.