Scholarly article on topic 'A Compact Bit String Accessibility Map for Secure XML Query Processing'

A Compact Bit String Accessibility Map for Secure XML Query Processing Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{"Access Control" / "Labeling Scheme" / Privacy / "XML Query Processing"}

Abstract of research paper on Computer and information sciences, author of scientific article — Meghdad Mirabi, Hamidah Ibrahim, Nur Izura Udzir, Ali Mamat

Abstract One of the challenging issues related to specifying a fine-grained access control on the XML data is how to implement the accessibility map in a compact format with minimum affect on the XML query processing. In this paper, we propose a Compact Bit String Accessibility Map (CBSAM) to implement the accessibility map in a compact format. In order to achieve a secure and efficient XML query processing, the CBSAM is integrated with the region number labeling scheme. The experimental results illustrate that the CBSAM compresses the accessibility map with minimum affect on the XML query processing when the access locality among the XML nodes is high.

Academic research paper on topic "A Compact Bit String Accessibility Map for Secure XML Query Processing"

Available online at www.sciencedirect.com

Procedía

Computer Science

ELSEVIER

Procedía Computer Science 10 (2012) 1172 - 1179

International Workshop on Service Discovery and Composition in Ubiquitous and

Pervasive Environment (SUPE)

A Compact Bit String Accessibility Map for Secure XML

Query Processing

Meghdad Mirabi*, Hamidah Ibrahim, Nur Izura Udzir, Ali Mamat

Department of Computer Science, Faculty of Computer Science, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia

One of the challenging issues related to specifying a fine-grained access control on the XML data is how to implement the accessibility map in a compact format with minimum affect on the XML query processing. In this paper, we propose a Compact Bit String Accessibility Map (CBSAM) to implement the accessibility map in a compact format. In order to achieve a secure and efficient XML query processing, the CBSAM is integrated with the region number labeling scheme. The experimental results illustrate that the CBSAM compresses the accessibility map with minimum affect on the XML query processing when the access locality among the XML nodes is high.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer]

Keywords: Access Control; Labeling Scheme; Privacy; XML Query Processing.

1. Introduction

Access control is the process of determining whether users' requests to system's resources should be granted or denied. Access control plays an important role to prevent unauthorized users to access private data. In traditional access control mechanisms, access authorizations are specified to the whole of entities such as tables and views. Therefore, when a user submits a query, all the entities referred to in the query are checked against access authorizations granted to that user. If one of the entities is inaccessible to the user, the whole query is rejected by the system. This kind of access control is called coarse-grained access control. In contrast to the coarse-grained access control mechanism, in fine-grained access control mechanism, access authorizations are specified at a finer granularity, e.g. XML nodes and each query is answered using that portion of data that is accessible to the user.

* Meghdad Mirabi. Tel.: +6-038-946-6555; fax: +6-038-946-6576. E-mail address:meghdad.mirabi@gmail.com.

Abstract

1877-0509 © 2012 Published by Elsevier Ltd. doi:10.1016/j.procs.2012.06.169

In general, the accessibility map determines whether a specific subject can access a specific object under a specific action or not [1]. The accessibility map is often represented as an access control matrix where the subjects, objects, and actions form three dimensions of the matrix. Each element of this matrix can be either accessible or inaccessible with respect to whether a specific subject can access a specific object under a specific action or not. Constructing such accessibility map supports rapid determination of accessibility of objects at runtime but incurs more storage and maintenance costs [1, 2]. Therefore, one of the challenging issues related to specifying a fine-grained access control for the XML data is how to implement the accessibility map in a compact format.

By specifying a fine-grained access control on the XML data, the process of access authorization checking is required during the process of XML querying. Thus, the access control mechanism which checks the accessibility of XML data at runtime must be efficient [3]. This is not generally an issue in relational database systems with a coarse-grained access control. In relational database systems, the access authorizations are checked once before processing the query. If all the entities on which a query depends on are accessible, the query is processed. Otherwise, it is rejected. However, in XML database systems, each XML query is answered based on the part of XML data that is accessible for the subject who submits the query.

The aim of this paper is to implement the accessibility map in a compact format with minimum affect on the XML query processing. Hence, the main contributions of this paper are summarized as follows:

• We propose a Compact Bit String Accessibility Map (CBSAM) to compress the accessibility map with a fine-grained access control on the XML data.

• We integrate the CBSAM with the region number labeling scheme to achieve a secure and efficient XML query processing. By integrating the CBSAM with the region number labeling scheme, the process of access authorization checking is accelerated during the process of XML querying.

The rest of the paper is organized as follows: in Section 2, previous researches on the XML access control are investigated. In Section 3, the CBSAM is described. In Section 4, the process of integrating the CBSAM with the region number labeling scheme is explained. The experimental results are illustrated in Section 5. Finally, the paper is concluded in Section 6.

2. Related Works

Several researches have been done to specify and enforce a fine-grained access control on the XML data which include studies by [3-13]. Some of these research works [4, 5, 13] have defined a model to specify a fine-grained XML access control, generally using the discretionary access control or role based access control. These researches focus on some issues such as the level of granularity which access authorization should be specified, the propagation policies, the conflict resolution policies, and the default policies.

Moreover, several mechanisms have been proposed to enforce a fine-grained access control on the XML data. Some of these research works [6, 9] are view-based which generate and maintain the different XML views for the different users based on their defined access authorizations. These views are created at compile time and contain the set of XML nodes which the users have permission to access. At runtime, the users' queries are submitted against these views without worrying about the access control enforcement. The problem of view based XML access control mechanism is high maintenance and storage cost since many views should be created for the large number of users with different access authorizations.

In traditional node filtering mechanisms [4, 5], the access authorizations are determined by labeling the XML nodes with a permission (+), or a denial (-) and then pruning the XML tree based on the associated signs. Therefore, such a mechanism requires the repetitive XML document labeling and pruning for each user's query which degrade the query performance at runtime.

In query rewriting access control mechanisms [7, 8, 10, 11], the access authorizations are employed to rewrite the probable unsafe queries into the safe ones which should be evaluated against the original XML dataset. Due to the need of runtime process to rewrite the queries, such a mechanism degrades the query performance.

Besides, several research works have been done to reduce the overall storage cost of accessibility map with rapid determination of accessibility of XML nodes at runtime [1, 2]. The Compressed Accessibility Map (CAM) proposed by [1] compresses the accessibility map in order to reduce the need of large storage space to store it. In general, only a small portion of the accessibility map accounts for the total size of the CAM. Moreover, the Integrated CAM (ICAM) proposed by [2] is an improvement over the original CAM which combines multiple CAMs into an ICAM. The correlation among different actions (e.g. if the action "write" is allowed, the action "read" automatically is allowed as well) are used by the ICAM to compress multiple CAMs into one integrated structure.

3. Compact Bit String Accessibility Map (CBSAM)

In general, an accessibility map can be defined through a function F-.S x 0 x A ^ [Accessible, Inaccessible] where S is a set of subjects (users or roles), O is a set of objects (XML nodes), and A is a set of actions supported by the system (e.g. read and write).

In the case that subjects only can query the XML data, the accessibility map can be represented by associating a string of bits to each XML node in which each bit represents the accessibility of XML node to a specific subject. Note that the number of bits in the bit string is equal to the number of subjects. This is exactly what we use in the CBSAM. Here, we refer an XML tree marked based on the accessibility map as a marked XML tree.

Similar to the CAM proposed in [1], we exploit the access locality among the XML nodes in the marked XML tree in order to compact the accessibility map. The access locality among the XML nodes means that the XML nodes clustered together have similar accessibility. The access locality can be horizontally only between sibling nodes or vertically between the ancestor and descendant nodes in a marked XML tree. In contrast to the CAM which constructs a CAM for each subject, the accessibility of XML nodes can be combined together in the CBSAM to support multiple subjects.

Definition of Authorization Node: An authorization node is a node in the marked XML tree which accessibility is different from its parent node. Note that we assume that the root node of the marked XML tree is an authorization node.

Fig. 1 shows a marked XML tree with three subjects. In Fig. 1, each XML node is divided into three parts where the left, middle, and right parts represent the accessibility of XML node to the first, second, and third subjects, respectively. In Fig. 1, the gray rectangles represent accessibility of XML nodes to the subject while the white rectangles represent non-accessibility of XML node to the subject. For example, node "B" is accessible to the first subject but not to the second and third subjects. The CBSAM corresponding to a given marked XML tree is the set of authorization nodes together with their accessibilities when the marked XML tree is traversed in the preorder. Therefore, when the descendant nodes of an authorization node have the same accessibility, it only records the authorization node. Since the number of subjects in Fig. 1 is "3", three bits are needed to represent the accessibility of an XML node for three subjects in both the accessibility map and the CBSAM. As illustrated in Fig. 1, the CBSAM only stores five authorization nodes "A", "B", "G", "I", and "J" with their accessibilities while the corresponding accessibility map stores all the XML nodes together with their accessibilities.

Assume that |5| be the total number of subjects and \N\ be the total number of XML nodes in the XML tree. In the case that subjects can only query the XML data, the total number of bits required to present the accessibility of XML nodes for different subjects in the accessibility map depends on the total number of subjects (|5|) and the total number of XML nodes (|W|) while the total number of bits required to present the accessibility of XML nodes for different subjects in the CBSAM depends on the total number of subjects (|5|) and the total number of authorization nodes in the marked XML tree. The total number of

authorization nodes in the marked XML tree depends on the access locality among the XML nodes for different subjects. If the access locality among the XML nodes for different subjects is high, the total number of authorization nodes will be small and therefore, the total number of bits required to represent the accessibility of XML nodes will be small. However, in the worst case, the total number of bits required to represent the accessibility of XML nodes in the CBSAM is equal to that in the accessibility map. Table 1 illustrates the comparison of the overall storage cost of the accessibility map with the CBSAM in the worst and best cases. In the real world, we expect that the access locality among the XML nodes for different subjects is strong and this strong access locality makes the overall storage cost of the CBSAM as minimum as possible.

Fig. 1 A marked XML tree for multiple subjects

Table 1. The total number of bits required to represent the accessibility of XML nodes

Method Worst Case Best Case

Accessibility Map (AM) Compact Bit String Accessibility Map (CBSAM) (|S| x |«|)üts (|S| x \N\)bits (|S| x \N[)bits \S\bits

4. Integrating the CBSAM with the Region Number Labeling Scheme

We integrate the CBSAM with the region number labelling scheme [14, 15] in order to achieve a secure and efficient XML query processing since the region number labeling scheme is able to represent the order of XML nodes and to facilitate the process of XML querying [16-18].

In the region number labeling scheme, each node is assigned with two values, start and end, based on the positions of start and end tags of the node in the XML document. For example, the marked XML tree in Fig. 1 is labeled by the region number labeling scheme. In the region number labeling scheme, Ancestor-Descendant (A-D) relationship between two arbitrary XML nodes can be determined as follows: node "a" is an ancestor of node "b" iff starta < startb and enda > endb. For example, in Fig. 1, node "D" is an ancestor of node "J" since "6" < "16" and "19" > "17".

To integrate the CBSAM with the region number labeling scheme, we first store the metadata of region number labeling scheme as well as the metadata of CBSAM into relational database. The metadata of region number labeling scheme is stored into REGION-METADATA relation. The schema of this relation is as follows: REGION-METADATA (NodeName, Start, End, Content).

Each tuple in the REGION-METADATA relation represents an XML node in the XML tree. The NodeName attribute represents the tag name of XML node in the XML tree and the Start and End attributes correspond to the start and end values of each XML node, respectively. Moreover, the Start attribute serves as the primary key of the relation. Besides, the Content attribute represents the value of

XML node if it is a leaf node in the XML tree. An instance of REGION-METADATA relation for the marked XML tree illustrated in Fig. 1 is given in Table 2 (a). Note that the Content attribute of all the internal XML nodes as well as the root node is "null".

The metadata of CBSAM is stored into CBSAM-METADATA relation. The schema of this relation is as follows: CBSAM-METADATA (Start, End, Accessibility).

Each tuple in the CBSAM-METADATA relation represents an authorization node in the marked XML tree. The Start and End attributes correspond to the start and end values of authorization node, respectively. Moreover, the Start attribute serves as the primary key of the relation. Besides, the Accessibility attribute represents the accessibility of authorization node for a set of subjects in which each bit corresponds to a subject. An instance of CBSAM-METADATA relation for the marked XML tree illustrated in Fig. 1 is given in Table 2 (b).

Table 2. (a) An instance of REGION-METADATA relation; (b) An instance of CBSAM-METADATA relation

NodeName Start End Content Start End Accessibility

A 1 20 Null 1 20 010

B 2 3 2 3 100

C 4 5 11 18 101

D 6 19 Null 14 15 010

E 7 8 16 17 000

F 9 10

G 11 18 Null

H 12 13

I 14 15

J 16 17

In general, two methods can be utilized in order to improve the efficiency of access authorization checking during the process of XML querying. The first method is to efficiently search the appropriate access authorizations [1, 2, 4-6] and the second method is to reduce the number of access authorization checks during the process of XML querying [3, 7, 11]. We employ both methods in this paper.

According to the definition of authorization node, the accessibility of an XML node for a specific subject can be determined by finding the nearest authorization node of the XML node in the marked XML tree. The definition of nearest authorization node for each XML node is defined as follows: Definition of Nearest Authorization Node: An authorization node is called the nearest authorization node (NANe) of node e if it satisfies the following two conditions:

1. NANe is an authorization node which can be node e or one of its ancestor nodes;

2. No authorization node exists in the depth between the node e and the node NANe.

The algorithm of finding the nearest authorization node is shown in Fig. 2.

FindNearestAuthorizationNode (XNode)

Input: the XML node XNode.

Output: the nearest authorization node NAN.

NANXNode Select * From CBSAM-METADATA

Where Start = (Select Max (Start) From CBSAM-METADATA

Where (Start < startXNode) and (End > endXNode)));

Fig. 2 Algorithm of finding the nearest authorization node

A solution to reduce the number of access authorization checks during the process of XML querying is to identify a set of XML nodes which have the same accessibility [3, 7, 11]. We utilize this method to improve the efficiency of access authorization checking.

As illustrated in Fig. 3, assume that {Bi,B2} be two authorization nodes and {N1,N2, ...,W200} be a set of XML nodes retrieved by the XML query processor. In Fig. 3, all the XML nodes from the node N1 to the node Nqq have the same accessibility since the nearest authorization node for them is the node Bi. They form the set of unauthorized XML nodes. Moreover, the nearest authorization node of all the XML nodes

from the node Nwo to the node JV200 is the node B2. Therefore, they form the set of authorized XML nodes.

(startBl,endBl)

(startNl,endNJ N, _

Authorization Node (-)

{startB2,endg2)

B2 ^— Authorization Node (+)

Fig. 3 A set of XML nodes retrieved from the XML query processor with two authorization nodes

We can define an interval to determine the set of XML nodes with the same accessibility during the process of XML querying. The defined interval represents either the set of authorized XML nodes or the set of unauthorized XML nodes. Using this interval, we are able to reduce the number of access authorization checks during the process of XML querying. In order to construct this interval, we define the nearest override authorization node as follows:

Definition of Nearest Override Authorization Node: Suppose that NANe is the nearest authorization node of node e. An authorization node is called the nearest override authorization node NOANe of the nearest authorization node NANe if it satisfies the following two conditions:

1. NOANe is a descendant node of node NANe with different accessibility;

2. No authorization node exists between the node e and the node NOANe when the marked XML tree is traversed in the preorder.

Definition of Interval: Suppose that the nearest authorization node of the node e is NANe and the nearest overriding authorization node of NANe is NOANe. The interval of a set of XML nodes which have the same nearest authorization node as that of the node e is defined as follow:

Interval =

[starte,startN0ANJ [starte,endNANe)

if the NOANe exists if the NOANedoes not exist

For example, in Fig. 3, the nearest authorization node of node N1 is the authorization node B1 and the nearest override authorization node of node Bi is the authorization node B2. Therefore, the interval of a set of XML nodes which have the same accessibility as the node N1 is [startNi,startBi). It means that all the XML nodes which are in this interval are inaccessible since the node Ni is inaccessible. Thus, instead of finding a set of nearest authorization nodes for 99 XML nodes from the node Ni to the node N99, we can accelerate the process of access authorization checking using this method.

5. Experimental Results

In our experiments, we used the XMark data set [19] with 720,800 nodes and all the experiments were conducted on a 2.4 GHz Pentium (IV) processor with 3.23 GB of RAM running Windows XP professional.

5.1. Experiment on space efficiency

In order to verify the affect of access locality among the XML nodes in the CBSAM to compact the accessibility map, we simulate two cases with different access localities. To control the access locality

among the XML nodes, two parameters namely the propagation ratio and the accessibility ratio were defined. The propagation ratio determines the percentage of XML nodes in the XML data set that are randomly and uniformly selected as the selected nodes while the accessibility ratio determines the percentage of selected nodes that are accessible. We simulated the access locality among the XML nodes by propagating the accessibility of the selected nodes to their descendants using the most specific override takes precedence policy [4, 5]. Moreover, the root of XML data set was selected as the selected node to make sure all the XML nodes are either accessible or inaccessible.

Fig. 4 shows the compression ratio of the CBSAM for 4 subjects with the propagation ratio 2% and 5% when the accessibility ratio changes from 10% to 90%. It is quite clear that when the access locality among the XML nodes is high (the propagation ratio is 2%), the compression ratio is also high.

Is 0.98

= o 0.96

H 0.94

ti 0.92

» ---- .....■

"V .•••

10% 30% 50% 70% Accessibility Ratio 90%

♦ Propagation Ratio =2% ■ «■■■■ Propagation Ratio =5%

Fig. 4 Compression ratio versus accessibility ratio for 4 subjects

5.2. Experiment on XML query processing

We compared the performance of XML query processing with and without the CBSAM to measure the overall overhead of CBSAM for specifying a fine-grained access control on the XML data. Three queries with different complexity were used in the experiment as illustrated in Table 3.

Table 3. Queries used in the experiment

Query Query Definition

Q1 //person//interest Q2 //person//watches//watch

Q3 //site//open-auctions//open-auction//bidder//increase

Fig. 5 (a) and (b) illustrate the overall overhead of the CBSAM when the propagation ratio is 2% and 5%, respectively. As shown in Fig. 5 (a) and (b), when the access locality among the XML nodes is very high (the accessibility ratio is 10% or 90%), the overall overhead of the CBSAM is very low in all the queries.

Fig. 5 Overall overhead of the CBSAM (a) Propagation Ratio = 2%; (b) Propagation Ratio = 5%

6. Conclusion

In this paper, we have devised a Compact Bit String Accessibility Map called CBSAM to implement the accessibility map in a compact format by exploiting the access locality among the XML nodes in the marked XML tree. The accessibility of XML nodes for different subjects is combined together in the CBSAM to support multiple subjects. The experimental results demonstrated that the CBSAM efficiently compresses the accessibility map in the multi-subject environments. Besides, the CBSAM was integrated with the region number labeling scheme to achieve a secure and efficient XML querying. The experimental results showed that the overall overhead of the CBSAM is very low when the access locality among the XML nodes is high.

References

1. Yu, T., Srivastava, D., Lakshmanan, L.V.S., and Jagadish, H.V., A Compressed Accessibility Map for XML. ACM Transactions on Database Systems 2004; 29(2): 363-402.

2. Jiang, M. and Fu, A.W.-C., Integration and Efficient Lookup of Compressed XML Accessibility Maps. IEEE Transactions on Knowledge and Data Engineering 2005; 17(7): 939-953.

3. Lee, J.-G., Whang, K.-Y., Han, W.-S., and Song, I.-Y., The Dynamic Predicate: Integrating Access Control with Query Processing in XML Databases. The VLDB Journal 2007 16(3): 371-387.

4. Bertino, E., Castano, S., Ferrari, E., and Mesiti, M., Specifying and Enforcing Access Control Policies for XML Document Sources. World Wide Web 2000; 3(3): 139-151.

5. Damiani, E., Vimercati, S.D.C.d., Paraboschi, S., and Samarati, P., A Fine-Grained Access Control System for XML Documents. ACM Transactions on Information and System Security 2002; 5(2): 169-202.

6. Fan, W., Chan, C.-Y., and Garofalakis, M., Secure XML Querying with Security Views. in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004. Paris, France: ACM.

7. Murata, M., Tozawa, A., Kudo, M., and Hada, S., XML Access Control Using Static Analysis. ACM Transactions on Information and System Security 2006; 9(3): 292-324.

8. Damiani, E., Fansi, M., Gabillon, A., and Marrara, S., A General Approach to Securely Querying XML. Computer Standards & Interfaces 2008; 30(6): 379-389.

9. Kuper, G., Massacci, F., and Rassadko, N., Generalized XML Security Views. International Journal of Information Security 2009; 8(3): 173-203.

10. Byun, C. and Park, S., A Schema Based Approach to Valid XML Access Control. Journal of Information Science and Engineering 2010; 26(1719-1739.

11. Luo, B., Lee, D., Lee, W.-C., and Liu, P., QFilter: Rewriting Insecure XML Queries to Secure Ones using Non-Deterministic Finite Automata. The VLDB Journal 2011; 20(3): 397-415.

12. Mirabi, M., Ibrahim, H., Mamat, A., and Udzir, N.I., Integrating Access Control Mechanism with EXEL Labeling Scheme for XML Document Updating. in Proceedings of the Third International Conference on Networked Digital Technologies. 2011. Macau, China: Springer-Verlag.

13. Mirabi, M., Ibrahim, H., Fathi, L., Udzir, N.I., and Mamat, A., An Access Control Model for Supporting XML Document Updating. in Proceedings of the Third International Conference on Networked Digital Technologies. 2011. Macau, China: Springer-Verlag.

14. Li, Q. and Moon, B., Indexing and Querying XML Data for Regular Path Expressions. in Proceedings of the 27th International Conference on Very Large Data Bases. 2001. Roma, Italy: Morgan Kaufmann.

15. Amagasa, T., Yoshikawa, M., and Uemura, S., QRS: A Robust Numbering Scheme for XML Documents. in Proceedings of the 19th International Conference on Data Engineering. 2003. Bangalore, India: IEEE Computer Society.

16. Mirabi, M., Ibrahim, H., Udzir, N.I., and Mamat, A., An Encoding Scheme Based on Fractional Number for Querying and Updating XML Data. Journal of Systems and Software 2012; http://dx.doi.org/10.10167j.jss.2012.02.054.

17. Mirabi, M., Ibrahim, H., Udzir, N.I., and Mamat, A., Label Size Increment of Bit String Based Labeling Scheme in Dynamic XML Updating. in Proceedings of the International Conference on Digital Enterprise and Information Systems. 2011. London, UK: Springer-Verlag.

18. Mirabi, M., Ibrahim, H., Mamat, A., Udzir, N.I., and Fathi, L., Controlling Label Size Increment of Efficient XML Encoding and Labeling Scheme in Dynamic XML Update. Journal of Computer Science 2010; 6(12): 1529-1534.

19. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., and Busse, R., XMark: A Benchmark for XML Data Management. in Proceedings of the 28th International Conference on Very Large Data Base. 2002. Hong Kong, China: VLDB Endowment.