Accepted Manuscript

Achieving privacy-preserving big data aggregation with fault tolerance in smart grid Zhitao Guan, Guanlin Si

PII: S2352-8648(17)30066-4

DOI: 10.1016/j.dcan.2017.08.005

Reference: DCAN 99

To appear in: Digital Communications and Networks

Received Date: 15 February 2017 Revised Date: 3 August 2017 Accepted Date: 10 August 2017

Please cite this article as: Z. Guan, G. Si, Achieving privacy-preserving big data aggregation with fault tolerance in smart grid, Digital Communications and Networks (2017), doi: 10.1016/j.dcan.2017.08.005.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Achieving Privacy-Preserving Big Data Aggregation with Fault Tolerance in Smart Grid

Zhitao Guan, Guanlin Si

School of Control and Computer Engineering, North China Electric Power University, Beijing, 102206 China

*Zhitao Guan (Corresponding author) and Guanlin Si are with the School of Control and Computer Engineering, North China Electric Power University, Beijing, China. E-mail: guanzhitao@126.com, m18811612766@163.com.

Abstract

In smart grid, a huge amount of data are collected for various applications, such as load monitoring and demand response. These data are used for analyzing the power state and creating optimal dispatching strategy. However, these big energy data in terms of volume, velocity and variety raise consumer's privacy concerns. For instance, in order to optimize the energy utilization and support the demand response, numerous smart meters are installed at consumer's home to collect energy consumption data at a fine granularity, but these fine-grained data may disclose the appliances consumption patterns and then discover consumer's behaviors at home. In this paper, we propose a privacy-preserving data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without compromising user's privacy. Meanwhile, we also consider fault tolerance and the resistance to differential attack during the data aggregation. At last, we analyze the security analysis and performance evaluation of our scheme compared with the other similar schemes. The analysis shows that our scheme can meet the security requirement, and it also has better performance than that of the other popular methods.

Keywords: Big data, Smart grid, Privacy-preserving, Fault tolerance

1. Introduction

Fig. 1: System Model

As a new generation of energy network, smart grid is considered a useful way to solve the severe environmental issues and resource crisis. It is the product of the combination of energy network and information technology. Differing from the unidirectional centralized grid, the control mode of the smart grid is more flexible and reliable. It supports bidirectional power flow between the users and grid. User in smart grid is not only a consumer but also a generator. In smart grid, large quantities of data are collected to support basic services [1]. For example, to create power plan or dynamic price, control center needs to collect and analyze real-time data from various applications by adopting the smart meter installed at the user's house. What's more, electric vehicle drivers need to upload their location message to control center for power dispatching.

Although big data collected from users is necessary for the basic service, it is usually sensitive to user's privacy [2]. For instance, smart meters are adopted to collect the real-time data from users to control center, but these data may disclose user's family behaviors. Thus, if a thief gets the real-time data from user's smart meter, he may gain entry to user's house when he notices that there is nobody home. Besides, user's location-privacy may be disclosed during the interaction between electric vehicles and smart grid, which may help the adversary to catch user's track [3] [4]. If user's sensitive data isn't preserved very well, the implement of smart grid will meet resistance. Therefore, the privacy-leaking in smart grid becomes an extremely important problem.

For the privacy-preservation in smart grid, there are various solutions. As we know, traditional privacy-preserving strategies can be divided into two aspects. One is to hide the user's identity; the other one is to protect the user's sensitive data [5]. As the properties of big data in smart grid are reflected by volume, velocity and variety [6], approaches for privacy-preservation need to consider more on communication overhead and computational cost [7] [8].

The first strategy is protecting user's identity through anonymity or pseudonym. User's attributes can be classified into identity information, quasi-identifier, and sensitive information. Given an anonymity table, if the attributes in the table have not been properly treated, an adversary may deduce the relationship between user's identity and sensitive information according to the user's quasi-identifier such as the age and gender. Although there are k-anonymity algorithm [9] and l- diversity algorithm [10] to address the disadvantages of identity-protection scheme, it is very difficult to find a credible party to complete the secure anonymity work.

The second strategy is using data aggregation to protect the user's real-time data, which contains the homomorphic encryption and data-obfuscation. In fact, these two methods are often used together. Given the huge volume, velocity and variety of big data, we aggregate users data in groups and adopt Paillier algorithm to encrypt user's real-time data. Besides, secret shares are distributed into each user for further obfuscating their data. Our contributions are summarized as follows: 1) In case that data aggregation device (DA) and control center (CC)launch differential attack based on two data sets differing on at most one element, the threshold of secret shares is the same with the number of group members. If the number of SMs participating data aggregation isn't equal to the number of group members, CC can't get the right result.

2) We mask user's identity through anonymity and use the group's hash table to search the malfunctioning smart meter (SM) by comparing with other groups without disclosing user's identity.

3) We realize the fault tolerance through the substitution strategy. Each member in a group has the same secret share with other group members. When there is a malfunctioning SM in a group, we can use related user's data in other groups to substitute.

The rest of this paper is organized as follows. Section 2 introduces the related work. Section 3 shows the system model and design goals. In section 4, some preliminaries are given. In section 5, our scheme is stated. In section 6, security analysis is given. In Section 7, the performance of our scheme is evaluated. In Section 8, the paper is concluded.

2. Related Work

To protect the privacy of users in smart grid, many scholars have proposed various strategies. These strategies can be classified into two aspects: 1) protect user's privacy by masking the real identity; 2) protect user's privacy by masking their real-time data.

Some works are focused on masking user's identity. A simple solution adopting a trusted- party to manage the identity list is proposed in [11]. However, finding a trusted-party is not easy. Privacy-preserving and validity-authentication are two related problems [12] and Cheung proposes a scheme based on blind signature to solve these two problems in [13]. It ensures that the verifier can authenticate sender's signature while has no information about his privacy. Camenisch and Lysyanskaya proposed a scheme named CL -Signature scheme which is similar to the blind signature in [14]. Stegelmann proposes k-anonymity to protect user's privacy [15], however, finding a credible party to complete the secure anonymity work is difficult. An effective scheme based on virtual ring is presented in [16]. It groups the users by their geographical positions and distributes each member in the same group with the same serial number. In this way, control center can obtain all of the users' data without knowing the sender's real identity. However, the validity-authentication can't be guaranteed because of the anonymity. Riesch, P. J analyzes the authentication problems during the identity-preserving [17]. Privacy-preserving scheme based on pseudonym is also very common such as [18] [19] [20], and it always combines with the ring signature or blind signature to mask user's identity.

Some works are focused on masking user's real-time data. Solutions using a battery to hide the realtime data are proposed in [21] [22]. In this kind of scheme, smart grid and the household battery provide users with electricity at the same time. When the household consumption curve goes high, the battery discharges. Otherwise, it charges. In this way, we can hide the user's real-time data to protect user privacy. The downside of this method is that the battery charging and discharging frequently may reduce the lifetime of battery. Data aggregation is also popular in smart grid for privacy-preserving.

The Paillier encryption [23] [24] [25] and Bone-Goh-Nission encryption [26] [27] are classical homomorphic encryption algorithms for data aggregation. Besides, bilinear mapping is also a common solution for data aggregation [28] [29]. Remarkably, it is often used to realize the keyexchange. Secret sharing scheme is proposed to realize the data aggregation. It often adopts the Shamir technique to encrypt the electricity data [30] [31]. It divides a secret into different pieces and distributes them to various entities in smart grid. Only the control center acquires fixed number of shares, can he get the integral secret. Of course, secret sharing scheme can be also applied into other aspects, such as the key management, but this property is not taken into our consideration, and we will describe it in detail at below. Beussink, A shows a scheme based on data obfuscation [32], which adds a random number to each electricity data, but it will cause some large errors if the random numbers are not reasonable.

Differential privacy [33] and fault-tolerance are two important problems during the data aggregation and data obfuscation. In [23], Shi, Z and Sun, R discuss about the fault tolerance and differential privacy. However, the error rate and computational complexity is not very ideal. For the differential privacy, scholars always add random noise which obeys Laplace distribution. However, adding random noise into user's data will cause unnecessary errors. Hong, Yuan also analyzes the tradeoff between differential privacy and utility [34].

In our scheme, we use substitution to realize the fault tolerance based on secret sharing during the data aggregation. Comparing with other fault tolerance schemes such as [23], our scheme has lower error rate and less computational cost. Besides, we also use secret sharing scheme to defend differential attack. The process of our scheme is shown in figture1.

3. System Model and Design Goals

3.1. System model

As shown in figure 1, smart grid is divided into four parts, which are comprised of the control center (CC), the key initialization center (KIC), the data aggregation device (DA), and residential users.

1) Residential users: We divide all the users into different groups in accordance with their geographical locations. Each residential user'house is installed with a smart meter (SM) to collect the real-time data of the house applications every 15 minutes.

2) Data aggregation device: The data aggregation device is responsible for collecting all the data sent by SMs, calculating the sum of ciphertexts by running the homomorphic algorithm and uploading the sum to the control center. In addition, it has a fault-tolerance function. When a SM is malfunctioning in a group, his data would be replaced by the other group's SM which has the same secret key.

3) Key initialization center: The key initialization center is responsible for initializing all of the keys for SMs and CC. Additionally, each group has the same number of SMs, and the encrypted parameters assigned to each group correspond to equal.

4) Control center: The control center can acquire a summary of real-time data in smart grid from DA by decrypted key. With these data, the CC can get the trend of power consumption and make the power generation plan or dynamic pricing immediately.

3.2. Adversarial Model

We assume that smart meter installed on the user side is vulnerable to external attacks. The communication channel is not secure and adversary may eavesdrop on the channel. CC and DA are honest-but-curious. That is to say, they do not destroy or modify user's data, but always attempt to snoop the user's private information through the background knowledge. What's more, CC may conspire with DA to increase the probability of successful attack.

3.3. Design goals

Considering the above scenarios, our design goals can be divided into three aspects. 1) Privacy-preserving:

A residential user's data are inaccessible to any other users. No matter the outside adversary, DA or CC should not acquire the real-time data of users even if he knows the cipher text and encryption algorithms.

2) Resistance to differential attack:

Although the plaintext of a single user can be masked by homomorphic encryption after the data aggregation, however, given two data sets differing on almost one element, adversary can get user's plaintext by calculating the difference between the two data sets. We call this attack method differential attack. Therefore, resistance to this kind of differential attack is one of our design goals.

3) Fault tolerance:

As we set the threshold of secret shares same with the number of group members, only DA aggregates all the ciphertexts in a group and sends to CC, can CC get the right result. Therefore, if there is a SM damaged, the data aggregation can't run in the right way. Fault tolerance means that we must ensure that the data aggregation still run normally, when there are several malfunctioning SMs in a group.

4. Preliminaries 4.1. Notations

In table 1, the notations used in the proposed scheme are listed. Tab. 1: Notations

Acronym Descriptions

SM Smart meter

CC Control center

EK Encrypted key

DK Decrypted key

KIC Key initialization center

DA Data aggregation device

SNR Signal to noise ratio

gcd Greatest common denominator

lcm Least common multiple

Ps The power producing the normal signal

Pn The power producing the noise

mi Plaintext of SMi

r Random number

Ci Ciphertext of SMi

d The number of group members

The serial number of SMi

yt The serial number of groupi

t Time stamp

e% The error rate

m The average value of plaintext

C The average value of ciphertext

N The total number of all the SMs

F The number of the malfunctioning SMs

^^ sum The sum of users' plaintexts

C sum The sum of users' ciphertexts

CSUm3 The sum of lost ciphertexts

C sum The sum of processed ciphertexts

4.2. Paillier cryptosystem

Paillier encryption algorithm is an asymmetric encryption algorithm, which has additive homomorphism properties. It includes three procedures: key generation, encryption and decryption.

1) Key generation: Chooses two prime numbers p, q with the same length and calculates n = pq . g is a generator of cyclic group Z*2, and gcd (L (gAmodn2), n) = 1. The public key is (n, g), and the private key is X .

A = lcm(p-1, q-1) (1)

2) Encryption phase: For the plain text m E Zn, we can select a random number r < n . Then, the ciphertext can be calculated as follows:

C = gmr"modn2 (2)

3) Decryption phase: After receiving the cipher text C, the receiver can get the plain text m with the private key X by the following formula

L (C A modn2)

m = —-7Y modn (3)

L (g modn )

4.3. Secret sharing scheme

The secret sharing scheme is a scheme which splits a secret into a pieces and distributes these pieces with different valid members. If an adversary captures a member in the system, he can only get a piece of the secret. Only if the adversary get at least d pieces of the secret, can he get the whole secret. We call d threshold and usually adopt the Shamir technique to realize this result.

The trusted-party chooses a polynomial to split a secret. G (x) = 0+a1x + a2x2 +...+ adxd (4)

(x, G (xi)) is the corresponding share. Remarkably, the Shamir secret sharing scheme is the fully

homomorphic and can be designed as a better scheme to realize the data aggregation. According to the lagrange interpolation polynomial, we have

G (x) = E

x — x

i=1,i*j xi xj

^ =n(6)

' J~xJ — x

Then, we can easily compute 6 as follows:

TP (x 3 = 0 = 0 (7)

4.4. Signal-to-noise ratio

Signal-to-noise ratio (SNR) is a common ratio and it is often used to measure the performance of electronic systems. SNR is calculated as follows:

SNR = 10lgP (8)

PS represents the power producing the normal signal and PN represents the power producing the

noise. The higher is the SNR, the stronger is the signal. Generally speaking, the image SNR is greater than 30 dB, which will not affect the resolution of the picture. In this paper, we take the sum of normal SMs'data as the signal and take the sum of malfunctioning SMs'data processed by substitution as the noise. Then, we can measure the error rate of our scheme through the image SNR.

5. Our Scheme 5.1. System initialization

The KIC first chooses two prime numbers p, q with the same length and calculates n = pq . g is a generator of cyclic group Z*2, and gcd (l (gAmodn2), n) = 1. The public key is (n, g), and the private key is A = lcm(p -1,q-1).

It constructs a formula G (x) = 0 + a1x + a2x2 +... + adxd . Here, we set 0 = 0. All the SMs in smart grid are divided into different groups, and each group has d members. For each SM in a group, the KIC assigns the private key [xi, y, G (xi), /3x ] to the SMi. While, xi is a random number

representing the SMi serial number. yi is the group serial number and ¡3x = n

Particularly, the set of SM serial numbers [ xj, x2,..., xd ] in each group is the same. That is to say, for

a special SM, it can find members with the same private key except for the group serial number in other groups.

At last, KIC publishes (n, h, g, H1, H2) and sends X to CC in a secure way. h, H1 and H2 are three hash functions. We show the process of system initialization in Algorithml. Tab. 2: System initialization

Algorithm1: System initialization KIC.Input: d

KIC.Output: [x, yi, G (xt), 3Xi ], (n, h, g, H1, H2), X

(1) Choose two primesp, q and calculate n = pq

(2) Calculate A = lcm( p — 1, q — 1) and send to CC

(3) Choose g E {Z* | gcd(L(gAmodn2),n) = 1}

(4) Construct G (x) = 0 + a1x + a2 x2 + ... + adxd

(5) Set xi for each SM and set yi for each group

(6) Calculate G (xf) and 3Xi

(7) Send [x, yi, G (xi), (3Xt ] to each SM

(8) Choose hash functions h, H1, H 2

(9) Publish (n, h, g, Hj, H 2)

5.2. Encryption

The SM collects the electricity data every 15 minutes from all the house applications. For the time t, it computes

C =(gmiri"modn2) h (tf*G(xi) (9) H (t | G (x,) (3Xi) (10) H2 (y, | C | H, (t | G (x,) ¡3Xi)) (11)

Then, SM encrypts the total electricity data and sends yi, C,, Hx (t | G (x,) ¡¡x ),

H2 (y, | Ci | H1 (t | G (xt)3X.)) to the DA. Remarkably, user's identity can be confirmed by (x,, y,).

Publishing the group serial number y, while masking the serial number x, can still protect user's identity. We show the process of encryption in Algorithm2. Tab. 3: Encryption

Algorithm2: Encryption

SM.Input: [x,, yt, G (xt), 3Xi ], (n, h, g, H1, H2)

SM.Output: yt, C,, H (t | G (x,) 3Xi), H2 (yt | C, | H, (t | G (x,) 3Xi))

(1) Choose r, = GenRan()f~]{r, < n}

(2) Calculate C, = [g^r^modn2) h (t )

(3) Calculate H, (t | G (x, 3) and H2 (yt | Ct | H, (t | G (xt) 3,))

(4) send y,, C , H, (t | G (x, )3Xi), H2 (y, | C | H, (t | G (x,) 3XI)) to DA

5.3. Data aggregation

When the DA receives a message from a SM, it verifies H2 (yi | Ci | Hx (t | G (xi) 3x )) to

authenticate the message integrity. If the hash value is right, the data aggregation will be performed.

1) Normal aggregation: If the DA receives all the SMs' data in a group, it runs the data aggregation as follows:

Csum nCi

g E'=m

= g i=1'

h(t )T-3x'G(x'5 (12)

DA calculates the hash value of the concatenation of Csum and t, and encrypts the value by his private key skD . We show the final result as follows:

Enc[skD, H (Cum 11)] (13)

2) Fault tolerance: As DA only knows which group a SM belongs to according to the group serial number, we can use Hx (t | G(x,)3x.) to find the malfunctioning SM while mask user's identity. If there is a malfunctioning SM in a group, DA runs the following steps:

First, it compares hash table of this group constituted by Hx (t | G (xt) 3x.) with other complete groups to find the malfunctioning SM.

Then, selects a SMj from other groups with the same hash value Hx (t | G (x,) 3x.) to replace the malfunctioning SM,. Theoretically, if there is a malfunctioning SM, we shouldn't consider this user's data. To further reduce the error, the data of SM j is processed before the data aggregation as follows:

m =-(14)

C j = -i = (gmJ—mrJmodn2) h (t )3xJG(xj ] (15) g

.= mt represents the sum of the electricity data of the previous period in group j. C . represents

the processed data of Cj and replaces the missing data Ci to run the data aggregation. We show the process of data aggregation in Algorithm3. Tab. 4: Data aggregation

Algorithm3: Data aggregation

DA.Input: yt, C,, H, (t | G (x,) 3Xi), H2 (yt | C, | Hx (t | G (x,) 3Xi)) DA.Output: Cum , Enc[SkD , H! (Csum | t)]

(1)Calculate H2 (yt | Ct | H1 (t | G (xt) )) based on input.

(2)If H2 (yt | C | H1 (t | G (xt )3x,)) is right, then

(3) For 1 <, < v

(4) Calculate the number of SM in groupi

(5)If(Count(SM)=d), then

(6) Csum = nC

(7) , + +, return to (3)

(8)else

(9) Compare hash table composed of Hx (t | G (xt) 3x,) with other groups

(10)If there is a value lost in group, , then

(11) Calculate rh and C,, return to (6)

(12)else if there is an extra value in group,

(13) Drop this message

(14)end if

(15)end if

(16)end if

5.4. Power dispatching

After receiving the aggregated result from DA in smart grid, CC first uses the public key of DA to decrypt the value of Enc[skD,H1 (Csum 11)], which is used to ensure that the message is from DA.

Then, it calculates the hash value of the concatenation of Csum and t. If the final result is the same

with the attached one, CC can calculate the sum of the electricity data by the decrypted key X as formula (16).

L (Csummodn

Msum =—y—-^ modn (16)

L ( g modn2

Msum denotes the sum of users' plaintexts. Basing the sum of the real-time data, CC can draw the

real-time load curve and create the dynamic pricing, power generation plan and other scheduling strategies. We show the process of power dispatching in Algorithm4.

Tab. 5: Power dispatching

Algorithm4: Power dispatching CC.Input: Cum, Enc[skD, H, (Cm\t )],A

CC.Output: Msi

1)Decrypt Enc[skD, Hx (Cmm 11 )] by pk

2)Calculate H (Csum 11 ) based on input

3)IfH (Csum 11 ) is wrong, then

4) Drop this message

5)else

6) Calculate M by formula(16)

7)end if

8)For 1 < i < v

9) Calculate m for groupi and send to DA

10) i + +, return to (8)

6. Security Analysis

6.1. Privacy-preserving

For a SM in smart grid, we can analyze the security of data from following aspects: external attacker, DA, CC and conspiracy attack.

1) External attacker: When an external attacker compromises the user's SM, the cipher text C, sent by

the user can be obtained. However, because the attacker doesn't know the other d — 1 users'private key and the decrypted key X, so it can't acquire the plain text.

2) DA: After receiving all the data from SMs, DA can only perform the data aggregation and replace the malfunctioning one when it's necessary. However, it can't obtain a single user's plaintext due to

the lack ofh(t)3x (x' , therefore, the security of the user's data can be guaranteed.

3) CC: CC can only get all the users'aggregated data, so it can't snoop to a single user's real-time data. Thus, user's privacy can be guaranteed in our scheme.

4) Conspiracy attack: If DA gets the decrypted key from CC and tries to acquire the plaintext of a single user, user's privacy can still be preserved because they don't know the secret shares h(t )3x'G(x'5

. What's more, if the DA is in collusion with some SMs from other groups with the same SM key, our scheme can also protect the users' privacy because of anonymity.

6.2. Resistance to differential attack

Theorem 1: If the threshold of secret sharing scheme is the same with the number of group members during the data aggregation, then, this scheme can resist to differential attack.

Proof:

1) Given two data sets differing on at most one element, Csum1 and Csum 2, we can describe them as follows:

Cm = CC .Cd (17)

Csum2 = C1C2 '-CdCd+1 (18)

2) If we don't consider the obfuscation of secret share, and an adversary gets the decrypted key X from CC and gets the two data sets from DA, he can launch the differential attack as follows:

Ci = gm'r"modn2 (19)

L (CLimodn2) .

Msumi = ri m S 2\ modn (20)

L (g modn )

L (CL 2 modn2) Mum2 = ;(T modn (21)

L (g modn )

md = Msum2 - Msum1 (22)

3) When we add secret shares to further obfuscate the data and set the threshold of secret sharing scheme same with the number of group members during the data aggregation, we have

C =(gm'rinmodn2) h (tf*G(x') (23)

L (CLimodn) .

Msum1 = r( m S 2\ modn (24)

L (g modn )

However, because multiplication of d + 1 shares h (t )l3xiG((Ci) isn't equal to be one, therefore, adversary can't get the value of Msum2.

L (C°um2modn ) . Msum2 * r, * , ^ modn (25)

L (g modn )

User's plaintext md is secure because adversary doesn't know the value of Msum2. We complete our proof.

6.3. Fault tolerance

Theorem 2: When we substitute the data of mal-functioning SM with the data of normal SM which has the same secret share in other groups, error caused by substitution can be ignored.

Proof:

1) We suppose that there are F malfunctioning SMs in smart grid composed of N SMs. Then, the error rate can be calculated as follows:

FC / gm e% = FC ' V (26)

(N - F)C

2) As the number of SM is much larger than the number of malfunctioning SM, therefore, the error rate can be simplified as follows:

1 F e% = -,--- «-- (27)

(N / F-1) gm Ngm

3) The larger is the number of SM, the smaller is the error rate. For the large quantities of SM in smart grid, the error rate caused by substitution can be ignored.

We complete our proof.

7. Performance Evaluation

The most important idea of our paper is to use substitution to realize the fault tolerance, so it is necessary to prove the substitution between two members in different groups to be right for the power dispatching in smart grid.

Here, we take the sum of the normal SMs' data as the signal and take the sum of malfunctioning SMs' data processed by substitution as the noise. Then, we can use the SNR to measure the error rate of our scheme and the detailed formula is presented as (28).

C — C

SNROS = 1—= 10lg

gm (28)

By inputting the number of malfunctioning SMs and the total number of the SMs in smart grid into the formula (28), where Csum denotes the sum of normal SMs' data and Csum3 denotes the sum of

malfunctioning SMs' data and Csum denotes the sum of the data processed by the fault tolerance, we can get the curve of SNR as figure2.

Fig. 2: Signal to noise ratio in our scheme

For the number of SMs is from 1000 to 5000, we can find that the SNR of our scheme is more than 35 dB in the worst case and the average rate is close to 40 dB, which is allowed in smart grid for the power scheduling. As the formula (28) shows, the SNR is related to the value of g. Therefore, we can increase the accuracy by adjusting the value of g.

We use Texp to denote the time of exponentiation and use Tmul to denote the time of multiplication. Tran denotes the time of generation of random number and Tpro denotes the time of Pollard's lambda

method As far as we know, DG-APED scheme proposed by Shi, Zhiguo is also a subtle group-based scheme with fault tolerance [23].We can calculate computational cost of our scheme and DG-APED when there is no malfunctioning SM as follows:

Tos =(N + 4) Tmul + 5Texp (29)

TDG = (N + 5) Tmul + 7Txp + Tran + Tpro (30)

Fig. 3: Computational complexity in normal situation

As figure.3 shows, we can find our scheme has great advantage than DG-APED when the number of SM varies from 1000 to 5000.

Then, we calculate the computational cost of our scheme and DG-APED when there are malfunctioning SMs. To observe the relationship between computational cost and malfunctioning SMs, we suppose that the number of SM in smart grid is 1000. L denotes the number of SM's types and rn denotes the number of groups in each type. k is the number of SMs in a group at DG-APED scheme.

Tos =(F + N + 4)Tmul + 5Texp (31) Tdg = (N + 5 ) Tmul + 7Texp + Tran +

Tpro (32)

As figure.4 shows, our scheme has great advantage than DG-APED when the number of malfunctioning SM varies from 0 to 200.

Fig. 4: Computational complexity considering fault-tolerance

Besides, We also calculate the SNRs of our scheme and DG-APED. The SNRs of two schemes are showed as follows:

C — C C\

SNROS = 1Q/g^^—= 10/g

gm (33)

2(NC - FC) 2(N - F) SNRdg = 10/g -)-= 10/g -(34)

DG 5 (k-1)FC (k-1)F

Fig. 5: Signal to noise ratio of two schemes

Through the figure 5, we can find that the SNR of our scheme is much higher than DG-APED, which means our scheme has less error rate.

8. Conclusion

In this paper, we propose a privacy-preserving data aggregation scheme based on secret sharing scheme. We set the threshold of secret shares same with the number of group members to resist the differential attack by the conspiracy between DA and CC. Besides, we mask user's identity by using the same group serial number, adopt a hash table to find the malfunctioning SM and achieve the fault tolerance for the normal aggregation by substitution. Therefore, even if there are some malfunctioning SMs in a group, our scheme can also run in a normal way. We will focus on combining the real-time data privacy with the normal billing in our future work.

Acknowledgements

This work is partially supported by Natural Science Foundation of China under grant 61402171, the Fundamental Research Funds for the Central Universities under grant 2016MS29.

References

[1] H. Zhang, Q. Zhang, Z. Zhou, X. Du, Processing geo-dispersed big data in an advanced mapreduce framework, Network IEEE 29 (5) (2015) 24-30.

[2] S. Yu, Big privacy: Challenges and opportunities of privacy study in the age of big data, IEEE Access 4 (2016) 1-1.

[3] P. Kamat, Y. Zhang, W. Trappe, C. Ozturk, Enhancing source-location privacy in sensor network routing, in: IEEE International Conference on Distributed Computing Systems, 2005. ICDCS 2005. Proceedings, 2005, pp. 599-608.

[4] K. P. N. Puttaswamy, S. Wang, T. Steinbauer, D. Agrawal, A. E. Abbadi, C. Kruegel, B. Y. Zhao, Preserving location privacy in geosocial applications, IEEE Transactions on Mobile Computing 13 (1) (2014)159-173.

[5] H. Zhang, Z. Xu, Z. Zhou, J. Shi, Clpp: Context-aware location privacy protection for location-based social network, in: IEEE International Conference on Communications, 2015, pp. 1164-1169.

[6] A. Mehmood, I. Natgunanathan, Y. Xiang, G. Hua, S. Guo, Protection of big data privacy, IEEE Access 4 (2016) 1821-1834.

[7] G. Sand, L. Tsitouras, G. Dimitrakopoulos, V. Chatzigiannakis, A big data aggregation, analysis and exploitation integrated platform for increasing social management intelligence, in: IEEE International Conference on Big Data, 2015, pp. 40-47.

[8] P. Costa, A. Donnelly, A. Rowstron, G. O'Shea, Camdoop: Exploiting in-network aggregation for big data applications, Nsdi.

[9] L. Sweeney, k-anonymity:, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (05) (2008) 557-570.

[10] A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, L-diversity: privacy beyond k-anonymity, in: International Conference on Data Engineering, 2006, pp. 24-24.

[11] C. Efthymiou, G. Kalogridis, Smart grid privacy via anonymization of smart metering data, in: First IEEE International Conference on Smart Grid Communications, 2010, pp. 238-243.

[12] R. Doss, W. Zhou, S. Sundaresan, S. Yu, L. Gao, A minimum disclosure approach to authentication and privacy in rfid systems, Computer Networks the International Journal of Computer & Telecommunications Networking 56 (15) (2012) 3401C3416.

[13] J. C. L. Cheung, T. W. Chim, S. M. Yiu, V. O. K. Li, Credential-based privacy-preserving power request scheme for smart grid network, in: Global Telecommunications Conference, 2011, pp. 1-5.

[14] J. Camenisch, A. Lysyanskaya, Signature schemes and anonymous credentials from bilinear maps, in: Advances in Cryptology - CRYPTO 2004, International Cryptologyconference, Santa Barbara, California, Usa, August 15-19, 2004, Proceedings, 2004, pp. 56-72.

[15] M. Stegelmann, D. Kesdogan, Gridpriv: A smart metering architecture offering k-anonymity, in: IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 2012, pp. 419-426.

[16] M. Badra, S. Zeadally, Design and performance analysis of a virtual ring architecture for smart grid privacy, IEEE Transactions on Information Forensics & Security 9 (2) (2014) 321-329.

[17] P. J. Riesch, X. Du, Audit based privacy preservation for the openid authentication protocol, in: Homeland Security, 2012, pp. 348-352.

[18] X. Tan, J. Zheng, C. Zou, Y. Niu, Pseudonym-based privacy-preserving scheme for data collection in smart grid 22 (2) (2016) 120.

[19] Y. Sun, R. Lu, X. Lin, X. Shen, An efficient pseudonymous authentication scheme with strong privacy preservation for vehicular communications, IEEE Transactions on Vehicular Technology 59 (7)(2010) 3589-3603.

[20] R. Lu, X. Lin, T. H. Luan, X. Liang, Pseudonym changing at social spots: An effective strategy for location privacy in vanets, IEEE Transactions on Vehicular Technology 61 (1) (2012) 86-96.

[21] J. Yao, P. Venkitasubramaniam, The privacy analysis of battery control mechanisms in demand response: Revealing state approach and rate distortion bounds, in: Decision and Control, 2014, pp. 1377-1382.

[22] G. Kalogridis, Z. Fan, S. Basutkar, Affordable privacy for home smart meters, in: Ninth IEEE International Symposium on Parallel and Distributed Processing with Applications Workshops, 2011, pp. 77-84.

[23] Z. Shi, R. Sun, R. Lu, L. Chen, Diverse grouping-based aggregation protocol with error detection for smart grid communications, IEEE Transactions on Smart Grid 6 (6) (2015) 1-1.

[24] L. Chen, R. Lu, Z. Cao, Pdaft: A privacy-preserving data aggregation scheme with fault tolerance for smart grid communications, Peer-to-Peer Networking and Applications 8 (6) (2015) 1122-1132.

[25] F. Borges, M. Muhlhauser, Eppp4sms: Efficient privacy-preserving protocol for smart metering systems and its simulation using real-world data, IEEE Transactions on Smart Grid 5 (6) (2014) 2701-2708.

[26] E. J. G. D. Boneh, K. Nissim, Evaluating 2-dnf formulas on ciphertexts., in: Theory of Cryptography, Second Theory of Cryptography Conference, TCC 2005, Cambridge, MA, USA, February 10-12, 2005, Proceedings, 2005, pp. 325-341.

[27] L. Chen, R. Lu, Z. Cao, K. Alharbi, X. Lin, Muda: Multifunctional data aggregation in privacy-preserving smart grid communications, Peer-to-Peer Networking and Applications 8 (5) (2015) 1-16.

[28] Y. Gong, Y. Cai, Y. Guo, Y. Fang, A privacy-preserving scheme for incentive-based demand response in the smart grid, IEEE Transactions on Smart Grid 7 (3) (2015) 1-1.

[29] H. Park, H. Kim, K. Chun, J. Lee, S. Lim, I. Yie, Untraceability of group signature schemes based on bilinear mapping and their improvement, in: International Conference on Information Technology, 2007, pp. 747-753.

[30] L. J. Pang, Y. M. Wang, A new ( t , n ) multi-secret sharing scheme based on shamirs secret sharing, Applied Mathematics & Computation 167 (2) (2005) 840-848.

[31] A. Barletta, C. Callegari, S. Giordano, M. Pagano, Privacy preserving smart grid communications by verifiable secret key sharing, in: International Conference on Computing and Network Communications, 2015, pp. 199-204.

[32] A. Beussink, K. Akkaya, I. F. Senturk, M. M. E. A. Mahmoud, Preserving consumer privacy on ieee 802.11s-based smart grid ami networks using data obfuscation, Opensiuc.

[33] C. Dwork, Differential Privacy, Springer Berlin Heidelberg, 2006.

[34] Y. Hong, J. Vaidya, H. Lu, P. Karras, S. Goel, Collaborative search log sanitization: Toward differential privacy and boosted utility, Dependable & Secure Computing IEEE Transactions on

12 (5)(2015) 504-518.