IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION 1

Linear Precoding Designs for Amplify-and-Forward Multiuser Two-Way Relay Systems

Rui Wang, Meixia Tao, Senior Member, IEEE and Yongwei Huang, Member, IEEE

Abstract—Two-way relaying can improve spectral efficiency in two-user cooperative communications. It also has great potential in multiuser systems. A major problem of designing a multiuser two-way relay system (MU-TWRS) is transceiver or precoding design to suppress co-channel interference. This paper aims to study linear precoding designs for a cellular MU-TWRS where a multi-antenna base station (BS) conducts bi-directional communications with multiple mobile stations (MSs) via a multi-antenna relay station (RS) with amplify-and-forward relay strategy. The design goal is to optimize uplink performance, including total mean-square error (Total-MSE) and sum rate, while maintaining individual signal-to-interference-plus-noise ratio (SINR) requirement for downlink signals. We show that the BS precoding design with the RS precoder fixed can be converted to a standard second order cone programming (SOCP) and the optimal solution is obtained efficiently. The RS precoding design with the BS precoder fixed, on the other hand, is non-convex and we present an iterative algorithm to find a local optimal solution. Then, the joint BS-RS precoding is obtained by solving the BS precoding and the RS precoding alternately. Comprehensive simulation is conducted to demonstrate the effectiveness of the proposed precoding designs.

Index Terms—MIMO precoding, two-way relaying, nonregenerative relay, minimum mean-square-error (MMSE), convex optimization.

I. Introduction

DUE to complex wireless propagation environments, such as multi-path fading, shadowing and interference, the signals received by a remote destination receiver are not always strong enough to be decoded correctly. This problem has been considered as a main obstacle in the development of modern wireless communication systems. Recently, relay assisted cooperative communication has been proposed as an efficient way to deal with this problem, which now has received great attention from both academia and industry. One example of the relay assisted cooperative communication is one-way relay system, which has been well studied in past decade [1], [2]. Although it has shown great potential in for example, transmission reliability, energy saving and coverage

Manuscript received January 13, 2012; revised July 1 and September 9, 2012; accepted September 13, 2012. The associate editor coordinating the review of this paper and approving it for publication was D. Dardari.

R. Wang and M. Tao are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China (e-mail: {liouxingrui, mxtao}@sjtu.edu.cn).

Y. Huang is with the Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong (e-mail: huang@hkbu.edu.hk).

This work is supported by the Joint Research Fund for Overseas Chinese, Hong Kong and Macao Young Scholars under grant 61028001, the NSF of China under grant 60902019, and the NCET Program under grant NCET-11-0331.

Part of this work was presented at GLOBECOM 2011.

Digital Object Identifier 10.1109/TWC.2012.100112.120082

extension, one-way relaying on the other hand reduces spectral efficiency due to half-duplex constraint.

A promising technique to improve spectral efficiency of one-way relaying is to apply network coding [3], resulting in two-way relaying which has now attracted great attention [4]-[7]. Two-way relaying applies the principle of network coding at the relay node so as to mix the signals received from the two source nodes who wish to exchange information with each other and then employs at each destination self-interference (SI) cancelation to extract the desired information. Compared with traditional one-way relaying, spectral efficiency of two-way relaying can be significantly improved since only two time slots instead of four time slots are needed to complete one round of information exchange.

In this work, we consider two-way relaying in multiuser systems. As in traditional multiuser systems, it is crucial to mitigate co-channel interference (CCI) for multiuser two-way relay system (MU-TWRS). An advanced method to suppress CCI is to apply multiple-input multiple-output (MIMO) technique. Therein, transceiver or precoding should be carefully designed at each multi-antenna station, especially at the relay station (RS) [8]-[15]. In [8], [9], authors study linear relay precoding for MU-TWRS with decode-and-forward (DF) relay strategy. Since the received signals are fully decoded in the first time slot, the relay precoding only affects the transmission in the second time slot. Then, by using zero-forcing (ZF) precoding, the relay precoding studied in [8], [9] reduces to a power allocation problem. The amplify-and-forward (AF) relay precoding, however, differs considerably from DF case as the transmissions of the first and second time slots are tightly coupled and hence is more challenging. Using ZF and minimum mean-square-error (MMSE) criteria, authors in [10]-[13] study precoding design for an AF based MU-TWRS with multiple pairs of users. In particular, the explicit and analytical results are derived in [13] for system performance evaluation. Relay precoding design for the AF based MU-TWRS with multiple pairs of users is also considered in our previous work [14]. Unlike [10]-[13], we do not impose any structural constraint on the relay precoder and thus the obtained results can approach the optimal performance [14]. In [15], authors study an AF MU-TWRS model with one base station (BS) and multiple mobile stations (MSs). By using ZF precoding scheme, explicit analytical results are also provided as in [13]. It is worth noting that the aforementioned ZF based precoding designs all impose certain constraints on the number of relay antennas which may not be available for some scenarios.

In this paper, we consider linear precoding design for a

1536-1276/12$31.00 © 2012 IEEE

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

cellular MU-TWRS where a multi-antenna BS intends to conduct bi-directional communications with multiple MSs via a multi-antenna RS. Our work differs from [9] in that we adopt AF relay strategy rather than DF for its simplicity in practical implementation. However, as mentioned previously, the precoding design with AF relay strategy is more challenging. Our work is also different from [15] since we do not impose any structure on precoders. Our design goal is to enhance uplink performance subject to individual signal-to-interference-plus-noise ratio (SINR) requirement for downlink signals. Specifically, total mean-square error (Total-MSE) and sum rate are chosen to measure the uplink performance. Since linear precoding can be employed at the BS, RS or both, three associated optimization problems are considered. When precoding is only conducted at the BS with the RS precoder fixed, we show that this optimization problem can be converted to a standard second-order cone programming (SOCP), thus the optimal solution can be obtained efficiently. The RS precoding with the BS precoder fixed, on the other hand, is non-convex and we present an iterative algorithm to find a local optimal solution. Thirdly, we obtain the joint BS-RS precoding design by solving the BS precoding and the RS precoding alternately, the convergence of which is guaranteed. Simulation results show that the RS precoding scheme outperforms the BS precoding scheme in most cases and the joint precoding scheme outperforms the individual precod-ing scheme. Besides performance, practical implementation issues, including signaling overhead and design complexity, for the proposed precoding designs are also discussed and compared.

The rest of the paper is organized as follows. In Section II, we present the system model. Different precoding designs are presented in Section III. In Section IV, we discuss the overhead and design complexity. Extensive simulation results are illustrated in Section V. Finally, we conclude the paper in Section VI.

Notations: E(■) denotes the expectation over the random variables within the brackets. <8 denotes the Kronecker operator. Tr(A), A-1, det(A) and Rank(A) stand for the trace, inverse, determinant and the rank of a matrix A, respectively, and Diag(a) denotes a diagonal matrix with a being its diagonal entries. Superscripts (■)T, (■)* and (■)H denote the transpose, conjugate and conjugate transpose, respectively. 0N xm implies the N x M zero matrix. IN denotes the N x N identity matrix and INxm = [IM, 0(N-m)xm]T if N > M. ||x||2 denotes the squared Euclidean norm of a complex vector x and ||X||^ denotes the Frobenius norm of a complex matrix X. |z| implies the norm of a complex number z, ift(z) and 9(z) denote its real and imaginary part, respectively. Cxxy denotes the space of x x y matrices with complex entries. The distribution of a circular symmetric complex Gaussian vector with mean vector x and covariance matrix £ is denoted by CN(x, £).

II. System Model

Consider a multiuser two-way relay system where an N-antenna BS conducts bi-directional communication with K single-antenna MSs under the assistance of an M-antenna RS.

Base Station

Mobile Users

ïï^ g2 k

1st time slot 2

Fig. 1. Illustration of a cellular MU-TWRS.

For effective multiuser transmission, we let N > K and M > K. Moreover, we assume that all the MSs are cell-edge users. Thus, due to impairments such as multipath fading, shadowing and path loss of wireless channels, the direct-path link between the BS and each MS is ignored. It is also assumed that the RS operates in half-duplex mode. That is, it cannot transmit and receive simultaneously.

The bi-directional (i.e., uplink and downlink) communications take place in two time slots as shown in Fig. 1. In the first time slot, also referred to as multiple-access (MAC) phase, both the BS and MSs simultaneously transmit their signals to the RS. The received M x 1 signal vector at the RS can be written as

yn = HixB + h2fcsk + , k=i

where xB € CN x1 represents the transmit signal vector from the BS, sk denotes the transmit signal from the MS k. We assume that the transmission power at the MS k is Pk, i.e.,

E (sk s*k

Pk. Hi € CMxN is the MIMO channel matrix

from the BS to the RS, h2fc G CMx1 is the channel vector from the MS k to the RS, and denotes the additive noise vector at the RS following CN(0,a2RIM). Here xB can be further expressed as

where sB g Ckx1 with E(sBsH) = IK is the modulated signal vector from the BS, B = [bi, b2, ■■■ , bK] G CNxk denotes the transmit precoding matrix at the BS. Furthermore, the maximum transmission power at the BS is assumed to be Pb , i.e.,

Tr(BBH) < Pb . (1)

Upon receiving the superimposed signal yR, the RS performs linear processing by multiplying it with a precoding matrix F g CM xm and then forwards it in the second time slot, also referred to as broadcast (BC) phase. Therefore, the M x 1 transmit signal vector from the RS is given by

xr = FyR = fhixb + Fh2fcsk + FnR. k=i

The maximum transmission power at the RS is given by PR, which yields

Tr {F (HiBBHHf + H2PPHHf + 4Im) Fh} < Pr

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS

where we define P = Diag(v^", \fPi-, • • • , \[Pk) and H2 = [h21, h22, •••, h2K]. Then the received signals at the BS and MS k after the BC phase can be written as

У в =J2 GiFh2fcsfc + GiFHiBsB + GiFnfi + пв

= GiFH2SM + gifhibsb + GiFnR + пв,

Ук =J2 g2fcFHibjSBi + g2fcFh2iSi + g2fcFnR + nk.

i=i i=i

|g2k FHibk I

e = £SM (||WYb - SM||2)

with respect to the decoding matrix W, the minimum Total-MSE is given by [19]

e = Tr (E-:L) ,

where E = IK + PfHfFfGf(aRGiFFHGf +

a2BIn)-iGiFH2P and the optimal W in (8) is W =Pf Hf Ff Gf (GiFH2PPf Hf Ff Gf

+aRGiFFfGf + aBIn) "

Our second objective aims to maximize the sum rate of the uplink transmission. By applying successive interference cancelation (SIC) and linear MMSE filter at the BS, the sum rate at the BS is given by [20]

=0.5 log2 det (IK + Pf Hf Ff Gf (aR Gi FFf Gf + aB In )-iGi FH2P

Here, sM = 1, s2 ,•••, sK] , sBi denotes the i-th entry in sB, G1 g CN xM and g2k € CM x1 are the channel matrix and vector from the RS to the BS and MS k, respectively, nB and nk denotes the additive noise at the BS and MS k, respectively, with nB ~ CJV(0,a"BIN) and nk ~ CJV(0,ak). Note that both the BS and MS k know their transmit signals sB and sk, respectively. Therefore, the back propagated self-interference terms sB and sk can be subtracted from (3) and (4), respectively. The equivalent received signals at the BS and MS k are yielded, respectively, as

yB = G1FH2SM + Gi Fnfl + nB, (5)

Vk = g2kFH1bk sBk + g2kFH1bjSBi

T T (6)

+ 2^ g2kFh2iSi + g2kFnfl + nk •

From (6), we find that the received downlink signal at each MS not only consists of the CCI from the downlink transmission (i.e., the second term), but also the CCI from the uplink transmission (i.e., the third term). The downlink performance of each MS can be measured by SINR given by

where the factor 0.5 is due to the fact that the MSs use two time slots to complete the uplink transmission. Note that (11) can be re-expressed as r = 0.5log2 det(E) with E defined in (9). We will see that the precoding designs proposed for Total-MSE minimization can be extended for sum-rate maximization.

III. Linear Precoding Designs

From Section II, it is seen that the downlink performance of each MS depends on both the BS precoder B and the RS precoder F. While for the uplink transmission, it is only related to the RS precoder F, thus less design freedom can be exploited compared with the downlink. In theory, the BS precoder B and the relay precoder F should be jointly designed such that the downlink and uplink performance can be optimized simultaneously. However, there is no single figure of merit to measure the overall performance of the multiuser bidirectional transmission. In this paper, we choose to ensure the downlink quality-of-service (QoS) for each individual MS while at the uplink minimizing the Total-MSE or maximizing the sum rate of all the users. This is because in practice the downlink data traffic usually is more dominant than the uplink traffic. As such, the optimization problem is formulated as

mm e or — r B,F

(|g|fc FHibii2 + Pl |g2k Fh2*i2) + aB ||g2k F||2 +

к = 1, 2, ••• ,K.

As for the uplink transmission in (5), it can be viewed as an MIMO multiple-access channel. Depending on different performance requirements, various metrics can be used to evaluate its performance. Our first objective aims to minimize the Total-MSE of all the MSs by assuming linear minimum mean-square error (MMSE) receiver at the BS. Using Total-MSE for precoding design has been widely studied in multiuser systems [10], [11], [16]-[18]. By minimizing MSE

SINRfc > Xk, к , , , , Tr(BBH) < PB

Tr {F (HiBBHHf + H2PPHHf +aRIm) Ff } < Pr

where Xk is a preset threshold for the MS к.

Since linear precoding can be conducted at the BS, RS or both, three associated precoding designs are considered respectively in the following three subsections. Note that for each design, the system needs different computational complexity and signaling overhead, such that they are suitable to different scenarios.

A. BS precoding

In this subsection, we assume that precoding is only employed at the BS, while the RS precoder is given as F = aF where F is an arbitrary fixed precoder applied at the RS, and a is a non-negative scalar used to scale the received signals at the RS to satisfy relay power constraint. Note that besides maintaining the downlink SINR, a properly designed B can reduce the RS power consumption by the signal sB from the BS. Then the uplink transmission can share more power at the RS, which is helpful for improving its performance.

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

The optimization problem can be formulated as:

min fi(a) or - /2(a)

Pk > Ak, Vk Tr(BBf) < PB

Tr {a2F (H1BBHHf + H2PPfHf+

4 Im)Ff}

where fi (a) with E(a) = a2BIn)-1GiFH2P and

= Tr (E(a)"^ and /2(a)

I„ _l_ ^ufnf vfnf

+ a2Pf Hf Ff Gf (aR a2Gi F Ff G

log2 det (E(a)) ff

!\&Ik F Hibk |

e + a2aR\\gTk F\\2 + a2' a2\gTk F Hib;\2 + a2Pl\gTk F h2i \

where e = e i=k

proceed to solve (13), we first give the following lemma, the proof of which is given in Appendix A.

Lemma 1: f1(a) and -f2(a) are monotonically decreasing functions with respect of a.

Based on Lemma 1, it is easy to see that minimizing f1(a) or —f2(a) in (13) is equivalent to maximizing the scalar a. By defining B = aB, problem (13) can be re-expressed as:

max a (14)

Tr(BBH) < a2PB

Tr( F H1]B B h hh F h )

+a2Tr(F(H2PPhHH + aRIm)Fh) < Pr

]T |gTkFH1bi|2 + a2 mgTkFh2i|2 + \i=k

aR \\gTk F\

+ a2 <(l + ^)|g|fcFH1bfc|2 Ak

B. RS precoding

In this subsection, we consider the precoding design at the RS with the BS precoder fixed. In the following, we first consider the precoding design for Total-MSE minimization, then extend it to sum-rate maximization.

1) Total-MSE minimization: The RS precoding to minimize Total-MSE can be formulated as:

Tr (E_i) t < Pn

Zk > Ak, Vk

where E is defined in (9), t H2PPf Hf + aRIm)Ff } and

Tr{F(HiBBf Hf +

\gTk FHibk\2

Ei=k(\gTk FHibi\2 + Pi\gTkFh2i\2

№k F\\2 + a\

Note that the power constraint at the BS is irrelevant here since B is fixed. It is not hard to verify that the objective function and SINR constraints in (16) are both non-convex. To make (16) more tractable, we substitute the linear MMSE decoding matrix W back into (16) and rewrite it as:

min f (F, W)

F,W V '

t < Pn

Zk > Ak, Vk

f (F, W)

=Tr {WGiFH2PPf Hf Ff Gf Wf+ a|WGiFFf Gf Wf + a2BWWf + I -WGiFH2P - Pf Hf Ff GfWf } .

Although (14) is still a non-convex problem, we can use the observation made in [21] that any phase shift of bk, i.e., ej6bk, does not affect the optimality of the primal problem. Therefore, for any optimal solutions, there always exists a phase shift version of bk to make the term g2kFH1bk real and positive while not affecting the value of the objective function and keeping the constraints satisfied. Thus, we can convert problem (14) into the following equivalent form

||BH2f < a2Pb,

||F H1B HF

+a2Tr(F(H2PPhHH + aRIm)Fh) < Pr

]T |g2kFH1bi|2 + a2 | £Pi^Fh2i|2+

i=1 i=k

^IllgP^Wa2 <(l + ^)(g|fcFH1bfc)2,Vfe

' Ak --V-'

real and >0

where xB = [vec(B)T, a]. It is not hard to verify that (15) is a standard second-order cone programming [22] and the optimal solution can be obtained by using available software package [23]. Then, dividing B by a, we finally get the optimal B.

Note that (18) can also be computed from (8). Although the two design matrices W and F are coupled together in (17), the advantage of introducing W is that we can apply alternating optimization to solve two decoupled subproblems iteratively in what follows.

In the alternating optimization, the first step is to update the BS decoding matrix W for a given F. From (17), it is seen that the constraints are independent of W. Thus, the optimal W can be readily obtained as in (10) by equating the gradient of the objective function in (17) to zero.

Secondly, we need to optimize F with W fixed. This problem is equivalently rewritten as:

Tr {Gf Wf WGiF (H2PPf Hf +aRIm) Ff - FH2PWGi

fwf of Tjf T7f

Hf Ff + a2B WWf + IK}

—Gf Wf P

T < Pr

Zk > Ak, Vk

where we have used the fact that Tr(AB) = Tr(BA) for

(18). Although we can verify that the objective function in

(19) is convex based on [24], while due to the non-convex SINR constraints, the optimal F is still not easy to obtain. To proceed, we need to recast (19) into a suitable form such

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS

that efficient optimization tools can be applied. After certain transformation as detailed in Appendix B, problem (19) can be rewritten into the following inhomogeneous quadratically constrained quadratic program (QCQP) form [22]:

min ff Qof - ff qo - qf f + qo s.t. ff Qxf < Pr

ff Qkf > Akak, Ук

(20b) (20c)

where f, Qo, Qx and Qk are defined in (34), (36) and (38) in Appendix B, respectively. By checking the positive semidefiniteness of Q0 and the positive definiteness of Qx, we can verify that both the objective function (20a) and the RS power constraint (20b) are convex. However, the constraint (20c) is not concave due to that Qk defined in (38) is not necessarily negative semidefinite. Hence, optimization problem (20) is non-convex. To solve (20), we rewrite (20) into a standard QCQP form as follows:

xf Q oxf

where xF

[t, fT]T

xF Q xxF < 0

xf QkxF < 0, Ук

0M2x i

0ixM 2 Qx

0M 2x i

Note that

and Q k

(19) and (21) are equivalent to each other. If we gel an optimal solution of (21), we can always obtain an optimal solution of

(20) by selecting appropriate entries from xF/t no matter t is real or complex. By a close inspection of (21), we find that (21) can be transformed into the following semidefinite programming (SDP) form [22]:

Tr( Q oXf )

Rank(XF ) = 1, Tr( QXf ) = 1 Tr(QxXf) < 0 Tr(QkXF) < 0, Ук

where Q

Due to the rank-one con-

1 0lxM2 0M2 xl 0M2 xM2

straint, it is not easy to obtain an optimal solution of (22). We therefore resort to relaxing it by deleting the rank-one constraint, namely,

Tr(QoXF ) (23)

Tr( QXf ) = 1 Tr(QœXF) < 0

Tr(QkXF) < 0, yk

Note that (23) is a standard SDP problem, thus its optimal solution can be easily obtained by using the available software package [23]. If the optimal solution of (23) is rank-one, the optimal RS precoder can be obtained by using eigenvalue decomposition. Otherwise, certain techniques are required to find the optimal RS precoder.

In what follows, we first consider a system with no more than two MSs (i.e., K < 2) for which an optimal solution of (20) can be obtained in most cases. Then, we extend the results to a more general system with K > 2 where the randomization technique is applied to find a quasi-optimal solution.

a) K < 2: We first give the following theorem.

Theorem 1: Suppose that the considered cellular MU-TWRS has at most two MSs, i.e., K < 2, an optimal rank-one solution of the non-convex optimization problem (22) can be derived in polynomial time from the relaxed SDP problem (23) in the following cases: 1) problem (23) has an optimal rank-one solution; 2) problem (23) has at least one inactive constraint at the optimal solution; 3) problem (23) has an optimal solution of rank higher than two if all the constraints are active.

Proof: Please refer to Appendix C. ■

From Theorem 1, we find that we cannot obtain an optimal rank-one solution if the SDP relaxation problem (23) happens to have an optimal solution of rank two with all the constraints being active. However, our simulations show that this case has rarely occurred. Nonetheless, we can propose a procedure of producing a suboptimal rank-one solution in Appendix D for that special case.

Now, the iterative RS precoding algorithm to minimize Total-MSE for K < 2 can be outlined as follows.

Algorithm 1 (RS precoding with K < 2)

• Initialize F

• Repeat

- Update the BS decoding matrix W using (10) for a fixed F;

- Update the RS precoder F with W fixed as follows: If the obtained Xf in (23) is rank-one, using eigenvalue decomposition to get F. Otherwise, using the procedures presented in Appendix C or D to get F;

• Until termination criterion is satisfied.

Lemma 2: Algorithm 1 is convergent and the limit point of iteration is a stationary point of (17).

Proof: Since for K < 2, the optimal solution in (19) can be obtained in most cases as claimed in Theorem 1, the solution in each iteration in Algorithm 1 can be viewed as optimal. Thus the Total-MSE at the BS is strictly reduced after each iteration before convergence. On the other hand, the objective function is lower-bounded (at least zero). Therefore, we conclude that Algorithm 1 is convergent. We assume that the limit point of Algorithm 1 is {W, F}. At the limit point, the solution will not change if we continue the iteration. Otherwise, the Total-MSE can be further decreased and it contradicts the assumption of convergence. The optimal solution in each iteration further means that W and F are local minimizers of each subproblem. Hence, we have

Tr |VWf (W; F)T (W - W)} > 0,

Tr{ VFf (F; W)T (F - F)} > 0.

Summing up the two inequalities, we get

Tr{Vx/ (X)T (X - X)} > 0,

where X = [W, F]. Condition (24) implies the stationarity of X in (17) (e.g., see Theorem 3 of [25]). ■

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

b) K > 2: Now we consider a more general case with K > 2. Since at least five constraints are contained in (23), it is difficult to find an optimal rank-one solution if the optimal solution in (23) has higher rank than one. Next we propose to apply the randomization technique in [26] to find a quasioptimal rank-one solution of (20). We first transform (20) into the following equivalent form:

Tr (QqF) - fHqo - qf f + qo

Tr (QqF) - fHqo - qf f + qo

Tr (QœFj < Pr

Tr (QkF) > Ak4, Vk > 0

E (fHQof - fHqo - qHf + qo

E (fHQf) < Pr E (fHQkf) > Aka2k, Vk

logk det (IK + PHHHFhGH

(aR GiFFH Gf + aB In )_1 GiFHkP t < Pr Zk > Ak, Vk

Tr (QœFj < Pr Tr (QkF) > Ak a2, Vk

F = fxfH

Relaxing the constraint F = f x fH to F > f x fH and applying the Schur complement theorem, we get the following optimization problem:

where the constraints are the same with (16). It is not hard to verify that (28) is non-convex. To solve (28), we introduce the following lemma.

Lemma 3: If F satisfies the Karush-Kuhn-Tucker (KKT) conditions of (28), it will also satisfy the KKT conditions of the following problem:

min Tr (AE-1)

s.t. t < Pr

Zk > Ak, Vk

where E is defined in (9), and the weight matrix A is set to 1 - + F" Hf F" G" (af^FF" G" +

A =—- (IK + PHHf FHGf (a^FF^Gf+ log 2

Note that (26) is convex, thus the obtained solution is optimal. If we generate enough samples of Gaussian variable x following CN( f, F - f x fH ) with F and f being an optimal solution of (26), and choose the best candidate x from the samples as a solution of (20), x will optimally solve (20) on average, i.e.,

Finally, the proposed iterative algorithm for K > 2 is outlined as:

Algorithm 2 (RS precoding with K > 2)

• Initialize F

• Repeat

- Update the BS decoding matrix W using (10) for a fixed F;

- Update the RS precoding matrix F with W fixed using the following steps: First, form an optimization problem as (23), if the obtained F is rank-one, the optimal RS precoder is obtained by applying eigenvalue decomposition. Otherwise, apply the randomization procedures (25)-(26) to get a quasi-optimal solution;

• Until termination criterion is satisfied.

Note that although the obtained F from the second step in Algorithm 2 may not be optimal, our simulation results show that the obtained F by using randomization is always good enough to make the iteration convergent.

2) Sum-rate maximization: Motivated by the relationship between sum rate and weighted MMSE in MIMO-BC system recently found in [27], we next try to extend the proposed RS precoding design for Total-MSE minimization to sum-rate maximization. The sum-rate maximization problem is re-stated

aB In) G1FH2PJ .

Proof: The proof is similar to the MIMO BC precoding design problem in [27], thus we omit for brevity. ■

Lemma 3 implies that using the weight matrix A in (30), (28) shares the same stationary point with (29). Then alternating optimization can be used to get the final solution of (28) as in [27], which is presented as follows:

Algorithm 3 (RS precoding for maximizing sum rate)

• Initialize F

• Repeat

- Update the BS decoder matrix W using (10) for fixed F and A;

- Update the weight matrix A using (30) for fixed F and W;

- Update the RS precoder matrix F as in Algorithm 1 or 2;

• Until termination criterion is satisfied.

According to the convergence analysis provided in [27], the convergence of Algorithm 3 can be ensured.

C. Joint precoding

Obviously, the previously presented two precoding designs can be combined to realize the joint BS-RS precoding design to obtain better performance. In this case, if the RS has enough capability to enable the joint design, it can collect all the required CSI and optimize F and B jointly. Then besides F, the RS should also broadcast B to the BS and MSs. On the other hand, the joint optimization can also be conducted at the BS and the RS helps to collect CSI and transmits them to the BS. Then, the BS needs to transmit B and F to the RS, and the RS further broadcasts them to the MSs. Nevertheless, such joint precoding design requires more feedback overheads although it leads to better performance.

According to the algorithms proposed in Subsections A and B, the joint precoding design is outlined as:

Algorithm 4 (Joint precoding scheme)

• Initialize B

• Repeat

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS

- Update the RS precoder F for a fixed BS precoder B by using Algorithm 1 or 2 for Total-MSE minimization and Algorithm 3 for sum rate maxmization;

- Update the BS precoder B for a fixed relay precoder F by using the SOCP optimization as in Subsection A;

• Until termination criterion is satisfied.

Lemma 4: The proposed joint precoding design algorithm is convergent.

Proof: For convenience of presentation, we take Total-MSE minimization as example. The proof can be easily extended to the case of sum-rate maximization. Firstly, for a fixed F, updating B must decrease the Total-MSE at the BS by increasing a in (13), otherwise, the BS precoder B should not be changed. Thus, we have

e (B(n + 1), F(n)) < e (B(n), F(n)),

where n denotes the iteration index. Then, we apply the proposed RS precoding design to update F by initializing F0 = aF(n). Since the proposed iterative RS precoding design algorithm decreases Total-MSE after each iteration, we

e (B(n + 1), F(n +1)) < e (B(n + 1), F(n)) •

Therefore, we conclude that the joint precoding design algorithm is convergent. ■

IV. Discussion on Signaling Overhead and Design Complexity

As mentioned previously, each precoding design has its own merit. Choosing which precoding scheme is not only dependent on the processing capability of the BS and the RS, but also the design complexity and signaling overhead. In this section, we provide a comprehensive comparison between these designs. It is assumed that the channel characteristics of each link change slowly enough so that they can be perfectly estimated by using pilot symbols or training sequences. Besides, the information of channel state and precoders can be exchanged accurately between the BS and the RS, the RS and the MSs through some lower rate auxiliary channels. For completeness, two transmission modes, i.e., time-division duplex (TDD) mode and frequency-division duplex (FDD) mode, are considered, respectively. The overall comparisons are presented in Table I, where "Overhead-I" denotes the overhead used to feed back the CSI and "Overhead-II" denotes the overhead used to feed back the precoding information. Moreover, we suppose that the BS and MSs can estimate their local CSI G1 and h2k, Vk, respectively.

Since the BS precoding design is an SOCP problem, according to [28], the design complexity can be approximated as

nBS = (NK+1)2(K + 2)°'5(2NK + K2 + 2K + 4)log(1/e),

where e denotes the solution accuracy. For the RS precoding design, the design complexity mainly comes from solving the

1On the case of solving (20) through randomization at K > 3, if we cannot find a solution decreasing the objective value in (20), we can just set

F(n + 1) = aF(n).

2 1.9 1.8 1.7

1.5 1.4 1.3

Iteration Index

(a) N = 2, M = 2,K = 2

15 20 25 Iteration Index

(b) N = 33, M = 3,K = 3

Fig. 2. Checking the optimality of the RS precoding design at P = 5 dB and L = 5.

SDP problem and using the randomization technique. Thus, according to [29], it can be approximated as

nBS = Ibs (max(M2, K + 2)4M log(1/e) + nrd), (32)

where nrd denotes the complexity of randomization and lBS denotes the iteration number required in Algorithm 1, 2 or 3. Note that when K < 2, nrd is equal to 0 (noting that the complexity of getting rank-one solution from higher rank one is dominated by that of solving the SDP, and thus can be negligible). Combining (31) and (32) leads to the joint precoding design complexity given in Table I where lj denotes the iteration number needed in Algorithm 4.

From Table I, we find that the difference of signal overhead between the BS precoding and the RS precoding is not significant if they are designed at the same station and it depends on the antenna configuration of the system. In general, the BS precoding design has less design complexity compared with the RS precoding design. For each precoding design, it is more practical to perform it at the RS in order to save the signaling overhead consumption.

8 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

TABLE I

Signaling overhead and design complexity comparison

TDD FDD Complexity

Overhead-I Overhead-II Overhead-I Overhead-II

BS Precoding ■ ' ( Design at BS ) RS h^fc BS RS MSs BS ^RS b a RS MSs MSs RS RS h2fc'gJà^fc'Hl BS RS HJ4?fc MS k BS =5 RS b a RS MSs 0(UBS)

BS Precoding ■ ■ ( Design at RS ) same as (1) b a RS => BS, MSs BS RS MSs RS WVfc.hi RS BS RS HJ4?fc MS k b a RS BS, MSs 0(Ubs)

RS Precoding ■ ■ ( Design at BS ) same as (1) BS RS RS MSs same as (1) BS RS RS MSs O(nRs )

RS Precoding ■ ' ( Design at RS ) same as (1) RS BS, MSs same as (2) RS BS, MSs O(nRs )

Joint Precoding ' ' ( Design at Bs") same as (1) BS M RS RS MSs same as (1) bs ^Irs RS ^ MSs O(lj (nBS + nRS))

Joint Precoding ■ ' ( Design at Rs") same as (1) b,f RS BS, MSs same as (2) b,f RS BS, MSs O(lj (nBS + nRS))

V. Simulation Results

In this section, some numerical examples are presented to evaluate the proposed precoding designs. The channels are set to be Rayleigh fading, i.e., the elements of each channel matrix or vector are complex Gaussian random variables with zero mean and unit variance. We assume that the noise powers at all the destinations are the same, i.e., aR = aR = a\ = 1, Vk. The transmission power at all the MSs and RS are the same as PR = Pk = P, Vk, and the transmission power at the BS is assumed to be PB = LP where L is a constant. For all the simulations, 1000 channel realizations have been simulated. Moreover, 10000 quadrature-phase-shift keying (QPSK) symbols are transmitted from each source node for each channel realization when simulating bit-error-rate (BER) performance. For all comparisons, if not specified otherwise, the fixed RS procoder F in the BS precoding design is chosen as F = 1M and the fixed BS precoder B in the RS precoding design is chosen as xK.

In Fig. 2, we check the optimality of the proposed RS precoding design, Algorithm 1 and Algorithm 2, for Total-MSE minimization in Fig. 2(a) and Fig. 2(b), respectively, by trying different initialization points at three sets of given but arbitrary channel realizations. Specifically, for each channel realization, six different initialization points, including the identity matrix and five random matrices, are simulated. Moreover, for K = 3, we choose three channel realizations where the randomization technique is needed to find a quasi-optimal rank-one solution of (20). Fig. 2(a) shows that Algorithm 1 for K = 2 can converge to a unique solution with any initialization points. Fig. 2(b) shows that Algorithm 2 for K > 2 is also able to converge to the solutions which are close to each other with different initialization points. Thus we conclude that the proposed iterative RS precoding for Total-MSE minimization can indeed approach the optimal solution.

In Fig. 3(a), the convergence behavior of the proposed RS and joint precoding designs for Total-MSE minimization is shown as the function of iteration index at P = 5 dB and

1.9 1.8 1.7 1.6

w 1.5 1.4 1.3 1.2 1.1

\ —+— RS precoding Joint precoding

V .¿Tîs

—.......... --N=3, M=3, K= 3

\ --N=2, M=2, K=2

"^AAAAAAAUAAAAAAAAAAAAAAA AAt UÀÀÀAÀÀÀA j

15 20 25 Iteration Index

(a) Convergence behavior

^ Lower Bound Randomization

-e- -i

-u-t>-y-0-i)

0 1000 2000 3000 4000 5000

Number of Randomization Samples

(b) Complexity of randomization

Fig. 3. Convergence behavior of the proposed iterative precoding design and complexity of randomization at P = 5 dB and L = 5.

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS

L=1 Non-precoding

0 L=1 Proposed-BS

V L=1 Proposed-RS-MSE

L=1 Proposed-Joint-MSE

— L=10 Non-precoding

-©- L=10 Proposed-BS

L=10 Proposed-RS-MSE

L=10 Proposed-Joint-MSE

P (dB)

-0— Proposed-BS(4.2) —k- Proposed-RS(4.2) -e— Proposed-BS(3.3) —k- Proposed-RS(3.3) -0— Proposed-BS(2.4) —k- Proposed-RS(2.4)

P (dB)

(a) BER comparison

(a) BER comparison

- L=1 Non-precoding L=1 Proposed-BS

L=1 Proposed-RS-Rate

- L=1 Proposed-Joint-Rate L=10 Non-precoding L=10 Proposed-BS L=10 Proposed-RS-Rate L=10 Proposed-Joint-Rate

te 4 rat

P (dB)

P (dB)

(b) Sum-rate comparison

(b) Sum-rate comparison

Fig. 4. Performance comparison for different precoding designs with N -2, M = 2,K = 2.

Fig. 5. Performance comparison for the BS and RS precoding designs with different antenna configuration (N, M) at L =5 with K = 2.

L = 5. We observe that the proposed RS precoding converges in 20 iterations for K = 2 and in 30 iterations for K = 3. Moreover, the proposed joint precoding algorithm converges within 10 iterations for both two and three MSs2. Fig. 3(b) illustrates the required random samples in solving (20) by using randomization to approach the lower bound obtained from (26). We observe that as the number of the samples increases, a better solution can be obtained. But when the number exceeds 2000, the obtained solution does not change much, which further indicates that 2000 samples are enough in general to generate a near optimal solution.

In Fig. 4, we show the uplink BER and sum rate comparisons of all the proposed precoding designs as the function of P for N = 2,M = 2,K = 2 at L = 1 and L =10 dB. Here the notation "-MSE" means that the precoding is designed based on the Total-MSE criterion, while "-Rate" means that the precoding is designed based on the sum rate criterion. For fair comparison and to make our optimization

2 Here, for the inner RS precoding design, we set the maximum iteration number as 20 for K = 2 and 30 for K = 3.

problems feasible, we set the SINR requirements in (12) as Xk = ek, Vk where ek is the SINR at the MS k when no precoding is employed, i.e., both B and F are identity matrices. We observe that when the BS has the same power as the RS and MS, i.e., L = 1, the RS precoding design outperforms the the BS precoding design for both BER and sum rate comparison. When the BS has more power than the RS and MS, i.e., L = 10, the BS precoding can achieve better uplink performance than the RS precoding in certain SNR regime. The reason is that with more power at the BS, the interference observed at each MS is introduced mainly by the downlink transmission. Then the precoding at the BS becomes important in coordinating the interference, which makes the BS precoding more effective than the RS precoding to improve the uplink performance.

Fig. 5 illustrates the BER and sum rate comparison for different BS and RS antenna configuration (N, M) at K = 2 with total number of the BS and RS antennas being fixed at N + M = 6. For fair comparison, the target SINR at each MS is set as Xk = —5dB, Vk and the uplink performance

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

+—Joint-precoding [15] ■ e— Proposed-BS !

V— Proposed-RS-MSE ■ ■3— Proposed-Joint-MSE ■

P (dB)

(a) BER comparison

4.5 r 4 -

3.5 -. 3 -

. 2.5 ■ 2 -1.5 -1 -0.S 0L

—»— Joint-precoding [15] -6— Proposed-BS

— Proposed-RS-Rate —H— Proposed-Joint-Rate

5 10 15 20 25

P (dB)

(b) Sum-rate comparison

Fig. 6. Performance comparison with [15] at N = 2, M = 2,K = 2 and L = 10.

is averaged over the cases where the BS and RS precoding designs are feasible. We see that when the BS has more antennas than the RS, i.e., at (4, 2), the BS precoding performs better than the RS precoding. The reason is that increasing the number of the BS antennas is not only helpful for the BS precoding, but also helpful for the decoding of the uplink transmission. However, when the RS has more antennas than the BS, the system performance can be significantly enhanced and the RS precoding greatly outperforms the BS precoding. This indicates that the antennas are more useful at the RS, while not at the BS. This is because the BS precoding just makes an effort to let the downlink use less RS power to satisfy the SINR requirements at the MSs, and then more RS power can be allocated for the uplink to improve the performance. However, the RS precoding is directly relevant to the uplink transmission. A well designed RS precoder can change the uplink channel matrix, not only the power.

In Fig. 6, we compare the proposed precoding designs with the joint precoding design in [15] for K = 2 at L = 10. For fairness, we set the SINR requirements in (12) as Xk = ek, Vk

where ek is the SINR obtained by using the precoders obtained in [15]. Specifically, the RS and BS precoders obtained from [15] are chosen as the fixed RS precoder in "Proposed-BS" and the fixed BS precoder in "Proposed-RS", respectively. Under this setup, we find that further optimizing the BS precoder or the RS precoder can obtain more performance gain over [15]. Fig. 6 also shows that the RS precoding can get most of the performance gain of the joint precoding, which implies that the obtained ZF BS precoding in [15] is indeed a good choice for improving the system performance.

VI. CONCLUSIONS

In this paper, we studied linear precoding designs for multiuser two-way relay systems in a cellular network for maximizing the uplink performance while maintaining the downlink QoS requirements. Three precoding schemes were considered, namely, the BS precoding, the RS precoding and the joint BS-RS precoding. By recasting the precoding designs into suitable forms, we obtained the optimal solution for the BS precoding and the local optimal solutions for both the RS precoding and the joint BS-RS precoding. The performance of these precoding designs were compared and some practical implementation issues were discussed. Simulation results showed that the RS precoding design is more efficient than the BS precoding design in most cases. The results also demonstrated the superiority of the proposed precoding designs over existing ones.

Appendix A PROOF OF LEMMA 1 To prove Lemma 1, we only need to verify that functions

/1(3) = Tr (EC0)-1) and /2(3) = log2 det(E(3)) with

E(3) =IK + PH Hf FH Gf

aR GiF F H Gf + 3aR In

GiF H2P,

where 3 = 1/a2, are monotonically increasing and decreasing with respect to 3, respectively. To this end, we have

dh№ dp

= Tr I - E(3)"

PH Hf F H Gf (aR G1F F H Gf + /3aR IN )"1G1 F H2P)

Tr [aR E(S)-1PH Hf F H Gf R-2G1F H2PE(S)-1

Tr E(3)"

> 0, dhiP)

dp log 2

ci(PHHf FHGf (cr^GiFF^Gf + /Jo-|Ijv)-1GiFH2P

1 Tr fcr|E(/3)-1PHHfFiiGfR-2GiFH2P

where R = aR G1FFHGf + 3aR In . For both inequalities, we have used the fact that both E and R are positive definite. Thus, the proof is completed.

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS 11

Appendix B Transformations from (19) to (20)

We first rewrite the objective function f (F, W) in (19) as

f (F, W) = fHQof - fHqo - qHf + qo, (33)

where q0 = vec(GHWH PH HH), q0 = Tr (a2 WWH + Ik) and

f = vec(F),

T (34)

Qo = (HiPPHHH + aRIM) ® (GHWhWGi) .

Here the second and third terms in (33) are obtained from the corresponding terms of the objective function in (19) by using the rule Tr(ATB) = (vec(A))Tvec(B) [30]. The first term of (33) is the reformulation of the first term of the objective function in (19) by using the rule [30]

Tr (ABCD) = (vec(DT))T (CT ® A) vec(B). (35)

Again according to (35), the relay power constraint t < PR in (19) can be re-expressed as

fHQxf < Pr,

Qx = (HiBBHHH + HiPPHHH + aRIm f <8>Im. (36)

The SINR constraint (k > Xk in (19) is equivalent to, by simple manipulations

Tr(g^kgTk^HibkbHHH -

Xk (^(Hibibf HH + PihiihH) + aR Im )) FH) > Xk a\.

By using (35), inequality (37) can be rewritten as

fHQkf > Xkal,

Qk = (Hibfc bf Hf -

Finally, (19) can be readily written into a form as (20).

Appendix C Proof of Theorem 1

Note that if K = 1, the optimal rank-one solution can be obtained as claimed in Lemma 3.1 given in [31], here we omit it for brevity. on the case where (23) has an optimal rank-one solution, it is indeed the optimal solution of (22). Next we focus on the case K = 2 and the rank of the optimal solution of (23) is higher than one. since the optimization problem (23)

is convex, the sufficient and necessary optimality conditions (or termed as complementary slackness condition) are

ykTr (QkXF) =0, yk > 0,k = 1, 2 ysTr (QxXF) = 0, y4 (Tr (QXf) - 1) = 0, (39) y3 > 0,y4 e R where yi, for i = 1, 2, 3,4, are dual variables and

Tr (ZXF )=0 (40)

with Z = Qo + yiQ 1 + yiQ2 + y3Qx + y4Q h 0. To proceed, we assume that ai = Tr(QiXF), i = 1, 2, x.

We first consider the case where at least one inequality constraint in (23) is inactive, i.e., at least one ai < 0. Suppose that the rank of the obtained XF in (23) is R and it can be decomposed as XF = VVH with V e C(M2+1)xR. By applying the trick used in [31], we introduce a Hermitian matrix M to satisfy

Tr (vhQkVM) =0, Tr (vhQxvm) =0, k = 1, 2,

where M e CRxR has R2 real elements. If R2 > 3, there always exists a nonzero solution M satisfying (41). Let Si, for i = 1, 2,...,R, be the eigenvalues of M and define |So| = max{|Si|, Vi}. Then, we get XF = V (Ir - (1/So)M) VH and further set XF = XF/a with a = XF(1,1). Here we note that a> 0 due to the fact that XF is positive semidefinite and Qk is positive definite. It is not hard to see that the rank of XF is reduced by at least one. We next verify that XF is still an optimal solution of (23). First, we check the primal feasibility of XF. With XF (1,1) = 1, the condition Tr ^QXFj =1 is satisfied. Moreover, since Tr ^QiXF j =

Tr nQiXF) ,i = 1,2, x and a > 0, Tr (QxXFj < 0 and

Tr(QiX^ < 0,i = 1,2 are also satisfied. Second, we need to check the complementary conditions in (39) and (40). It is found that if Tr ^QiX^ = 0,i = 1n2,x, we must

have Tr (QiXF) = 0,i = 1, 2,x. Then Tr (QiXF^) = 0, i = 1, 2,x, succeed, which means that (39) is satisfied. On the other hand, if Tr ^QiX^ =0, i = 1, 2, x, it means that yi = 0. Then dividing XF by a does not affect the satisfaction of (39). For (40), since Tr (ZXF) = Tr(ZXf) = 0,

Tr (zXF^J must be equal to zero. Thus XF also satisfies the

condition in (40). Therefore, X'F is also an optimal solution of (23). Repeat the above procedure until R2 < 3, then an optimal rank-one solution is obtained. For completeness, we present the detailed procedures as follows:

• Solve optimization problem (23) and get the optimal solution Xf with rank R;

• Repeat

- Decompose Xp as Xp = VVH;

- Find a nonzero R X R Hermitian solution M of the following linear equations Tr (vhQiVM) =0, i = 1, 2, x;

- Evaluate the eigenvalues ¿1,^2, ••• of M and set |<5o| = max{|^i|, Vi};

- Compute XF = V (IR - (l/5o )M) VH and further get X'F =

XF/a with a = x'f(l, l).

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION

- Set XF

• Until the rank R = Rank(Xp) is equal to l.

Then we consider the case where all the inequality constraints are active, i.e., ai = 0, for i = 1, 2,x. Note that since M2 + 1 > 4, the size of matrix XF in (23) is always larger than four. Suppose R > 3. Based on Theorem 2.1 given in [32], we obtain that there is a rank-one decomposition for XF (synthetically denoted as D3(XF, Q1, Q2, Qx)), i.e.,

XF = J2R=i xrx^, such that

xr Q k xr

Tr(Q fcXF) R

хя0 x _ Tr(QxXF)

1, 2, r

1, 2, • • • ,R,

By generating XF = xixf and XF = XF/XF (1,1) (we again note that XF (1,1) > 0), it is easy to check XF is feasible for (23) and satisfies the optimality conditions (39) and (40) together with the optimal dual solution {y1, y2, y3, Vi]- Therefore, XF can be regarded as an optimal rank-one solution of (23).

Appendix D Procedure to Get a Suboptimal Rank-One Solution

if (23) has an optimal solution of rank two with all the constraints being active, we next give a method to obtain a good feasible solution. Let the optimal solution in (23) be in

the form XF

. We have

ak = Tr(QkXf) = Xkak - Tr(QkX), k ax = Tr(Q xXf ) = -Pr + Tr(QxX). That is,

3k = Tr(QkX) = Xk ak - ak ,k = 1, 2, 3x = Tr(QxX) = ax + Pr .

Then we have Tr((Qfc - |iiQx)X) = 0. Again according to Theorem 2.1 given in [33], xwe obtain that there is a rank-one matrix decomposition (synthetically denoted as D2(XF, Q1 -

feQx, Q2 - f Qx)) X = Zr=i frfr11 (R = Rank(X) < R) suxch that x

frH(Qfc - ^Qx)fr = o, k = 1, 2, r = 1, 2, • • • , R.

-. It can be verified that

We take fi and set 7 =

(/rfi^QfcC^fi) = ßk, ¿ri = 1, 2, x, and that X' = xixf with xi = [1, i)H]H is feasible for (23) and can be regarded as a suboptimal rank-one solution of (23).

References

[1] X. Tang and Y. Hua, "Optimal design of non-regenerative MIMO wireless relays," IEEE Trans. Wireless Commun., vol. 6, no. 4, pp. 1398— 1407, 2007.

[2] Y. Rong, X. Tang, and Y. Hua, "A unified framework for optimizing linear nonregenerative multicarrier MIMO relay communication systems," IEEE Trans. Signal Process., vol. 57, no. 12, pp. 4837-4851, 2009.

[3] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, "Network information flow," IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1204-1216, 2000.

[4] B. Rankov and A. Wittneben, "Spectral efficient protocols for half-duplex fading relay channels," IEEE J. Sel. Areas Commun., vol. 25, no. 2, pp. 379-389, 2007.

[5] R. Zhang, Y.-C. Liang, C. C. Chai, and S. Cui, "Optimal beamforming for two-way multi-antenna relay channel with analogue network coding," IEEE J. Sel. Areas Commun., vol. 27, no. 5, pp. 699-712, 2009.

[6] R. Wang and M. Tao, "Joint source and relay precoding designs for MIMO two-way relaying based on MSE criterion," IEEE Trans. Signal Process., vol. 60, no. 3, pp. 1352-1365, 2012.

[7] Y. Liu, M. Tao, B. Li, and H. Shen, "Optimization framework and graph-based approach for relay-assisted bidirectional OFDMA cellular networks," IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 34903500, Nov. 2010.

[8] C. Esli and A. Wittneben, "One- and two-way decode-and-forward relaying for wireless multiuser MIMO networks," in Proc. 2008 IEEE Global Telecommunications Conf.

[9] -, "Multiuser MIMO two-way relaying for cellular communications," in Proc. 2008 IEEE Int. Symp. Personal, Indoor and Mobile Radio Communications.

[10] J. Joung and A. H. Sayed, "Multiuser two-way amplify-and-forward relay processing and power control methods for beamforming systems," IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1833-1846, 2010.

[11] -, "User selection methods for multiuser two-way relay communications using space division multiple access," IEEE Trans. Wireless Commun., vol. 9, no. 7, pp. 2130-2136, 2010.

[12] E. Yilmaz, R. Zakhour, D. Gesbert, and R. Knopp, "Multi-pair two-way relay channel with multiple antenna relay station," in Proc. 2010 IEEE Int. Communications Conf.

[13] C. Y. Leow, Z. Ding, K. K. Leung, and D. L. Goeckel, "On the study of analogue network coding for multi-pair, bidirectional relay channels," IEEE Trans. Wireless Commun., vol. 10, no. 2, pp. 670-681, 2011.

[14] M. Tao and R. Wang, "Linear precoding for multi-pair two-way MIMO relay systems with max-min fairness," IEEE Trans. Signal Process., vol. 60, no. 10, pp. 5361-5371, Oct. 2012.

[15] Z. Ding, I. Krikidis, J. Thompson, and K. K. Leung, "Physical layer network coding and precoding for the two-way relay channel in cellular systems," IEEE Trans. Signal Process., vol. 59, no. 2, pp. 696-712, 2011.

[16] R. Hunger, M. Joham, and W. Utschick, "On the MSE-duality of the broadcast channel and the multiple access channel," IEEE Trans. Signal Process., vol. 57, no. 2, pp. 698-713, 2009.

[17] E. A. Jorswieck and H. Boche, "Transmission strategies for the MIMO MAC with MMSE receiver: average MSE optimization and achievable individual MSE region," IEEE Trans. Signal Process., vol. 51, no. 11, pp. 2872-2881, 2003.

[18] Z.-Q. Luo, T. N. Davidson, G. B. Giannakis, and K. M. Wong, "Transceiver optimization for block-based multiple access through ISI channels," IEEE Trans. Signal Process., vol. 52, no. 4, pp. 1037-1052, 2004.

[19] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, "Joint Tx-Rx beam-forming design for multicarrier MIMO channels: a unified framework for convex optimization," IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2381-2401, 2003.

[20] P. Viswanath and D. N. C. Tse, "Sum capacity of the vector Gaussian broadcast channel and uplink-downlink duality," IEEE Trans. Inf. Theory, vol. 49, no. 8, pp. 1-10, 2003.

[21] A. Wiesel, Y. C. Eldar, and S. Shamai, "Linear precoding via conic optimization for fixed MIMO receivers," IEEE Trans. Signal Process., vol. 54, no. 1, pp. 161-176, 2006.

[22] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.

[23] M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Programming, version 1.21, http://cvxr.com/cvx, July 2010.

[24] A. Hjorungnes and D. Gesbert, "Hessians of scalar functions of complex-valued matrices: a systematic computational approach," in Proc. 2007 Int. Symp. Signal Processing and Its Applications, pp. 1-4.

[25] M. Razaviyayn, M. Sanjabi, and Z.-Q. Luo, "Linear transceiver design for interference alignment: complexity and computation," IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 2896-2910, 2012.

[26] S. Boyd, "Ee364a course notes stanford university," Stanford, CA, 2004. Available: http://www.stanford.edu/class/ee364b/lectures/relaxations.pdf.

[27] S. S. Christensen, R. Agarwal, E. Carvalho, and J. Cioffi, "Weighted sum-rate maximization using weighted MMSE for MIMO-BC beam-

WANG et al.: LINEAR PRECODING DESIGNS FOR AMPLIFY-AND-FORWARD MULTIUSER TWO-WAY RELAY SYSTEMS

forming design," IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 4792-4799, 2008.

[28] M. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret, "Applications of second-order cone programming," Linear Algebra and its Applications, vol. 284, pp. 193-228, 1998.

[29] Z.-Q. Luo, W. Kin Ma, A. M.-C. So, Y. Ye, and S. Zhang, "Semidefinite relaxation of quadratic optimization problems," IEEE Signal Process. Mag., vol. 27, no. 3, pp. 20-34, 2010.

[30] x. Zhang, Matrix Analysis and Applications. Tsinghua University Press, 2004.

[31] Y. Huang and D. P. Palomar, "Rank-constrained separable semidefinite programming with applications to optimal beamforming," IEEE Trans. Signal Process., vol. 58, no. 2, pp. 664-678, 2010.

[32] W. Ai, Y. Huang, and S. Zhang, "New results on hermitian matrix rank-one decomposition," Mathematical Programming: Series A, vol. 128, pp. 253-283, June 2011.

[33] Y. Huang and S. Zhang, "Complex matrix decomposition and quadratic programming," Mathematics of Operations Research, vol. 32, no. 3, pp. 758-768, Aug. 2007.

Rui Wang received the B.S. degree from Anhui Normal University, Wuhu, China, in 2006, and the M.S. degree from Shanghai University, Shanghai, China, in 2009 both in electronic engineering. Currently he is pursuing his Ph.D. degree at the Institute of Wireless Communication Technology (IWCT) in Shanghai Jiao Tong University. From August 2012 to February of 2013, he is a visiting Ph.D student at the Department of Electrical Engineering of University of California, Riverside. His research interests include digital image processing, cognitive radio and signal processing for wireless cooperative communication.

Meixia Tao (S'00-M'04-SM'10) received the B.S. degree in electronic engineering from Fudan University, Shanghai, China, in 1999, and the Ph.D. degree in electrical and electronic engineering from Hong Kong University of Science and Technology in 2003. She is currently an Associate Professor with the Department of Electronic Engineering, Shanghai Jiao Tong University, China. From August 2003 to August 2004, she was a Member of Professional Staff at Hong Kong Applied Science and Technology Research Institute Co. Ltd. From August 2004 to December 2007, she was with the Department of Electrical and Computer Engineering, National University of Singapore, as an Assistant Professor. Her current research interests include cooperative transmission, physical layer network coding, resource allocation of OFDM networks, and MIMO techniques.

Dr. Tao is an Editor for the IEEE Transactions on Communications and the IEEE Wireless Communications Letters. She was on the Editorial Board of the IEEE Transactions on Wireless Communications from 2007 to 2011 and the IEEE Communications Letters from 2009 to 2012. She also served as Guest Editor for IEEE Communications Magazine with feature topic on LTE-Advanced and 4G Wireless Communications in 2012, and Guest Editor for EURISAP J WCN with special issue on Physical Layer Network Coding for Wireless Cooperative Networks in 2010.

Dr. Tao is the recipient of the IEEE ComSoC Asia-Pacific Outstanding Young Researcher Award in 2009.

Yongwei Huang (M'09) received the Bachelor of Science degree in information and computation science in 1998 and the Master of Science degree in operations research in 2000, both from Chongqing University. In 2005, he received the Ph.D. degree in operations research from Chinese University of Hong Kong.

He is a Research Assistant Professor in Department of Mathematics, Hong Kong Baptist University, Hong Kong, which he joined in 2011. Prior to the current position, he had held several research appointments in Department of Biomedical, Electronic, and Telecommunication Engineering, University of Naples "Federico II," Italy; Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology; and Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong.

His research interests are related to optimization theory and algorithms, including conic optimization, robust optimization, combinatorial optimization, and stochastic optimization, and their applications in signal processing for radar and wireless communications. He is a recipient of Best Poster Award in the 2007 Workshop on Optimization and Signal Processing.