EURASIP Journal on Applied Signal Processing 2004:10, 1568-1584 © 2004 Hindawi Publishing Corporation

Multicarrier Block-Spread CDMA for Broadband Cellular Downlink

Frederik Petre

Wireless Research, Interuniversity MicroElectronics Center (IMEC), Kapeldreef 75, 3001 Leuven, Belgium Email: frederik.petre@imec.be

Geert Leus

Electrical Engineering, Mathematics and Computer Science, Delft University of Technology (TUDelft), Mekelweg 4, 2628 CD Delft, The Netherlands Email: leus@cas.et.tudelft.nl

Marc Moonen

Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven (KULeuven), Kasteelpark Arenberg 10,

3001 Leuven, Belgium

Email: marc.moonen@esat.kuleuven.ac.be

Hugo De Man

Interuniversity MicroElectronics Center (IMEC), Kapeldreef75, 3001 Leuven, Belgium Email: hugo.deman@imec.be

Received 6 March 2003; Revised 7 November 2003

Effective suppression of multiuser interference (MUI) and mitigation of frequency-selective fading effects within the complexity constraints of the mobile constitute major challenges for broadband cellular downlink transceiver design. Existing wideband direct-sequence (DS) code division multiple access (CDMA) transceivers suppress MUI statistically by restoring the orthogonality among users at the receiver. However, they call for receive diversity and multichannel equalization to improve the fading effects caused by deep channel fades. Relying on redundant block spreading and linear precoding, we design a so-called multicarrier block-spread- (MCBS-)CDMA transceiver that preserves the orthogonality among users and guarantees symbol detection, regardless of the underlying frequency-selective fading channels. These properties allow for deterministic MUI elimination through low-complexity block despreading and enable full diversity gains, irrespective of the system load. Different options to perform equalization and decoding, either jointly or separately, strike the trade-off between performance and complexity. To improve the performance over multi-input multi-output (MIMO) multipath fading channels, our MCBS-CDMA transceiver combines well with space-time block-coding (STBC) techniques, to exploit both multiantenna and multipath diversity gains, irrespective of the system load. Simulation results demonstrate the superior performance of MCBS-CDMA compared to competing alternatives.

Keywords and phrases: multicarrier CDMA, broadband cellular system, frequency-selective fading channels, equalization, MIMO, space-time block coding.

1. INTRODUCTION

The main drivers toward future broadband cellular systems, like high-speed wireless internet access and mobile multimedia, require much higher data rates in the downlink (from base to mobile station) than in the uplink (from mobile to base station) direction. Given the asymmetric nature of most of these broadband services, the capacity and performance bottlenecks clearly reside in the downlink of these future systems. Broadband cellular downlink communications poses three main challenges to successful transceiver design. First, for increasing data rates, the underlying multipath channels

become more time dispersive, causing intersymbol interference (ISI) and interchip interference (ICI), or, equivalently, frequency-selective fading. Second, due to the increasing success of future broadband services, more users will try to access the common network resources, causing multiuser interference (MUI). Both ISI/ICI and MUI are important performance limiting factors for future broadband cellular systems, because they determine their capabilities in dealing with high data rates and system loads, respectively. Third, cost, size, and power consumption issues put severe constraints on the receiver complexity at the mobile station (MS).

Direct-sequence (DS) code division multiple access (CDMA) has emerged as the predominant air interface technology for the 3G cellular standard [1], because it increases capacity and facilitates network planning in a cellular system, compared to conventional multiple access techniques like frequency-division multiple access (FDMA) and timedivision multiple access (TDMA) [2]. In the downlink, DS-CDMA relies on the orthogonality of the spreading codes to separate the different user signals. However, ICI destroys the orthogonality among users, giving rise to MUI. Since the MUI is essentially caused by the multipath channel, linear chip-level equalization, followed by correlation with the desired user's spreading code, allows to suppress the MUI [3, 4, 5, 6]. However, chip equalizer receivers suppress MUI only statistically, and require receive diversity to cope with the effects caused by deep channel fades [7, 8].

On the other hand, it is well known that orthogonal frequency-division multiplexing (OFDM), also called multicarrier (MC) modulation, with cyclic prefixing (CP) constitutes an elegant solution to combat the wireless channel impairments [9, 10, 11]. It converts a frequency-selective channel into a number of parallel flat fading channels by multiplexing blocks of information symbols on orthogonal subcarriers using implementation efficient fast Fourier transform (FFT) operations. Hence, the complex equalizer commonly encountered in single-carrier (SC) systems reduces to a set of parallel and independent single-tap equalizers. However, OFDM, in itself, does not extract frequency diversity, but calls for bandwidth overconsuming forward error correction (FEC) coding techniques to enable frequency diversity [12]. Furthermore, OFDM as such does not support multiple users but requires a multiple access technique on top of it.

In this paper, we propose a novel MC-CDMA transceiver that synergistically combines the advantages of DS-CDMA and OFDM to tackle the challenges of broadband cellular downlink communications. By capitalizing on the general concepts of redundant block spreading and linear precoding, our so-called multicarrier block-spread- (MCBS-)CDMA transceiver possesses three unique properties compared to competing alternatives (Section 2). First, by CP or zero padding (ZP) the block-spread symbol blocks, our MCBS-CDMA transceiver preserves the orthogonality among users, regardless of the underlying time-dispersive multipath channels. This property allows for deterministic (as opposed to statistical) MUI elimination through low-complexity and channel-independent block despreading. Second, redundant linear precoding guarantees symbol detectability and full frequency-diversity gains, thus robustifying the transmission against deep channel fades. Assuming perfect channel state information (CSI) at the receiver, different equalization and decoding options, ranging from linear over decision-directed to maximum likelihood (ML) detection, strike the trade-off between performance and complexity (Section 3). Finally, our transceiver exhibits a rewarding synergy with multiantenna techniques, to increase the spectral efficiency and to improve the link reliability of multiple users in a broadband cellular network (Section 4). Simulation results

demonstrate the outstanding performance of the proposed transceiver compared to competing alternatives (Section 5).

Several other MC-CDMA techniques that also combine CDMA with OFDM have recently gained increased momentum as candidate air interface for future broadband cellular systems [13]. Three different flavours of MC-CDMA exist, depending on the exact position of the CDMA and the OFDM component in the transmission scheme. The first variant, called MC-CDMA, performs the spreading operation before the symbol blocking (or serial-to-parallel conversion), which results in a spreading of the information symbols across the different subcarriers [14, 15, 16]. However, like classical DS-CDMA, MC-CDMA does not enable full frequency-diversity gains. The second variant, called MC-DS-CDMA, executes the spreading operation after the symbol blocking, resulting in a spreading of the information symbols along the time axis of the different subcarriers [17, 18]. However, like classical OFDM, MC-DS-CDMA necessitates bandwidth overconsuming FEC coding plus frequency-domain (FD) interleaving to mitigate frequency-selective fading. The third variant, called multitone (MT) DS-CDMA, performs the spreading after the OFDM modulation such that the resulting spectrum of each subcarrier no longer satisfies the orthogonality condition [19]. Hence, MT-DS-CDMA suffers from ISI and intertone interference (ITI), as well as MUI, and requires expensive multiuser detection techniques to achieve a reasonable performance. Finally, alternative MUI-free MC transceivers, like AMOUR [20] and generalized multicarrier (GMC) CDMA [11], rely on an orthogonal frequency-division multiple access- (OFDMA-)like approach to retain the orthogonality among users, regardless ofthe underlying multipath channels. Unlike our MCBS-CDMA transceiver, these transceivers do not inherit the nice properties of CDMA related to universal frequency reuse1 in a cellular network, such as increased capacity and simplified network planning.

Notation

We use roman letters to represent scalars, lower boldface letters to denote column vectors (i.e., blocks), and upper boldface letters to denote matrices (i.e., a collection of blocks). (■)*,( ')T, and (• )H represent conjugate, transpose, and Her-mitian, respectively. Further, | • | and || • || represent the absolute value and Frobenius norm, respectively. We reserve E{-} for expectation and [ • J for integer flooring. Subscripts nt and nr point to the ntth transmit and the nrth receive antenna, respectively. Superscript m points to the mth user. Argument i denotes symbol index for symbol scalar sequences and symbol block index for symbol block sequences. Likewise, argument n denotes chip index for chip scalar sequences and chip block index for chip block sequences. Tilded letters x denote FD signals and upperlined letters x denote space-time block-encoded signals at the transmitter and block-despread

1 Universal frequency reuse, also called frequency reuse of one-in-one, is a unique attribute of CDMA systems, which refers to the reuse of the same frequencies in neighbouring cells.

mth user

Figure 1: MCBS-CDMA downlink transmission scheme.

signals at the receiver. Acuted letters x denote space-time block-decoded signals at the receiver. Hatted letters x denote soft estimates, whereas hatted and underlined letters x denote hard estimates.

2. MCBS-CDMA TRANSCEIVER DESIGN

Effective suppression of MUI and mitigation of ISI and frequency-selective fading, within the complexity constraints of the MS, pose major challenges to transceiver design for the broadband cellular downlink application. To tackle these challenges, we propose a novel MC-CDMA transceiver that combines two specific CDMA and OFDM concepts, namely, block-spread CDMA and linearly-precoded OFDM. The resulting so-called MCBS-CDMA transceiver exhibits two unique properties compared to competing alternatives. First, by relying on block-spread CDMA, MCBS-CDMA preserves the orthogonality among users, even after propagation through a time-dispersive multipath channel. This property allows for deterministic (as opposed to statistical) MUI elimination at the receiver through low-complexity block de-spreading. Second, by relying on linearly-precoded OFDM, MCBS-CDMA mitigates ISI and guarantees symbol detection, regardless of the underlying frequency-selective multi-path channel. This property enables full frequency-diversity gains and, hence, robustness against frequency-selective fading at the receiver, through ML single-user equalization. Furthermore, different single-user equalization options, ranging from linear over decision-directed to ML detection, strike the trade-off between performance and complexity.

This section is organized as follows. Section 2.1 introduces the MCBS-CDMA downlink transmission scheme, and motivates the different operations involved. Section 2.2 demonstrates how our MCBS-CDMA transceiver enables MUI-resilient reception over frequency-selective multipath channels. Finally, Section 2.3 argues the need for single-user equalization and guaranteed symbol detection.

2.1. MCBS-CDMA downlink transmission

We consider a single cell of a cellular system with a base station (BS) serving M active MSs within its coverage area. For now, we limit ourselves to the single-antenna case and defer the multiantenna case to Section 4. The block diagram in Figure 1 describes the MCBS-CDMA downlink transmission scheme (where only the mth user is explicitly shown) that transforms the M user data symbol sequences {sm[i]}M=1, with a rate 1/Ts, into the multiuser chip sequence u[n], with a rate 1/Tc. Apart from the user multiplexing and the IFFT, the MCBS-CDMA transmission scheme performs three ma-

jor operations, namely, linear precoding, block spreading, and adding transmit redundancy. Since our scheme belongs to the general class of block transmission schemes, the mth user's data symbol sequence sm[i] is first serial-to-parallel converted into blocks of B symbols, leading to the symbol block sequence sm[i] := [sm[iB],..., sm[(i + 1)B - 1]]r.

The first operation involves complex-field linear pre-coding, where the encoding is performed over the complex field rather than over the Galois field, as done traditionally [21, 22]. Unlike MC-CDMA that spreads the information symbols across the subcarriers employing a user-specific spreading code [14, 15, 16], MCBS-CDMA precodes the information symbols on the different subcarriers employing a linear precoding matrix. Specifically, the information blocks sm [i] are linearly precoded by a Q X B matrix 0 to yield the Q X 1 precoded symbol blocks:

sm[i]:= 0 ■ sm[i], (1)

where Q is the number ofsubcarriers, and 0 is a para-unitary matrix, that is, 0H ■ 0 = IB. The linear precoding can be either redundant (Q > B) or nonredundant (Q = B). For conciseness, we limit our discussion to redundant precod-ing, but the proposed concepts apply equally well to nonredundant precoding. As we will show later, linear precod-ing guarantees symbol detection and maximum frequency-diversity gains, and thus robustifies the transmission against frequency-selective fading.

The second operation entails a block-spreading operation, which is also depicted in Figure 1. Unlike DS-CDMA and MC-CDMA that rely on classical symbol spreading (operating on a scalar symbol), MCBS-CDMA relies on block spreading (operating on a block of symbols). Specifically, the block sequence sm [i] is block spread by a factor N with the user composite code sequence cm [n], which is the multiplication of a short (periodic) orthogonal Walsh-Hadamard spreading code that is MS specific and a long (aperiodic) overlay scrambling code that is BS specific. The chip block sequences of the different active users are added, resulting in the multiuser chip block sequence:

x[n] =X sm[i]cm[n], (2)

where the symbol block index i relates to the chip block index n through i = [n/NJ. The block spreading operation is also illustrated in Figure 1, where the N X replicator repeats the symbol block at its input N times. Collecting N consecutive chip blocks, X[n], into X[i] := [X[iN],...,X[(i + 1)N - 1]],

\v[n] -

1 > S/P

cm[n]*

I—I y[n]

1 I—I Q x 1

Equalizer

sm [i]

Figure 2: MUI-resilient MCBS-CDMA downlink reception scheme.

we obtain the symbol block level equivalent of (2), that is:

X[i] = X sm[i] ■ cm[i]r = S[i] ■ C[i]2

where cm[i] := [cm[iN],...,cm[(i + 1)N - 1]]r is the mth user's composite code vector used to block-spread its data symbol block Sm[i], S[i] := [s1[i],...,SM[i]] collects the symbol blocks of the different active users, and C[i] := [c1[i],..., cM [i]] collects the composite code vectors of the different active users. The block spreading operation in (3) can be viewed as classical symbol spreading, where every user's information symbols on the different subcarriers are spread along the time axis, using the same spreading code. Furthermore, by choosing Q sufficiently high, each subcarrier experiences frequency-flat fading, such that the orthogonality among users is preserved on every subcarrier, even after propagation through a frequency-selective channel. Consequently, as will become apparent in Section 2.2, block spreading enables MUI-resilient reception and thus effectively deals with the MUI. Subsequently, the Q X Q IFFT matrix Fq transforms the FD chip block sequence x[n] into the time-domain (TD) chip block sequence: x[n] = Fq-x[n].

The third operation involves the addition of transmit redundancy. Specifically, the K X Q transmit matrix T, with K the transmitted block length, K > Q, adds some redundancy to the chip blocks x[n], that is, u[n] := T • x[n]. As will be clarified later, this transmit redundancy copes with the time-dispersive effect of multipath propagation, and enables low-complexity equalization at the receiver. Finally, the resulting transmitted chip block sequence u[n] is parallel-to-serial converted into the corresponding scalar sequence [u[nK],..., u[(n + 1 )K - 1]]T := u[n], and transmitted over the air at a rate 1/Tc. By analyzing the rates of the different transmitter blocks in Figure 1, it is clear that the channel symbol rate, Rs, relates to the chip rate, Rc, through Rs = (B/K )(1/N )Rc.

From a bandwidth utilization point of view, the BS transmits B information symbols to each of the M users, using NK = N(Q + L) = N(B + 2L) transmitted chips, where the overhead of 2L stems from the (B + L) X B redundant linear precoder, 0, which guarantees symbol detection, and the length-L CP, which is common to all users and removes interblock interference (IBI). Therefore, the bandwidth efficiency of our transceiver supporting M users can be calculated as

^MCBS-CDMA

N (B + 2L)

Clearly, as the number of users approaches its maximum value, that is, M = N, the bandwidth efficiency also converges to its maximum value, eMCBS-CDMA = B/(B + 2L).

2.2. MUI-resilient reception with MCBS-CDMA

Adopting a discrete-time baseband equivalent model, the synchronized and chip-sampled received signal is a channeldistorted version of the transmitted signal, and can be written as

v[n] = X h[l]u[n - l] + w[n],

where h[l] is the chip-sampled FIR channel that models the frequency-selective multipath propagation between the transmitter and the receiver including the effect of transmit and receive filters, Lc is the order of h[l], and w[n] denotes the additive Gaussian noise, which we assume to be white with variance a¿,. Furthermore, we define L as a known upper bound on the channel order L > Lc,which can be well approximated by L ra LTmax/Tc J + 1, where rmax is the maximum excess delay within the given propagation environment.

The block diagram in Figure 2 describes the reception scheme for the MS of interest (which we assume to be the mth one), which transforms the received sequence v[n] into an estimate of the desired user's data symbol sequence sm [i]. Assuming perfect chip and block synchronization, the received sequence v[n] is serial-to-parallel converted into its corresponding block sequence v[n] := [v[nK],...,v[(n + 1)K -1]]T. From the scalar input/output relationship in (5), we can derive the corresponding block input/output relationship:

v[n] = H[0] ■ u[n] + H[1] ■ u[n - 1] + w[n],

where w[n] := [w[nK],...,w[(n + 1)K - 1]]T is the noise block sequence, H[0] is a K XK lower triangular Toeplitz matrix with entries [H[0]]M = h[p - q], and H[1] is a K X K upper triangular Toeplitz matrix with entries [H[1]] p,q = h[K + p - q] (see, e.g., [11] for a detailed derivation of the single-user case). The time-dispersive nature of multi-path propagation gives rise to so-called IBI between successive blocks, which is modelled by the second term in (6). The Q X K receive matrix R again removes the redundancy from the blocks v[n]: y[n] := R • v[n]. The purpose of the transmit/receive pair (T, R) is twofold. First, it allows for simple block-by-block processing by removing the IBI. Second, it enables low-complexity FD equalization by making the linear channel convolution appear circulant to the received block.

To guarantee perfect IBI removal, the pair (T, R) should satisfy the following condition [11]:

R ■ H[1] ■ T = 0. (7)

To enable circulant channel convolution, the resulting channel matrix H := R ■ H[0] ■ T should be circulant. In this way, we obtain a simplified block input/output relationship in the TD:

y[n] = H ■ x[n] + z[n], (8)

where z[n] := R ■ w[n] is the corresponding noise block sequence. In general, two options for the pair (T, R) exist that satisfy the above conditions. The first option corresponds to CP in classical OFDM systems [23], and boils down to choosing K = Q + L, and selecting

T = Tcp := [Ijp, IQ]T, R = Rcp := [0qxl, Iq], (9)

where Icp consists of the last L rows of Iq .The circulant property is enforced at the transmitter by adding a cyclic prefix of length L to each block. Indeed, premultiplying a vector with Tcp copies its last L entries and pastes them to its top. The IBI is removed at the receiver by discarding the cyclic prefix of each received block. Indeed, premultiplying a vector with Rcp deletes its first L entries and thus satisfies (7).

The second option corresponds to ZP, and boils down to setting K = Q + L, and selecting

T = TZp := [IQ,0Tqxl]t, R = Rzp := [Iq,Izp], (10)

where Izp is formed by the first L columns of Iq. Unlike classical OFDM systems, here the IBI is entirely dealt with at the transmitter. Indeed, premultiplying a vector with Tzp pads L trailing zeros to its bottom and thus satisfies (7). The circulant property is enforced at the receiver by time-aliasing each received block. Indeed, premultiplying a vector with Rzp adds its last L entries to its first L entries.

Referring back to (8), circulant matrices possess a nice property that enables simple per-tone equalization in the FD.

Property 1. Circulant matrices can be diagonalized by FFT operations [24]

H = Fh ■ H ■ Fq , (11)

with H := diag(h), h := [H (ej0), H (ej(2n/Q)),..., H(ej(2n/Q)(Q-1))] the FD channel response evaluated on the FFT grid, H(z) := X=0 h[l]z-l the z-transform of h[l], and FQ the Q X Q FFT matrix.

Aiming at low-complexity FD processing, we transform y[n] into the FD by defining y[n] := Fq ■ y[n]. Relying on Property 1, this leads to the following FD block input/output

relationship:

y[n] = H ■ x[n]+z[n], (12)

where z[n] := Fq ■ z[n] is the corresponding FD noise block sequence. Collecting N consecutive chip blocks y[n] into Y[i] := [y[iN],...,y[(i + 1)N - 1]], definingX[i] and Z[i] in a similar manner as Y[i], and exploiting (3), we obtain the symbol block level equivalent of (12), that is,

Y[i] = H ■ S[i] ■ C[i]T + Z[i]. (13)

By inspecting (13), we can conclude that our transceiver preserves the orthogonality among users, even after propagation through a (possibly unknown) frequency-selective multipath channel. This property allows for deterministic MUI elimination through low-complexity code-matched filtering. Indeed, by block despreading (13) with the desired user's composite code vector cm[i] (we assume the mth user to be the desired one), we obtain

ym[i] := Y[i] ■ cm[i]* = H ■ 0 ■ sm[i]+zm[i], (14)

where zm [i] := Z[i] ■ cm [i] * is the corresponding noise block sequence. Our transceiver successfully converts (through block despreading) a multiuser detection problem into an equivalent but simpler single-user equalization problem. Moreover, the operation of block despreading preserves ML optimality, since it does not incur any information loss in the Shannon sense regarding the desired user's symbol block sm [i].

In the above discussion, our main focus was on the downlink problem, which is simpler in nature than the uplink problem, since the different user signals experience the same multipath channel, time offset, and carrier frequency offset. In theory, the same signal design is also feasible in the uplink. Assuming perfect time and frequency synchronization between the different users and the BS, it can be shown that the orthogonality among users is still preserved, even if the user signals now propagate through a different multipath channel. In practice, perfect time and frequency synchronization cannot be guaranteed, since the user signals experience a different time offset and carrier frequency offset, with respect to the BS. Furthermore, the BS receiver can only compensate for a certain user's synchronization mismatches after this user's signal has been separated from the received multiuser mixture. Otherwise, a compensation for that particular user would affect all other users too. However, since the proposed block spreading scheme relies on the orthogonality preservation property, which requires perfect time and frequency synchronization, the synchronization mismatches would have introduced irreducible distortion at that point already. Therefore, in contrast with the downlink, which can rely on existing single-user schemes, a new scheme is needed in the uplink, in which each user estimates its synchronization mismatches with respect to the BS and compensates these before transmission, which we refer to as presynchro-nization. Only the small residual mismatches that remain after pre-synchronization should be compensated after separation, which we refer to as postsynchronization.

2.3. Single-user equalization for MCBS-CDMA

After successful elimination of the MUI, we still need to detect the desired user's symbol block sm[i] from (14). Ignoring, for the moment, the presence of 0 (or, equivalently, setting Q = B and selecting 0 = Iq), this requires H to have full column rank Q. Unfortunately, this condition only holds for channels that do not invoke any zero diagonal entries in H. In other words, if the MS experiences a deep channel fade on a particular tone (corresponding to a zero diagonal entry in H), the information symbol on that tone cannot be recovered. To guarantee symbol detectability of the B symbols in sm [i], regardless of the symbol constellation, we thus need to design the precoder 0 such that

Table 1: Complexity of ML.

rank(H ■ 0) = B,

irrespective of the underlying channel realization [11]. Since an FIR channel of order L can invoke at most L zero diagonal entries in H, this requires any Q - L = B rows of 0 to be linearly independent.

In [21, 22], two classes of precoders have been constructed that satisfy this condition and thus guarantee symbol detectability or, equivalently, enable full frequency-diversity gain; namely, the Vandermonde precoders and the real cosine precoders. The Q X B complex Vandermonde precoder is defined by [0(p)]q,b = p\, where p := [po,..., pQ-1]T, and the pq's, with q = 0,..., Q - 1, are Q complex points, such that pq = pq for all q = q'. A special case of the general Vandermonde precoder is a truncated FFT matrix, defined by choosing pq = exp(- j2nq/Q). The Q X B real cosine precoder is defined by [0($)]q,b = cos(b + 1/2)0q, where $ := [0o,..., $q-i]t , and the 's, with q = 0,..., Q - 1, are Q real points, such that = (2k + 1)n and ± $'q = 2kn for all q = q' and k integer. A special case of the general cosine precoder is a truncated discrete cosine transform (DCT) matrix, defined by choosing = qn/Q.

3. EQUALIZATION OPTIONS

In this section, we discuss different options to perform equalization and decoding of the linear precoding, either jointly or separately, under the assumption of perfect CSI at the receiver. These options allow to trade-off performance versus complexity, ranging from optimal ML detection with exponential complexity to linear and decision-directed detection with linear complexity. To evaluate the complexity, we distinguish between the initialization phase, when the equalizers are calculated based on the channel knowledge, and the data processing phase, when the received data is actually processed. The rate of the former is related to the channel's fading rate, whereas that of the latter is executed continuously at the symbol block rate. By analyzing the rate of the different receiver blocks in Figure 2, it is clear that the equalizer operates at a rate which is B times lower than the symbol rate that is, Req = Rs/B.

This section is organized as follows. Section 3.1 investigates ML detection. Section 3.2 studies joint linear equalization and decoding, whereas Section 3.3 introduces joint deci-

Data processing

Multiplications QC B

Additions Q c-1 1 -CB

Data transfers 2C B+1 C B 1 3Q CC 1 1+ 2QCB 3

sion feedback equalization and decoding. Finally, Section 3.4 proposes separate linear equalization and decoding.

3.1. ML detection

The ML algorithm is optimal in an ML sense but has a very high complexity. Amongst all possible transmitted blocks, it retains the one that maximizes the likelihood function or, equivalently, minimizes the Euclidean distance:

sm[i] = arg min I |ym [i]

sm[i]ei

-H ■ 0 ■ sm[i]

In other words, the ML metric is given by the Euclidean distance between the actual received block and the block that would have been received if a particular symbol block had been transmitted in a noiseless environment. The number of possible transmit vectors in S is the cardinality of S, that is, |S| = CB, with C the constellation size. Consequently, the number ofpoints to inspect grows exponentially with the initial block length B.

The ML algorithm does not require an initialization phase. During the data processing phase, the ML algorithm calculates the Euclidean distance metric of (16), for all possible transmit vectors sm[i]. To lower the complexity, a treelike implementation avoids frequent recalculation of common subexpressions. Table 1 summarizes the complexity of the ML algorithm in terms of complex multiplications, additions, and data transfers. The overall complexity is O(QCB) during data processing. Hence, this algorithm is only feasible for a small block length B and a small constellation size C.

3.2. Joint linear equalization and decoding

Linear equalizers that perform joint equalization and decoding combine a low complexity with medium performance. A first possibility is to apply a zero-forcing block linear equalizer (ZF-BLE) [25]

Gzf = (0H ■ HH ■ H ■ 0) -1 ■ 0H ■ HH

which completely eliminates the ISI, irrespective of the noise level. A second possibility is to apply a minimum mean-square-error block linear equalizer (MMSE-BLE) [25]

0H ■ HH ■ H ■ 0 + ^ I

0H ■ H

which minimizes the MSE between the actual transmitted symbol block and its estimate. Here, ot and oS are the noise variance and the information symbol variance, respectively.

Table 2: Complexity of ZF-BLE.

Initialization Data processing

Multiplications ^ +3B2Q + f BQ BQ

Additions + 3B2Q - 5BQ - B2 3 6 BQ - B

Data transfers 2B3Q + 21B2Q + 7BQ - 3B2 6BQ - 3B

Table 3: Complexity of MMSE-BLE.

Initialization Data processing

Multiplications B6Q + 2 B2Q +7 BQ +, BQ

Additions 5B2Q - W - B2+ B 22 BQ-B

Data transfers — + 15B2Q + BQ - 3B2 + 3B + 3 22 6BQ - 3B

During the initialization phase, GZF and Gmmse can be computed from the set of multiple linear systems, implicitly shown in (17) and (18), respectively. For the ZF-BLE, the solution of each linear system can be found using the LU decomposition, which relies on Gauss elimination with partial pivoting [24]. For the MMSE-BLE, each linear system can be solved based on the LDLH decomposition (instead of the LU decomposition), which relies on Gauss elimination without pivoting [24]. During the data processing phase, the equalizers Gzf and Gmmse are applied to the received block ym [i]. Tables 2 and 3 summarize the complexity of the ZF- and the MMSE-BLE, respectively, in terms of complex multiplications, additions, and data transfers. In both cases, the overall complexity is O(B3Q) during initialization and O(BQ) during data processing.

3.3. Joint decision feedback equalization and decoding

The class of nonlinear equalizers that perform joint decision feedback equalization and decoding lies in between the former categories, both in terms of performance and in complexity. The block decision feedback equalizers (BDFEs) consist of a feedforward section, represented by the matrix W, and a feedback section, represented by the matrix B [26, 27]:

sm[i] = slice [W ■ ym[i] - B ■ sm[i]].

The feedforward and feedback sections can be designed according to a ZF or MMSE criterium. In either case, B should be a strictly upper or lower triangular matrix with zero diagonal entries, in order to feedback decisions in a causal way. To design the decision feedback counterpart of the ZF-BLE, we compute the Cholesky decomposition of the matrix 0H ■ HH ■ H ■ 0 in (17), that is,

0H ■ HH ■ H ■ 0 = (Z1 ■ UOH ■ Z1 ■ U1,

where U1 is an upper triangular matrix with ones along the diagonal and E is a diagonal matrix with real entries. The ZF-BDFE then follows from

Wzf = Ui ■ Gzf = Z-1 ■ (UH ■ Zi) ■ 0H ■ HH,

Ui - Ib

The linear feedforward section WZF suppresses the ISI originating from "future" symbols, the so-called precursor ISI, whereas the nonlinear feedback section BZF eliminates the ISI originating from "past" symbols, the so-called postcursor ISI.

Likewise, to design the decision feedback counterpart of the MMSE-BLE, we compute the Cholesky decomposition of the matrix 0H ■ HH ■ H ■ 0 + (ff¿,/ffs2)I5 in (18), that is,

0H ■ HH ■ H ■ 0 + % Ib = (£2 ■ U2)H ■ Z2 ■ U2, (22)

where U2 is an upper triangular matrix with ones along the diagonal, and E2 is a diagonal matrix with real entries. The MMSE-BDFE can then be calculated as

MMSE = u2 ■ gmmse = z2~ ■ (u ■ z2)

U2 ■ Gmmse = z21

BMMSE = U2

■ 0H ■ H

During the initialization phase, the feedforward and feedback filters of the ZF- and MMSE-BDFE are computed based on (21) and (23), respectively, relying on the Cholesky decomposition [24]. During the data processing phase, the received data is first filtered with the feedforward filter, W, and then fed back with the feedback filter, B, according to (19). Tables 4 and 5 summarize the complexity of the ZF- and MMSE-BDFE, respectively, in terms of complex multiplications, additions, and data transfers. In both cases, the overall complexity is O(B3Q) during initialization and O(BQ) during data processing. Hence, the nonlinear BDFEs involve the same order of complexity as their linear counterparts.

Table 4: Complexity of ZF-BDFE.

Initialization Data processing

Multiplications B3Q B3 13 B2 B —- + 4B2Q + — + — BQ + — + -3 v 6 6 v 2 3 BQ + B2

Additions B3Q +4B2Q + B3 - 6BQ - B2 + 5B 3 6 6 6 BQ + B2 - B

Data transfers 2B3Q + 27B2Q + B3 +4BQ - B2 +4B 6BQ + 6B2 - 3B

Table 5: Complexity of MMSE-BDFE.

Initialization Data processing

Multiplications B3Q 7d_ B3 7 B2 B , —- + - B2Q + — + - BQ + — + - + 1 6 2 v 6 3 Q 2 3 BQ + B2

Additions 7 B3 3 11 7B2Q + - - 3BQ - B2 + TB BQ + B2 - B

Data transfers — +21B2Q + B3 + 5BQ - B2 + 7B +3 2 2 6BQ + 6B2 - 3B

3.4. Separate linear equalization and decoding

Previously, we have only considered joint equalization and decoding of the linear precoding. However, in order to even further reduce the complexity with respect to the block linear equalizers of Section 3.2, equalization and decoding can be performed separately as well:

Sm[z] = 0H ■ G ■ ym [i], (24)

for which we rely on the para-unitary property of 0. Here, G performs per-tone linear equalization (PT-LE) only, and tries to restore Sm[i], whereas 0H subsequently performs linear decoding only, and tries to restore sm [i].

The ZF-per-tone linear equalizer (PT-LE), which can be expressed as

Gzf = (HH ■ Hr1 ■ HH, (25)

perfectly removes the amplitude and phase distortion on every tone, irrespective ofthe noise level.

The MMSE-PT-LE, which balances amplitude and phase distortion with noise enhancement on every tone, can be expressed as

Gmmse = (HH ■ H + alR-T1 ■ HH, (26)

where Rs := E{sm[z] ■ sm[z]H} = aj0 ■ 0H is the covariance matrix of Sm [i]. The MMSE equalizer only decouples into Q parallel and independent single-tap equalizers, if we neglect the color in the precoded symbols, that is, Rs « as2lQ.

During the initialization phase, GZF and GMMSE are calculated from (25) and (26), respectively, where the matrix inversion reduces to Q parallel scalar divisions. During the data processing phase, the received data is separately equalized and decoded, according to (24). Furthermore, the linear decoding step relies on implementation efficient IDCT or IFFT operations. Tables 6 and 7 summarize the complexity of the ZF- and MMSE-PT-LE, respectively, in terms of com-

plex multiplications, additions, and data transfers. In both cases, the overall complexity is O(Q) during initialization and O(Qlog2(Q)) during data processing.

4. EXTENSION TO MULTIPLE ANTENNAS

As shown in Sections 2 and 3, MCBS-CDMA successfully addresses the challenges of broadband cellular downlink communications. However, the spectral efficiency of single-antenna MCBS-CDMA is still limited by the received signal-to-noise ratio (SNR) and cannot be further improved by traditional communication techniques. As opposed to single-antenna systems, MIMO systems that deploy NT transmit and Nr receive antennas enable an Nmin-fold capacity increase in rich scattering environments, where Nmin = min{NT,Nr} is called the multiplexing gain [28, 29, 30]. Besides the time, frequency, and code dimensions, MIMO systems create an extra spatial dimension that allows to increase the spectral efficiency and/or to improve the performance. On the one hand, space-division multiplexing (SDM) techniques achieve high spectral efficiency by exploiting the spatial multiplexing gain [31] (see also [32]). On the other hand, space-time coding (STC) techniques achieve high quality-of-service (QoS) by exploiting diversity and coding gains [33, 34, 35]. Besides the leverages they offer, MIMO systems also sharpen the challenges of broadband cellular downlink communications. First, time dispersion and ISI are now caused by NTNR frequency-selective multipath fading channels instead of just 1. Second, MUI originates from NTM sources instead of just M. Third, the presence of multiple antennas seriously impairs a low-complexity implementation of the MS. To tackle these challenges, we will demonstrate the synergy between our MCBS-CDMA waveform and MIMO signal processing. In particular, we focus on a space-time block-coded (STBC) MCBS-CDMA transmission, but the general principles apply equally well to a space-time trellis coded or a space-division multiplexed MCBS-CDMA transmission.

Table 6: Complexity of ZF-PT-LE.

Initialization Data processing

Multiplications Additions Data transfers 2Q q( 2lo&(Q) + 1) Q lo&(Q) 6Q 3q( 3log2(Q) + 1

Table 7: Complexity of MMSE-PT-LE.

Initialization Data processing

Multiplications Additions Data transfers 2Q + 1 q( 1lo&(Q) + 1) Q Q lo&(Q) 9Q + 3 3Q( |log2(Q) + 1)

This section is organized as follows. Section 4.1 details the STBC MCBS-CDMA transmission scheme for the case of NT = 2 transmit antennas. Section 4.2 demonstrates how the user orthogonality preservation property of MCBS-CDMA translates to the MIMO case, which allows to convert a difficult multiuser MIMO detection problem into an equivalent but simpler single-user MIMO equalization problem. Finally, Section 4.3 explains how space-time decoding and equalization can then be performed for each user separately.

4.1. Space-time block-coded MCBS-CDMA transmission

The block diagram in Figure 3 describes the STBC MCBS-CDMA downlink transmission scheme (where only the mth user is explicitly shown), that transforms the M user data symbol sequences {sm[i]}M=1 into NT ST coded multiuser chip sequences {u„t [n] ^^ with a rate 1/Tc. For conciseness, we limit ourselves to the case of NT = 2 transmit antennas with rate R = 1 space-time block codes. Note, however, that the proposed technique can be easily extended to the case of NT > 2 transmit antennas with R = 1/2 space-time block codes, by resorting to the generalized orthogonal designs of [35]. As for the single-antenna case, the information symbols are first grouped into blocks of B symbols and linearly precoded. Unlike the traditional approach of performing ST encoding at the scalar symbol level, we perform ST encoding at the symbol block level; this was also done in, for example, [36]. Out ST encoder operates in the FD and takes two consecutive symbol blocks {sm [2i], sm [2i + 1]} to output the following 2Q X 2 matrix of ST coded symbol blocks:

sf[2i] sf[2i + 1] s™ [2i] s2m[2i +1]

sm [2i] -sm[2i +1]> sm[2i +1] sm[2i]*

• (27)

At each time interval i, the ST coded symbol blocks sm [i] and s2m[i] are forwarded to the first and the second transmit antenna, respectively. From (27), we can easily verify that the

transmitted symbol block at time instant 2i + 1 from one antenna is the conjugate of the transmitted symbol block at time instant 2i from the other antenna (with a possible sign change). This corresponds to a per-tone implementation of the classical Alamouti scheme for frequency-flat fading channels [34]. As we will show later, this property allows for deterministic transmit stream separation at the receiver.

After ST encoding, the resulting symbol block sequences {sm, [i]}^ are block-spread and code-division multiplexed with those of the other users:

([n] =X s™[i]cm[n], n = iN +

At this point, it is important to note that each of the NT parallel block sequences are block spread by the same composite code sequence cm [n], guaranteeing an efficient utilization of the available code space. As will become apparent later, this property allows for deterministic user separation at every receive antenna. After IFFT transformation and the addition of some form of transmit redundancy

un, [n] = T ■ Fq ■ x„, [n],

the corresponding scalar sequences { unt [n] } nNT= 1 are transmitted over the air at a rate 1/Tc.

4.2. MUI-resilient MIMO reception

The block diagram in Figure 4 describes the reception scheme for the MS of interest, which transforms the different received sequences {v„r [n]}^! into an estimate of the desired user's data sequence sm [i]. After transmit redundancy removal and FFT transformation, we obtain the mul-tiantenna counterpart of (13):

YYnr [i] = X H„

n, = 1

XXn, [i] + Znr [i],

where Ynr[i] := [y„r[iN],...,y„r[(i + 1)N - 1]] stacks N consecutive received chip blocks y„r [n] at the nrth receive antenna, H„r,„t is the diagonal FD channel matrix from the ntth transmit to the nrth receive antenna, and XXnt [i] and Z„r [i] are similarly defined as Y„r [i]. From (28) and (30), we can conclude that our transceiver retains the user orthogonality at each receive antenna, irrespective of the underlying frequency-selective multipath channels. As in the single-antenna case, a low-complexity block despreading operation with the desired user's composite code vector cm[i] deter-ministically removes the MUI at each receive antenna:

ym [i] := ynr [i] ■ cm[i]* = x Hnr,nt ■ sm [i]+zm M. (31)

n, = 1

Hence, our transceiver successfully converts (through block despreading) a multiuser MIMO detection problem into an equivalent single-user MIMO equalization problem.

mth user

Figure 3: STBC MCBS-CDMA downlink transmission scheme.

\vi[n\ -

L——> S/P

vi[n] ,—, yi[n]

j i-1 in»i

cm[n]*

YvNr [n] .-VNR [n] ,-.yNR [n] -y^NR [n]

1 > S/P > R > FFT -X

- K x 1 1—1 Q x 1 -Q x 1

cm[n]*

VN ¿*n=l

ST decoder

Equalizer

Figure 4: MUI-resilient STBC/MCBS-CDMA MIMO reception scheme.

4.3. Single-user space-time decoding and equalization

After MUI elimination, the information blocks sm[i] still need to be decoded from the received block despread sequences {ym [i] Our ST decoder decomposes into three steps: an initial ST decoding step, a transmit stream separation step for each receive antenna, and, finally, a receive antenna combining step.

The initial ST decoding step considers two consecutive symbol blocks {ym [2i] and y^ [2i +1]}, both satisfying the block input/output relationship of (31). By exploiting the ST code structure of (27) as in [36], we arrive at

ym [2i] = h nr ,1 ■ sm[2i]+H „r ,2 ■ [2i + 1] * = -H *,1 ■ sm[2i]+ H *

[2i] + i sm[2i]

[2i], (32) [2i+ 1] *. (33)

Combining (32) and (33) into a single block matrix form, we obtain

ym [2i] .ym [2i+1]*

rm [¿1

H nr ,1 Hnr ,2

- nr ,2

sm[2i] + ¿m [2i]

sm [2i + 1] zm [2i+1]*_

nm [¿]

where sf[2i] = sm[2z] and s^T [2i] = sm[2z + 1] follow from (27). From the structure of H„r in (34), we can deduce that our transceiver retains the orthogonality among transmit streams at each receive antenna for each tone separately, regardless of the underlying frequency-selective mul-tipath channels. A similar property was also encountered

in the classical Alamouti scheme but only for single-user frequency-flat fading multipath channels [34].

The transmit stream separation step relies on this property to deterministically remove the transmit stream interference through low-complexity linear processing. We define the Q X Q matrix Dn, with nonnegative diagonal entries as ID nr := [H nr ,1 ■ H1,1 +H nr ,2 ■ H1,2]1/2. From (34), we can verify that the channel matrix Hnr satisfies H^ ■ Hnr = I2 ® D2„r, where ® stands for Kronecker product. Based on Hnr and Dnr, we can construct a unitary matrix Ünr := Hnr-(I2 ®D-1), which satisfies Ü^r ■ Ünr = I2q and Ü^ ■ Hnr = I2 ® IInr. Performing unitary combining on (34) (through Ü^) collects the transmit antenna diversity at the nr th receive antenna:

y m [2¿] y m [2¿+1]

ü£ ■ rm [i]

Dn ■ sm[2¿]

ID nr ■ sm[2¿ +1]

¿ m [2i]

z m [2i+1]

nmnr [i]

where the resulting noise nm [i] := U, ■ tjm [i] is still white with variance a¿. Since multiplying with a unitary matrix preserves ML optimality, we can deduce from (35) that the symbol blocks Sm[2i] and Sm[2i + 1] can be decoded separately in an optimal way. As a result, the different symbol blocks Sm [i] can be detected independently from

y m [i] = D„r ■ sm[¿]+z m [i].

Stacking the blocks from the different receive antennas {ym, [i] }1nR=1 for the final receive antenna combining step, we

obtain

Table 8: Parameters of the ITU pedestrian B channel.

-ymii]" Di "zm[ir

= ■sm[i] +

_ym [¿1. -D Nr_ -ZmR [¿i_

y m[i]

z m[i]

At this point, we have only collected the transmit antenna diversity at each receive antenna, but still need to collect the receive antenna diversity. We define the Q X Q matrix D with

nr = 1 n«r ,nt

H = D2. Based

nonnegative diagonal entries as D := [X^^ Xn

nt = JH

„rn ]1/2. From (37), we can verify that: № on H and D, we can construct a tall unitary matrix U := H ■ D-1, which satisfies UH ■ U = IQ and UH ■ H = D. Gathering the receive antenna diversity through multiplying (37) with UH, we finally obtain

ym[i] := JJH ■ ym[i] = ID ■ 0 ■ sm[i] + zm[i],

where the resulting noise zm [i] := JtH ■ zm [i] is still white with variance a¿. Since the multiplication with a tall unitary matrix, which does not remove information, preserves ML decoding optimality, the blocks sm [i] can be optimally decoded from (38). Furthermore, since (38) has the same structure as its single-antenna counterpart in (14), the design of the linear precoder 0 in Section 2.3 and the different equalization options that we have discussed in Section 3 can be applied here as well. Specifically, with Lt the number of taps of the underlying multipath channels, the ML detector achieves the full diversity order of NTNRLt, hence, both multi-antenna and multipath diversity. The transmit antenna diversity is enabled at the transmitter by the space-time encoder and collected at each receive antenna by the transmit stream separation step. The receive antenna diversity is collected by the final receive antenna combining step. The multipath diversity is enabled at the transmitter by the linear precoder, and extracted at the receiver by the ML joint equalization and decoding step.

5. SIMULATION RESULTS

We consider the downlink of an MCBS-CDMA system, operating at a carrier frequency of Fc = 2 GHz and transmitting with a chip rate of Rc = 1/Tc = 4.096 MHz. Each user's bit sequence is QPSK modulated with ny = 2 bits per symbol. To assess the performance of the MCBS-CDMA system, we have selected ITU's outdoor-to-indoor and pedestrian B channel model, which models typical urban propagation environments. The main parameters of this tapped delay line model are summarized in Table 8. Hence, the multipath channel has Lt = 6 Rayleigh fading taps with a maximum excess delay of Tmax = 3700 ns, resulting in a minimum channel order of Lmin = [Tmax/Tc] = 16. To satisfy the IBI removal condition L > Lmin, we choose the CP length L = 32. This specific design can even handle a maximum excess delay of Tg = LTc = 7812.5 ns, with Tg the guard time. However, a larger transmit redundancy can be used to handle more ICI.

Tap Excess delay (ns) Average relative power (dB)

2 200 -0.9

3 800 -4.9

4 1200 -8.0

5 2300 -7.8

6 3700 -23.9

Table 9: Main MCBS-CDMA system parameters.

Carrier frequency Fc = 2 GHz

Chip rate Rc = 4.096 MHz

Modulation format nb = 2 (QPSK)

Initial block length B = 224

Cyclic prefix length L = 32

Number of subcarriers Q = 256

Transmitted block length K -- 288

Symbol rate Rs = 199 kHz

Adversely, a smaller transmit redundancy is allowed if less ICI has to be handled. To limit the overhead, we choose the number of subcarriers Q = 8L = 256, leading to a transmitted block length K = Q + L = 288. Hence, the information symbols are parsed into blocks of B = Q - L = 224 symbols and linearly precoded into blocks of size Q = 256. The Q X B pre-coding matrix, 0, constitutes the first B columns of the Q X Q DCT matrix [22]. The precoded symbol blocks are subsequently block spread by a real orthogonal Walsh-Hadamard spreading code of length N = 16, along with a complex random scrambling code. For the above parameters, this results in a channel symbol rate of Rs = (B/K)(1/N)Rc = 199 kHz. For convenience, the main MCBS-CDMA system parameters are summarized in Table 9.

In the following, we show the average bit error rate (BER) versus the average received SNR for three different test cases. Here, the SNR is defined as the average received energy per bit of the desired user versus the noise power spectral density. Section 5.1 compares the different single-user equalization options, from a BER performance as well as a complexity point of view. Section 5.2 compares the BER performance of the proposed MCBS-CDMA transceiver with two competing CDMA transceivers. Finally, Section 5.3 discusses the BER performance of the SIBC-MCBS-CDMA transceiver in different propagation environments.

5.1. Comparison of different equalization options

We test the different equalization options, discussed in Section 3, for a fully-loaded MCBS-CDMA system with M = 16 active users.

Figure 5 compares the performance of the different block linear equalizers (BLEs) and BDFEs that perform joint equalization and decoding. As a reference also, the performance of a system without linear precoding (uncoded) as well as the optimal ML performance are shown. Clearly, the system without linear precoding only achieves diversity 1, whereas

Average SNR (dB)

- - Uncoded ■ o ■ ZF-BDFE

-e- ZP-BLE ■ v ■ MMSE-BDFE

MMSE-BLE — ML

Figure 5: Performance comparison of joint block linear equalization (BLE) and decoding versus joint block decision feedback equalization (BDFE) and decoding for fully-loaded MCBS-CDMA system with M = 16 users. Both ZF and MMSE critera are considered. Uncoded and ML performances are shown as a reference.

Average SNR (dB)

- - Uncoded -e- ZF-BLE

-o- ZF-PT-LE MMSE-BLE

-v- MMSE-PT-LE — ML

Figure 6: Performance comparison of separate PT-LE and decoding versus joint block linear equalization (BLE) and decoding for a fully-loaded MCBS-CDMA system with M = 16 users. Both the ZF and the MMSE criteria are considered. Uncoded and ML performances are shown as a reference.

ML detection achieves the full frequency-diversity gain Lt = 6. The ZF-BLE performs worse than the uncoded system at low SNR but better at high SNR (SNR > 9 dB). The MMSE-BLE always outperforms the uncoded system and achieves a diversity gain between 1 and Lt = 6. At a BER of 10~3, it realizes a 3 dB gain compared to its ZF counterpart. The nonlinear ZF- and MMSE-BDFEs outperform their respective linear counterparts, although this effect is more pronounced for the ZF than for the MMSE criterion. For a target BER of 10~3, the MMSE-BDFE exhibits a 1.9 dB gain relative to the MMSE-BLE, whereas the ZF-BDFE exhibits a 4.2 dB gain relative to the ZF-BLE. Furthermore, the MMSE-BDFE marginally outperforms the ZF-BDFE by 0.7 dB, and comes within 1.4 dB of the optimal ML detector.

Figure 6 compares the performance of separate PT-LE and decoding versus joint block linear equalization (BLE) and decoding, both of which perform linear equalization. On the one hand, the ZF-PT-LE always performs worse than the uncoded system, due to the excessive noise enhancement caused by the presence of channel nulls. For a target BER of 10~2, the ZF-BLE outperforms its corresponding ZF-PT-LE by 7.4 dB. On the other hand, the MMSE-PT-LE performs within 0.3 dB of its corresponding MMSE-BLE, and, thus, achieves a diversity gain between 1 and Lt = 6. The MMSE-BLE, on its turn, outperforms the uncoded system by 4.8 dB and comes within 2.7 dB of the optimal ML detector.

Tables 10 and 11 summarize the complexity results for the different MCBS-CDMA equalization options. Table 10 compares the initialization complexity of the different equalization options. The initialization complexity of the ZF-BLE, which is similar to that of the ZF-BDFE, involves an opera-

Table 10: Comparison of the initialization complexity of the different MCBS-CDMA equalization options.

mpys Initialization adds dts

ML — — —

ZF-BLE 998 M 998 M 6.0 G

MMSE-BLE 512 M 32 M 1.6 G

ZF-BDFE 1.0 G 1.0 G 6.1G

MMSE-BDFE 527 M 47 M 1.7 G

ZF-PT-LE 0.5k - 1.5k

MMSE-PT-LE 0.5k 0.3 k 2.3 k

Table 11: Comparison of the data processing complexity of the different MCBS-CDMA equalization options.

mpys/s Data processing adds/s dts/s

ML 1.7 ■ 10131 G 3.9 ■ 10131 G 1.5 ■ 10132 G

ZF-BLE 51 M 51 M 305 M

MMSE-BLE 51 M 51 M 305 M

ZF-BDFE 96 M 95 M 573 M

MMSE-BDFE 96 M 95 M 573 M

ZF-PT-LE 1M 2 M 9 M

MMSE-PT-LE 1M 2 M 9M

tion count of 998 Mmpys and 998 Madds, and a data transfer count of 6.0Gdts. The initialization complexity of the MMSE-BLE, which is similar to that of the MMSE-BDFE, involves 2 times less multiplications, 30 times less additions,

and 3.7 times less data transfers. Specifically, it amounts to an operation count of 512 Mmpys and 32 Madds, and a data transfer count of 1.6Gdts. On the other hand, the MMSE-PT-LE involves an initialization complexity, which is between 5 and 6 orders of magnitude smaller than that of its corresponding MMSE-BLE. Specifically, its initialization complexity amounts to an operation count of 0.5kmpys and 0.3 kadds, and a data transfer count of 2.3 kdts.

Table 11 compares the data processing complexity of the different equalization options. Note that the equalizer block operates at a rate which is B times lower than the symbol rate Rs, that is, Req = Rs/B = 889 Hz. The data processing complexity of the optimal ML algorithm is astronomically high, which certainly prohibits implementation, even on the most advanced quantum computers. The BLEs have a data processing complexity, which amounts to an operation count of 51 Mmpys/s and 51 Madds/s, and a data transfer bandwidth of 305 Mdts/s. On the one hand, the data processing complexity of the BDFEs is approximately twice that of the BLEs. On the other hand, the data processing complexity of the PT-LEs is roughly between 1 and 2 orders of magnitude lower than that of the corresponding BLEs. Specifically, it amounts to an operation count of 1 Mmpys/s and 2 Madds/s, and a data transfer bandwidth of 9 Mdts/s.

5.2. Comparison of different CDMA transceivers

In the following, we compare three different CDMA transceivers.

(1) The first transceiver applies the downlink DS-CDMA transmission scheme used in 3G cellular standards, performing classical symbol spreading. The receiver employs either a classical RAKE combiner or an MMSE time-domain chip equalizer (TD-CE) [3, 4, 5, 6, 7, 8] based on perfect CSI. The number of fingers in the RAKE combiner equals Lt = 6, while the order of the chip equalizer equals Qc = 23. The bandwidth efficiency of this first transceiver, supporting M1 users, can be calculated as e1 = M1/N, where N is the length of the Walsh-Hadamard spreading codes.

(2) The second transceiver applies the downlink MC-CDMA transmission scheme, performing classical symbol spreading followed by OFDM modulation [14, 15, 16]. The receiver employs an MMSE frequency-domain chip equalizer (FD-CE) based on perfect CSI. The bandwidth efficiency of this second transceiver, supporting M2 users, can be calculated as £2 = £mc-cdma = M2B2AB2N + L), where B2 is the initial block length and Q2 = B2N is the number of subcarriers. The overhead of L stems from the CP for IBI removal.

(3) The third transceiver is our MCBS-CDMA transceiver that we have derived in Section 2, combining block-spread CDMA and linearly-precoded OFDM. The receiver employs an MMSE PT-LE or ML detection. As discussed in Section 2.1, the bandwidth efficiency of this third transceiver, supporting M3 users, can be cal-culatedas £3 = £mcbs-cdma = M3B3/N (B3+2L), where

\ \ ^ \

10-4 I- - - -I- - - -I- ---,----,----,---- r -Vv - - ->A - - r - - --I 0 2 4 6 8 10 12 14 16 18 20 Average SNR (dB)

-o- DS-CDMA/RAKE -•- DS-CDMA/MMSE-TD-CE ■■B- MC-CDMA/MMSE-FD-CE -f- MCBS-CDMA/MMSE-PT-LE — MCBS-CDMA/ML

Figure 7: Comparison of DS-CDMA, MC-CDMA, and MCBS-CDMA for small system load with M1 = 3, M2 = 3, and M3 =

4 users, respectively: RAKE and MMSE-TD-CE for DS-CDMA; MMSE-FD-CE for MC-CDMA; MMSE-PT-LE and ML for MCBS-CDMA.

B3 is the initial block length and Q3 = B3 + L is the number of tones. The overhead of 2L stems from the redundant linear precoding, on the one hand, and the CP, on the other.

In order to make a fair comparison between the three transceivers, we should force their respective bandwidth efficiencies to be the same, that is, £1 = £2 = £3. This leads to the following relationship between the number of users to be supported by the different transceivers: M2 = ((B2N + L)/B2N)M1, and M3 = ((B3 + 2L)/B3)M1. With N = 16, L = 32, Q2 = Q3 = 8L = 256, B2 = 16, and B3 = 224, we can derive that M2 = (9/8)M1 and M3 = (9/7)M1. Furthermore, we ensure that the total transmit power is the same for the different transceivers.

Figure 7 compares the performance of the different transceivers for a small system load with M1 = 3, M2 = 3, and M3 = 4 users, respectively (£1 « £2 « £3). The DS-CDMA RAKE receiver starts flooring off at 10~3, due to ISI/ICI and associated MUI. The DS-CDMA MMSE-TD-CE actively suppresses these interferences and achieves a significant performance improvement compared to the RAKE. On the other hand, the MC-CDMA MMSE-FD-CE has the same performance as the DS-CDMA MMSE-TD-CE at low SNR (SNR < 8), but clearly outperforms it at high SNR. Furthermore, the MCBS-CDMA MMSE-PT-LE that deter-ministically removes the MUI but still suffers from ISI performs worse than both the DS-CDMA MMSE-TD-CE and the MC-CDMA MMSE-FD-CE. Specifically, for a target BER of 10~4, the DS-CDMA MMSE-TD-CE realizes a 0.5 dB gain

-O- DS-CDMA/RAKE

-<- DS-CDMA/MMSE-TD-CE

■■a ■ MC-CDMA/MMSE-FD-CE

MCBS-CDMA/MMSE-PT-LE — MCBS-CDMA/ML

Figure 8: Comparison of DS-CDMA, MC-CDMA, and MCBS-CDMA for large system load with M1 = 12, M2 = 14, and M3 = 16 users, respectively: RAKE and MMSE-TD-CE for DS-CDMA; MMSE-FD-CE for MC-CDMA; MMSE-PT-LE and ML for MCBS-CDMA.

compared to the MCBS-CDMA MMSE-PT-LE, whereas the MC-CDMA MMSE-FD-CE realizes a 2.8 dB gain. Finally, the optimal MCBS-CDMA ML achieves the full diversity gain of Lt = 6.

Figure 8 depicts the performance of the different transceivers for a large system load with Mi = 12, M2 = 14, and M3 = 16 users, respectively (e1 ~ e2 ~ e3). The DS-CDMA RAKE receiver clearly suffers from a BER floor at 8 ■ 10~2, since it does not cope at all with the increased MUI. Although the DS-CDMA MMSE-TD-CE still outperforms the RAKE, its performance also starts flooring off, because it does not completely suppress these interferences at high SNR. Indeed, the existence of a ZF solution for DS-CDMA TD chip equalization requires multichannel reception at the MS [7, 8]. Hence, both DS-CDMA receivers suffer from a BER saturation level that increases with the system load M1. Likewise, since the MC-CDMA MMSE-FD-CE does not deterministically suppress the MUI either, its performance is also affected by the increased MUI. However, unlike the DS-CDMA MMSE-TD-CE, it does not suffer from a BER floor, since it more effectively copes with the ICI through CP. In contrast with DS-CDMA and MC-CDMA, MCBS-CDMA is an MUI-free CDMA transceiver, such that its performance remains unaffected by the increased MUI. Consequently, even at large system load, the MCBS-CDMA MMSE-PT-LE achieves a diversity order between 1 and Lt = 6. Furthermore, the MCBS-CDMA MMSE-PT-LE now performs better than both the DS-CDMA MMSE-TD-CE and the MC-CDMA MMSE-FD-CE. Specifically, for a target BER

-г- (1,1) MMSE-PT-LE ■ - ■ (2,1) ML - - (1,1) ML -7- (2,2) MMSE-PT-LE

■ v ■ (2,1) MMSE-PT-LE — (2,2) ML

Figure 9: Performance of STBC-MCBS-CDMA for channels with small delay spread. Different MIMO system setups, ranging from (1,1) over (2,1) to (2,2). MMSE-PT-LE and ML detection.

of 3 ■ 10Л the MCBS-CDMA MMSE-PT-LE outperforms the DS-CDMA MMSE-TD-CE by 6.8 dB. Additionally, for a target BER of 10~4, the MCBS-CDMA MMSE-PT-LE performs 1 dB better than the MC-CDMA MMSE-FD-CE. Finally, the optimal MCBS-CDMA ML still achieves the full diversity gain of Lt = 6.

5.3. Performance of space-time block-coded MCBS-CDMA

We test our STBC-MCBS-CDMA transceiver of Section 4, employing a cascade of STBC and MCBS-CDMA, for three different MIMO system setups (NT,NR): the (1,1) setup, the (2,1) setup with TX diversity only, and the (2,2) setup with both TX and RX diversity. The system is fully loaded supporting M = 16 active users. For each setup, the receiver employs an MMSE-PT-LE or an ML detector based on perfect CSI.

Figure 9 depicts the performance over a propagation environment with a small delay spread. The underlying multi-path channel has Lt = 2 chip-spaced Rayleigh fading taps of equal average power. For a target BER at 10~3, and focusing on the MMSE-PT-LE, the (2,1) setup outperforms the (1,1) setup by 6 dB. The (2,2) setup achieves, on its turn, a 3.7dB gain compared to the (2,1) setup. Comparing the MMSE-PT-LE with its corresponding ML detector, it incurs a 4.2 dB loss for the (1,1) setup, but only a 0.3 dB loss for the (2,2) setup. So, the larger the number of transmit and/or receive antennas, the better the linear MMSE-PT-LE succeeds in extracting the full diversity of order NTNRLt.

Figure 10 shows the performance over a propagation environment with a large delay spread. The underlying

0 2 4 6 8 10 12 14 16 18 20 Average SNR (dB)

(1,1) MMSE-PT-LE - ■ (2,1) ML - - (1,1) ML (2,2) MMSE-PT-LE

■v (2,1) MMSE-PT-LE — (2,2) ML

Figure 10: Performance of STBC-MCBS-CDMA for channels with large delay spread. Different MIMO system setups, ranging from

(1.1) over (2,1) to (2,2). MMSE-PT-LE and ML detection.

multipath channel, which is the ITU pedestrian B channel that we have introduced before, has Lt = 6 Rayleigh fading taps. For a target BER at 10~3, and focusing on the MMSE-PT-LE, the (2,1) setup outperforms the (1,1) setup by 4.4 dB, whereas the (2,2) setup achieves, on its turn, a 1.1 dB gain compared to the (2,1) setup. So, compared to Figure 9, the corresponding gains due to multiantenna diversity are now smaller because of the inherently larger underlying multipath diversity. Comparing the MMSE-PT-LE with its corresponding ML detector, it incurs a 0.9 dB loss for the

(2.2) setup.

6. CONCLUSION

To cope with the challenges of broadband cellular downlink communications, we have designed a novel multicarrier CDMA transceiver that enables significant performance improvements compared to 3G cellular systems, yielding gains of up to 6.8 dB in full load situations. To this end, our MCBS-CDMA transmission technique capitalizes on redundant block spreading and linear precoding to preserve the orthogonality among users and to enable full multipath diversity gains, regardless of the underlying multipath channels. The corresponding receiver relies on low-complexity block despreading to convert a difficult multiuser detection problem into an equivalent but simpler single-user equalization problem, for which any single-user equalizer allows to trade-off performance versus complexity. In this perspective, we have evaluated the performance and complexity of four different single-user equalization options for a realistic MCBS-CDMA cellular system that fits the UMTS channel bandwidth. On the one hand, the performance results show that, for a target BER of 10~3, the MMSE-BDFE exhibits a 1.9 dB gain relative to the MMSE-BLE, and comes

within 1.4 dB of the optimal ML detector. Furthermore, the MMSE-PT-LE performs within 0.3 dB of the MMSE-BLE, while it is 3.6 dB away from the ML detector. On the other hand, the complexity estimates show that the initialization complexity of the MMSE-BDFE is similar to that of the MMSE-BLE, while its data processing complexity is approximately two times higher. Furthermore, the MMSE-PT-LE involves an initialization complexity, which is between 5 and 6 orders of magnitude smaller than that of the MMSE-BLE, while its data processing complexity is roughly between 1 and 2 orders of magnitude smaller. Based on this study, we can conclude that the MMSE-PT-LE offers a good trade-off between performance and complexity. Finally, to increase the spectral efficiency and to improve the link reliability of multiple users in a broadband cellular network, we have demonstrated the rewarding synergy between MCBS-CDMA and existing and evolving MIMO communication techniques. Specifically, our STBC-MCBS-CDMA transmission technique not only retains the orthogonality among users but also among the different transmit streams of each user. At the receiver, these properties, respectively, allow for deterministic ML user separation through low-complexity block despreading as well as deterministic transmit stream separation through simple linear processing. Consequently, ML equalization per transmit stream and per user achieves maximum multiantenna and multipath diversity gains for every user in the system, irrespective of the system load. Furthermore, the low-complexity MMSE-PT-LE approaches the optimal ML performance (within 0.9 dB for a (2,2) system), and comes close to extracting the full diversity in reduced as well as full load settings.

REFERENCES

[1] H. Holma and A. Toskala, WCDMAfor UMTS: Radio Access for Third Generation Mobile Communications, John Wiley & Sons, New York, NY, USA, 2001.

[2] L. B. Milstein, "Wideband code division multiple access," IEEE Journal on Selected Areas in Communications, vol. 18, no. 8, pp. 1344-1354, 2000.

[3] A. Klein, "Data detection algorithms specially designed for the downlink of CDMA mobile radio systems," in IEEE 47th Vehicular Technology Conference (VTC '97), vol. 1, pp. 203207, Phoenix, Ariz, USA, May 1997.

[4] I. Ghauri and D. T. M. Slock, "Linear receivers for the DS-CDMA downlink exploiting orthogonality of spreading sequences," in Proc. IEEE 32nd Asilomar Conference on Signals, Systems & Computers, vol. 1, pp. 650-654, Pacific Grove, Calif, USA, November 1998.

[5] C. D. Frank, E. Visotsky, and U. Madhow, "Adaptive interference suppression for the downlink of a direct sequence CDMA system with long spreading sequences," The Journal of VLSI Signal Processing, vol. 30, no. 1-3, pp. 273-291, 2002.

[6] K. Hooli, M. Juntti, M. J. Heikkila, P. Komulainen, M. Latva-aho, and J. Lilleberg, "Chip-level channel equalization in WCDMA downlink," EURASIP Journal on Applied Signal Processing, vol. 2002, no. 8, pp. 757-770, 2002.

[7] T. P. Krauss, W. J. Hillery, and M. D. Zoltowski, "Downlink specific linear equalization for frequency selective CDMA cellular systems," The Journal of VLSI Signal Processing, vol. 30, no. 1-3, pp. 143-161,2002.

[8] F. Petré, G. Leus, L. Deneire, M. Engels, M. Moonen, and H. De Man, "Adaptive chip equalization for DS-CDMA downlink with receive diversity," IEEE Transactions on Wireless Communications, May 2003, accepted subject to major revisions.

[9] L. J. Cimini Jr., "Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing," IEEE Trans. Communications, vol. 33, no. 7, pp. 665-675, 1985.

[10] J. A. C. Bingham, "Multicarrier modulation for data transmission: an idea whose time has come," IEEE Communications Magazine, vol. 28, no. 5, pp. 5-14, 1990.

[11] Z. Wang and G. B. Giannakis, "Wireless multicarrier communications," IEEE Signal Processing Magazine, vol. 17, no. 3, pp. 29-48, 2000.

[12] W. Y. Zou and Y. Wu, "COFDM: an overview," IEEE Transactions on Broadcasting, vol. 41, no. 1, pp. 1-8, 1995.

[13] S. Hara and R. Prasad, "Overview of multicarrier CDMA," IEEE Communications Magazine, vol. 35, no. 12, pp. 126-133, 1997.

[14] N. Yee, J.-P. Linnartz, and G. Fettweis, "Multi-carrier CDMA in indoor wireless radio networks," in Proc. IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC '93), vol. 1, pp. 109-113, Yokohama, Japan, September 1993.

[15] K. Fazel, S. Kaiser, and M. Schnell, "A flexible and high performance cellular mobile communications system based on orthogonal multi-carrier SSMA," Wireless Personal Communications, vol. 2, no. 1/2, pp. 121-144, 1995.

[16] S. Kaiser, "OFDM code-division multiplexing in fading channels," IEEE Trans. Communications, vol. 50, no. 8, pp. 12661273, 2002.

[17] V. M. DaSilva and E. S. Sousa, "Multicarrier orthogonal CDMA signals for quasi-synchronous communication systems," IEEE Journal on Selected Areas in Communications, vol. 12, no. 5, pp. 842-852, 1994.

[18] S. Kondo and L. B. Milstein, "Performance of multicarrier DS CDMA systems," IEEE Trans. Communications, vol. 44, no. 2, pp. 238-246, 1996.

[19] L. Vandendorpe, "Multitone spread spectrum multiple access communications system in a multipath Rician fading channel," IEEE Trans. Vehicular Technology, vol. 44, no. 2, pp. 327337, 1995.

[20] G. B. Giannakis, Z. Wang, A. Scaglione, and S. Barbarossa, "AMOUR-generalized multicarrier transceivers for blind CDMA regardless of multipath," IEEE Trans. Communications, vol. 48, no. 12, pp. 2064-2076, 2000.

[21] Z. Wang and G. B. Giannakis, "Linearly precoded or coded OFDM against wireless channel fades?," in Proc. IEEE 3rd Workshop on Signal Processing Advances in Wireless Communications (SPAWC '01), pp. 267-270, Taiwan, China, March 2001.

[22] Z. Wang and G. B. Giannakis, "Complex-field coding for OFDM over fading wireless channels," IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 707-720, 2003.

[23] A. Peled and A. Ruiz, "Frequency domain data transmission using reduced computational complexity algorithms," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP '80), vol. 5, pp. 964-967, Denver, Colo, USA, April 1980.

[24] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md, USA, 1996.

[25] A. Klein and P. W. Baier, "Linear unbiased data estimation in mobile radio systems applying CDMA," IEEE Journal on

Selected Areas in Communications, vol. 11, no. 7, pp. 10581066, 1993.

[26] A. Klein, G. K. Kaleh, and P. W. Baier, "Zero forcing and minimum mean-square-error equalization for multiuser detection in code-division multiple-access channels," IEEE Trans. Vehicular Technology, vol. 45, no. 2, pp. 276-287, 1996.

[27] A. Stamoulis, G. B. Giannakis, and A. Scaglione, "Block FIR decision-feedback equalizers for filterbank precoded transmissions with blind channel estimation capabilities," IEEE Trans. Communications, vol. 49, no. 1, pp. 69-83, 2001.

[28] G. J. Foschini and M. J. Gans, "On limits of wireless communications in a fading environment when using multiple antennas," Wireless Personal Communications, vol. 6, no. 3, pp. 311-335, 1998.

[29] G. G. Raleigh and J. M. Cioffi, "Spatio-temporal coding for wireless communication," IEEE Trans. Communications, vol. 46, no. 3, pp. 357-366, 1998.

[30] D. Gesbert, H. Bolcskei, D. Gore, and A. Paulraj, "MIMO wireless channels: capacity and performance prediction," in Proc. IEEE Global Telecommunications Conference (GLOBE-COM '00), vol. 2, pp. 1083-1088, San Francisco, Calif, USA, November-December 2000.

[31] G. J. Foschini, "Layered space-time architecture for wireless communication in a fading environment when using multiple antennas," Bell Labs Technical Journal, vol. 1, no. 2, pp. 41-59, 1996.

[32] A. Paulraj and T. Kailath, "Increasing capacity in wireless broadcast systems using distributed transmission/directional reception (DTDR)," US Patent 5345599, Stanford University, Stanford, Calif, USA, September 1994.

[33] V. Tarokh, N. Seshadri, and A. R. Calderbank, "Space-time codes for high data rate wireless communication: performance criterion and code construction," IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 744-765, 1998.

[34] S. M. Alamouti, "A simple transmit diversity technique for wireless communications," IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451-1458, 1998.

[35] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block codes from orthogonal designs," IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456-1467, 1999.

[36] Z. Liu, G. B. Giannakis, B. Muquet, and S. Zhou, "Spacetime coding for broadband wireless communications," Wireless Communications and Mobile Computing, vol. 1, no. 1, pp. 35-53, 2001.

Frederik Petré was born in Tienen, Belgium, on December 12, 1974. He received the Electrical Engineering degree and the Ph.D. in applied sciences from the Katholieke Universiteit Leuven (KULeu-ven), Leuven, Belgium, in July 1997 and December 2003, respectively. In September 1997, he joined the Design Technology for Integrated Information and Communication Systems (DESICS) Division at the In-teruniversity Micro-Electronics Center (IMEC) in Leuven, Belgium. Within the Digital Broadband Terminals (DBATE) Group of DESICS, he first performed predoctoral research on wireline transceiver design for twisted pair, coaxial cable, and powerline communications. During the fall of 1998, he visited the Information Systems Laboratory (ISL) at Stanford University, California, USA, working on OFDM-based powerline communications. In January 1999, he joined the Wireless Systems (WISE) group of DESICS as a Ph.D. researcher, funded by the Institute for

Scientific and Technological Research in Flanders (IWT). Since January 2004, he is a Senior Scientist within the Wireless Research group of DESICS. He is investigating the baseband signal processing algorithms and architectures for future wireless communication systems, like third generation (3G) and fourth generation (4G) cellular networks, and wireless local area networks (WLANs). His main research interests are modulation theory, multiple access schemes, channel estimation and equalization, and smart antenna and MIMO techniques. He is a Member of the ProRISC technical program committee and the IEEE Benelux Section on Communications and Vehicular Technology (CVT). He is a Member of the Executive Board and Project Leader of the Reconfigurable Radio Project of the Network of Excellence in Wireless Communications (NEWCOM), established under the sixth framework of the European Commission.

Geert Leus was born in Leuven, Belgium, in 1973. He received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Belgium, in June 1996 and May 2000, respectively. He has been a Research Assistant and a Postdoctoral Fellow of the Fund for Scientific Research—Flanders, Belgium, from October 1996 till September 2003. During that period, Geert Leus was affiliated with the Electrical Engineering Department of the Katholieke Universiteit Leuven, Belgium. Currently, Geert Leus is an Assistant Professor at the Faculty of Electrical Engineering, Mathematics and Computer Science of the Delft University of Technology, The Netherlands. During the summer of 1998, he visited Stanford University, and from March 2001 till May 2002, he was a Visiting Researcher and Lecturer at the University of Minnesota. His research interests are in the area of signal processing for communications. Geert Leus received a 2002 IEEE Signal Processing Society Young Author Best Paper Award. He is a Member of the IEEE Signal Processing for Communications Technical Committee, and an Associate Editor for the IEEE Transactions on Wireless Communications and the IEEE Signal Processing Letters.

Marc Moonen received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Univer-siteit Leuven, Leuven, Belgium, in 1986 and 1990, respectively. Since 2004, he is a Full Professor at the Electrical Engineering Department of Katholieke Universiteit Leuven, where he is currently heading a research team of 16 Ph.D. candidates and postdocs, working in the area of signal processing for digital communications, wireless communications, DSL, and audio signal processing. He received the 1994 KU Leuven Research Council Award, the 1997 Alcatel Bell (Belgium) Award (with Piet Van-daele), and was a 1997 "Laureate of the Belgium Royal Academy of Science." He was the Chairman of the IEEE Benelux Signal Processing Chapter (1998-2002), and is currently a EURASIP Ad-Com Member (European Association for Signal, Speech and Image Processing, from 2000 till now). He is Editor-in-Chief for the "EURASIP Journal on Applied Signal Processing" (from 2003 till now), and a Member of the Editorial Board of "Integration, the VLSI Journal," "IEEE Transactions on Circuits and Systems II" (2002-2003), "EURASIP Journal on Wireless Communications and Networking," and "IEEE Signal Processing Magazine."

Hugo De Man is Professor of electrical engineering at the Katholieke Universiteit Leu-ven, Belgium since 1976. He was Visiting Associate Professor at UC Berkeley in 1975 teaching semiconductor physics and VLSI design. His early research was devoted to the development of mixed-signal, switched capacitor, and DSP simulation tools as well as new topologies for high-speed CMOS circuits which lead to the invention of NORA CMOS. In 1984, he was one of the cofounders of IMEC (Interuni-versity Microelectronics Center), which, today, is the largest independent semiconductor research institute in Europe with over 1100 employees. From 1984 to 1995, he was Vice President of IMEC, responsible for research in design technology for DSP and telecom applications. In 1995, he became a Senior Research Fellow of IMEC, working on strategies for education and research on design of future post-PC systems. His research at IMEC has lead to many novel tools and methods in the area of high-level synthesis, hardware-software codesign, and C++ based design. Many of these tools are now commercialized by spin-off companies like Coware, Adelante Techn, and Target Compilers. His work and teaching also resulted in a cluster of DSP-oriented companies in Leuven, now known as DSP Valley, where more than 1500 DSP engineers work on design tools and on telecom, networking, and multimedia integrated system products. In 1999, he received the Technical Achievement Award of the IEEE Signal Processing Society, the Phil Kaufman Award of the EDA Consortium, and the Golden Jubilee Medal of the IEEE Circuits and Systems Society. Hugo De Man is an IEEE Fellow and a Member of the Royal Academy of Sciences in Belgium.