http://jwcn.eurasipjournals.eom/content/2012/1/62 Wireless Communications and Networking

a SpringerOpen Journal

RESEARCH Open Access

Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers

Alessandro Tomasoni1, Massimiliano Siti2, Marco Ferrari1* and Sandro Bellini3

Abstract

In this article we study hardware-oriented versions of the recently appeared Layered ORthogonal lattice Detector (LORD) and Turbo LORD (T-LORD). LORD and T-LORD are attractive Multiple-Input Multiple-Output (MIMO) detection algorithms, that aim to approach the optimal Maximum-Likelihood (ML) and Maximum-A-Posteriori (MAP) performance, respectively, yet allowing a complexity quadratic in the number of transmitting antennas rather than exponential. LORD and T-LORD are also well suited to a hardware (e.g., ASIC or FPGA) implementation because of their regularity, parallelism, deterministic latency, and complexity. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen by the 802.11n standard. We show that, when only global latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively. Notwithstanding the suboptimal low-complexity and hardware-oriented implementation, LORD and T-LORD approach the EXtrinsic Information Transfer of the ML and MAP detectors, respectively. To focus on a specific setting, we consider the indoor MIMO wireless LAN 802.11n standard, taking into account errors in channel estimation and a frequency selective, spatially correlated channel model.

Keywords: MIMO systems, space-frequency BICM, OFDM, ML and MAP detection, Turbo detection, EXIT charts

1 Introduction

Because of the increasing demand of data rate and link robustness in wireless transmissions, Multiple-Input Multiple-Output (MIMO) technologies are nowadays an indispensable option in the wireless communications standards recently released or under definition, such as IEEE 802.11n [1], WiMax [2], and mobile long term evolution (LTE) [3]. In fact, the capacity of the wireless link grows linearly with the number of transmitting or receiving antennas [4,5], when spatial diversity is available. In practice, MIMO is often combined with space-frequency bit interleaved coded modulation (BICM) and orthogonal frequency-division multiplexing (OFDM) [1,2], which ensure that almost uncorrelated channels are experienced by different tones within an OFDM symbol.

To increase the spectral efficiency of the link, the transmitting antennas can be used in layered mode, i.e.,

* Correspondence: marco.ferrari@ieiit.cnr.it

1CNR-IEIIT Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy

Full list of author information is available at the end of the article

each antenna transmits a different symbol in the same bandwidth at the same time. On one side, a sophisticated receiver is needed to solve spatial inter-symbol interference and effectively exploit the theoretical advantages. On the other side, mobile devices must be low power consuming and moderately expensive.

The ideal receiver should consider the likelihood of the received vectors for each possible codeword, jointly performing detection and decoding. This has prohibitive complexity, except for simple space-time codes. In practice, detection and decoding are decoupled, and Soft-Input Soft-Output (SISO) detectors are used in conjunction with SISO decoders in iterative schemes [6], to approximate the ideal receiver through disjoint stages, according to the turbo principle [7]. Turbo detectors exploit the information fed back by the channel decoder as a priori information about the transmitted vectors of symbols. If needed, detector and decoder can be simply applied in cascade, as a special case of turbo detection and decoding without iterations. In the former case, the optimal detector to be used is the Maximum-A-Poster-iori (MAP) detector, while in the latter case the MAP

Springer

©2012 Tomasoni et al; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

detector degenerates into the symbol-by-symbol Maximum-Likelihood (ML) detector, since there is no available a-priori information.

Despite these simplifications, the complexity of the optimal MAP and ML detectors still increases exponentially with NtNb, where Nt is the number of transmitting antennas and Nb the modulation order [bits/dimension]. This is why many researchers have sought suboptimal detection strategies, trying to approach the ideal detector with limited complexity, e.g., via (Turbo) MMSE detection [8,9] or sphere detection [10-14]. The former strategy combines "soft decision" subtraction (soft-DFE) and linear minimum mean square error (MMSE) spatial equalization [8,9,15]. This method, although computationally affordable, can lead to largely sub-optimal results, especially when very high spectral efficiencies are sought with large Nb and Nt and high rate codes. Conversely, sphere decoding (SD) detectors [16] reduce the MAP and ML complexity restricting the search to a subset of the whole hyper-symbol constellation. The SD family can be roughly divided in two groups: the depth-first [10] and the breadth-first [11-14,17-19] SDs. The second family has several advantages, such as fixed latency and lower complexity. However, for a small number of candidates, breadth-first SD leads to a performance degradation. Recently, to improve its behavior in iterative receivers, SD has been combined with MMSE [20]: the linear detector assists the SD, finding a good center of the search sphere and thus improving the performance of the iterative receiver.

One of the most promising proposals is the Layered ORthogonal lattice Detector (LORD) [21,22], and its iterative version, namely Turbo LORD (T-LORD) [23]. A proposal similar to LORD has been published few years later [24]. LORD detects the ML hyper-symbol, or close, depending on the number of antennas involved. In a similar way, the T-LORD approaches the MAP performance, combining the low-complexity spatial DFE principles of LORD with a simple yet accurate method to handle a priori log-likelihood ratios (LLRs). LORD and T-LORD are particularly suited for hardware, parallel implementation and soft-output bit detection, and perform very well in combination with soft decoders like SOVA [25] or BCJR [26] for convolutional codes, or with an LDPC decoder [27]. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen e.g., by the 802.11n standard.

In this article, we propose simplified LORD and T-LORD versions, that keep their vocation for hardware implementation, maintaining deterministic complexity (quadratic in the number of transmitting antennas) and flexibility for setting the desired performance-complexity trade-off. Besides, we show that, when only global

latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively.

In all cases the performance is very good. We show in particular that LORD and T-LORD can perform very close to the ideal MAP detector for at least up to four antennas and for modulation orders of 3 bits per dimension, even in a realistic setting, with imperfect channel state information (CSI) and correlated channels. Furthermore, we show that our detectors exhibit the same EXIT chart behavior as the MMSE-assisted SD [20], yet with a lower complexity, even more reduced w. r.t. [23]. Recently, a strategy that recalls in principle the idea proposed in this article for OFDM tones processing has been proposed in [28], with a different detector.

The article is organized as follows. Section 2 describes the system model. Section 3 recalls the full-complexity LORD and T-LORD algorithms. Section 4 motivates and describes our low-complexity hardware-oriented LORD and T-LORD proposals. Section 5 shows their performance. Section 6 concludes the article.

2 System model

Consider a MIMO communication system with Nt antennas at the transmitter side and Nr > Nt at the receiver side. To focus on a practical application, we adopt many of the parameters from the 802.11n standard [1]. At the transmitter, each Wireless LAN packet carrying 390 data bytes is encoded by a 64-states binary convolutional encoder, space-frequency interleaved and Gray-mapped onto an M-QAM constellation. An OFDM modulator, for each spatial stream, splits the overall frequency band (20 MHz) in Nf = 64 sub-bands (tones), out of which Ndc = 52 are data carriers. The block diagram in Figure 1 summarizes the main operations applied to a packet, when Nt = 2 and Nr = 2. The extension to more than two antennas is straightforward.

The OFDM format allows to separately consider each tone. Therefore the dependency on the carrier index is omitted in the article, and all the equations refer to a single OFDM tone. The received signal y e CNr reads

y = Hx + n (1)

where x g CN' contains the transmitted symbols and n g is an i.i.d. Gaussian complex white noise vector with covariance matrix: Rn = E [nnH] = N0I. The transmitted M-QAM symbols are uncorrelated, with zero mean and variance = 1, for each transmitting antenna. Therefore, the transmitter Signal-to-Noise-Ratio SNRTX equals Nt/N0.

The MIMO (frequency selective) channel is represented by H e CNxNt, whose elements hr,t are the

OFDM time S/T

Convolutional Encoder

T7ÍT7

OFDM time 2

<l-' Bit Generator 1

1 i 1 & Comparer i BER L

OFDM time 1

Space - Frequency Bit Interleaver

OFDM time S/T

Extrinsic Information

Convolutional - OFDM time 2

Decoded Bits Decoder Space - Frequency

OFDM time 1 Bit Deinterleaver

OFDM time 1

QAM OFDM

Mapper Modulator

QAM OFDM

Mapper Modulator

Detector Demodulator

X OFDM

Demodulator

Space - Frequency Bit Interleaver

OFDM time 2

OFDM time S/T Figure 1 Transmitter block diagram.

Channel

complex (flat) gain of the path between transmitter t and receiver r, at a certain tone. These elements are normalized, i.e., E [|hr,t|2] = 1, with t =1, 2,..., Nt and r = 1, 2,... , Nr. More details on channel assumptions are given in Section 5.

At the receiver (see Figure 1), after OFDM demodulation the symbol vector y is passed to the detector, which computes the LLRs of the coded bits 1t(n), with t =1, 2,..., Nt and n = 1, 2,..., 2Nb, where Nb = log2(M)/2 is the number of bits per dimension. The soft values are de-interleaved and passed to a decoder. In case of serially concatenated detection and decoding, the decoder, e.g., a Viterbi one, outputs hard bits. Conversely, in case of turbo detection and decoding, the SISO decoder, such as the BCJR [26] or the SOVA [25], also outputs the extrinsic information used as a priori LLR ft(n) at the detector (after interleaving)

It (n) = ln

Pjbnjxt) = 0) P{bn{xr) = 1)

where bn (xt) e {0,1} is the i-th bit of symbol xt. Taking advantage of the soft information fed back by the

decoder, at each iteration the detector produces more reliable extrinsic information to be passed to the decoder itself:

^t(n) = fat(n) - It(n).

Here jt (n) is the a posteriori LLR of bn (xt), computed during the detection process. In the next section we focus on this detection stage.

3 Detectors outline

In this Section, we give a synthetic description of LORD and T-LORD, indispensable to understand the implementation proposed in the successive sections. For more details about LORD and T-LORD and their various versions, we defer the reader to [23]. We also recall the optimal MAP and ML detectors. Rather than practicable solutions they are a reference for the performance of any other detector. The reader interested in other techniques, such as the MMSE and the Sphere, can refer again to [23], where a detailed comparison with the "standard" T-LORD is reported, both in terms of complexity and performance.

3.1 MAP and ML detectors

A MAP detector accepts the received vector y and the a priori information, coming from the decoder, and evaluates the probability of all possible transmitted vectors x. These two contributions can be easily identified in the following metric:

Il u »2 Nt 2Nb

||y — Hxl| b <Kx) =--~-+

J2J2b' (i)

t=1 i=1

With no a priori information, i.e., ft (i) = 0, (4) reduces to the ML metric.

Equation (4) is the basis for the computation of a posteriori LLRs

... Ex:b„(xt)=0 exP Hx) M m

<Mn) = ln ———-— ss max - max iplx) (5)

Ex:b„(xt) = 1 exP <P(x) x:t„(x,)=l x:b„(xt)=0

The last term in (5) is the (typically very accurate) Max-Log-MAP approximation. As mentioned in the introduction, the number of complex Euclidean distances (ED) and a-priori probabilities to compute, either in case of MAP and ML, as well as of their Max-Log approximations, increases exponentially with NbNt.

x, j = Nt

argmin jt) — rj,j(t)x — ^ rj,i(t)xi(t,x) , j < Nt x i=j+1

Finally, the algorithm builds Nt different sets, back to the "original" order:

St = {n(t)X (t, x), Vxj

and performs the Log-Likelihood search only over the elements in (9):

xe5t:bm(xt)=1

|y(t) — R(t)x|

xe5t:bm(xt)=0

|y(t) — R(t)x|

If Nt = 2, it can be shown that the set St contains, for each possible bit of xt, the closest hyper-symbol x having bm(xt) equal to one or zero. Thus, the algorithm has the same performance of the Max-Log ML detector. On the contrary, if Nt > 2, this is not assured, due to possible error propagation in the decision feedback equalization (DFE) (8). This sub-optimal behavior can be mitigated crossing many sets Sj, i.e., letting a hyper-symbol x e St be replaced by the candidate x' e Sj, if its ED is smaller and Xt = xt = X:

3.2 LORD detector

The LORD algorithm is composed of two stages.a The former is a pre-processing, common to several received sequences y, as long as the channel can be supposed constant for several OFDM symbols. It consists in Nt QR factorizations of the channel matrix H, with permuted column orders:

Q(t)R(t) = Hn(t)

S't = j argmin ||y(t) — R(t)nH(t)x'||2,Vx| (11)

I tfeSj^Vj'xt=x I

3.3 T-LORD detector

Turbo-LORD is a generalization of LORD, able to manage a-priori information. Basically, (9) is replaced by

where n(t) = [u1... ut-1 ut+1... uNtut] is the permutation matrix which moves the t-th column of H in the last position. Thus, the received symbol and the system model can be rewritten as

y(t) = Q(t)Hy = Rn(t)Hx + Q(t)Hn = Rxx(t) + n(t) (7)

without impairing the receiver performance as long as the AWGN is spatially i.i.d., since n is still Gaussian with Rn = QH(t)RnQ(t) = N0I. The evaluation burden of this phase is negligible if the channel can be supposed constant for several consecutive OFDM symbols, thus we focus on the second stage of the algorithm.

For each permutation n(t), the LORD algorithm explores every possible transmitted QAM symbol xt, moved in the lowest position of x(t), called "root layer" from now on. For each hypothesis xt = Xnt = x, the algorithm subtracts its interference over the upper layers, chooses the closest symbol over the new interference-free layer (through a simple slicing) and iterates this process, up to y}1(t . In formulas:

St = | arg max ^(x), Vx

[ xeUt (xt)

where the metric ^ (x) is the same as (4) and all the elements in the same subset Ut(X) share the same root symbol X:

x e Ut (x) ^ xt = x

Thus, the a-posteriori LLR (5) is eventually approximated as

&(n) w max^ ^(x) - max^ ^(x) (14)

xeS, :bm(xt )=0 xeS, :bm(xt )=1 v '

The cardinality of Ut(X) is selected according to the desired complexity-performance trade-off. E.g., one could consider just two hyper-symbols, the one with with the highest a-priori probability, and the one with the smallest ED:

xA(t,X) = {x : (xt = X) and (bi(xj) = sign(^j(i))Vi,Vj = t)} (15)

xD(t,x) = argmin ||/ — Rx'|

Ut(x) = {xA(t, x), xD(t, x)}

Using the terminology introduced in [15] in the scalar case, (15) and (16) are said to obey the a priori and the distance criteria: T-LORD assumes that the symbol with the smallest (4) is either x4 or xD.

When Nt > 2, the distance to be minimized is function of two or more variables, hence the solution can not be found through simple slicing. As in the previous section, one can rely on a DFE process, but there is not guarantee that the chosen symbol is actually the closest, because wrong decisions in intermediate layers can happen.

We denote this sequentially chosen hyper-symbol as

xD.....D (t, x), to underline the layer by layer application of

the distance criterion, over the upper layers. In the same way, even the a-priori criterion can be applied layer by layer, processing blocks of 2Nb LLRs per layer, letting their signs drive the choice of QAM candidates and subtracting their interference from the upper layers, this time without introducing errors.

In [23], other criteria have been proposed to choose the hyper-symbols in Ut(X), e.g., to take into account not only the most probable a-priori symbol, but also the second one xF(t, X), that can be easily computed by flipping the weakest a-priori LLR. Furthermore, there is no need for applying the same criterion at each layer: they can be mixed, retaining only the K candidates with the best Partial A-Posteriori Probability (PAPP)

,, ,,,, llyWt(t) - R,UNtMNt (t)XUNt (t)||2 N UrMh /101

<P (xi.'-N,(0) =--!-T}-+ M-W) Si-lirl(') (loj

0 j=k i=1

where k + Nt stands for k,k + 1,...,Nt and

f(j, t) =

j < t j = t

j - 1 j > t

next section, is K = 1. This corresponds to retaining at each layer only the best new candidate. This algorithm can be interpreted as a decision feedback equalizer (DFE), driven by the aforementioned criteria.

Finally, though the above enhancements proved to be effective, it can anyhow happen that the above generalized DFE process fails, missing the correct detection at some intermediate layer. However, when Sj for another transmitted symbol is computed, the distance and the a priori criteria may select some symbols x with xt = X in the upper layers with a better metric (4). So, in analogy with (11), one can augment St as follows:

arg max

xeUs (x)vxeSj=t :xt=

<p(x), Vx!

is the LLR permutation, coherent with matrix n(t). The K-best algorithm builds Ut (X) as follows. At the first step, the partial set UN(X) contains only X. At the k-th step, according to each possible criterion (a priori, distance and possibly flipping), the K-best algorithm expands each candidate in Uk+1(X) (note that index k decreases while the algorithm proceeds). Only the K best results, out of this expanded set, are retained in the partial set Uk(X , while the other ones are discarded. At the last step, when k = 1, we declare Ut(X) = U1(X) and the T-LORD search goes on as explained in Section 3.2.

We remark that (18) can be recursively updated, layer by layer, adding the new partial ED and a priori terms to the previous partial metric, saving computing time and power. A very interesting choice, as shown in the

One can first compute (12) for all t, then cross data to obtain the improved S't. This enhancement implies no growth of the number of checked hyper-symbols, but only extra latency and complexity due to additional metric comparisons and selections.

4 Hardware-oriented, low-complexity LORD

As we are going to show, the number of QAM symbols that LORD must check is largely affected by M, the cardinality of the constellation, which can even reach hundreds of points. Therefore, we aim to reduce the number of candidates in St, exploring only a subset of the QAM constellation at the root layer. Thus, the hyper-symbol span described in (8) is performed only with the root belonging to this reduced set of QAM symbols. Trying to preserve the regularity and parallelism typical of the full-complexity LORD, we restrict our attention to square subsets of the QAM constellation, centered in the received (equalized) signal yNt(t)/rN„N,(t), at the root layer (red cross in Figure 2 a).

The performance of the detector depends on the probability that the transmitted symbol does not belong to the square QAM subset, since when this happens, the LORD algorithm fails with high probability. To describe how this border violation behaves, we keep the QAM constellation size fixed and properly re-scale the noise power. After the pre-processing operations in (7), the channel gain rNt,Nt(t) multiplies the signal XNt(t), and the noise at the root layer reads

(t) = qNt (tK

which is a Gaussian variable with zero mean and

unchanged power E fiNt |2] = NoqNt(t)qNt(t) = No.

Thus, the actual noise power affecting the fixed-size QAM constellation of Figure 2a reads

n nt (t)

rN ,Nt(t)

'Nt ,Nt

This provides a better insight into our problem. Indeed, let us assume the channel is composed by i.i.d. Gaussian variables. Due to the Gram-Schmidt orthogo-nalization in the QR process, rNt,Nt (t) is a Rayleigh random variable with unit power and probability density function given by

pRN„Nt (rNt,N ) = 2rNt,Nt exp( —rNt,Nt)

We can easily compute the pdf of the actual noise standard deviation, normalized by v/No, being the output of the function d =/(?Nr,Nr) =

rNt ,Nt

po{d) = PRn"n'

f—1(d))

\f'(f-\d))\

This pdf is plotted in Figure 2b (solid blue line), along with other simulated pdfs (dashed red lines), referring to a MIMO system with Nt = Nr = 2 and spatially correlated channel gains. It can be observed that the correlated case is even worse, because the matrix H can easily be ill-conditioned, i.e., with the last diagonal elements of R(t) close to zero.

Even more importantly, the pdfs always exhibit long tails at high standard deviations. This suggests the

following interpretation of the SNR behaviour at the root layer: quite often, the noise level is moderate; occasionally it is very high. The square side reduction is expected to be applicable only in the former case, since any approximation in the latter case would likely be harmful. This idea will be made practical in the following section.

4.1 Algorithm description

Aiming at square subsets (see Figure 2a) that include the transmitted symbol with a high probability, square side lengths larger than ay/No/2 (with a > 1) should be considered. Probably, one could approach the performance of the full complexity LORD with ad-hoc tuning of the parameter a, fundamental to find a good tradeoff between complexity reduction and capability to detect the ML point. Nevertheless, this solution would be sensitive to the parameter a, thus not well suited to any device implementation.

On the contrary we propose an attractive alternative solution, which does not require any parameter to be tuned, once the architecture has been chosen. Indeed, we suggest to perform the full-complexity search over the whole QAM constellation for the carriers affected by the worst-case noise (in the following, "worst carriers" for brevity), limiting the search for all the other ones to a square subset with a fixed number of points. This way, we always reserve the more robust (full-

complexity) algorithm to the carriers affected by higher noise powers at the root layer. This significantly reduces the probability that a transmitted point falls outside the reduced square subset of Figure 2a. Clearly, the higher the hardware capability of the device, the higher the fraction of carriers that can be fully spanned. The fundamental hypothesis, at the basis of this solution, is the existence of detection time constraints (measured in number of clock cycles) only at the level of the entire OFDM symbol, and not carrier-by-carrier. This hypothesis seems quite reasonable, since devices for typical applications have to conclude the detection of an entire OFDM symbol within a fixed time, say Tmax.

Let us focus on the QAM symbols transmitted by the same antenna within an OFDM symbol. Define the parallelism Pc as the minimum number of DFE processors (8) able to exhaustively analyze only the worst N^11 carriers, limiting the search for the rest of the tones within a square subset of cardinality S2, as in Figure 2.

Assume that each DFE processing (8) takes Xelem clock cycles. Thus, the low-complexity LORD requires a number of elementary processing units per antenna Pc such that

Nf'M + (Ndc - NLM1)S2

_ 1 max

(2S - Plc) -

Plc + s

As a special case, when N"11 = Ndc we obtain the full-complexity LORD parallelism PfuU, satisfying Telem |~NdcM/PM1] < Tmax.

For example, assume that Tmax = 6NdcTelem, i.e., on average 6 clock cycles are allowed to conclude the detection process for each antenna permutation, at each data carrier.b With Nt = 2, Ndc = 52 and a 64-QAM constellation, Figure 3 plots the maximum number of fully-spanned tones (solid lines) as a function of the parallelism Plc and of the square subset side S. Clearly, the higher the number of processing units, the higher the number of exhaustive searches we can exploit. On the contrary, the larger the square side length S, the lower the number of full-complexity detections.

To efficiently compute not only (8) but also (10), we impose further regularity to the hardware supposing that at each clock cycle the device can work only over a row/column of the square.

When Plc > S, as in Figure 4a, the square processing is always performed in the same direction, e.g., by columns. On the contrary, when Pc <S, the square is computed through a tessellation, as in Figure 4b. It can be shown that this kind of processing requires

clock cycles and, with a reasoning analogous to (25), leads to a smaller number of available fully-spanned carriers, as reported in Figure 3 (dashed lines). Curves in case of tessellation have been plotted only up to S = 6, which is the limit case, since with 6 clock cycles (on average) the detection time is exhausted for performing the low-complexity algorithm over each carrier (i.e., N"11 = 0). If S > 6, it is not possible to meet the overall OFDM time constraints of our example. Nevertheless, it will be shown in the next section that S = 5 is enough for all cases of practical interest we have tested. This means that the ML performance can be approached with S ■ Plc/M = 25/64 « 40% of the original LORD complexity.

4.2 Extension to T-LORD

From the description of LORD and T-LORD outlined in Section 3, it is clear that the LORD algorithm is actually a special case of the T-LORD, when all the a-priori LLRs it(n) are zero. Indeed, in this case there is no point in applying the a-priori criterion, since any symbol

has the same a-priori probability equal to —. Only the

distance criterion makes sense, and its DFE process is actually the same as (8). Finally, having just one meaningful candidate per set Ut (x , also the K-best approach becomes superfluous. To summarize, the distance criterion in the T-LORD works as the LORD algorithm. For this reason we can generalize the LORD hardware-oriented simplification, presented in the previous section, to the T-LORD.

Basically, the full T-LORD algorithm is performed only for the most attenuated carriers. For the rest, the DFE process is run just for a subset of root layer symbols. In this case, the candidate sets Ut(x) are not determined for any hypothesis x, but only for those belonging to a properly chosen square subset of the QAM constellation. The only difference with LORD is in the way we can choose the square subset of cardinality S2. Indeed, in principle one could find it at the first iteration, as shown in Section 4, and let it unchanged along iterations. Though this approach is quite attractive, it is potentially harmful, since it could inhibit the a-priori LLRs influence on the detector outputs, if the search gets stuck in a bad subset of root layer symbols, not containing the transmitted one. An effective solution is to let the a-priori information drive the choice of the square subset, in conjunction with the observed

S 30 0 c

c 25 co 25 a.

20 # 15

-s-S=3, tessellation S=4

-©-S=4, tessellation -X-S=5

-X-S=5, tessellation ^-S=6

S=6, tessellation -*-S=7

■---[]

----X.....-X

"7------Ö------9~""""1l

parallelism

Figure 3 Maximum number of fully-spanned subcarriers, depending on the desired parallelism (in abscissa) and on square side length (different lines).

signal yNt(t). We compute the L-MMSE estimation of the transmitted symbol on the root layer, performing a weighted maximal ratio combining (MRC) of the equalized received signal xo(t) and of the a-priori expected

symbol Xa(î):

fc(0 - ^ (27)

rN ,Nt (t)

(a) (b)

• • • • • • • • • •

• • • • • • • • • •

• • • • • • • • • •

• • • • • • • • • •

• • • • • • • • • •

Figure 4 Processing of a 5 x 5 QAM constellation by a regular hardware. (a) with parallelism 6; (b) with parallelism 3.

°-d(0 =

\rNt (t)|

*,m Vt n exP№№(0)

Vi(0 VU 1 + exp(ft(/))

j=1 1=1

2(t) = E

1 + exp(& (i))

x( t) =

al(t)xD(t) + a£(t)xA(t) °D(0 + °A(0

In case of null a-priori information, a|(t) = ctJ and the square subset choice is practically the same as the LORD one, since aX = 1 is typically much greater than ctD(t). Conversely, when the a-priori information in high, the received symbol is ignored in the calculus of x(t), since o|(t) is small. Finally we remark that (29) and (30) can be efficiently computed with techniques similar to [15].

4.3 Related issues

As the constellation search is restricted at the root layer, there is no guarantee that at least one candidate symbol in St exists for each value of any bit of xt. In this case, if the crossing processes (11) or (20) do not recover one in S't, one of the two terms in (10) and (14) is missing for that particular bit. Clipping approximations, like assigning a fixed (finite or infinite) value to its LLR, based on the hard decided ML or MAP symbol, are not completely satisfactory. Nevertheless, for a Gray 64-QAM constellation this approximation is required only when S < 4. In fact, as clear in Figure 5, if we consider five or more adjacent symbols of an 8-PAM Gray constellation, we are assured to span at least one symbol for each possible bit value.

Another problem is how to efficiently find the square subset of Figure 2a. An efficient solution is to apply, for

each dimension, the Euchner-Schnorr "zig-zag" algorithm [29], which determines the symbol closest to the received one or to the estimated x(t), and alternatively adds points on its left and right, till the boundaries of the constellation or the square subset sizes are not exceeded.

4.4 LORD and T-LORD complexity

In this section we discuss the complexity of the proposed hardware-oriented LORD and T-LORD. A simple measure to rate the complexity of any detector is the number of spanned modulated symbols, i.e., the number of EDs to compute. Indeed, this is approximately proportional to the number of multiplications (usually more expensive than additions in hardware). E.g., one could compare the ML receiver with LORD and MMSE. With the above definition, the ML complexity is larger thanc MNt. Conversely, LORD evaluates

Cl = MNt (Nt - 1)

EDs, while the MMSE essentially requires 2 points per coded bit (we can exploit the Gray mapping regularity as in [30], "folding" the constellation), i.e., 2Nt log2 M on the whole. Though the computation of the number of EDs is only a preliminary tool to evaluate complexity, it reveals that the ML cost is exponential in the number of antennas, practically unaffordable even for small arrays, while the LORD complexity is only quadratic. Analogous considerations can be done in case of iterative detectors.

Here, we focus on the complexity reduction of the simplified LORD and K-best T-LORD, w.r.t. the "original" ones. For an exhaustive complexity analisys of all the T-LORD versions as well of other detector families, we refer the reader to [23]. As reported therein, the K-best T-LORD (with K =1 and all the enhancements set on) computes

Cktl = 3MNt(Nt - 1)

EDs. From (32) and (33), it is clear that when the constellation cardinality M is large, it represents the largest

000 001 011 010 110 111 101 100

Figure 5 Five consecutive 8-PAM symbols with gray mapping provide at least one candidate for each possible bit value.

contribute to the LORD and T-LORD complexity. The simplification proposed in the previous Subsections reduces that factor, and the complexity (averaged w.r.t. frequency tones) becomes

Nf "M + (Ndc - NLM1)S2 Ndc

Clc-L =

■Nr{Nr - 1)

Nf "M + (Ndc ~ N^n)S2 Ndc

■Nt{Nt - 1) (35)

To strenghten the above analysis, we study the number of multiplications, additions and comparisons, also distinguishing those performed just once (such as the QR decomposition), from those to be repeated for every detection process, i.e., referring to a single tone, OFDM symbol and iteration. Results for these fixed and variable operations are reported in Table 1, for the most complex case that we have investigated, i.e., M = 64 and Nt = 4. The table also comprises the complexity referring to the soft-output generation stage. For completeness, the SIC-MMSE [15], the Full-Complexity T-LORD [23] and a SD have been reported, too. Among different SD families, a breadth-first list detector has been considered, since it guarantees deterministic complexity and latency, as T-LORD does. The list size is K = 36, chosen to achieve a performance close to the T-LORD one.

Focusing on the low-complexity K-best T-LORD, we have chosen a reduced square subset of side S = 4, the smallest available parallelism Pc = 4, and a full search over N^11 = 8 carriers. The square subset center is driven both by the received symbols and the a-priori LLRs. This solution is very attractive, since the loss w. r.t. the full-complexity T-LORD evanishes after some iterations, as shown in the next section. Furthermore, being the square subset 4x4, the same hardware could be used also to detect a 16-QAM constellation, with negligible incremental costs. As we can see in Table 1, the low-complexity K-best T-LORD saves less multiplications and additions than those expected looking at the number of computed ED. Indeed, there

Table 1 Number of real operations per tone and iteration

(Nt = 4, M = 64)

Multiplications Additions & Comparisons

Detector Symbols Fixed Variable Fixed Variable

SIC-MMSE 56 672 372 536 2077

Sphere detector 6976 744 4414 583 19456

LC K-best T-LORD 842 848 616 971 22278

K-best T-LORD 2304 832 1082 571 28919

T-LORD Full 9984 832 7994 571 83495

are additional operations to perform, e.g., (29)-(31) and the identification of the N"11 worst carriers. Also the crossing between candidate sets St contributes to lower the complexity reduction, since it must be performed within the entire constellation, and not only among points of the square subset. Anyway, the simplified T-LORD greatly benefits from the reduction of the spanned symbols, and almost halves the number of required variable multiplications. Also the number of variable additions is sligthly smaller (the a-priori criterion remains almost unchanged).

To conclude, we remark that not only the device area (i.e., the number of required logic gates) benefits from the simplification proposed in this paper, but also the power consumption, as reported in [31]. E.g., in case of LORD with Nt = Nr = 2 and M = 64, assuming a 65 nm CMOS technology with an 80 MHz clock frequency, the area is reduced from 0.64 mm2 to 0.21 mm2 and the power from 38 to 14 mW, respectively. Therein, a comparison with prior designs can be found, too.d

5 Simulation results

In this section, we provide performance results, both in terms of extrinsic information delivered by several detectors and Monte Carlo simulation of the receivers embedding them.

We assume two different environments, referred in the following as the "ideal" and "real" one. Firstly, we use a rich scattering channel, whose coefficients hr,t are i.i.d. complex gaussian values with unit power. Perfect CSI at the receiver side is assumed, too. Then we consider a more realistic channel, with exponentially decaying power delay profile (PDP) and a short time delay spread Trms = 50 ns (equal to the sampling time). Spatial correlation is assumed equal to r(ti, t2) = 0.5'tl-t2', being t1 and t2 the antenna indexes, sorted in ascending order from a border of the linear array. The perfect CSI hypothesis is abandoned, and substituted by pilot aided tone-by-tone channel estimation (CE): due to the average of subsequent orthogonal long training sequences (LTS), as in [32], each channel tap estimation hr,t is affected by i.i.d. Gaussian noise with power

= |-iog°N -| • N° nor frequency smoothing, such

as [32,33], is adopted, since this would have reduced the difference between the ideal and real settlements.

5.1 EXIT charts

In an iterative receiver, a detector is rated on its capability to transfer extrinsic information to the decoder. EXIT charts are an effective tool to predict the convergence behavior of iterative systems [34] and to design component codes [27,35], even in case of (possibly

MIMO) selective channels. The EXIT analysis assumes independent a priori LLRs &(«), drawn at random from some probability density function (pdf) often modeled as the output of an AWGN channel with variance twice the mean The output pdf p(l) of extrinsic LLRs is generally sampled experimentally. The mutual information for a consistent pdf is [36]

/=i- y K0iog2 (1 + 7(jf)dl (36)

In case of serial concatenation between the detector and the decoder, the quality of the detector output can be evaluated looking at the leftmost value in the graph, corresponding to absence of a-priori information (see, e. g., Figure 6). On the contrary, in turbo receivers one can track the system convergence overlapping the charts of the two iterative modules (with exchanged axes), since the output of the former becomes the input of the latter and so on.

In Figure 6, we plot (both in ideal and realistic channel conditions) the EXIT curve of the hardware oriented T-LORD developed so far and, as references, the SIC-MMSE [9] and the MAP.

We choose SNR = 23dB, corresponding to a target packet error rate (PER) close to 10-2. The T-LORD algorithm produces at the output almost the same information as the MAP detector, for any input information IA. On the contrary, SIC-MMSE is largely suboptimal and is expected to introduce severe losses in an iterative receiver. Monte-Carlo simulation will confirm these predictions.

In Figure 7 we further compare the EXIT curves of different T-LORD detectors. As we can see, the gap between the hardware oriented T-LORD with S = Plc = 5 and the full-complexity T-LORD is small. Conversely, the case S = Plc = 4 without the update of the square subset center leads to some information loss.

An interesting choice is S = Plc = 4 with the update enabled (denoted throughout figures as "moving square"), exhibiting an abrupt change of the delivered information, as the a-priori information gets large. When no a-priori information is available, the reduced square positioning is based only on the received, noisy symbol, therefore when the transmitted signal lies outside the reduced square, some extrinsic information is lost. Conversely, when the square positioning can

— -a-

_____-à- -

- - -P"

-©-MAP (ideal channel) T-LORD, moving square, S=Plc=5 (ideal channel) -B-SIC-MMSE (ideal channel) ■©-MAP (real channel)

T-LORD, moving square,

■ X " lc

S=Plc=5 (real channel) -B-SIC-MMSE (ideal channel)

Figure 6 EXIT charts for different detectors (SNR = 23 dB).

0.85 -

0.75 -

T-LORD, full complexity (ideal channel) T-LORD, moving square, S=Plc=5 (ideal channel) T-LORD, moving square, S=Plc=4 (ideal channel) T-LORD, fixed square, S=Plc=4 (ideal channel) T-LORD, full complexity

(real channel)

T-LORD, moving square, S=Plc=5 (real channel) T-LORD, fixed square, S=Plc=4 (real channel)

0.2 0.3 0.4 0.5 0.6 0.7

0.8 0.9

Figure 7 EXIT charts for 3 x 3 T-LORD detectors (ideal and real channels, both, SNR = 23 dB).

T-LORD full complexity T-LORD moving square, " S=Plc=4 T-LORD fixed square, S=Plc=4

SIC-MMSE

SNR [dB]

Figure 8 Detectors performance for a MIMO system with Nt = Nr = 2, Rc = 5/6 and 64-QAM (10 bit/carrier), with ideal channel conditions.

benefit from a priori information, a relevant fraction of extrinsic information is recovered. This closes the gap iteration by iteration, as confirmed by simulation in the following. Similar results hold for the realistic channel.

5.2 Monte Carlo simulations

In this section we provide simulation results for the above described low-complexity LORD detectors, with different square sides and parallelisms. For comparison, we plot also the MAP, MMSE [9] and full-complexity T-LORD [23] curves. Simulations are floating-point. Iterations range from 1 to 4. Iteration 1 means no extrinsic information is available to the detector, i.e., LORD and T-LORD coincide, as well as MAP and ML. Aiming to achieve very high spectral efficiencies, up to 15 bit/carrier, and to test the simplified T-LORD in challenging conditions, we always consider a channel code rate Rc = 5/6, i.e., the most sensitive 64-QAM mode in the 802.11n standard [1].

Figures 8 and 9 plot PER vs SNR for the case Nt = Nr = 2 and Nt = Nr = 3, respectively. Target PER has been set equal to 10-2, a common assumption in wireless LAN communications when retransmission is allowed.6 Here, we assume ideal channel conditions (i.e., Gaussian uncorrelated channels with perfect CSI at the receiver).

As we can see, the T-LORD performance is close to the MAP detector, and largely outperforms the MMSE receiver, plagued by ill-conditioned channel matrices. Only the T-LORD with S = Plc = 4 and fixed subset choice at the root layer has a modest loss, say 0.2dB more than the full-complexity T-LORD.

The T-LORD robustness w.r.t. MMSE becomes even more pronounced in Figure 10, assuming the realistic channel conditions described at the beginning of this section. Part of the SNR gap between ideal and realistic conditions can be ascribed to the noisy (tone-by-tone, ZF) channel estimates, computed exploiting the orthogonal preambles in [1]. The estimation error can be interpreted as additional noise over the link, and one

SNR [dB]

Figure 10 Detectors performance for a MIMO system with Nt = Nr = 3, Rc = 5/6 and 64-QAM (15 bit/carrier), with realistic channel conditions.

can expect an overall performance degradation

No + Nt aCE

= - 1 +

2 [logiNt ]

equal to 3dB when Nt = 2 or Nt = 4, and slightly smaller (2.4dB) when Nt = 3, since the 802.11n preamble contains one more LTS than the number of transmitting antennas. The remaining 3dB loss, that one would experience even in case of ideal CE, is due to the severe channel described in Section 2, with an exponentially decaying PDP, short time delay spread and spatial correlation. In this challenging case with less spatial diversity, the MMSE receiver completely fails to improve with iterations, while the full-complexity K-best T-LORD misses the MAP performance by only 0.5 dB in the realistic case. This gap is probably due to error propagation in the DFE process.

For completeness, in Figures 11 and 12 we also report simulations for the case Nt = Nr = 4, both for ideal and

realistic channels.f In this case, the extremely time consuming MAP has been replaced by a lower bound, assuming that MAP receiver can fail only when also the full complexity T-LORD is not able to recover the message.

Focusing now on LORD detectors, S = Plc = 5 is enough to approach the optimal ML detector performance as the full-complexity LORD does, while in case of S = Plc = 4 LORD suffers some performance degradation. This can be explained by a higher probability that noise overcomes the square subset borders or the soft output generation misses some EDs in (10) or (14). Thus, the former parameters have been chosen for the HDL implementation of LORD [31]. Conversely, focusing on the fourth iteration, the above gap is almost closed by the S = Pc = 4 T-LORD with the square subset center update (as predicted by EXIT charts), performing even better than the case S = Plc = 5, with fixed subset.

To conclude, we defer readers interested in a performance comparison with sphere detectors to [23]. Results

OU LU CL

SNR [dB]

Figure 11 Detectors performance for a MIMO system with Nt ■■ conditions.

Nr =4, Rc = 5/6 and 64-QAM (20 bit/carrier), with ideal channel

show that T-LORD achieves better performance computing less EDs, in any simulated case.

6 Conclusions

In this article, we have proposed innovative hardware-oriented, soft-output LORD and T-LORD algorithms, that can heavily reduce the number of parallel elementary processing units, required to meet the latency constraints in MIMO-OFDM systems, when high-cardinality QAM constellations are deployed. The simplified versions preserve the features of the original algorithms, i.e., fixed complexity, deterministic latency and a remarkable parallel structure. The proposed solution is regular, scalable and does not require any ad hoc parameter tuning, e.g., depending on the experienced average SNR or the actual channel realization.

Besides, the loss in 802.11n systems w.r.t. the ML and MAP detectors is very small (few tenths of dB). We tried several configurations up to 20 bit/carrier (Nt = 4), corresponding to a system throughput of 260 Mb/s, if we

consider the 802.11n standard. We also tested the system with very noisy channel estimates, as well as a more realistic channel offering less spatial and frequency diversity, due to correlation. In each case, the simplified LORD and T-LORD showed comforting robustness, outperforming the non-iterative ML and the iterative SIC-MMSE receiver, and always approaching the receiver with the ideal detector. These features make LORD and T-LORD good candidates for VLSI MIMO receivers.

Endnotes

aFor brevity, we give a simplified description of the LORD algorithm. For more details, refer to [21] and [22], where a real-domain modified QR decomposition allows to avoid the normalization of the columns of Q. Nevertheless, the low-complexity, hardware-oriented LORD and T-LORD presented in this article can be also applied to that framework, as shown in [31]. bObviously this value depends on the hardware, but it is reasonable for an FPGA device (with a clock frequency of tens of MHz) aiming to process

iteration-

MAP (lower bound) T-LORD full complexity T-LORD fixed square, S=Plc=5

T-LORD moving square, S=Plc=4

T-LORD fixed square,

S=Plc=4

SIC-MMSE

SNR [dB]

Figure 12 Detectors performance for a MIMO system with Nt ■■ conditions.

Nr =4, Rc = 5/6 and 64-QAM (20 bit/carrier), with realistic channel

in realtime an ODFM symbol lasting 4 ^s and carrying 52 data carries, as in [1]. cNote that the measure does not refer to the number of Nt-dimensional hyper-symbols, but to the number of spanned QAM symbols throughout the algorithm. dA fair comparison of area and power consumption is hard to achieve, since many parameters change from one design to the other (e.g., clock frequency, CMOS technology, modulations, antennas, soft-output generation). Nevertheless, in [31] it is shown that LORD provides a very good trade-off in any case. eE.g., the standardization group for 802.11n chose PER = 10-2 for performance comparison purposes. fA straight performance comparison between systems with a different number of antennas is hard to achieve. The SNR letting the system meet the target PER changes when the number of antennas gets large and its trend is hard to foresee, for at least two reasons. On the one hand, we exploit the capacity growth to increase the throughput, not to strengthen the communication. On the other hand, a larger number of antennas makes the data packet shorter, since more

information is conveyed at each channel use: this reduces the PER for a given SNR.

Acknowledgements

We warmly thank Eng. Teo Cupaiuolo for the VLSIdesign of LORD. Author details

1CNR-IEIIT Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy 2STMicroelectronics, Agrate Brianza (MB), Italy 3Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy

Competing interests

This work has been partially supported by the Advanced System Technology Group of STMicroelec-tronics, via Olivetti 2, 20864 Agrate Brianza (MB), Italy. Some solutions are protected by US and EU patents.

Received: 14 May 2011 Accepted: 24 February 2012 Published: 24 February 2012

References

1. Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specification-Amendment 5: Enhancements for Higher Throughput, IEEE 802.11n-2009, Std (Oct 2009)

2. WiMAX Forum Mobile System Profile Release 1.0 Approved Specification, Revision 1.5.0: 2007-11-17, Std (Nov 2005)

3. Evolved universal terrestrial radio access (EUTRA) and evolved universal terrestrial radio access network (EU-TRAN); overall description, 3GPP TS36.300 V9.0.0, Tech. Rep (June 2009)

4. E Telatar, Capacity of multi-antenna gaussian channels. Eur Trans Telcommun. 10(6), 585-595 (1999). doi:10.1002/ett.4460100604

5. GJ Foschini, M Gans, On limits of wireless communications in a fading environment when using multiple antennas. Wirl Pers Commun. 6(3), 31 1-335 (1998). doi:10.1023/A:1008889222784

6. S ten Brink, Convergence of iterative decoding. Electron Lett. 35(10), 806-808 (1999). doi:10.1049/el:19990555

7. C Berrou, A Glavieux, Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun. 44(10), 1261-1271 (1996). doi:10.1109/ 26.539767

8. M Tuchler, AC Singer, R Koetter, Minimum mean squared error equalization using a priori information. IEEE Trans Acous Speech Signal Process. 50(3), 673-683 (2002)

9. D Zuyderhoff, X Wautelet, A Dejonghe, L Vandendorpe, MMSE turbo receiver for space-frequency bit-interleaved coded OFDM,", in Proc IEEE Vehicular Tech Conf, Orlando, FL, pp. 567-571 (Oct 2003)

10. E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans Inf Theory. 45(5), 1639-1642 (1999). doi:10.1109/18.771234

11. K Wong, C Tsui, RSK Cheng, M Mow, A VLSI architecture of a k-best lattice decoding algorithm for MIMO channels, in Proc IEEE Int Symp on Circuits and Systems, Scottsdale, AZ, pp. 273-276 (May 2002)

12. S Baro, J Hagenauer, M Witzke, Iterative detection of MIMO transmission using a list-sequential (LISS) detector, in Proc IEEE Int Conf Communications, Anchorage, AK, pp. 2653-2657 (May 2003)

13. Z Guo, P Nisson, Algorithm and implementation of the k-best sphere decoding for MIMO detection. IEEE J Sel Areas Commun. 24(3), 491-503 (2006)

14. P Radosavljevic, JR Cavallaro, Soft sphere detection with bounded search for high-throughput MIMO receivers, in Proc Asilomar Conf Signals, Systems, and Computers, Pacific Grove, CA, pp. 1 1 75-1 179 (Oct 2006)

15. A Tomasoni, M Ferrari, D Gatti, F Osnato, S Bellini, A low complexity turbo mmse receiver for w-lan mimo systems, in Proc IEEE Int Conf Communications, Istanbul, Turkey, pp. 4119-4124 (June 2006)

16. BM Hochwald, S ten Brink, Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun. 51(3), 389-399 (2003). doi:10.1109/ TCOMM.2003.809789

17. Q Li, Z Wang, Improved K-best sphere decoding algorithms for MIMO systems, in Proc IEEE Int Symp on Circuits and Systems, Kos, Greece, pp. 1159-1162 (May 2006)

18. LG Barbero, T Ratnarajah, C Cowan, A low-complexity soft-MIMO detector based on the fixed-complexity sphere decoder, in Proc IEEE Int Conf On Acoustics, Speech and Signal Processing, Las Vegas, NE, pp. 2669-2672 (Mar 2008)

19. Y Wu, YT Liu, Y Liao, H Chang, Early-pruned k-best sphere decoding algorithm based on radius constraints, in Proc IEEE Int Conf Communications, Beijing, China, pp. 4496-4500 (May 2008)

20. L Wang, L Xu, S Chen, L Hanzo, Generic iterative search-centre-shifting k-best sphere detection for rank-deficient SDM-OFDM systems. Electron Lett. 44(8), 552-553 (2008). doi:10.1049/el:20083279

21. M Siti, MP Fitz, Layered orthogonal lattice detector for two transmit antenna communications, in Proc Allerton Conference On Communication, Control, And Computing, Monticello, IL, pp. 287-296 (Sept 2005)

22. M Siti, MP Fitz, A novel soft-ouput layered orthogonal lattice detector for multiple antenna communications, in Proc IEEE Int Conf Communications, Istanbul, Turkey, pp. 1686-1691 (June 2006)

23. A Tomasoni, M Siti, M Ferrari, S Bellini, Low complexity, quasi-optimal MIMO detectors for iterative receivers. IEEE Trans Wirl Commun. 9(10), 3166-3177 (2010)

24. CJ Ahn, Parallel detection algorithm using multiple QR decompositions with permuted channel matrix for SDM/OFDM. IEEE Trans Veh Technol. 57(4), 2578-2582 (2008)

25. J Hagenauer, P Hoeher, A Viterbi algorithm with soft-decision outputs and its applications, in Proc IEEE Global Telecommunications Conf, Dallas, TX, pp. 1680-1686 (Nov 1989)

26. L Bahl, J Cocke, F Jelinek, J Raviv, Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory. 20(2), 284-287 (1974)

27. S ten Brink, G Kramer, A Ashikhmin, Design of low-density parity-check codes for modulation and detection. IEEE Trans Commun. 52(4), 670-678 (2004). doi:10.1109/TCOMM.2004.826370

28. MS Baek, Y You, HK Song, Combined QRD-M and DFE detection technique for simple and efficient signal detection in MIMO-OFDM systems. IEEE Trans Wirel Commun. 8(4), 1632-1638 (2009)

29. E Agrell, T Eriksson, A Vardy, K Zeger, Closest point search in lattices. IEEE Trans Inf Theory. 48(8), 2201-2214 (2002). doi:10.1109/TIT.2002.800499

30. F Tosato, P Bisaglia, Simplified soft-output demapper for binary interleaved COFDM with application to HIPERLAN/2, in Proc IEEE Int Conf Communications, New York City, NY, pp. 664-668 (Apr 2002)

31. T Cupaiuolo, M Siti, A Tomasoni, Low-complexity and high throughput VLSI architecture of soft-output ML MIMO detector, in Proc Design, Automation & Test in Europe, Dresden, Germany, pp. 1396-1401 (Mar 2010)

32. A Tomasoni, E Gallizio, S Bellini, Low complexity and low latency training assisted channel estimation for MIMO-OFDM systems, in Proc IEEE Personal Indoor and Mobile Radio Conf, Athens, Greece, pp. 1 -5 (2007)

33. YG Li, N Seshadri, S Ariyavisitakul, Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE J Sel Areas Commun. 17(3), 461-471 (1999). doi:10.1109/49.75373'

34. M Tüchler, Convergence prediction for iterative decoding of threefold concatenated systems, in Proc IEEE Global Telecommunications Conf, Taipei, Taiwan, pp. 1358-1362 (Nov 2002)

35. G Lechner, J Sayir, I Land, Optimization of LDPC codes for receiver frontends, in Proc IEEE Int Symp Inform Theory, Seattle, WA, pp. 2388-2392 (July 2006)

36. S ten Brink, Convergence behavior of iteratively decoded parallel concatenated codes. IEEE Trans Commun. 49(10), 1727-1737 (2001). doi:10.1109/26.957394

doi:10.1186/1687-1499-2012-62

Cite this article as: Tomasoni et al.: Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers. EURASIP Journal on Wireless Communications and Networking 2012 2012:62.

Submit your manuscript to a SpringerOpen journal and benefit from:

7 Convenient online submission 7 Rigorous peer review 7 Immediate publication on acceptance 7 Open access: articles freely available online 7 High visibility within the field 7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com