Hindawi Publishing Corporation

EURASIP Journal on Wireless Communications and Networking Volume 2008, Article ID 824673, 14 pages doi:10.1155/2008/824673

Research Article

Differentially Encoded LDPC Codes—Part I: Special Case of Product Accumulate Codes

Jing Li (Tiffany)

Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA Correspondence should be addressed to Jing Li (Tiffany), jingli@ece.lehigh.edu Received 19 November 2007; Accepted 6 March 2008 Recommended by Yonghui Li

Part I of a two-part series investigates product accumulate codes, a special class of differentially-encoded low density parity check (DE-LDPC) codes with high performance and low complexity, on flat Rayleigh fading channels. In the coherent detection case, Divsalar's simple bounds and iterative thresholds using density evolution are computed to quantify the code performance at finite and infinite lengths, respectively. In the noncoherent detection case, a simple iterative differential detection and decoding (IDDD) receiver is proposed and shown to be robust for different Doppler shifts. Extrinsic information transfer (EXIT) charts reveal that, with pilot symbol assisted differential detection, the widespread practice of inserting pilot symbols to terminate the trellis actually incurs a loss in capacity, and a more efficient way is to separate pilots from the trellis. Through analysis and simulations, it is shown that PA codes perform very well with both coherent and noncoherent detections. The more general case of DE-LDPC codes, where the LDPC part may take arbitrary degree profiles, is studied in Part II Li 2008.

Copyright © 2008 Jing Li (Tiffany). This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION

The discovery of turbo codes and the rediscovery of low-density parity-check (LDPC) codes have renewed the research frontier of capacity-achieving codes [1, 2]. They also revolutionized the coding theory by establishing a new soft-iterative paradigm, where long powerful codes are constructed from short simple codes and decoded through iterative message exchange and successive refinement between component decoders. Compared to turbo codes, LDPC codes boast a lower complexity in decoding, a richer variety in code construction, and not being patented.

One important application of LDPC codes is wireless communications, where sender and receiver communicate through, for example, a no-line-of-sight land-mobile channel that is characterized by the Rayleigh fading model.It is well-recognized that LDPC codes perform remarkably well on Rayleigh fading channels, that is, assuming the carrier phase is perfectly synchronized and coherent detection is performed; but what if otherwise?

It should be noted that, due to practical issues like complexity, acquisition time, sensitivity to tracking errors,

and phase ambiguity, coherent detection may become expensive or infeasible in some cases. In the context of noncoherent detection, the technique of differential encoding becomes immediately relevant. Differential encoding admits simple noncoherent differential detection which solves phase ambiguity and requires only frequency synchronization (often more readily available than phase synchronization). Viewed from the coding perspective, performing differential encoding is essentially concatenating the original code with an accumulator, or, a recursive convolutional code in the form of 1/(1 + D).

In this series oftwo-part papers, we investigate the theory and practice of LDPC codes with differential encoding. We start with a special class of differentially encoded LDPC (DE-LDPC) codes, namely, product accumulate (PA) codes (Part I), and then we move to the general case where an arbitrary (random) LDPC code is concatenated with an accumulator (Part II) [3].

Product accumulate codes, proposed in [4] and depicted in Figure 1, are a class of serially concatenated codes, where the inner code is a differential encoder, and the outer code is a parallel concatenation of two branches of single-parity

Outer code

Outer code

Inner code

Inner code

■ Check y: observation from the channel

• Bit x: input bit to 1/(1 + D),

output bit from TPC/SPC

Figure 1: PA codes (a), code structure (b). Graph representation.

check (SPC) codes or a structured LDPC code comprising degree-1 and degree-2 variable nodes. Since the accumulator can also be described using a sparse bipartite graph, a PA code is, overall, an LDPC code. Alternatively, it may also be regarded as a differentially-encoded LDPC code, to emphasize the impact of the inner differential encoder.The reasons to study PA codes are multifold. First, PA codes exhibit an interesting threshold property and remarkable performance,and are well established as a class of "good" codes with rates > 1/2 and performance within a few tenths of adB from the Shannon limit [4]. Here, "good" is in the sense defined by MacKay [2]. Second, PA codes are desirable for their simplicity. They are simple to describe, simple to encode and decode, and simple enough to allow rigorous theoretical analysis [4]. Comparatively, a random LDPC code can be expensive to describe and expensive to implement in VLSI (due to the difficulty of routing and wiring). Finally, PA codes are intrinsically differentially encoded, which naturally permits noncoherent differential detection without needing additional components.

The primary interest is the noncoherent detection case, but for completeness of investigation and for comparison, we also include the case of coherent detection. Under the assumption that phase information is known, we compute Divsalar's simple bounds to benchmark the performance of PA codes at finite code lengths [5], and we evaluate iterative thresholds using density evolution (DE) to benchmark the performance of PA codes at infinite code lengths. The asymptotic thresholds reveal that PA codes are about from

0.6 to 0.7 dB better than regular LDPC codes, but 0.5 dB worse than optimal irregular LDPC codes (whose maximal left degree is 50) on Rayleigh fading channels with coherent detection. Simulations of fairly long block lengths show a good agreement with the analytical results.

When phase information is unavailable, the decoder/ detector will either proceed without phase information (completely blind), or entails some (coarse) estimation and compensation in the decoding process. We regard either case as noncoherent detection. The presence of a differential encoder in the code structure readily lands PA codes to noncoherent differential detection. Conventional differential detection (CDD) operates on two symbol intervals and recovers the information by subtracting the phase of the previous signal sample from the current signal sample. It is cheap to implement, but suffers as much as from 4 to 5 dB in bit error rate (BER) performance [6]. Closing the gap between CDD and differentially encoded coherent detection generally requires the extension of the observation window beyond two symbol intervals.The result is multisymbol differential detection (MSDD), exemplified by maximum-likelihood (ML) multisymbol detection, trellis-based multisymbol detection with per-survivor processing, and their variations [7, 8]. MSDD performs significantly better than CDD, at the cost of a considerably higher complexity which increases exponentially with the window size. To preserve the simplicity of PA codes, here we propose an efficient iterative differential detection and decoding (IDDD) receiver which is robust against various Doppler spreads and can perform, for example, within 1 dB from coherent detection on fast fading channels.

We investigate the impact of pilot spacing and filter lengths, and we show that the proposed PA IDDD receiver requires very moderate number of pilot symbols, compared to, for example, turbo codes [6]. It is quite expected that the percentage of pilots directly affects the performance especially on very fast fading channels, but much less expected is that how these pilot symbols are inserted also makes a huge difference. Through extrinsic information transfer (EXIT) analysis [9], we show that the widespread practice of inserting pilot symbols to periodically terminate the trellis of the differential encoder inevitably [6, 7] incurs a loss in code capacity. We attribute this to what we call the "trellis segmentation" effect, namely, error events are made much shorter in the periodically terminated trellis than otherwise. We propose that pilot symbols be separated from the trellis structure, and simulation confirms the efficiency of the new method.

From analysis and simulation, it is fair to say that PA codes perform well both with coherent and noncoherent detection. In Part II of this series of papers, we will show that conventional LDPC codes, such as regular LDPC codes with uniform column weight of 3 and optimized irregular ones reported in literature, actually perform poorly with noncoherent differential detection. We will discuss why, how, and how much we can change the situation.

The rest of the paper is organized as follows. Section 2 introduces PA codes and the channel model. Section 3 analyzes the coherently detected PA codes on fading channels

using Divsalar's simple bounds and iterative thresholds. Section 4 discusses noncoherent detection and decoding of PA codes and performs EXIT analysis. Finally, Section 5 summarizes the paper.

2. PA CODES AND CHANNEL MODEL

2.1. Channel model

We consider binary phase shift-keying (BPSK) signaling (0 — + 1,1 — -1) over flat Rayleigh fading channels. Assuming proper sampling of the outputs from the matched filter, the received discrete-time baseband signal can be modeled as rk = akej6ksk + nk, where Sk is the BPSK-modulated signal, nk is the i.i.d. complex AWGN with zero mean and variance a2 = N0/2 in each dimension. The fading amplitude ak is modeled as a normalized Rayleigh random variable with E[al] = 1 and pdf pA(ak) = 2ak exp(-a2) for ak > 0, and the fading phase 9k is uniformly distributed over [0,2n].

For fully interleaved channels, ak's and 9k's are independent for different time indexes k. For insufficiently interleaved channels, they are correlated. We use the Jakes' isotropic scattering land mobile Rayleigh channel model to describe the correlated Rayleigh process which has autocorrelation Rk = (1/2) J0(2knfdTS), where fdTS is the normalized Doppler spread, and J0( ■) is the 0th order Bessel function of the first kind.

Throughout the paper, 9k is assumed known perfectly to the receiver/decoder in the coherent detection case, and unknown (and needs to be worked around) in the noncoherent detection case. Further, the receiver is said to have channel state information (CSI) if ak known (irrespective of 9k), and no CSI otherwise.

2.2. PA codes and decoding analysis

A product accumulate code, as illustrated in Figure 1(a), consists of an accumulator (or a differential encoder) as the inner code, and a parallel concatenation of 2 branches of single-parity check codes as the outer code. PA codes are decoded through a soft-iterative process where soft extrinsic information is exchanged between component decoders conforming to the turbo principle. The outer code, modeled as a structured LDPC code, is decoded using the message-passing algorithm. The inner code, taking the convolutional form of 1/(1 + D), may be decoded either using the trellis-based BCJR algorithm, or a graph-based message-passing algorithm. The latter, thanks to the cycle-free code graph of 1/(1 + D), performs as optimally as the BCJR algorithm, but consumes several times less of complexity [4, 10]. Thus, the entire code can be efficiently decoded through a unified message-passing algorithm, driven by the initial log-likelihood ratio (LLR) values extracted from the channel [4]. For Rayleigh fading channels with perfect CSI, that is, ak is known Vk, the initial channel-LLRs are computed using

and for Rayleigh fading channels without CSI,

LTHsk) =

4E[ak ]. No

where E[a] = -Jn/2 is the mean of a. Due to the space limitation, we omit the details of the overall message-passing algorithm, but refer readers to [4].

3. COHERENT DETECTION

This section investigates the coherent detection case on Rayleigh fading channels. We employ Divsalar's simple bounds and the iterative threshold to analyze the ensemble average performance of PA codes, and simulate individual PA codes at short and long lengths.

3.1. Simple bounds

Union bounds are simple to compute, but are rather loose at low SNRs. Divsalar's simple bound is possibly one of the best closed-form bounds [5]. Like many other tight bounds, the simple bound is based on the second Gallager's bounding techniques [1]. By using numerical integration instead of a Chernoff bound and by reducing the number of codewords to be included in the bound, Divsalar was able to tighten the bound to overcome the cutoff rate limitation. Since the simple bound requires the knowledge of the distance spectrum, a hard-to-attain property especially for concatenated codes, it has not seen wide application. Here, the simplicity of PA codes permits an accurate computation of the ensemble-average distance spectrum (whose details can be found in [4]), and thus enables the exploitation of the simple bound.

The technique of the simple bound allows for the computation of either a maximum likelihood (ML) threshold in the asymptotic sense [4, 5], or a performance upper bound with respect to a given finite length. Divsalar derived the general form of the simple bound on independent Rayleigh fading channels with perfect CSI. Following a similar line of reasoning, below we extend it to the case of non-CSI.

3.1.1. Gallager's second bounding technique

Gallager's second bounding technique sets the base for many tight bounds including the simple bounds [1]. It states that

Pr (error) < Pr (error, r e R) + Pr (r e R),

LChSI(sk) = Nkrk,

where r = y as + n is the received codeword (N-dimensional noise-corrupted vector), s is the transmitted codeword vector, n is the noise vector whose components are i.i.d. Gaussian random variables with zero mean and unit variance, y is the known constant (in modulation), a is the N X N matrix containing fading coefficients (a is an identity matrix for AWGN channels),and R denotes a region in the observed space around the transmitted codeword. To get a tight bound, optimization and integration are usually needed to determine a meaningful R.

3.1.2. Divsalar's simple bound for independent rayleigh fading channels with CSI

For Rayleigh fading channels, the decision metric is based on the minimization of the norm ||r - yas||, where s, r, and a are the transmitted signal, received signal, and the fading amplitude in vector form, respectively, and y is the amplitude of the transmitted signal such that y2/2 = Es/N0.

For a good approximation of the error using (3),and for computational simplicity, the decision region R was chosen as an N-dimensional hypersphere centered at nyas and with radius VNR, where n and R are the parameters to be optimized [5].

When perfect CSI is available, the effect of fading can be compensated through a linear transformation on y as. In particular, a rotation ejf and a rescaling Z have shown to yield a good and analytically feasible solution [5]

R = {r I ||r - Ze^yasW < NR2},

which leads to the upper bound of the error probability of an (N, K, R) code [5]

2V N-K+1

mi^ e-NE(c,S,p,ß,K,<), eNyN(S)

L sin26 + C_

E(c, S,p,ß, k, <)

= -pyn (S) + 2log ß + 17P ^^

+ pSlog ( 1 + c(1 - 2k<))

+ P(1 - S)log

(1 - P)log

1 + c 1 - 2k< -

(1 - k2)p ß

1-p(1-2K<) (1 - p(1 - k))2

(1-p)(1-ß)

f= El = REil

2 No No,

A h S = N '

Y(S) = Yn(N)

log (Ah),

ü^gi^ WAw,h ),

for word error rate, for bit error rate.

rescaling each coordinate of r so as to compensate for the effect of fading

R = {r | ||a-1r - nysH2 < NR2},

where n and R are optimized. For independent Rayleigh channels without CSI, since accurate information on a is unavailable,we resort to the expectation of the fading coefficient a-1 « E[a-1] = (1/0.8862)1 in (10), where I is an identity matrix. By replicating the computations described in [5], we obtain the upper bound of the bit error rate for independent Rayleigh channels without CSI:

2V N-K +1

P(e) < X miJ e-NE(c,S,p),exp(^

( hvNyN(S)

-J1+2/V + 1

E(c, S, p) = — 1_log( 1 - p + pe2YN (S))

+ c(1 + 1-S (1 + 1-p e-2YN S

p= ß=

1 + 1_ß e2YN(S)

2c(1 - e-

2yn (s)y

1 S 2 1/2

[(1 + c)2 - ^ - (1+ c)

c = E2 [a] y— = 0.88622REb, 2 N0

v = ^(y2/2)2 - 1 = V(REb/N0? - 1,

and S and yN(S) are the same as in (8) and (9).

Please note that the aforediscussed extension to the fading case with no CSI slightly loosens the simple bound, but it preserves the computational simplicity. It is possible for a more sophisticated transformation to yield tighter bounds but not necessarily a feasible analytical expression.

Figure 2 plots the simulated BER performance and the simple bound of a (1024,512) PA code on independent Rayleigh fading channels with and without CSI. Since an optimal ML decoder is assumed, and the ensemble average distance spectrum is used, in the computation, the simple bound represents the best ensemble average performance, and may not accurately reflect the individual PA code being simulated. Nevertheless, we see that the bound is fairly tight. It provides a useful indication of the code performance at SNRs below the cut-off rate, and, at high SNRs, it joins with the union bound to predict the error floor.

3.1.3. Extension of the simple bound to the case of No CSI

Another simple and reasonable choice of the decision region is an ellipsoid centered at nys, which can be obtained by

3.2. Threshold computation via the iterative analysis

The ML performance bound evaluated in the previous subsection factors in the finite length of a PA code ensemble,

as ï Я

K = 512, R = 0.5 PA, independent fading

2 4 6 8 10 12 14 16

Eb/No (dB)

—©- Simu., CSI ---Divsalar bound, CSI

-0- Simu., no CSI ----Divsalar bound, no CSI

Figure 2: Divsalar simple bounds for R = 0.5 PA codes.

but the assumption of an ML decoder may be optimistic. Below we account for the iterative nature of the practical decoder and compute an asymptotic iterative threshold using the renowned method of density evolution [11].

A useful tool for analyzing the iterative decoding process of sparse-graph codes, density evolution examines the probability density function (pdf) of exchanging messages in each step and can,literally speaking, track the entire decoding process. In general, we are more interested in the asymptotic SNR thresholds, q, which are defined as the critical channel condition that isrequired for the decoding process to converge unanimously to the correct decision:

in {iimilf(Z) dZ = 4 (13)

where y = ±1 is the BPSK modulated signals, and f( denotes the pdf of LLR information on y after the Ith decoding iteration.

Tracking the density of the messages requires the computation of the initial pdf of the LLR messages from the channel,and the transformation of the message pdf's in each step of the decoding process. Although Gaussian approximation is reported to incur only very little inaccuracy on AWGN channels [ 12, 13], the deviation is larger on fading channels, since the pdf of the initial LLRs from a fading channel looks different from a Gaussian distribution. Hence, exact density evolution is used to preserve accuracy.

3.2.1. Initial LLR pdf from the channel

Hou et al. showed in [14] that the pdf of the LLRs from independent Rayleigh channelswith perfect CSI is given

by (assuming BPSK signaling and the all-zero sequence is transmitted)

4a2 8a2 \

fS(f ) = lo n( NbNr) ■ ^

C(vNTT -1)

x]^^™^-^0*1) da. Using integrals from [15], we further simplify (14) to

/lch,y(Z )

4VrTN0

■ exp

z -\Z Ivr+NÔ

For the case when CSI is not available to the receiver, we assume that the Rayleigh-faded and AWGN-corrupted signals follow a Gaussian distribution in the most probable region. The pdf of the initial messages is then derived as

fTco = ^-§)), »6)

where A = sJN0/2(N0 + 1), k = exp(-A2(2/2n), and Q(x) = (W2n )J ;e-z2/2dz.

3.2.2. Evolution of LLR pdf in the decoder

To track the evolution of the pdf's along the iterative process can either employ Monte Carlo simulation, or, more accurately and more efficiently, to proceed analytically through discretized density evolution. The latter is possible due to the simplicity in the code structure and in the decoding algorithm of PA codes. As a selfcontained discussion, we summarize the major steps of the discretized density evolution of PA codes in the Appendix, but for details, please refer to [4].

Using (15) for perfect CSI case or (16) for no CSI case (i.e., substituting them in (A.4) and (A.5) in the Appendix), the thresholds of PA codes on Rayleigh channels can be computed through (A.3) to (A.12) in the Appendix. The computed thresholds are a good indication of the performance limit as the code length and the number of iterations increase without bound.

Figure 3 plots the thresholds as well as the simulation results of PA codes on independent Rayleigh channels with and without CSI. We see that the analytical results are consistent with the simulation results for fairly large block sizes. Here, simulations are evaluated after the 50th iteration. As the block size and the number of iterations continue to increase, we expect the actual performance to converge to the thresholds.

Table 1 compares the thresholds of PA codes with those of LDPC codes for several code rates. The ergodic capacity of the independent Rayleigh fading channel is also listed as reference. We see that the thresholds of PA codes are about 0.6 dB from the channel capacity, and simulations of fairly

Table 1: Thresholds (Eb/N0 in dB) of PA codes on Rayleigh channels ((3, p) LDPC data by courtesy of Hou et al. [14]).

Flat Rayleigh CSI Flat Rayleigh no CSI

Rate_Capacity (dB)_PA (dB)_LDPC (dB)_Capacity (dB)_PA (dB)_LDPC (dB)

0.5 1.8 2.42 3.06 2.6 3.33 4.06

0.6 3.0 3.56 — 3.8 4.48

2/3 3.7 4.34 4.72 4.4 5.15 5.74

Simulations and thresholds of PA codes

g 10"3

1/2, nc

R = 2/3,

1.5 2 2.5 3 3.5 4 4.5 5 5.5 Eb/N0 (dB)

Figure 3: Thresholds computed using density evolution and simulations (data block size K = 64 K).

large block sizes are about 0.3-0.4 dB from the thresholds. Compared to the thresholds of LDPC codes reported in [14], rate 1/2PA codes are from about 0.6-0.7dB better (asymptotically) than (3,6)-regular LDPC codes, but are about 0.5 dB worse (asymptotically) than irregular LDPC codes. It should be noted that these irregular LDPC codes are specifically optimized for Rayleigh fading channels and have maximum variable node degree of 50. It is fair to say that PA codes perform on par with LDPC codes (using coherent detection).

3.3. Simulation with coherent detection

To benchmark the performance of coherently detected PA codes, several PA configurations are simulated on correlated and independent Rayleigh fading channels. In each global iteration (i.e., iteration between the inner decoder and the outer decoder), two local iterations of the outer decoding are performed. This scheduling is found to strike the best tradeoff between complexity and performance (with coherent detection).

3.3.1. Coherent BPSKon independent rayleigh channels

Figure 4 shows the performances of rate 1/2 PA codes on independent Rayleigh fading channels with and without channel state information, respectively. Bit error rates after 20, 30, and 50 (global) iterations are plotted, and data

block sizes from short to large (512, 1 K, 4K, and 64 K) are evaluated to demonstrate the interleaving gain. For comparison purpose, the corresponding channel capacities are also shown. The simulated performance degradation due to the lack of CSI is about 0.9 dB, which is consistent with the gap between the respective channel capacities.

Compared to the (3, 6)-regular LDPC codes reported in [14],the performance of this rate 1/2, codeword length N = 128 X 1024 = 1.3 X 105 PA code is about 0.4 and 0.25 dB better than regular LDPC codes oflength N = 105 and 106 on independent Rayleigh channels. It is possible that optimized irregular LDPC codes will outperform PA codes (as indicated by their thresholds), but for regular codes, PA codes seem one of th best.

3.3.2. Coherent BPSKon correlated rayleigh channels

Figure 5 shows the performance of PA codes on correlated fading channels. Perfect CSI is assumed available to the receiver, and an interleaver exists between the PA code and the channel (to partially break up the correlation between the neighboring bits). Short PA codes with rate 1/2 and 3/4 are simulated on two common fading scenarios with normalized Doppler spreads fdTs = 0.01 and 0.001, respectively. As expected, the performance deteriorates rapidly as fdTs decreases, since slower Doppler rate brings smaller diversity order. Due to the interleaver between the PA code and the channel, the impact of slow Doppler rate is less severe for larger block sizes than for smaller ones. Whereas K = 1K PA code loses about 7dB at BER = 10~4 as fdTs changes from 0.01 to 0.001, the loss with K = 4K PA code is less than 5 dB.

To illuminate how well short PA codes perform on correlated channels, we compare them with turbo codes (which are the best-known codes at short code lengths) in Figure 5. The comparing turbo code has 16-state component convolutional codes whose generator polynomial is (1,35/23)oct and which are decoded using log-domain BCJR algorithm. Code rate is 075, data block size is 4K, and S-random interleavers are used in both codes to lower the possible error floors. Curves plotted are for PA codes at the 10th iteration and turbo codes at the 6th iteration. We observe that turbo codes perform about 0.6 and 0.7 dB better than PA codes for fdTs = 0.001 and 0.01, respectively. However, it should be noted that this performance gain comes at a price of a considerably higher complexity. While the message-passing decoding of a rate-0.75 PA code at the 10th iteration requires about 267 operations per data bit [4], the log-domain BCJR decoding of a rate-0.75 turbo code at the 6th iteration requires as many as 9720 operations per data

W 10" «

R = 0.5 PA, independent fading, CSI

50 iterations

\ \\\ \\ \\\K = 512 :

Shannon limit

\\ K = 1K\\

K = 64K K = 4K

3 3.5 Eb/N0 (dB) (a)

W 10"3 B

R = 0.5 PA, independent fading, no CSI

50 iterations

K = 512

Shannon limit l 1

W \ K = 1A

K = 64K 1 K= 4K

4 4.5 Eb/N0 (dB) (b)

Figure 4: Performance of PA codes on independent Rayleigh fading channels. Code rate 0.5, data block size 512, 1 K, 4K, 64 K. (a) With CSI; (b) without CSI.

S 10"3

Correlated fading, K = 4K, fdTs = 0.01,0.001

Eb/N0 (dB)

-I- R = 1/2, fdTs = 0.01 -■- Turbo,R = 3/4, fdTs = 0.01 -èr- R = 1/2, fdTs = 0.001-0- R = 3/4, fdTs = 0.001 -eh- R = 3/4, fdTs = 0.01 -♦ - Turbo, R = 3/4, fdTs = 0.001

Figure 5: Performance of PA codes on correlated Rayleigh fading channels with CSI. Data block length 4 K, normalized Doppler rate fdTs = 0.01,0.001, rate of PA codes 0.5 and 0.75, rate of turbo codes 0.75, component codes ofthe turbo code (1,35/23)oct, 10 iterations for PA codes, and 6 iterations for turbo codes.

bit, a complexity 35 times larger. Hence, PA codes are still attractive for providing good performance at low lost.

4. NONCOHERENT DETECTION OF PA CODES

This section considers noncoherent detection. The channel model of interest is a Rayleigh fading channel with correlated fading coefficients.

4.1. Iterative differential detection and decoding

PA codes are inherently differentially encoded which makes it convenient for noncoherent differential detection. Although multiple symbol differential detection is possible, for complexity concerns, we consider a simple iterative differential detection and decoding receiver, whose structure is shown in Figure 6.The IDDD receiver consists of a conventional differential detector with 2-symbol observation window (the current and the previous), a phase tracking filter and the original PA decoder (that used in coherent detection [4]). Trellis structure is employed to assist the detection and decoding of the inner differential code 1/(1 + D), but unlike the case of multiple symbol detection, the trellis is not expanded and has 2 states only. Soft information is passed back and forth among different parts of the receiver conforming to the turbo principle. Let x denote the input to the inner differential encoder or the output from the outer code, and let y denote the output from the differential encoder or the symbol to be put on the channel (see Figure 6). The differential encoder implements yk = Xkyk-i for Xk,yk e {±1} (BPSK signal mapping 0 +1,1 -1). The channel reception is given by rk = akej9kyk + nk, where the channel amplitudes (ak's) and phases (9k's) are correlated, and the complex white Gaussian noise samples (nk's) are independent.

In theory, differential decoding does not require pilot symbols. In practice, however, pilot symbols are inserted periodically even with multiple symbol detection, to avoid catastrophic error propagation in differential decoding. This is particularly so for the fast fading case where phases (9k) are changing rapidly (will show later). Hence, some of the rk's (and yk's) in the received sequence are pilot symbols.

We use L to denote the LLR information,superscript (q) to denote the qth (global) iteration, and subscript i, o, ch, and e to denote the quantities associated with the inner

Channel

Iterative differential detector and decoder Figure 6: Structure of iterative differential detection and decoding receiver.

code, the outer code, the fading channel, and "the extrinsic", respectively.

where Q(a, b) is the Marcum Q-function. It is then possible to get the true pdf of Ukusing

4.1.1. IDDD receiver

Here is a sketch of how the proposed IDDD receiver operates. In the first iteration, the switch in Figure 6 is flipped up. The samples of the received symbols, rk, are fed into the conventional differential detector which computes Uk = Real(rkrJ*_1) and subsequently soft LLR Lch(xk) from Uk. Here * denotes the complex conjugate. Lch(xk) is then treated as L<e1J(xk) and fed into the outer decoder, which, in return, generates L(1}(xk) and passes it to the inner decoder for use in the next detection/decoding iteration. Starting from the second iteration, the switch in Figure 6 is flipped down, and channel estimation for ak and Ok is performed before the "coherent" detection and decoding of the inner and outer code. After Q iterations, a decision is made by combining the extrinsic information from both the inner and the outer decoders: Xk = sign(L(® (xk) + L(Qo(xk)). In the above discussion, we have ignored the existence of the random interleaver, but it is understood that proper interleaving and de-interleaving is performed whenever needed.

4.1.2. Conventional differential detector for the first decoding iteration

With the assumption that the carrier phases are near constant between two neighboring symbols, the conventional

differential detector (in the first iteration) performs Uk = Real (rkrj*_1). Hard decision of Xk is obtained by simply checking the sign of Uk. Computing soft information Lch(xk) from Uk requires the knowledge of the pdf of Uk. The conditional pdf of Uk given ak and xk is [16]

fu\a,X(u | a, x) -

1 ( xu - a2/2

2NÖexp( -^T"

1 ( xu-a2 /2 \ni

-o < xu < 0,

I a2 j4xu\

nö'v NW'

0 < xu < to,

fuIX(u \ x) = fu\aX(u \ a,x) fa(a) da Jo

= 2 fu\aX(u I a,x) ae-Jo

Since the computation of Marcum Q-function is slow and does not always converge at large values, an exact evaluation of (18) and hence the computation of Lch(xk) can be difficult. We propose a simple approximation which evaluates (17) with a substituted by its mean £ [a]. This leads to

fu\X(u I x)

1 ( xu - n/8

2No exp 1 No

1 ( xu - n/8

2No exp No

-o < xu < 0,

4NÖ'V NÖ"

0 < xu < 00.

The corresponding LLR from the channel can then be computed by

Lch(xk ) = log

Pr (uk | xk = +1) Pr (uk | xk = - 1)

2|uk |

sign (uk ) (n1 +log (Q (a/10, a/Ï?)

An even more convenient compromise is to assume Uk is Gaussian distributed, as is used in [17] and a few other papers. Under this Gaussian assumption, we get

fu\x(u \ x) ^ N (x,2No+ No2),

j I \ 2uk Lch(xk) '

2No+ No2'

(21) (22)

Alternatively, instead of using the conventional differential decoding in the first iteration, a channel estimation followed by the decoding of the inner 1/(1 + D) code can

Es/Nq 6dB \<- T h (M rue pdr f (u) 4onte Carlo)

\ V— f(u I a)

\\ V" Gaussian approximation

N (1,2Nq + N2)

Figure 7: Distribution of uk = Re{ykyt-1} in a conventional differential detection (assume "+1" transmitted).

be used, which makes the first iteration exactly the same as subsequent iterations. This third option then leads to pilot symbol assisted modulation (PSAM), which has slightly higher complexity than using differential detection in the first iteration.

To see how accurate the above treatments are, we plot in Figure 7 several curves approximating the pdf of Uk. From the most sharp and asymmetric to the least sharp and symmetric, these curves denote the exact pdf of fu\x(u \ x = +1) from Monte Carlo simulations (histogram, can be regarded as the numerical evaluation of (18)), the "mean-a approximated" pdf from (19) and the Gaussian approximated pdf from (21). From the figure, the Gaussian approximation does not reflect the true pdf well, but this inaccuracy turns out not severely affecting the overall IDDD performance. As shown later in Figure 13, all the three treatments (Gaussian approximation, mean-a approximation, and PSAM) result in very similar decoding performance.We attribute this to the fact that the inaccuracy affects mostly the first iteration, and subsequent iterations can help mitigate the loss. Thus, Gaussian approximation still presents itself as a simple and viable approach for noncoherent differential decoding.

4.1.3. Channel estimator

The channel estimator in the IDDD receiver (Figure 6) may be implemented in several ways. Here we use a linear filter of (2L + 1) taps to estimate ak's and 9k's in the qth iteration

~(q) Sq) ^r Aq-i)

ak e k = Z Pl yk-l rk-l,

where pi denotes the coefficient of the Ith filter tap, and yk denotes the estimate on yk from the feedback of the previous iteration. For soft feedback, yk 1)1 is computed using yk 1)1 = tanh^L^ 1)1 (yk))/2), and for hard feedback, yk 1)1 = sign (Lq yk)). The LLR message L^ 1)1 (yk) is generated toge-

ther with L% l)(xk) by the inner decoder in the (q - 1)th decoding iteration (please refer to [4] for the step-by-step message-passing decoding algorithm of 1/(1 + D) code). In the first iteration, L®(yk )'s are initiated as zeros for coded bits and a large positive number (i.e., +<x>) for pilot symbols.

Regarding the choice of the filter, we take a Wiener filter, since it is known to be optimal for estimating channel gain in the minimum mean-square-error (MMSE) sense, when the correlation of the fading process, Rks, are known [18]. The filter coefficients, p-L,p-L+1,...,pL, are obtained from the Wiener-Hopf equation

Nq R 1

Rq — Nq

1 RL -2

( R-l \

Rl-i Rl-2

Rq - nq/

Í p-L \

P-(L-i)

V pL )

where Rk = (1/2)J0(2knfdTs). Since the computation of pi's from (24) involves an inverse operation on a matrix (one-time job), it may not be computable when the matrix becomes (near) singular, which occurs when the channel is very slow fading. In such cases, a low-pass filter, or a simple "moving average" can be used [6].

4.2. Analysis of pilot insertion through EXIT charts

4.2.1. EXIT charts

We perform EXIT analysis [9] to generate further insights into PA codes and the proposed noncoherent IDDD receiver. In EXIT charts, the exchange of extrinsic information is visualized as a decoding/detection trajectory, allowing the prediction of the decoding convergence and thresholds [9]. Several quantities, like the bit error rate, the mean of the extrinsic LLR information, and the equivalent SNR value, were previously used to depict the characteristics and relations of the component decoders, but the mutual information is shown to be the most robust among all [9]. The mutual information between the binary bit yk and its corresponding LLR values is defined as

I (Y, L(Y)) 1 2

= 1 I fL(y)(n I Y = y)

2 y = ±iJ

2fL(y)(n I Y = y)

'2 fL(y)(n I Y = +1)+ fL(y)(n | Y =-1)

fL(y)(n I y = +1)

j — c

, 2fL(y)(n I Y = +1)

■ log2 r , , v _ , ^ , r „ i v _ . ^ dn

32 fL(y)(n I Y = +1)+ fL(y)( — n I Y = +1)

= 1 — fL(Y)(n I Y = +1) ■ log2(1+ e-n)dn,

Figure 8: Trellis diagram of binary differential PSK with pilot insertion. (a) Pilot symbols periodically terminate the trellis. (b) Pilot symbols are separated from the trellis structure.

where L(Y) is either the a priori information La(Y) or the extrinsic information Le(Y), and fL(Y)(n I Y = y) is the conditional pdf. The second equality holds when the channel is output symmetric such that fL(y)(n I Y = - y) = fL(y)(~n I Y = y), and the third equality holds when the received messages satisfy the consistency condition (also known as the symmetry condition): fL(y)(n I Y = y) = fL(y)(-n I Y = y)eyn [11]. Note that the consistency condition is an invariant in the message-passing process on a number of channels including the AWGN channel and the independent Rayleigh fading channel with perfect CSI; but it is not preserved on fading channels without CSI or with estimated (thus imperfect) CSI, since the initial density function evaluated in the latter cases is but an approximation of the actual pdf of the LLR messages. Thus, (25) should be used to compute the mutual information in those cases. We use the X-axis to represent the mutual information to the inner code (a prior) or from the outer code (extrinsic), denoted as Ia,j/Ie,o, and the Y-axis to represent the mutual information from the inner code or to the outer code, denoted as Ie/Ia,o.

Effect of pilots segmenting the trellis, Es/No = 4.75, 0.5 dB

Es/No = 4.75 dB, /j^

0,4,10,20% pilots ^¿^rfjl

= 0.5 dB, ji y

y" 0,4,10,20% pilots^V^

^ " ' Out code of PA codes ^^

R = 0.75

/ / / / ■ / ■ .

s' Outer code of PA codes

1 R = 0.5

0.4 0.6

Figure 9: The effect of pilot symbols segmenting the trellis on the performance of the differential decoder. Normalized Doppler rate fdTs = 0.01, Es/N0 = 4.75 dB and 0 dB, perfect CSI.

4.2.2. Pilot symbol insertion

A practicality issue about noncoherent detection is pilot insertion. The number of pilot symbols inserted should be sufficient to attain a reasonable track of the channel, but not in excess. Many researchers have reported that excessive pilot symbols not only cause wasteful bandwidth expansion, but actually degrade the overall performance, since the energy compensation for the rate loss due to excessive pilot more than outweighs the gain that can be obtained by a finer channel tracking. This trade-off issue has long been noted in literature, but little attention has been given to another issue of no less importance, namely, how pilots should be inserted when differential encoding or other trellis-based coding/modulation front-end is used.

There exist at least two ways to insert pilot symbols in a differential encoder. The widespread approach is to periodically terminate the trellis [6, 7], as shown in Figure 8(a), such that pilot symbols are used to estimate the channel and at the same time participate in the trellis decoding. Seemingly plausible, this turns out to be a bad strategy, since segmenting the trellis into small chunks significantly increases the number of short error events, and consequently incurs a loss in performance.

The negative effect of trellis segmentation is best illustrated by the EXIT chart in Figure 9. EXIT curves corre-

sponding to the differential decoder with 0%, 4%, 10%, and 20% pilot insertion are plotted for two different SNR values. To eliminate the impact of other factors, the four curves in each SNR set are given the same energy per transmitted symbol and perfect knowledge on the fading phase and amplitude is provided to all the decoders (irrespective of the number of pilot symbols). Thus the difference between the curves in each family is only due to the difference in pilot spacing. At the left end of the curves (when input mutual information is small), a larger number of pilot symbols correspond to a better performance (a higher output mutual information). This is because when little information is provided from the outer code, pilot symbols become the primary contributor to a priori information. However, the situation is completely reversed toward the right end of the EXIT curves. We see that more pilot symbols actually degrade the performance, the reason being, given sufficient information provided by the outer code, pilot symbols no longer constitute the key source of a priori information; on the other hand, they segment the trellis and shorten error events, rendering an opposite effect to spectrum thinning and thus deteriorating the performance. The performance loss is more severe when more pilot symbols are inserted and when the code is operating at a relatively low SNR level. It is worth noting, for example, with 20% of pilot insertion (pilot spacing is 5), even provided with a perfect

mutual information from the outer code (Ia,; = 1, but the channel remains noisy), the trellis decoder nevertheless fails to produce sufficient output mutual information Ie,j. As such, the inner EXIT curve is bound to intersect the outer EXIT curve at a rather early stage of the iterative process, causing the iterative decoder to fail at a high BER level (not to mention this EXIT curve has 20% more of energy consumption than the no-pilot case).

The implication of this EXIT analysis is that the widespread approach of inserting pilot symbols as part of the trellis could cause deficiency for differential encoding (and other serial concatenated schemes with inner trellis codes). Specifically, unless the outer code is itself a capacity-achieving code at some SNR, the inner and outer EXIT curves will intersect, result in convergence failure and cause error floors. We observe that the more the pilot symbols, the higher the error floor; and the lower the code rate (lower SNR), the more severe the impact. It is therefore particularly important to keep the number of pilot symbols in such schemes minimal, so that error floors do not occur too early. This analysis also suggests an alternative, and potentially better-performing, way of pilot insertion, namely, separating pilots from the trellis and thus not affecting error events; see Figure 8(b).

It should be pointed out, that the level of the impact caused by trellis segmentation may be very different for different outer codes. Many (outer) codes, including single parity check codes, block turbo codes (i.e., turbo product codes) and convolutional codes, will see a large impact, since these (outer) codes require sufficient input information in order to produce perfect output information, or, put another way, these codes alone are not "good" codes (good in the sense as MacKay defined in [2]). However, "good" codes like LDPC codes will likely see a much smaller impact. This is because an ideal LDPC code has an EXIT curve shaping like a box (e.g., see [3, Figure 3]) which can produce perfect output information as long as the input information is above some threshold (without requiring Ia,j = 1). Alternatively, one may also interpret it as: ideal LDPC codes have large minimum distances and are capable of correcting short error events including those caused by the segmentation effect.

To verify the analytical results, we simulate the performance of a rate 1/2, data block size K = 32 K PA code with different strategies of pilot insertion; see Figure 10. The normalized Doppler spread is fdTs = 0.01, and error rates evaluated after 10 decoding iterations. Solid lines represent the cases where perfect channel knowledge is known to the receiver, and dashed lines represent the case where noncoherent detection is used. Comparing solid curves, we see a drastic performance gap results from different strategies of pilot insertion. In this specific case, by segmenting the trellis every 10 symbols, trellis-segmented pilot insertion losses more than 3 dB at BER of 10~4 than otherwise.The dashed curve corresponds to the same PA code noncoherently-detected via the IDDD receiver discussed before, where 10% of pilot symbols are inserted using the strategy in Figure 8(b) and where an 81-tap wiener filter is used to estimate the channel. It is interesting to note that if one overlooks the impact of pilot insertion strategies, one might arrives

-0- 0%, ideal

-A- 10%, ideal, pilots separated —10%, ideal, pilots term trellis -V- 10%, IDDD, pilots separated

Figure 10: Performance of PA codes with different pilot insertion strategies. Normalized Doppler rate fdTs = 0.01, code rate 0.5, data block size 32 K, 0% or 10% pilot insertion, 10 iterations.

at a paradox result that noncoherent detection (dashed line) performs (noticeably) better than coherent detection (rightmost solid line)!

4.3. Impact of the pilot symbol spacing and filter length

We now investigate how the number of pilot symbols and the length of the estimation filter affect the performance of noncoherent detection. Figure 11 illustrates the impact of different pilot spacing on the BER performance of fast fading channels where the normalized Doppler spread takes fdTs = 0.05, 0.02 or 0.01. We observe the following: (1) The IDDD receiver is rather robust for different Doppler rates. (2) Smaller pilot spacing, such as <6 symbols, is undesirable, whose consumption of additional energy more than outweighs any gain it may bring. (3) The code performance at high Doppler rates is more sensitive to pilot spacing than that at lower Doppler rates. At the normalized Doppler rate of 0.01 (already fast fading), noncoherently detected PA codes tolerate pilot spacing as small as 6 symbols and as large as 45 to 50 symbols (put aside the bandwidth issue); but at very fast Doppler rate of 0.05, pilot spacing beyond 7-9 symbols will soon cause drastic performance degradation. For comparison, we also plot the case where pilot symbols periodically terminate the trellis (dashed line), which, due to trellis segmentation, experiences inferior performance when pilot spacing is small. Compared to differentially encoded turbo codes [6], PA codes appear to require fewer pilot symbols (we note that in the study of differentially encoded turbo codes in [6], the authors terminated the trellis periodically with pilot symbols, which may have made the

fdTs = 0.01, Eb/N0 = 10 dB

-*- fdTs = 0.05 -x- fdTs = 0.01

-A- fdTs = 0.02 -+- fdTs = 0.01, segment trellis

Figure 11: Effect of the number of pilot symbols on the performance of noncoherent detected PA codes on correlated Rayleigh channels with fdTs = 0.01. Code rate 0.75, data block size 1 K, filter length 65, 10 (global) iterations, 4 (local) iterations within the outer code of PA codes.

Eb/N0 (dB)

—#— IDDD-1, soft feedback, Guass approximation -V- IDDD-2, soft feedback, mean-a approximation -e- IDDD-3, soft feedback, PSAM -A- IDDD-4, hard feedback, PSAM

Figure 12: Comparison of BER performance for several noncoherent receiver strategies on correlated Rayleigh channels with fdTs = 0.01. Code rate 0.75, data block size 1 K, 4% of bandwidth expansion, filter length 65, 10 (global) iterations each with 4 (local) iterations for the outer decoding.

tolerant range of pilot spacing (at the small spacing end) smaller than otherwise).

The impact of the length of the channel tracking filter is also studied. We observe that while the filter length affects the overall performance, the impact is limited compared to pilot spacing.This is consistent with what has been reported

in other studies [6] and is not a new discovery. Hence, we omit the plot.

4.4. Simulation results of noncoherent detection

The performance of noncoherently detected PA codes on fast Rayleigh fading channels are presented below. Unless otherwise indicated, the BER curves shown are after 10 global iterations, and in each global iteration, 4 to 6 local iterations of the outer code are performed. We have chosen these parameters on the basis of a set of simulations and trading-off between performance and complexity.

4.4.1. Noncoherent detection of PA codes with different receiver strategies

We compare the BER performance of 4 types of IDDD strategies for a K = 1K,R = 3/4 PA code ona fdTs = 0.01 Rayleigh fading channel in Figure 12. "IDDD-1" uses the conventional differential detection with Gaussian approximation (22) to compute Lch(xk) in the first iteration, and soft feedback of yk in all iterations to assist channel estimation; "IDDD-2" uses conventional differential detection with "mean-a" approximation (20) in the first iteration and soft feedback in all iterations; "IDDD-3" is PSAM with soft feedback; and "IDDD-4" is PSAM with hard feedback. In all cases, 4% of pilot symbols are inserted and curves shown are after 10 iterations. Different decoding strategies in the first iteration does not affect the performance much, and the performance is not very sensitive to hard or soft feedback either. Although not shown, simulations of a long PA code (K = 48 K) of the same (high) rate (R = 3/4) reveal a similar phenomenon. It is possible, however, that other codes maybe more sensitive to the difference in decoding strategies especially the difference in the feedback information [6].

4.4.2. Comparison of noncoherent detection with coherent detection

Figure 13 shows the performance of rate 3/4 PA codes after 10 iterations on fast Rayleigh fading channels with Doppler rate Tsfd = 0.01.Short block size of 1 K and large block size of 48 K are evaluated. In each case, a family of 5 BER-versus- Eb/N0 curves, accounting for rate loss due to pilot insertion, are plotted. The three leftmost curves are the ideal coherent case with knowledge of fading amplitudes and phases provided to the receiver, and the two right curves are the noncoherent case where IDDD is used to track amplitudes and phases. In both the coherent and the noncoherent case, trellis segmentation incurs a small performance loss, but since the pilot spacing is not very small (every 25 symbols), the effect is not as drastic as the case in Figure 10. The noncoherent cases are about 1 dB and 0.55 dB away from the ideal coherent case at BER of 10~4 for block sizes of 48 K and 1 K, respectively. This satisfying performance is achieved with only 4% of pilot insertion and a very low-complexity IDDD receiver.

R = 3/4, fdTs = 0.01, 10 iterations

5 6 7 8 9 10

Eb/N0 (dB)

-a- 0%, ideal

-v- 4%, ideal, pilots separated from trellis

-e- 4%, ideal, pilots terminate trellis

- a - 4%, IDDD, pilots separated from trellis

- * - 4%, IDDD, pilots terminate trellis

Figure 13: Comparison of BER performance for several transmission/reception strategies for PA codes of large and small block sizes on correlated Rayleigh channels with fdTs = 0.01. Code rate 0.75, data block size 48 K and 1 K, 4% of bandwidth expansion, filter length 65, 10 (global) iterations each with 4 (local) iterations for the outer decoding.

APPENDIX

DISCRETIZED DENSITY EVOLUTION FOR PA CODES

Using message-passing decoding, the relevant operations on the messages (in LLR form) include the sum in the real domain and the tanh operation (also known as the check operation or ffl operation). For independent messages to add together, the resulting pdf of the sum is the discrete convolution (denoted by *) of the component pdf's which can be efficiently implemented using a fast Fourier transform (FFT). For the tanh operation on messages, define:

y = a ffl = Q(2tanh":(tanh(a/2)tanh(^/2))), where a, ft, and y are quantified messages, and Q defines the quantization operation. The pdf of y, fY, can be computed using

fy [k] = X fa[i] ■ fy[ j],

(i, j):ka=iaffl ja

where A is the quantization interval. To simplify the notation, we denote this operation (A.1) as fY = R( fa, f), and using induction on the above equation, we further denote

Rk (fa) = R(fa, (ft (fa, ... , R(fa, fa) ■■■)). (A.2)

5. CONCLUSION

Previous work has established product accumulate codes as a class of provenly "good" codes on AWGN channels, with low linear-time complexity and performances close to the Shannon limit. This paper performs a comprehensive study of product accumulate codes on Rayleigh fading channels with both coherent and noncoherent detection. Useful analytical tools including Divsalar's simple bounds, density evolution, and EXIT charts are employed, and extensive simulations are conducted. It is shown that PA codes not only perform remarkably well with coherent detection, but the embedded differential encoder makes them naturally suitable for noncoherent detection. A simple iterative differential detection and decoding (IDDD) strategy allows PA codes to perform only 1 dB away from the coherent case. Another useful finding reveals that the widespread practice of inserting pilot symbols to terminate the trellis actually incurs performance loss compared to when pilot symbols are inserted as separate parts from the trellis.

We conclude by proposing product accumulate codes as a promising low-cost candidate for wireless applications. The advantages of PA codes include (i) they perform very well with coherent and noncoherent detection (especially at high rates), (ii) the performance is comparable to turbo and LDPC codes, yet PA codes require much less decoding complexity than turbo codes and much less encoding complexity and memory than random LDPC codes, and (iii) the regular structure of PA codes makes it possible for low-cost implementation in hardware.

The following notations are also used:

(i) fuh,y: pdf of the messages of the received signals y obtained from the channel (see Figure 1(b)),

(ii) fLui: pdf of the (a prior) messages of the input x to the inner 1/(1 + D) code in the kth iteration (obtained from the outer code in the k " 1th iteration) (see Figure 1(b)),

(iii) fuX: pdf of the (extrinsic) messages passed from the inner code to the outer code in the kth iteration,

(iv) f^fo) and f^ ): pdf's of the extrinsic information computed from the upper and lower branch of the outer code in the kth iteration, respectively. Subscripts d and p denote data and parity bit, respectively. Obviously, f(0l = A°it = 5(0), the Kronecker delta function.

The discretized density evolution of a rate t/(t + 2) PA code can then be summarized as follows [4]:

initialization: f® = f™ = f^ = Cd = 5(0), (A.3)

inner code: f® = ftf"", fi,,y * fg^), (A.4)

ft = R2 (fW * j£), (A.5)

inner-to-outer: f™4 = f^ (A

fw = fu}, (A7)

outer code: f® = , R-1)(fS * fCf)), (A.8)

Alp = R (f5 * ft*)' (A.9)

Cd = f, R(t-1)(fLkd * Cd)), (A.10)

fZ = R (f(ld * fdd), (A.11)

tt f (k) + f (k) N f (k) + f (k)

. . • r(k+1) L\JLe1,d + A«,^ , fLe1,p + JLt2,p

outer-to-inner: Jl =-—-—--1--£--.

JL°x t + 2 2t + 2

(A.12)

Although the outer code of PA codes can be viewed as an LDPC code, it is desirable to take a serial update procedure as described above rather than a parallel one as in a conventional LDPC code, since this allows the checks corresponding to the two SPC branches to take turns to update, which leads to a faster convergence [4].

ACKNOWLEDGMENTS

This research work is supported in part by the National Science Foundation under Grants no. CCF-0430634 and CCF-0635199, and by the Commonwealth of Pennsylvania through the Pennsylvania Infrastructure TechnologyAlliance (PITA).

REFERENCES

Conference on Communications (ICC '95), vol. 2, pp. 10091013, Seattle, Wash, USA, June 1995.

[11] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, "Design of capacity-approaching irregular low-density parity-check codes," IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619-637, 2001.

[12] S.-Y. Chung, T. J. Richardson, and R. L. Urbanke, "Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approximation," IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 657-670, 2001.

[13] K. Xie and J. Li, "On accuracy of Gaussian assumption in iterative analysis for LDPC codes," in Proceedings of IEEE International Symposium on Information Theory (ISIT 06), pp. 2398-2402, Seattle, Wash, USA, July 2006.

[14] J. Hou, P. H. Siegel, and L. B. Milstein, "Performance analysis and code optimization of low density parity-check codes on Rayleigh fading channels," IEEE Journal on Selected Areas in Communications, vol. 19, no. 5, pp. 924-934, 2001.

[15] I. S. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series and Products, Academic Press, New York, NY, USA, 1980.

[16] G. L. Stuber, Principles of Mobile Communications, Kluwer Academic Publishers, Norwell, Mass, USA, 1996.

[17] M. K. Simon and M.-S. Alouini, Digital Communication over Fading Channels, John Wiley & Sons, New York, NY, USA, 2000.

[18] P. Hoeher, S. Kaiser, and P. Robertson, "Two-dimensional pilot-symbol-aided channel estimation by Wiener filtering," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), pp. 1845-1848, Munich, Germany, April 1997.

[1] R. G. Gallager, Low Density Parity Check Codes, MIT Press, Cambridge, Mass, USA, 1963.

[2] D. J. C. MacKay, "Good error-correcting codes based on very sparse matrices," IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 399-431, 1999.

[3] J. Li, "Differentially encoded LDPC codes—part II: general case and code optimization," to appear in EURASIP Journal on Wireless Communications and Networking.

[4] J. Li, K. R. Narayanan, and C. N. Georghiades, "Product accumulate codes: a class of codes with near-capacity performance and low decoding complexity," IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 31-46, 2004.

[5] D. Divsalar and E. Biglieri, "Upper bounds to error probabilities of coded systems beyond the cutoff rate," in Proceedings of the IEEE International Symposium on Information Theory (ISIT '00), p. 288, Sorrento, Italy, June 2000.

[6] M. C. Valenti and B. D. Woerner, "Iterative channel estimation and decoding of pilot symbol assisted turbo codes over flat-fading channels," IEEE Journal on Selected Areas in Communications, vol. 19, no. 9, pp. 1697-1705, 2001.

[7] P. Hoeher and J. Lodge, ""Turbo DPSK": iterative differential PSK demodulation and channel decoding," IEEE Transactions on Communications, vol. 47, no. 6, pp. 837-843, 1999.

[8] M. Peleg and S. Shamai, "Iterative decoding of coded and interleaved noncoherent multiple symbol detected DPSK," Electronics Letters, vol. 33, no. 12, pp. 1018-1020, 1997.

[9] S. T. Brink, "Convergence behavior of iteratively decoded parallel concatenated codes," IEEE Transactions on Communications, vol. 49, no. 10, pp. 1727-1737, 2001.

[10] P. Robertson, E. Villebrun, and P. Hoeher, "A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain," in Proceedings of the IEEE International