BIDIRECTIONAL STACK DECODING OF POLAR CODES

Introduction/purpose: The paper introduces a reduced latency stack decoding algorithm of polar codes, inspired by the bidirectional stack decoding of convolutional codes and based on the folding technique. Methods: The stack decoding algorithm (also known as stack search) that is useful for decoding tree codes, the list decoding technique introduced by Peter Elias and the folding technique for polar codes which is used to reduce the latency of the decoding algorithm. The simulation was done using the Monte Carlo procedure. Results: A new polar code decoding algorithm, suitable for parallel implementation, is developed and the simulation results are presented. Conclusions: Polar codes are a class of capacity achieving codes that have been adopted as the main coding scheme for control channels in 5G New Radio. The main decoding algorithm for polar codes is the successive cancellation decoder. This algorithm performs well at large blocklengths with a low complexity, but has very low reliability at short and medium blocklengths. Several decoding algorithms have been proposed in order to improve the error correcting performance of polar codes. The successive cancellation list decoder, in conjunction with a cyclic redundancy check, provides very ACKNOWLEDGMENT: The research described in this paper has been supported by the Serbian Ministry of Education, Science and Technological Development through Project no. 451�03-68/2020-14/200156 titled ”Innovative scientific and artistic research from the Faculty of Technical Sciences activity domain”. 405 M in ja , A . e t a l, B id ir e c ti o n a l s ta c k d e c o d in g o f p o la r c o d e s , p p .4 0 5 – 4 2 5 good error-correction performance, but at the cost of a high implementation complexity. The successive cancellation stack decoder provides similar error-correction performance at a lower complexity. Future machine-type and ultra reliable low latency communication applications require high-speed low latency decoding algorithms with good error correcting performance. In this paper, we propose a novel decoding algorithm, inspired by the bidirectional stack decoding of classical convolutional codes, with reduced latency that achieves similar performance as the classical successive cancellation list and successive cancellation stack decoding algorithms. The results are presented analytically and verified by simulation.


Introduction
Context and motivation. Polar codes, introduced in (Arikan, 2009), are the first class of codes with an explicit construction and a low complexity encoding and decoding that provably achieve the symmetric capacity of a binary input discrete memoryless channel (bi-DMC). The standard decoding algorithm for polar codes is the successive cancellation decoding (SCD). Although the SCD has good error correcting performance for large blocklengths and is an important procedure for proving the capacity achieving property of polar codes, it underperforms for short and medium blocklengths. Several new decoding algorithms are developed to correct this. The most important algorithms in this category are definitely the successive cancellation list decoding (SCLD), introduced in (Tal & Vardy, 2015) and improved in (Balatsoukas-Stimming et al., 2015) and (Hashemi et al., 2016), the successive cancellation stack decoding (SCSD) introduced in (Niu, K. & Chen, K., 2012a) and improved in (Xiang et al., 2019), , (Xiang et al., 2020) and the belief propagation decoding (BPD) (Arikan, 2008), (Arikan, 2010). Cyclic redundancy check (CRC) aided polar codes, coupled with the SCLD or the SCSD (Niu & Chen, 2012b), (Li et al., 2012) have been proposed to further improve the error correcting performance of polar codes for practical applications. Other research direction deals with reducing the latency of the decoding algorithm. The simplified successive cancellation algorithm (SSCD) was proposed in (Alamdar-Yazdi & Kschischang, 2011) and further improved in (Sarkis et al., 2014), (Hanif & Ardakani, 2017). The main idea of the SSCD is to identify small constituent codes that can efficiently be decoded. The four common constituent codes usually considered are the rate-0 code, the repetition code, the rate-1 code, and the single parity check code. The reduced latency SCLD was presented in (Li et al., 2013) and an efficient implementation of the SCSD was proposed in . One way of reducing latency is to exploit the recursive nature of polar codes. This approach known as folding, was introduced in (Kahraman et al., 2013). Several decoding algorithms based on the folding technique were proposed in (Kahraman et al., 2014a), (Kahraman et al., 2014b), (Vangala et al., 2014a), (Vangala et al., 2014b), (Huang et al., 2018). A reduced latency implementation of the SCSD was given in (Xiang et al., 2020).
Polar codes have been adopted as the main coding scheme for control and physical broadcast channels in the enhanced mobile broadband (eMBB) and the ultra reliable low latency communications (URLLC) service categories, defined in the fifth generation (5G) wireless communications standard (-3rd Generation Partnership Project., 2016), (Won & Ahn, 2020), (Hashemi et al., 2020). It is well known that the CRC aided (2048, 1024) polar codes with list decoding can outperform LDPC and Turbo codes of the same length (Li et al., 2012). Although polar codes have good error correcting performance, the decoding latency still remains an important research topic (Hashemi et al., 2020). In this work, we focus on improving the decoding latency of the SCSD algorithm.

Contribution.
In this paper, we present a novel low latency decoding algorithm that is constructed by applying the folding technique to the SCSD algorithm. The new decoding algorithm is in part inspired by the bidirectional decoding of convolutional codes (Senk & Radivojac, 1997) where the channel output is divided into two parts and each part is processed in parallel, while an inner component code is used to combine the results. We name this new decoding algorithm the bidirectional stack decoding (BSD). To the best of our knowledge, no such algorithms has ever been presented. We also show that the reduced latency decoding algorithm presented in (Xiang et al., 2020) can also be constructed by applying the folding technique to the classical SCSD algorithm. This reduced latency SCSD can be used as a component decoder of our algorithm, presented here.
Paper organization. The remainder of the paper is composed of four sections. The section System model introduces the polar codes and gives a brief overview of the SCD, the SCLD and the SCSD algorithms. The section Bidirectional stack decoding algorithm describes the new decoding algorithm introduced in this paper. The next section presents the Simulation results. The section Conclusion gives an overview of the future work and concludes the paper.
Notation. Throughout the paper, uppercase letters represent random variables, lowercase letters represent realizations of the corresponding random variables, uppercase bold letters represent random vectors, and lowercase bold letters represent their realization. The i-th component of a vector x is denoted x i . P [·] represents the probability of an event, E[·] represents the mean of a random variable and Var[·] represents the variance of a random variable. Cursive uppercase letters represent sets, and A N represents the set of all N -tuples of a set A. The cardinality of A is denoted |A|. Sometimes a set will be defined only by its elements, i.e {a 1 , a 2 , . . . a N }. Given a set A = {a 1 , a 2 , . . . , a K } and an N -dimensional vector x, such that N ≥ K, we define a new vector x A = [x a1 , x a2 , . . . , x aK ]. Other notation is introduced as it is used.

System model
In this paper, we consider only binary polar codes constructed by the Arikan kernel (Arikan, 2009) as they are most often used in practical applications. Given a vector u = [u 0 , u 1 ], we define u · F 2 = x = [x 0 , x 1 ]. A useful graphical representation of this multiplication is given in Fig. 1. Note that u = xF −1 2 = xF 2 can be represented by the same logical circuit in Fig. 1, by treating values x 0 and x 1 as the input, and u 0 and u 1 as the output. The Kronecker product of the matrix F 2 with itself is defined as Given N = 2 m , we define an N -dimensional polar transform as F N = F ⊗m 2 = F ⊗m−1 2 ⊗F 2 . A logical circuit representation of a higher-order matrix can be recursively constructed from smaller order circuits as shown in Fig.  2 for the case of the matrix in eq. (2). The polar code C : (N, K, A) is a linear block code of length N and dimension K defined by a set of indices A ⊂ {0, 1 . . . , N − 1}, such that |A| = K. The generator matrix of the code C is given by G = F N A , which consists only of the rows of F N specified by the values in A. Encoding can be defined as Alternatively, we can define an N -dimensional vector u, such that u A = i, and u A c bits are said to be frozen and set to some predefined value (usually all zeros) which is known both by the encoder and the decoder. This is useful as we can now use a logical circuit (as the one given in Fig. 1 and Fig. 2) to define the encoding and decoding. There are many methods for constructing the set A (Vangala et al., 2015). In this paper, we used the method based on Bhattacharyya bound approximation proposed in (Arikan, 2009).

Successive cancellation decoding
Let X be a random codeword, and C = φ(X), where φ(·) represents the BPSK mapping, defined as φ(x) = 1 − 2x. C is transmitted over an additive white Gaussian channel (AWGN), and the channel output Y = C + W is received. W represents the AWGN noise vector with mean 0 and variance σ 2 . Let y be a specific channel output, then λ = 2y σ 2 represents the log-likelihood ratio (LLR) vector.
x 00 x 01 x 10 x 11 The decoding factor graph can be constructed from the encoding circuit by replacing all summation blocks with check nodes and all junctions with variable nodes (Fig. 1) (Forney, 2001).
The SCD decoding algorithm works by passing the LLR values along such a defined factor graph from right to left, and hard decisions (β values) from left to right. When an updated LLR value reaches the end of the decoding circuit, a hard decision is made and propagated in the other direction. The SCD decoding procedure for the case of N = 2 is shown in Fig. 1. The update rules in Fig. 1 are given by: where the functions f (·), g(·) and h(·) are defined as The high-level description of the SCD algorithm for N = 2 m , m ≥ 1 is given in Alg. 1 (Vangala et al., 2015). Let Λ ∈ R N ×m+1 and B ∈ F N ×m+1 2 , be two matrices for storing the calculated LLR and hard estimate values, respectively. Each column of the matrix Λ (B) corresponds to a given level (numerated from left to right) in the decoding factor graph. At the beginning of the algorithm, the first column of the matrix Λ is set to the channel LLRs (λ). Furthermore, let ζ(·) represent the standard bit reversal function. For an efficient implementation of the SCD algorithm, see .

Successive cancellation list and stack decoding
The main drawback of the SCD algorithm is the fact that once the algorithm makes a hard decision, that decision is never again revisited, and an error can easily propagate. The list and stack algorithms were proposed to fix this problem. Let s l = (Λ, B, P M i , i) represent the current state of the decoder. The matrices Λ and B are the same as before and i represents the index of the last decoded bit. P M i represents the path metric at the time i and it is calculated as (Balatsoukas-Stimming et al., 2015) The SCLD algorithm works by keeping a list of states. At each time step, all states in the list are expanded. Every time a decision is made, the other decision is also considered and a new element is added to the list. If the list size is greater than some predefined size L, the list is sorted according to the path metric and only the best L are kept. The SCSD algorithm works in a similar manner, but instead of a list of states, a stack of states is kept and at each step only the best (top) state is considered. As the SCSD is an important component of the BSD, the high level description of the SCSD algorithm is given in Alg. 2. For a description of the SCLD, see (Balatsoukas-Stimming et al., 2015). In Alg. 2 the function top(·) returns the top of the stack. The matrix B is calculated by flipping a single bit in B, while the P M is the path metric of this changed state. We assume the stack is sorted in the descending order of the path metric, so that the best path is at the top of the stack. If the stack size is greater than some predefined size L, then the worst path is discarded. For an efficient implementation of the SCSD, see It follows that This operation is called the basic folding and it splits the original code into two polar codes of the length N 2 . A folded decoding factor graph corresponding to a polar code of the length N = 8 is shown in Fig. 3. This decoder consists of two polar codes of the length N = 4 that can work in parallel, and a combination phase that is used to reconcile the upper and lower decoder. The upper and lower decoders are usually implemented as a SCD, while the choice of the combination decoder may vary. If the combination decoder is also a SCD, then the error performance would be equal to that of a classical SCD algorithm. Usually, the combination phase is implemented as a maximum likelihood (ML) decoder (Vangala et al., 2014b), (Li et al., 2014), a list decoder (Li et al., 2014) or a stack decoder (Xiang et al., 2020). The upper and lower decoders can be folded again in order to further reduce latency (Xiang et al., 2020). The multiple folding technique consists of applying the same operation several times (Kahraman et al., 2013), (Kahraman et al., 2014b).  Note that a similar folding operation can be obtained by splitting the even and odd code bits into two parts and applying the polar transform in the opposite direction.
This operation splits the code into two polar codes of half the length, similarly as before. An example of this operation is shown in Fig. 4 for the case of a polar code of the length N = 8.
By combining equations (9) and (10), we get The same procedure can be applied to w E and w O to construct multiple folds. The new SCSD algorithm based on the odd-even folding is presented in Alg. 3. Let SCSD E and SCSD O represent the SCSD decoder of the even polar code and the odd polar code, respectively. Note that the reduced latency SCSD decoder, presented in (Xiang et al., 2020), can be used as a component decoder in our algorithm. Let ENCO E represents the encoder of the even polar code. The described decoding algorithm performs slightly worse than the classical SCSD algorithm. The reason for this is the fact that we keep only the best surviving state of the SCSD E decoder and discard all the rest. In order to improve performance, we modify the component SCSD algorithm to return a list of D best candidate states. This can easily be implemented by popping the best state and rerunning the SCSD with a modified stack. The decoder output is selected amongst all D 2 paths based on the combined path metric or by checking the CRC. As D does not need to be very large, an ML decision can also be applied.

Simulation results
As we are primarily interested in URLLC applications, we consider the polar codes of rate 0.5 and block lengths 128 and 256. Although we present results only for short blocklengths, the proposed algorithm can be applied to polar codes of any length. The codes were constructed using the Bhattacharyya bound approximation method, and the simulation was done using the Monte Carlo method for different values of E b /N 0 ranging from −1.6dB to 3.9dB with a step of 0.5dB. All simulations were run until a relative precision of δ = 0.05 was reached. The SCLD algorithm was run with a list size of L = 32, while the SCSD algorithm was run with a stack of size L = 100. In the case of the bidirectional stack algorithm, the component SCSD algorithms were run with a stack of size L = 32, and they both returned a list of D = 4 candidate states. Out of D 2 = 16 possible candidates, we select the ML one. Fig. 5 shows the bit error rate (BER) as a function of E b /N 0 for different decoders in the case of the (128, 64) polar code.  Fig. 6 shows the frame error rate (FER) as a function of E b /N 0 for different decoders in the case of the (128, 64) polar code. The FER of the (256, 128) polar code is shown in Fig. 7.
Based on these results, we see that the new BSD algorithm has the same error rate as the original SCSD algorithm. The speed of the proposed algorithm is higher than that of the original SCSD algorithm because of the smaller stack size and the fact that the algorithm is split into two parts, where each one is a SCSD of half the length of the original code.

Conclusion
In this paper, we presented a novel bidirectional stack decoding algorithm based on the folding technique. It was shown that the proposed algorithm has the same error performance as the existing algorithms. Future research will deal with further improving the bidirectional algorithm. The folding procedure is a powerful technique that can be used to construct a wide range of hybrid decoders. Different combinations of decoders could give better results -which is something that needs to be investigated. It is possible to further improve the proposed algorithm by applying the folding operation multiple times. The use of multiple CRC codes can also improve the performance of the decoder. An efficient hardware implementation of the proposed algorithm will also be developed.