Introduction to Coding Theory
Error correction in Compact Disks
How is it that a CD with so many scratches on it can still be played continuously with few audible errors? Why is it that a QR-code read incorrectly can still reveal the right information?
In other words, how can a CD-player or QR-code reader take a data stream with so many errors and correct most of them? This is where error correcting codes (a topic within Coding Theory) come into play.
Overview of error detection
In communications over noisy channels, like radio communications or the reading of a CD1, the data is often not received as intended.
Suppose that I intend to send the following message to a friend through a noisy channel: Hi! 2 This string is equivalent to the bits: 010010000110100100100001
However, my friend may receive this message as corrupted in ASCII. Flipped bits are marked in $\textcolor{red}{\texttt{red}}$.
Converting this transmitted message back to a string doesn’t even resemble a word! Over noisy mediums, error rates, like one-in-six ratio as in the example, are very common. Using check bits, however, we can devise of a simple way to detect an error (but not correct it).
Suppose that I send a message composed of three bits and a parity bit which is calculated by adding all the previous bits in the binary field :
Such a scheme can detect only one error: Suppose that the message is received, then the receiver can perform a simple calculation to see that the number of s in the first three bits is congruent to the parity bit in B. However, this particular scheme cannot detect two errors because the parity bit would not help in detecting an error3. The recipient of the data can only ask for a re-transmission in this scheme, an expensive and sometimes infeasible task.
With certain modifications, augmentations, and more complex calculations, schemes based solely on parity bits can detect a great deal of errors. Generally, the information rate for parity bit schemes is where is the length of the message4. However, this high information rate comes at the cost of not detecting a great deal of possible errors. These types of error-detecting schemes are widely implemented in ISBN numbers, bar-code scanners, and other noise-prone applications where re-scanning is a viable option.
Overview of error correction
In some situations, like reading a CD or in QR codes, detecting an error is simply not a viable choice as a corrupted bit cannot just be labeled as incorrect and subsequently discarded as the information may be vital. Here, error correction is important.
A simple error correction scheme is to simply append multiple copies of the original message and transmit it. For example, I can encode a message as and transmit . Then if a friend receives
he/she can take the majority-occurring message as the actual message and use it. Here the error correction and detection scheme corrects two errors but at the expense of more transmitted data. The information rate for such a scheme is , a very inefficient mechanism.
Defining Coding Theory
Coding Theory deals with the creation and analysis of efficient error-correction schemes to ensure that messages transmitted over noisy channels are properly received.
Further, Coding Theory aims to find the optimal balance of effectiveness and efficiency of error-correcting codes in various applications. In other words, Coding Theory aims to balance a high information rate with a high error correction rate.
Finally, I include a useful diagram which outlines the procedure of error correction across noisy channels. A message is encoded into a codeword, transmitted across a noisy channel, and then decoded and corrected by the recipient.
flowchart LR
subgraph encoding["Encoding and Transmission"]
direction LR
M1["Message word x"] -->|"Input"| ENC["Encoder"]
ENC -->|"Code word u"| ADD["+"]
ERR["Random error generator"] -->|"Errors"| ADD
ADD -->|"Received word v"| OUT1["Distorted signal"]
end
subgraph decoding["Reception and Decoding"]
direction LR
IN2["Received word v"] -->|"Input"| EP["Error processor"]
EP -->|"Code word u'"| DEC["Decoder"]
EP -.->|"v + error signal"| MON["Monitoring"]
DEC -->|"Message word x'"| OUT2["Recovered message"]
end
OUT1 -.->|"Transmission"| IN2
style encoding fill:#e1f5ff
style decoding fill:#fff4e1
style ERR fill:#ffcccc
style ADD fill:#ffe1e1
In this paper, I hope to effectively outline the use of Bose-Chaudhuri-Hocquenghem (BCH) codes which form a class of cyclic error-correcting codes constructed in finite fields.
Preliminaries to and definitions for discussing BCH codes
In the examples used in this paper, I will discuss the Galois field containing 16 elements notated as .
Code words and Hamming distances
Let’s first define some key words and concepts:
If is an alphabet (containing symbols), then an A-word with length is denoted as meaning that it contains elements from . For example, in the set of alphabetic characters , the A-word "hello" is denoted as .
An -block code over the alphabet of size contains exactly code words in . The set of elements in consists of all the code words for a message of length in the alphabet .
We will see that the field of code words defined in finite fields have special properties that would allow us to recover a corrupted message . In order encode some message , we use a bijective5 encoder to map some to the set of code words which have special properties as we will see.
We indulge in one more definition:
- The Hamming distance
-
between two code words in and is a measure of how much and differ6. This distance is defined as the number of positions where . In other words, the Hamming distance is a measure of how different two words are. For example, the distance between the vectors
helloandhi___in the alphabet set is 4 because the two vectors differ in value in four analougous locations.
Similarly, the minimum distance of the set is smallest Hamming distance between any distinct pair of vectors in . The minimum distance is thus a measure of how “close” any two words are in a set of code words; the lower the minimum distance, the closer two words in the set are. A code with a minimum Hamming distance of can correct up to errors in a code word. Usually, the set of code words is designed to have a minimum Hamming distance
In error-correction schemes, when receiving a message vector in a noisy channel, say , we are looking in the set of possible code words to see what vector in has the smallest Hamming distance to . Then, this vector is likely to be our intended message vector. In other words, we are trying to see what the message is most likely trying to represent from .
Finite/Galois Fields
Alphabets used in error-correction schemes are usually constrained to bits because they can represent all the common characters used in electronic communications. This leads to using a finite field of elements. Here, we give a more formal definition of finite fields.
The finite field contains elements and the standard arithmetic7 is constrained to this set of elements. can also be considered a ring so is the set of polynomials over . Arithmetic in can performed under a polynomial modulo and is denoted as for some irreducible8 polynomial that divides .
For the purposes of this paper, we will consider four bits as a single unit to deal with burst errors which tend to corrupt short and continuous segments of a code. As such, we will want to consider fields of size as there are two elements in the alphabet and the code word has length . As such, our finite field will can be denoted as .
We cannot use integers in our construction of our finite field as they do not satisfy the cancellation law9 under non-prime modulo. So, we must use polynomials because they satisfy the required arithmetic properties.
Polynomials in finite fields
Because our field is , we are constrained to using binary coefficients to describe our polynomials. From binary strings, we can easily represent polynomials in . For example: Polynomial addition is the same as addition over . For example, adding and results in . You may, correctly note that this resembles an XOR operation.
Polynomial multiplication is a little more involved and is covered in more detail soon.
Constructing a finite field with polynomials
We have seen that using composite numbers in does not allow the set of integers to satisfy the cancellation law. Using numbers like 16 as the modulus for a finite field leads to results like which are troubling to finite fields. This is why we chose polynomials as our arithmetic “basis” for math in the finite field .
Thus, for the construction of a finite field, other than for a prime , we must choose polynomials in that are irreducible10 factors of . One such factor of in GF(16) is and will be used to construct a finite field below.
To construct the finite field using our generator polynomial , we must first observe that because coefficients of in . The finite field itself is found by each possible binary polynomial under degree of . The set of possible polynomials each representing an integer from to is listed below in abbreviated form11.
Now, to actually construct the field, we compute a table by multiplying the polynomial representations of each binary number from the rows and columns together to form a cell:
For example, to compute , we compute the product of the polynomials: For another example, we compute : Using the fact that in we must further reduce this polynomial so it exists in because is not defined in the field.
Below is a completed table (in base 10) based on the primitive, generator polynomial .
Primitive elements in GF(16)
Similar to a primitive root for a prime number, a primitive element in a finite field can generate all elements of a field in some permutation. In other words, is a primitive element if the set is a permutation of .
For the field GF(16), is a primitive element because all powers of up to serve as a permutation of the set . Using the multiplication table above, we can easily verify this fact.
Overview of BCH
To develop a BCH code, we use all of the knowledge we have outlined above. Given a binary message expressed as a polynomial called , we can map it to an encoded code word that lies in the set of valid BCH code words12. We then convert to binary and transmit across a noisy channel.
The receiver will then perform various polynomial operations, calculate syndromes (yet to be discussed), and linear algebra row-reduction to eventually calculate where the corrupted bits are.
Encoding in BCH
To illustrate the BCH encoding and decoding procedures, we will use the example of a triple-error correcting BCH code in the GF based on the primitive13 polynomial 14.
First, let us clarify what a BCH code is:
- The represents
-
the size of the binary message transmitted. The size of a message is limited by the size of the finite field we are using. In this case, we are using GF so the size of our message is limited to . This value for is also directly related to the maximal power of a polynomial code word as we will soon see.
- The represents
-
the number of information bits that can be transmitted.
- The represents
-
the number of check bits needed to correct the errors we expect to introduce to our bit overall message. The designed minimum distance in this code is which is a requirement to correct errors15.
Primitive elements and minimal polynomials in GF(16)
Now, given a primitive element , we can find various minimal polynomials16 for each power of given below.
Note that each power of (i.e. ) is a multiple of the corresponding minimal polynomial function .
Finding a generator polynomial g(x)
Now, we must construct our generator function that will aid in creating our BCH code word. We find as such (using our table above). We find the LCM of three minimal polynomials because we are trying to correct three errors.
Now, we have finally established that the polynomial can correct three errors in a BCH code.
Encoding our message into a codeword of BCH
Next, suppose that we would like to send a message . We can express our message in equivalent forms: Now, we must find our transmitted word :
-
First, we find the parity bits polynomial . So, we must find the residue of in the GF based on the primitive polynomial .
In this case, we operate under . Our final expression is evaluated as such17:
-
Second, we find by adding and :
Now, suppose we have transmitted the code word . In transmission over a noisy channel, the error vector is introduced to the transmission . Adding the polynomials and we find the vector/polynomial received is18.
Summarizing our results thus far
The transmitted, error, and received vectors are included below for convenience (and resemble flipped bits):
Decoding and error correction in BCH
Structure of BCH cyclic codes
Before delving in the decoding and error correction of our sample BCH code, we should discuss the structure of these codes.
BCH codes are cyclic codes that are constructed by specifying the zeros of their generator polynomials ( in our case). BCH generator polynomials have a special property that the polynomial’s roots are consecutive. Namely, the generator polynomial has consecutive roots where is a primitive element and is the number errors that the BCH code can correct. This fact arises from the fact that BCH codes have a minimum designed minimum Hamming distance of .
Decoding BCH codes
To decode binary codes in BCH, we must use the elements of GF(16) to number the positions of a codeword. For a vector or length , the numbering is illustrated as such:
In GF(16) arithmetic, we can find the positions of the errors (bottom row of the table above) by solving a set of equations. These equations can be found from the error polynomial and the zeros of the code for .
Let represent the polynomial associated with a received codeword where the error polynomial is defined as for is the number of errors. The sets and are known as the error values and error positions respectively where for binary BCH codes and 19.
We can then calculate syndrome values as such by evaluating at each of the zeros of the code (which we have established to be powers of ):
Then we can define our error locater polynomial as:
Finally, the roots of this polynomial are equivalent to the roots of the inverses of the error locations in . Finally, we can make the following relation between the coefficients of and the syndromes as such.
Solving this matrix for the vector will give us the coefficients in the error locater polynomial . This matrix is also referred to as the key equation. There are multiple algorithms to solve for the vector:
- Euclidean algorithm
- Berlekamp-Massey algorithm (BMA)
- The Peterson-–Gorenstein–-Zierler (PGZ) decoder
In this example, we will use the PGZ decoder.
Now, the binary message is received and the recipient proceeds to determine the and correct the errors.
The recipient must first calculate values known as syndromes by evaluating the recevied message polynomial for corresponding powers of the primitive element . The syndromes are calculated as such using the Power/Vector table provided in the Appendix:20
Restating our findings:
Now, knowing the six syndrome values for the received message , we can proceed to use the Peterson–-Gorenstein–-Zierler algorithm to find the error locations of .
The Peterson-–Gorenstein–-Zierler (PGZ) decoder
The decoding problem with BCH is that the number of errors in a message are unknown to a recipient.
The PGZ process begins by filling the matrix from prior with values that we know for representing the three number of errors we can correct:
Because the values below are the inverses of the error positions, we must find the solution to the matrix equation:
Finding a solution for the vector is equivalent to seeing if
is invertible. This is equivalent to the matrix having a non-zero determinant21.
Let us calculate this determinant:
Because the determinant of the matrix is non-zero, then the matrix equation
must have a solution for the vector. Through a multitude of methods, we can solve the matrix equation above for the vector. Any such valid calculation (involving inverses and matrix multiplication22 reveals the following vector:
It follows by our definition of the error locater polynomial that:
This polynomial also factors in with our primitive root 23 as such.
This factorization allows us to see that the factors of this equation are , and .
These powers of these inverses in fact correspond to error positions as such:
These terms are identical to the terms of the error polynomial . To the receiver, who has just calculated this error polynomial for him/herself, the corresponding bits (12th, 6th, and 1st) are flipped24 to recover the original message. The final step is completed by adding the two vectors/polynomials25.
Summarizing our results
Now, the original message is the first five bits 10110. This message has been recovered, without error, due to this implementation of a BCH code. The information rate for this BCH code is .
The decoding and error correction segment of this algorithm is essentially a way to find the BCH code word that has a minimum distance to the received vector . This code word (in the set of BCH code words) then must be the intended sent code word before an error acted upon it.
Conclusion
Through our example of a BCH code, we required that calculations take place in and with codewords expressed in binomial coefficients in . Reed-Solomon codes drop this last condition and allow the terms our message to be expressed in . As such, Reed-Solomon codes are considered a special subset of BCH codes. Such Reed-Solomon codes are used in most CD players, QR-code readers, and even in communication between NASA and it Voyager deep-space probes.
For example, our example of a 5-bit message () is not longer constrained to 0s and 1s. Now, we can send messages in the set GF(16). Conveniently, data in computers is commonly represented in hex-code (in base 16) so we can send a string of data with 16 times more information!
In this paper, we have discussed Coding Theory, finite/Galois fields (and their various properties), polynomial arithmetic in these fields, and a complete example of BCH codes.
Appendix
Power/Vector table
References
-
Other examples of noisy channels include reading data from a hard drive or reading a magnetic tape ↩
-
Here, text is converted to Unicode numbers and finally into binary. ↩
-
In this scheme, we do not assume that the parity bit was immune from error. If the only parity bit is flipped, this scheme would still reveal that an error occurred in transmission. ↩
-
The greater is, the more information that is transmitted and the more efficient the detection or correction scheme is ↩
-
The encoder is both subjective and injective, onto and one-to-one. ↩
-
i.e. the distance between them ↩
-
i.e. addition, subtraction, multiplication, and division (inverses) ↩
-
The polynomial cannot be further factored ↩
-
The Cancellation Law states that if and , then . Integer math fails in mod 16, for example, because and, even worse, ↩
-
i.e. they do not factor or “split” ↩
-
Note that these s are not the same as the s used in as we are used to. ↩
-
This set adheres to a designed minimum Hamming distance to ensure a required measure of “separated-ness” between the code words in the set. This designed minimum distance will allow a decoder to determine what a transmitted message is trying to represent ↩
-
i.e. an irreducible factor of ↩
-
Note that this table is based on a different polynomial than the table shown earlier. A similar table can easily be constructed, however, to verify the math in the following sections. ↩
-
Let the designed minimum distance and let the number of errors we wish to correct with BCH be , then . In out example, we wish to correct three errors so our ↩
-
A minimal polynomial here is a monic (leading coefficient of 1) polynomial of smallest degree which has coefficients in GF(16) and as a root ↩
-
Note that adding polynomials with binary coefficients is identical to the XOR operation in computers! ↩
-
This could be extended for any field GF( for an natural number m ↩
-
Note that is a primitive element of . For , . So we are essentially evaluating for a primitive element’s polynomial like ↩
-
By the Invertible Matrix Theorem from Linear Algebra. ↩
-
If you wish to try row-reduction yourself, note that since is a primitive element in , we know that . For example, to find the inverse of follows that so ) ↩
-
Again, we use the Power/Vector table found in the Appendix ↩
-
This process involves simple binary addition ↩
-
As we see below, adding two terms with the same degree in binary coefficients cancels them out. ↩
This site is open source. Improve this page »