k-error linear complexity

An Algorithm for the k-Error Linear Complexity of Binary Sequences with Period 2ⁿ

Mark Stamp^* Clyde F. Martin^†

Department of Mathematics
Texas Tech University
Lubbock, TX 79409

Abstract

Certain applications require pseudo-random sequences which are unpredictable in the sense that recovering more of the sequence from a short segment must be computationally infeasible. It is shown that linear complexity is useful in determining such sequences. A generalized linear complexity which has application to the security of stream ciphers is proposed and an efficient algorithm is given for the case where the sequence is binary with period 2ⁿ. This new algorithm generalizes an algorithm due to Games and Chan.

Keywords: linear complexity, k-error linear complexity, algorithm

*Supported in part by NSA Grant MDA904-90-H-4009
Current Address: Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609

†Supported in part by NASA grant NAG2-89, NSF Grant DMS 8905334 and NSA Grant MDA904-90-H-4009

1 Introduction

In a stream cipher, a binary message is encrypted by adding (element-wise modulo 2) a pseudo-random sequence of bits. The resulting bit string is presumably difficult to decipher and hence may be transmitted over insecure lines. By simply reversing the process, the receiving party recovers the original message. Stream ciphers have received considerable attention in the literature---an excellent source is the book by Rueppel [9].

It is assumed that a persistent eavesdropper will obtain some segment of the text and thereby obtain a segment of the pseudo-random sequence. For a stream cipher to be secure against this type of attack, the pseudo-random sequence must be unpredictable, i.e., it must be computationally infeasible to recover more of the sequence from a short, captured segment.

There are at least two approaches to this concept of unpredictability. Authors such as Shamir [11] and Blum and Micali [1] insist that determining the next element of the sequence based on the previous elements be provably as difficult as inverting a one-way function, while Groth [5], Key [6], and Rueppel [9] emphasize linear complexity as the primary measure of unpredictability. Our approach follows this second line of attack. Below we use the term cryptographically strong as a synonym for unpredictable.

2 Linear feedback shift registers

The pseudo-random sequence generator employed in a stream cipher must be capable of rapidly producing bits and it must be analyzable. Linear feedback shift registers (LFSR's) satisfy these requirements.

An LFSR consists of n storage units, or stages, and a linear feedback function. An LFSR is controlled by a clock and at each clock pulse the element in stage i is shifted into stage i - 1, with the element in stage 0 taken as the output. The element inserted into stage n - 1 is calculated from the linear feedback function.

It is easy to construct a 4-stage LFSR which realizes the linear recurrence s_i-1 + s_i-4 (mod 2), for all i ≥ 4. Taking s₀ = 0, s₁ = 1, s₂ = 0, and s₃ = 1 as initial values, this recurrence generates the periodic sequence

(0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1) (2.1)
and any other choice of initial values---with the obvious exception of the zero vector---produces a cyclic shift of (2.1). The characteristic polynomial [8] of the sequence (2.1) is c(x) = x⁴ + x + 1.

The output from any LFSR is eventually periodic in the sense that there exists a k such that (s_k, s_k+1, s_k+2, ...) is periodic. The maximum possible period for an n-stage LFSR is 2ⁿ - 1 and any sequence which attains this upper bound is referred to as an m-sequence (or pseudo-noise sequence). It is well-known that the characteristic polynomial of an m-sequence is necessarily primitive [8].

Golomb [4] proposes three "randomness postulates" and suggests that sequences which satisfy these postulates are statistically random. Rueppel [9] has shown that Colomb's randomness postulates almost exclusively describe m-sequences. These properties make m-sequences useful in many applications where pseudo-random bits are required, but in the next section we show that m-sequences are cryptographically weak. It follows that intuitive concepts of randomness do not insure cryptographic strength. (However, certain nonlinear combinations of m-sequences do have desirable cryptographic properties; see [2, 5, 6, 9].)

3 Linear complexity

Linear complexity is useful in the search for cryptographically strong pseudo-random sequences.

Definition 3.1 The linear complexity of a sequence is the minimum number of stages of an LFSR that generates the sequence.

We employ the notation s for a finite sequence of bits and (s) for the periodic sequence obtained by appending copies of s. The linear complexity of a periodic sequence (s) is just the "length" of the shortest linear recurrence which generates (s), i.e., the degree of the corresponding characteristic polynomial. Also, the linear complexity of a sequence (s) with period n is bounded above by n since the recurrence s_i = s_i-n (which has characteristic polynomial x_n + 1 ) will generate (s).

For example, consider the sequence in (2.1). This sequence has linear complexity four, since (s) is periodic and the degree of its characteristic polynomial is four.

There is an elegant and efficient method for determining the linear complexity of any finite or periodic sequence. This procedure is the well-known Berlekamp-Massey algorithm [7]. The Berlekamp-Massey algorithm requires only 2n consecutive bits to completely determine a sequence with linear complexity n. In the special case where (s) is binary with period 2ⁿ, the lesser-known Chan-Games algorithm [3] is more efficient. (The paper [12] contains an alternative derivation of the Chan-Games algorithm which yields a slightly more general result.)

It follows that anyone who obtains 2n consecutive bits of (s) (where n is the linear complexity) can efficiently recover the entire pseudo-random sequence, and hence---in the case of a stream cipher---the entire "secret" message can be recovered. Therefore, a cryptographically strong sequence must have a high linear complexity.

However, a high linear complexity does not insure that a sequence is cryptographically strong. For example, the sequence

(3.2)
has linear complexity n (the maximum possible), but it is cryptographically weak---the sequence (3.2) would leave the original message virtually unaltered. A more subtle example is given by Safavi-Naini and Seberry [10].

As we mentioned in the previous section, m-sequences are generally regarded as statistically random. However, an m-sequence would be a poor choice for use in a stream cipher. An m-sequence with a characteristic polynomial of degree n produces an output sequence with period 2ⁿ - 1, but only 2n out of the 2ⁿ - 1 bits are required in order to completely determine the sequence.

4 The k-error linear complexity

In the previous section it was shown that a high linear complexity is a necessary---but not sufficient---condition for a sequence to be cryptographically strong. Additional tests are therefore needed in order to determine "stronger" cryptographic sequences. Rueppel [9] has shown that the linear complexity profile is useful in this regard.

The linear complexity profile of (s) is obtained by plotting the linear complexity of s₀,s₁,...,s_n-1 against n, for n=1,2,.... For example, in Figure 1 the linear complexity profile of

(s) = (00011010100110101000101010011010) (4.3)
is given.

Rueppel [9] states that a cryptographically strong sequence should have a linear complexity near the maximum possible, and its linear complexity profile should follow the n/2 line "closely but irregularly." The sequence (4.3) appears to satisfy these requirements.

We now propose a new measure of the complexity of periodic sequences which has application to the problem of identifying cryptographically strong pseudo-random sequences.

Definition 4.1 The k-error linear complexity of the periodic sequence

(s) = (s₀,s₁,...,s_n-1),
denoted c_k(s), is the smallest linear complexity that can be obtained when any k or fewer of the s_i's are altered.

The k-error linear complexity can be interpreted as a worst-case measure of the linear complexity when k or fewer errors occur---and hence the terminology k-error linear complexity.

Observe that the 0-error linear complexity of any sequence is just the usual linear complexity. Also, the 1-error linear complexity of the sequence (3.2) is zero, since changing the final bit from 1 to 0 will produce the all-zero sequence, which has linear complexity zero (by definition).

The k-error linear complexity profile---i.e., the k-error linear complexity for k=0,1,2.... plotted against k---of the sequence (4.3) is shown in Figure 2. Above, we observed that (4.3) apparently passes the two tests given by Rueppel [9]. However, its k-error linear complexity profile is not reassuring, since an LFSR with just 7 stages correctly produces 30 out of every 32 bits of (s). Obviously, if we can recover such a high proportion of the sequence, a stream cipher employing (4.3) would be compromised. This example suggests that the k-error linear complexity may provide more information about the cryptographic strength of a pseudo-random sequence than other previously suggested methods.

Figure 1: A linear complexity profile

5 A k-error linear complexity algorithm

The k-error linear complexity of any sequence could be found by repeated application of the Berlekamp-Massey algorithm [7] or, if appropriate, the Chan-Games algorithm [3]. But, to find the k-error linear complexity of (s) = (s₀,s₁,...,s_n-1) would require

applications of the basic algorithm which is prohibitively high, even for moderate n and k.

Figure 2: A k-error linear complexity profile

We have found an efficient new algorithm for computing the k-error linear complexity in the case where (s) is a binary sequence with period 2ⁿ. This new algorithm---which appears in Figure 3---reduces to the Chan-Games algorithm [3] in the case k = 0. Next, we give a brief discussion of the Chan-Games algorithm before proving the validity of our new algorithm.

Given a binary sequence (s) with period 2ⁿ, let

L = (s₀,s₁,...,s_2^n-1-1) and R = (s_2^n-1,s_2^n-1+1,...,s_2ⁿ-1)
and define b = L + R, where the addition is elementwise, modulo 2. If b = 0, then L = R and any recurrence that generates (L) also generates (s), and hence c₀(s) = c₀(L). On the other hand, if b ≠ 0 it is shown in [12] that c₀(s) = 2^n-1 + c₀(b). The Chan-Games algorithm [3] is obtained by applying these results recursively, i.e., if b ≠ 0, we increase the complexity by 2^n-1 and apply the procedure to b, but if b = 0, the complexity does not increase and we apply the procedure to L. Note that the complexity only increases at a step where b ≠ 0.

a = s;  c = 0;  l = 2ⁿ;
cost[i] = 1, for i=0,1,...,l-1;
while l > 1 do
     l = l/2;  L = a₀a₁...a_l-1;  R = a_la_l+1...a_2l-1;
     b = L + R;  T = sum(b_i ⋅ min(cost[i],cost[i+l]), for i=0 to l-1);
     if T ≤ k then
          k = k - T;
          for i = 0,1,...,l-1 do
               if b_i = 1 then
                    if cost[i] ≤ cost[i+l] then
                         L_i = R_i;  cost[i] = cost[i+l] - cost[i];
                    else
                         cost[i] = cost[i] - cost[i+l];
                    end if
               else
                    cost[i] = cost[i] + cost[i+l];
               end if
          next i 
          a = L;
     else
          c = c + l;
          a = b;
          cost[i] = min(cost[i],cost[i+l]), for i=0,1,...,l-1; 
     end if 
end while
if a₀ = 1 and cost[0] > k then
     c = c + 1;
end if

Figure 3: A k-error linear complexity algorithm

We can now verify the k-error linear complexity algorithm in Figure 3.

Theorem 5.1 Let (s) be a binary sequence with period 2ⁿ and let 0 ≤ k ≤ 2ⁿ. Then the algorithm in Figure 3 computes c, the k-error linear complexity of (s), in n steps.

Proof. As previously mentioned, when k = 0 the algorithm reduces to the Chan-Games algorithm; see [3] or [12] for a proof. If k > 0 then we are allowed to make k (or fewer) bit changes in s in order to reduce the complexity c as much as possible. But, as with the Chan-Games algorithm, c only increases when b ≠ 0. Notice that if we can force b = 0 at the jth step we should do so, since this prevents 2^n-j from being added to c and the total of all remaining possible additions is only 2^n-j. This is the basic logic of the algorithm-apply the Chan-Games method, but if b ≠ 0 and we can force b = 0 we do so.

The vector of cost[i]'s is intended to measure the "cost"---in terms of the number of bit changes required in the original sequence s---of changing the current element a_i without disturbing the results of any previous steps. Now suppose that at some step j, the value of cost[i] correctly gives the cost of changing a_i. If b_i = 1, changing either L_i or R_i will force b_i = 0 and hence the cost of changing b_i is the minimum of the costs of changing L_i or R_i. Since the cost of changing L_i is cost[i] and the cost associated with R_i is cost[i + l] it follows that the variable T in Figure 3 correctly gives the total cost of forcing b = 0.

It remains to show that the cost[i]'s are correct at the end of step j. Consider the case where T ≤ k, b_i = 1, and cost[i] ≤ cost[i + l]. In this case we change L_i since it has the lower cost, thus forcing b_i = 0. Notice that at the end of this step we will let a = L (see the algorithm). Now if we change a_i in step j + 1 we have effectively restored L_i (in step j) to its previous value. In order to maintain b_i = 0 in step j, we must then change R_i (in step j) which has a net cost of cost[i + l] - cost[i], and hence cost[i] is computed correctly in this case. Other cases must be considered but they are similar; see [131 for details.

(s) = (00011010100110101000101010011010), k = 2

Step 1)	l = 16, T = 2, k = 2
	L = 0₁0₁0₁1₁1₁0₁1₁0₁1₁0₁0₁1₁1₁0₁1₁0₁
	R = 1₁0₁0₁0₁1₁0₁1₁0₁1₁0₁0₁1₁1₁0₁1₁0₁

	b = 1₁0₁0₁1₁0₁0₁0₁0₁0₁0₁0₁0₁0₁0₁0₁0₁	c = 0
		a = L
Step 2)	l = 8, T = 0, k = 0
	L = 1₀0₂0₂0₀1₂0₂1₂0₂
	R = 1₂0₂0₂1₂1₂0₂1₂0₂

	b = 0₀0₂0₂1₀0₂0₂0₂0₂	c = 0
		a= L
Step 3)	l = 4, T = 6, k = 0
	L = 1₂0₄0₄1₂
	R = 1₄0₄1₄0₄

	b = 0₂0₄1₄1₂	c = 4
		a = b
Step 4)	l = 2, T = 6, k = 0
	L = 0₂0₄
	R = 1₄1₂

	b = 1₂1₄	c = 6
		a = b
Step 5)	l = 1, T = 0, k = 0
	L = 1₂
	R = 1₂

	b = 0₂	c = 6
		a = L
	a₀ = 1, cost[0] = 4, k = 0	c = 7

Figure 4: The 2-complexity

In Figure 4 our algorithm is applied to compute the 2-error linear complexity of

(s) = (00011010100110101000101010011010).
The subscript on L_i is cost[i], the subscript on R_i is cost[i + l] , and the subscript on b_i is min(cost[i], cost[i+l]). According to the proof of Theorem 5.1, these subscripts give the minimum number of bit changes in the original sequence s required in order to change the subscripted element, without adversely affecting any intermediate results.

6 Conclusion

In this paper we discussed a new measure of the complexity of periodic sequencesthe k-error linear complexity. An efficient algorithm for computing the k-error linear complexity of binary sequences with period 2ⁿ was given. The example in Figures 1 and 2 illustrates the potential usefulness of the k-error linear complexity in the analysis of stream ciphers. The problem of efficiently computing the k-error linear complexity of sequences with period other than 2ⁿ remains an open problem.

Acknowledgment

The authors thank Dr. A. H. Chan and the anonymous referees for several helpful comments and suggestions.

References

[1] M. Blum and S. Micali, "How to generate cryptographically strong pseudorandom sequences," SIAM Journal on Computing, Vol. 13, No. 4, pp. 850--864, November 1984.

[2] E. Dawson, "Linear feedback shift registers and stream ciphers," in Number Theory and Cryptography, London Mathematical Society Lecture Note Series 154, J. H. Loxton, ed., Cambridge University Press, 1990, pp. 106--119.

[3] R. A. Games and A. H. Chan, "A fast algorithm for determining the complexity of a pseudo-random sequence with period 2ⁿ," IEEE Transactions on Information Theory, Vol. IT-29, No. 1, pp. 144--146, January 1983.

[4] S. W. Golomb, Shift Register Sequences, Aegean Park Press, Laguna Hills, CA, 1987.

[5] E. J. Groth, "Generation of binary sequences with controllable complexity," IEEE Transactions on Information Theory, Vol. IT-17, pp. 288--296, May 1971.

[6] E. L. Key, "An analysis of the structure and complexity of nonlinear binary sequence generators," IEEE Transactions on Information Theory, Vol. IT-22, pp. 732--736, November 1976.

[7] J. L. Massey, "Shift-register synthesis and BCH decoding," IEEE Transactions on Information Theory, Vol. IT-15, No. 1, pp. 122--127, January 1969.

[8] R. J. McEliece, Finite Fields for Computer Scientists and Engineers, Kluwer Academic Publishers, Norwell, MA 1987.

[9] R. A. Rueppel, Analysis and Design of Stream Ciphers, Springer-Verlag, Berlin, 1986.

[10] R. S. Safavi-Naini and J. R. Seberry, "Pseudo-random sequence generators using structured noise," in Number Theory and Cryptography, London Mathematical Society Lecture Note Series 154, J. H. Loxton, ed., Cambridge University Press, 1990, pp. 129--136.

[11] A. Shamir, "On the generation of cryptographically strong pseudo-random sequences," in 8th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science 62, Springer-Verlag, New York, 1981, pp. 544--550.

[12] M. Stamp and C. F. Martin, "A new derivation of the Chan-Games algorithm," submitted.

[13] M. Stamp, "A generalized linear complexity," Ph.D. Dissertation, Department of Mathematics, Texas Tech University, May 1992.