Mark Stamp^{*} Clyde F. Martin^{†}
Department of Mathematics
Texas Tech University
Lubbock, TX 79409
Certain applications require pseudo-random sequences which are unpredictable in the sense that recovering more of the sequence from a short segment must be computationally infeasible. It is shown that linear complexity is useful in determining such sequences. A generalized linear complexity which has application to the security of stream ciphers is proposed and an efficient algorithm is given for the case where the sequence is binary with period 2^{n}. This new algorithm generalizes an algorithm due to Games and Chan.
Keywords: linear complexity, k-error linear complexity, algorithm
*Supported in part by NSA Grant MDA904-90-H-4009
Current Address: Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609
†Supported in part by NASA grant NAG2-89, NSF Grant DMS 8905334 and NSA Grant MDA904-90-H-4009
In a stream cipher, a binary message is encrypted by adding (element-wise modulo 2) a pseudo-random sequence of bits. The resulting bit string is presumably difficult to decipher and hence may be transmitted over insecure lines. By simply reversing the process, the receiving party recovers the original message. Stream ciphers have received considerable attention in the literature---an excellent source is the book by Rueppel [9].
It is assumed that a persistent eavesdropper will obtain some segment of the text and thereby obtain a segment of the pseudo-random sequence. For a stream cipher to be secure against this type of attack, the pseudo-random sequence must be unpredictable, i.e., it must be computationally infeasible to recover more of the sequence from a short, captured segment.
There are at least two approaches to this concept of unpredictability. Authors such as Shamir [11] and Blum and Micali [1] insist that determining the next element of the sequence based on the previous elements be provably as difficult as inverting a one-way function, while Groth [5], Key [6], and Rueppel [9] emphasize linear complexity as the primary measure of unpredictability. Our approach follows this second line of attack. Below we use the term cryptographically strong as a synonym for unpredictable.
2 Linear feedback shift registersThe pseudo-random sequence generator employed in a stream cipher must be capable of rapidly producing bits and it must be analyzable. Linear feedback shift registers (LFSR's) satisfy these requirements.
An LFSR consists of n storage units, or stages, and a linear feedback function. An LFSR is controlled by a clock and at each clock pulse the element in stage i is shifted into stage i - 1, with the element in stage 0 taken as the output. The element inserted into stage n - 1 is calculated from the linear feedback function.
It is easy to construct a 4-stage LFSR which realizes the linear recurrence
s_{i-1} + s_{i-4} (mod 2),
for all i ≥ 4.
Taking s_{0} = 0, s_{1} = 1,
s_{2} = 0, and s_{3} = 1
as initial values, this recurrence generates the periodic sequence
The output from any LFSR is eventually periodic in the sense that there exists a k such that (s_{k}, s_{k+1}, s_{k+2}, ...) is periodic. The maximum possible period for an n-stage LFSR is 2^{n} - 1 and any sequence which attains this upper bound is referred to as an m-sequence (or pseudo-noise sequence). It is well-known that the characteristic polynomial of an m-sequence is necessarily primitive [8].
Golomb [4] proposes three "randomness postulates" and suggests that sequences which satisfy these postulates are statistically random. Rueppel [9] has shown that Colomb's randomness postulates almost exclusively describe m-sequences. These properties make m-sequences useful in many applications where pseudo-random bits are required, but in the next section we show that m-sequences are cryptographically weak. It follows that intuitive concepts of randomness do not insure cryptographic strength. (However, certain nonlinear combinations of m-sequences do have desirable cryptographic properties; see [2, 5, 6, 9].)
3 Linear complexityLinear complexity is useful in the search for cryptographically strong pseudo-random sequences.
Definition 3.1 The linear complexity of a sequence is the minimum number of stages of an LFSR that generates the sequence.
We employ the notation s for a finite sequence of bits and (s) for the periodic sequence obtained by appending copies of s. The linear complexity of a periodic sequence (s) is just the "length" of the shortest linear recurrence which generates (s), i.e., the degree of the corresponding characteristic polynomial. Also, the linear complexity of a sequence (s) with period n is bounded above by n since the recurrence s_{i} = s_{i-n} (which has characteristic polynomial x_{n} + 1 ) will generate (s).
For example, consider the sequence in (2.1). This sequence has linear complexity four, since (s) is periodic and the degree of its characteristic polynomial is four.
There is an elegant and efficient method for determining the linear complexity of any finite or periodic sequence. This procedure is the well-known Berlekamp-Massey algorithm [7]. The Berlekamp-Massey algorithm requires only 2n consecutive bits to completely determine a sequence with linear complexity n. In the special case where (s) is binary with period 2^{n}, the lesser-known Chan-Games algorithm [3] is more efficient. (The paper [12] contains an alternative derivation of the Chan-Games algorithm which yields a slightly more general result.)
It follows that anyone who obtains 2n consecutive bits of (s) (where n is the linear complexity) can efficiently recover the entire pseudo-random sequence, and hence---in the case of a stream cipher---the entire "secret" message can be recovered. Therefore, a cryptographically strong sequence must have a high linear complexity.
However, a high linear complexity does not insure that a sequence is cryptographically strong. For example, the sequence
As we mentioned in the previous section, m-sequences are generally regarded as statistically random. However, an m-sequence would be a poor choice for use in a stream cipher. An m-sequence with a characteristic polynomial of degree n produces an output sequence with period 2^{n} - 1, but only 2n out of the 2^{n} - 1 bits are required in order to completely determine the sequence.
4 The k-error linear complexityIn the previous section it was shown that a high linear complexity is a necessary---but not sufficient---condition for a sequence to be cryptographically strong. Additional tests are therefore needed in order to determine "stronger" cryptographic sequences. Rueppel [9] has shown that the linear complexity profile is useful in this regard.
The linear complexity profile of (s) is obtained by plotting the linear complexity of
s_{0},s_{1},...,s_{n-1}
against n, for n=1,2,.... For example, in Figure 1 the linear
complexity profile of
Rueppel [9] states that a cryptographically strong sequence should have a linear complexity near the maximum possible, and its linear complexity profile should follow the n/2 line "closely but irregularly." The sequence (4.3) appears to satisfy these requirements.
We now propose a new measure of the complexity of periodic sequences which has application to the problem of identifying cryptographically strong pseudo-random sequences.
Definition 4.1 The k-error linear complexity of the periodic sequence
The k-error linear complexity can be interpreted as a worst-case measure of the linear complexity when k or fewer errors occur---and hence the terminology k-error linear complexity.
Observe that the 0-error linear complexity of any sequence is just the usual linear complexity. Also, the 1-error linear complexity of the sequence (3.2) is zero, since changing the final bit from 1 to 0 will produce the all-zero sequence, which has linear complexity zero (by definition).
The k-error linear complexity profile---i.e., the k-error linear complexity for k=0,1,2.... plotted against k---of the sequence (4.3) is shown in Figure 2. Above, we observed that (4.3) apparently passes the two tests given by Rueppel [9]. However, its k-error linear complexity profile is not reassuring, since an LFSR with just 7 stages correctly produces 30 out of every 32 bits of (s). Obviously, if we can recover such a high proportion of the sequence, a stream cipher employing (4.3) would be compromised. This example suggests that the k-error linear complexity may provide more information about the cryptographic strength of a pseudo-random sequence than other previously suggested methods.
The k-error linear complexity of any sequence could be found by repeated application of the Berlekamp-Massey algorithm [7] or, if appropriate, the Chan-Games
algorithm [3]. But, to find the k-error linear complexity of (s) = (s_{0},s_{1},...,s_{n-1}) would require
We have found an efficient new algorithm for computing the k-error linear complexity in the case where (s) is a binary sequence with period 2^{n}. This new algorithm---which appears in Figure 3---reduces to the Chan-Games algorithm [3] in the case k = 0. Next, we give a brief discussion of the Chan-Games algorithm before proving the validity of our new algorithm.
Given a binary sequence (s) with period 2^{n}, let
a = s; c = 0; l = 2^{n}; cost[i] = 1, for i=0,1,...,l-1; while l > 1 do l = l/2; L = a_{0}a_{1}...a_{l-1}; R = a_{l}a_{l+1}...a_{2l-1}; b = L + R; T = sum(b_{i} ⋅ min(cost[i],cost[i+l]), for i=0 to l-1); if T ≤ k then k = k - T; for i = 0,1,...,l-1 do if b_{i} = 1 then if cost[i] ≤ cost[i+l] then L_{i} = R_{i}; cost[i] = cost[i+l] - cost[i]; else cost[i] = cost[i] - cost[i+l]; end if else cost[i] = cost[i] + cost[i+l]; end if next i a = L; else c = c + l; a = b; cost[i] = min(cost[i],cost[i+l]), for i=0,1,...,l-1; end if end while if a_{0} = 1 and cost[0] > k then c = c + 1; end if
We can now verify the k-error linear complexity algorithm in Figure 3.
Theorem 5.1 Let (s) be a binary sequence with period 2^{n} and let 0 ≤ k ≤ 2^{n}. Then the algorithm in Figure 3 computes c, the k-error linear complexity of (s), in n steps.
Proof. As previously mentioned, when k = 0 the algorithm reduces to the Chan-Games algorithm; see [3] or [12] for a proof. If k > 0 then we are allowed to make k (or fewer) bit changes in s in order to reduce the complexity c as much as possible. But, as with the Chan-Games algorithm, c only increases when b ≠ 0. Notice that if we can force b = 0 at the jth step we should do so, since this prevents 2^{n-j} from being added to c and the total of all remaining possible additions is only 2^{n-j}. This is the basic logic of the algorithm-apply the Chan-Games method, but if b ≠ 0 and we can force b = 0 we do so.
The vector of cost[i]'s is intended to measure the "cost"---in terms of the number of bit changes required in the original sequence s---of changing the current element a_{i} without disturbing the results of any previous steps. Now suppose that at some step j, the value of cost[i] correctly gives the cost of changing a_{i}. If b_{i} = 1, changing either L_{i} or R_{i} will force b_{i} = 0 and hence the cost of changing b_{i} is the minimum of the costs of changing L_{i} or R_{i}. Since the cost of changing L_{i} is cost[i] and the cost associated with R_{i} is cost[i + l] it follows that the variable T in Figure 3 correctly gives the total cost of forcing b = 0.
It remains to show that the cost[i]'s are correct at the end of step j. Consider the case where T ≤ k, b_{i} = 1, and cost[i] ≤ cost[i + l]. In this case we change L_{i} since it has the lower cost, thus forcing b_{i} = 0. Notice that at the end of this step we will let a = L (see the algorithm). Now if we change a_{i} in step j + 1 we have effectively restored L_{i} (in step j) to its previous value. In order to maintain b_{i} = 0 in step j, we must then change R_{i} (in step j) which has a net cost of cost[i + l] - cost[i], and hence cost[i] is computed correctly in this case. Other cases must be considered but they are similar; see [131 for details.
Step 1) | l = 16, T = 2, k = 2 | |
L = 0_{1}0_{1}0_{1}1_{1}1_{1}0_{1}1_{1}0_{1}1_{1}0_{1}0_{1}1_{1}1_{1}0_{1}1_{1}0_{1} | ||
R = 1_{1}0_{1}0_{1}0_{1}1_{1}0_{1}1_{1}0_{1}1_{1}0_{1}0_{1}1_{1}1_{1}0_{1}1_{1}0_{1} | ||
b = 1_{1}0_{1}0_{1}1_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1}0_{1} | c = 0 | |
a = L | ||
Step 2) | l = 8, T = 0, k = 0 | |
L = 1_{0}0_{2}0_{2}0_{0}1_{2}0_{2}1_{2}0_{2} | ||
R = 1_{2}0_{2}0_{2}1_{2}1_{2}0_{2}1_{2}0_{2} | ||
b = 0_{0}0_{2}0_{2}1_{0}0_{2}0_{2}0_{2}0_{2} | c = 0 | |
a= L | ||
Step 3) | l = 4, T = 6, k = 0 | |
L = 1_{2}0_{4}0_{4}1_{2} | ||
R = 1_{4}0_{4}1_{4}0_{4} | ||
b = 0_{2}0_{4}1_{4}1_{2} | c = 4 | |
a = b | ||
Step 4) | l = 2, T = 6, k = 0 | |
L = 0_{2}0_{4} | ||
R = 1_{4}1_{2} | ||
b = 1_{2}1_{4} | c = 6 | |
a = b | ||
Step 5) | l = 1, T = 0, k = 0 | |
L = 1_{2} | ||
R = 1_{2} | ||
b = 0_{2} | c = 6 | |
a = L | ||
a_{0} = 1, cost[0] = 4, k = 0 | c = 7 |
In Figure 4 our algorithm is applied to compute the 2-error linear complexity of
In this paper we discussed a new measure of the complexity of periodic sequencesthe k-error linear complexity. An efficient algorithm for computing the k-error linear complexity of binary sequences with period 2^{n} was given. The example in Figures 1 and 2 illustrates the potential usefulness of the k-error linear complexity in the analysis of stream ciphers. The problem of efficiently computing the k-error linear complexity of sequences with period other than 2^{n} remains an open problem.
AcknowledgmentThe authors thank Dr. A. H. Chan and the anonymous referees for several helpful comments and suggestions.
References
[1] M. Blum and S. Micali, "How to generate cryptographically strong pseudorandom sequences," SIAM Journal on Computing, Vol. 13, No. 4, pp. 850--864, November 1984.
[2] E. Dawson, "Linear feedback shift registers and stream ciphers," in Number Theory and Cryptography, London Mathematical Society Lecture Note Series 154, J. H. Loxton, ed., Cambridge University Press, 1990, pp. 106--119.
[3] R. A. Games and A. H. Chan, "A fast algorithm for determining the complexity of a pseudo-random sequence with period 2^{n}," IEEE Transactions on Information Theory, Vol. IT-29, No. 1, pp. 144--146, January 1983.
[4] S. W. Golomb, Shift Register Sequences, Aegean Park Press, Laguna Hills, CA, 1987.
[5] E. J. Groth, "Generation of binary sequences with controllable complexity," IEEE Transactions on Information Theory, Vol. IT-17, pp. 288--296, May 1971.
[6] E. L. Key, "An analysis of the structure and complexity of nonlinear binary sequence generators," IEEE Transactions on Information Theory, Vol. IT-22, pp. 732--736, November 1976.
[7] J. L. Massey, "Shift-register synthesis and BCH decoding," IEEE Transactions on Information Theory, Vol. IT-15, No. 1, pp. 122--127, January 1969.
[8] R. J. McEliece, Finite Fields for Computer Scientists and Engineers, Kluwer Academic Publishers, Norwell, MA 1987.
[9] R. A. Rueppel, Analysis and Design of Stream Ciphers, Springer-Verlag, Berlin, 1986.
[10] R. S. Safavi-Naini and J. R. Seberry, "Pseudo-random sequence generators using structured noise," in Number Theory and Cryptography, London Mathematical Society Lecture Note Series 154, J. H. Loxton, ed., Cambridge University Press, 1990, pp. 129--136.
[11] A. Shamir, "On the generation of cryptographically strong pseudo-random sequences," in 8th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science 62, Springer-Verlag, New York, 1981, pp. 544--550.
[12] M. Stamp and C. F. Martin, "A new derivation of the Chan-Games algorithm," submitted.
[13] M. Stamp, "A generalized linear complexity," Ph.D. Dissertation, Department of Mathematics, Texas Tech University, May 1992.