Revealing Your Secrets through the Fourth Dimension

by Wing H. Wong

Introduction

Suppose Alice wants to make sure no one breaks into her house. So she buys the most secure locks and installs several of them on her door. However, a clever burglar might simply unscrew the hinges, remove the door and walk away with all of Alice’s valuables with minimal effort.

Do you think your computer system is secure because you use RSA? Do you think your system is impenetrable because you use a 1024-bit or 2048-bit key? Do you know attackers may be able to attack your system in a completely unexpected direction, doing sneaky things on the side instead of directly breaking RSA?

RSA is the most widely used public key cryptographic algorithm today. Cliff Cocks invented the algorithm in the early 1970’s [5] and it was independently rediscovered by the team of Rivest, Shamir, and Adleman at MIT in 1978. Today it is widely used to secure electronic data transfers. The strength of RSA comes from the fact that factoring large numbers is difficult. The best-known factoring methods are still very slow. For example, in a recent RSA challenge (August 1999), a 512-bit RSA challenge number was factored using 292 workstations and high-speed computers. The factoring took 35.7 CPU-years to accomplish, equivalent to approximately 80,000 MIPS years. The feat required 3.7 months of calendar time [7]. Because so many people have been trying to find efficient ways to factor large numbers, so far without great success, we can probably assume that RSA is safe from a factoring attack for a typical key length of 1024 bits. RSA can be made more secure against factoring by increasing the key length to 2048 bits or more.

Despite this formidable mathematical strength, research has shown that it is feasible to recover RSA private keys without directly breaking RSA. This type of attack is known as a “timing attack”, in which an attacker observes the running time of a cryptographic algorithm and thereby deduces the secret parameter involved in the operations. While it is generally agreed that RSA is secure from a direct attack, RSA’s vulnerability to timing attacks is not so well known and often overlooked. This paper explains the principles of timing attacks on RSA, summarizes the results of timing attacks implemented by various researchers, and discusses defenses against such attacks.

Timing Attacks on RSA

Kocher [4] was the first to discuss timing attacks. At the RSA Data Security and CRYPTO conferences in 1996, Kocher presented his preliminary result, warned vendors about his attack, and caught the attention of cryptographers including the inventors of the RSA cryptosystem. Timing attacks are a form of “side channel attack”, where an attacker gains information from the implementation of a cryptosystem rather than from any inherent weakness in the mathematical properties of the system. Unintended channels of information arise due to the way an operation is performed or the media used. Side channel attacks can exploit information about timing, power consumption, electromagnetic emanations or even sound to recover secret information about a cryptosystem [8].

Timing attacks exploit the timing variations in cryptographic operations. Because of performance optimizations, computations performed by a cryptographic algorithm often take different amounts of time depending on the input and the value of the secret parameter. If RSA private key operations can be timed reasonably accurately, in some cases statistical analysis can be applied to recover the secret key involved in the computations.

Before discussing timing attacks on RSA, we must first consider the mathematics of the cryptosystem. RSA is a public key cryptosystem that uses a public exponent e for encryption and a private exponent d for decryption. It uses a modulus N, which is a product of two large prime numbers p and q, that is, N = pq. The exponents e and d must be chosen to satisfy the condition ed = 1 mod (p − 1)(q − 1). Then the RSA key pair consists of the public key (N, e) and the private key d. Factoring breaks RSA because if an attacker can factor N into p and q, she can use the public value e to easily find the private value d.

To encrypt a plaintext message M, we compute C =M^e mod N, where C is the encoded message, or ciphertext. To decrypt the ciphertext C, we compute M = C^d mod N, which yields the original message M. The decryption works thanks to a result from number theory known as Euler’s Theorem [2]. Another use of RSA is to generate digital signatures, which serve to verify the source of the message. Signing uses the private key d and is the same mathematical operation as decrypting. The receiver uses the public key e and performs an encryption operation to verify the signature.

The private key operation in RSA is thus the modular exponentiation M = C^d mod N, where N is the RSA modulus, C is the text to decrypt or sign, and d is the private key. The attacker’s goal is to find d. For a timing attack, the attacker needs to have the target system compute C^d mod N for several carefully selected values of C. By precisely measuring the amount of time required and analyzing the timing variations, the attacker can recover the private key d one bit at a time until the entire exponent is known. As Kocher [4] explained, the attack is in essence a signal detection problem. The “signal” consists of the timing variation caused by the target exponent bit, while the “noise” consists of the inaccuracies in the timing measurements and the random fluctuations in timing, particularly over a network. The number of timing samples required is thus determined by the properties of the signal and noise. In any case, the required sample size is proportional to the number of bits in the exponent d [4]. As there are only a limited number of bits in the private exponent d, the attack is computationally practical.

A simple and efficient algorithm for computing C^d mod N is the square and multiply algorithm as shown in Figure 1, where d = d₀d₁…d_n in binary, with d₀ = 1. Typical RSA implementations use the Montgomery algorithm to perform the multiplication and the square operations. With the Montgomery algorithm, multiplications take a constant amount of time, independent of the size of the factors. But if the intermediate result of the multiplication exceeds the modulus N, an additional subtraction, called an extra reduction, is performed [3]. It is this extra reduction step that causes the timing difference for different inputs, and exposes information about the secret key.

 
    x = C
    for j = 1 to n
        x = mod(x², N)
        if d_j == 1 then
           x = mod(xC, N) 
        end if
    next j
    return x

Figure 1: Square and multiply algorithm.

Another typical way to optimize the RSA implementation is to use the Chinese Remainder Theorem (CRT) to perform the exponentiation. With CRT, the function M = C^d mod N is computed by first evaluating M₁ = C^d1 mod N and M₂ = C^d2 mod N, where d₁ and d₂ are precomputed from d. M₁ and M₂ are then combined to yield M [1]. RSA with CRT makes the original attack by Kocher inoperative. Nevertheless, a timing attack can expose one of the factors of N, as illustrated by Brumley and Boneh [1].

A Simple Illustration

Here we give a simple illustration to show how a timing attack works. It is a generalization of the basic idea presented by Dhem et al. in [3].

Let C be a message to decrypt, d be the private key to be recovered, denoted by d₀d₁…d_n where d₀ = 1. The computation to be performed is C^d mod N. Suppose the square and multiply algorithm (Figure 1) is used to compute this modular exponentiation, and the mod(x, N) operation is implemented as in Figure 2.

    mod(x, N)
    if x >= N
       x = x % N
    end if
    return x

Figure 2: Pseudocode for the mod function

Consider the square and multiply algorithm in Figure 1. If d_j = 0 then x = mod(x², N), but if d_j = 1 then two operations, x = mod(x², N) and x = mod(xC, N), occur. In other words, the computation time should be longer when d_j = 1. Also notice in Figure 2 that the modular reduction, “%”, is only executed if the intermediate result of the multiplication is greater than the modulus N. Our attack exploits these two facts and uses two sets of messages, one for which the computation of mod(xC, N) would require a reduction and another for which it would not, to recover any bit d_j.

Assume we can get the system to decrypt the messages of our choice, which is often possible in real systems. We start attacking d₁by choosing two messages Y and Z, where Y³ < N and Z² < N < Z³.

Compare the operations of the square and multiply algorithm (Figure 1) on the two messages Y and Z when j = 1. If d₁ is really 1, the operation x = mod(x · x², N) will be performed. Since Y³ < N, the “%” operation does not occur for message Y. But since Z² < N < Z³, the “%” operation occurs for message Z. On the other hand, if the actual value of d₁ is 0, the operation x = mod(x · x², N) will not be performed and the computations on Y and Z should take about the same time. In other words, the run time of the algorithm will be longer for Z only if d₁ = 1. We can recover the bit d₁ using this fact.

But how much “longer” is a significant difference in run time? We’ll need to rely on statistics here. Instead of using a single message Y, we choose a set of messages Y_i, for i = 0, 1, …, m-1, with Y_i³ < N and let y_i be the time required to compute Y_i^d mod N. Similarly, we choose a set of messages Z_i, for i = 0, 1, …, m-1, with Z_i² < N < Z_i³ and let z_i be the time required to compute Z_i^d mod N.

Compute the average timing y = (y₀ + y₁ + … + y_m_-1) / m, and z = (z₀ + z₁ + … + z_m_-1) / m. If z > y + ε, where ε can be determined empirically, we conclude that d₁ = 1; otherwise d₁ = 0. Once d₁ is known, we can attack d₂ in an analogous process by choosing Y and Z values that satisfy different criteria. The private key d can be recovered bit by bit in this manner.

The beauty of this attack lies in its simplicity. For the attack to work, we do not need to know the details of the implementation of the algorithm, and we do not need to know how to factor large numbers — all we need to know is that the exponentiation is done using square-and-multiply.

Implemented Timing Attacks

Dhem et al. mounted a similar attack against an earlier version of the CASCADE smart card [3]. A smart card is a kind of hardware security token. The fact that it is in the possession of an attacker makes it relatively easy to measure the running time of cryptographic operations and so smart cards are fairly vulnerable to timing attacks (as well as many other side channel attacks). Dhem’s implementation attacked the square rather than the multiply by observing the extra reduction at the square operation instead of the multiply operation. They reported breaking a 512-bit key in a few minutes with 350,000 timing measurements. A 128-bit key could be broken at a rate of 4 bits/sec using 10,000 samples. In cases where there are not enough samples to recover the complete key, the attack can still reveal a part of it. Dhem et al. observed that with only half of the required number of samples, the attack was able to recover ľ of the key [3].

Systems using the Chinese Remainder Theorem (CRT) are not vulnerable to Kocher’s original attack or Dhem’s attack. However, Brumley and Boneh devised a timing attack against OpenSSL, an open source cryptographic library used in web servers and other SSL applications [6], and successfully extracted a factor of the RSA modulus N and therefore, the private key d [1]. Before this attack was publicized, it was generally believed that timing attacks would not work against general purpose servers because timing variations would mask the decryption times of the cryptographic algorithm. Nevertheless, Brumley and Boneh’s attack worked over a local network between machines separated by multiple routers. It was also effective between two processes running on the same machine and between two virtual machines on the same computer. They attacked three different OpenSSL-based RSA decryption applications: a simple RSA decryption oracle, Apache/mod_ssl, and Stunnel. They reported using about a million queries to remotely extract a 1024-bit key from an OpenSSL 0.9.7 server in about two hours [1]. On networks with low variance in latency such as a LAN or corporate/campus network, their attack is feasible [9]. In the case of a remote attack, the attacker must also account for variables other than network latency such as the load on the server. Thus, during periods of low activity, a server may be more vulnerable [9].

These attacks illustrate that cryptosystems that rely on modular exponentiation may be vulnerable to timing attacks. One such example is the Diffie-Hellman key exchange algorithm which establishes a shared secret to be used as a symmetric key. Brumley and Boneh’s attack on the Chinese remainder operation shows that RSA with CRT is not secure against timing attacks. Any applications that perform RSA private key operations may be vulnerable. These applications include SSL/TLS-enabled network services, IPsec, Secure Shell (SSH1, ssh-agent), and TCPA/Palladium [9]. Their attack is also relevant to trusted computing efforts such as Microsoft's Next-Generation Secure Computing Base (NGSCB), which provides sealed storage for secret keys. By asking NGSCB to decrypt data in sealed storage, a user may learn the secret application key [1].

Possible Defenses

There are defenses against these timing attacks. The most widely accepted method is RSA blinding. With RSA blinding, randomness is introduced to the RSA computations to make timing information unusable. Before decrypting the ciphertext C, we first compute X = r^eC mod N, where r is a random value and e is the public exponent. We decrypt X as usual and multiply the output by r⁻¹ mod N to obtain the plaintext. Since a different r is used for each message, blinding prevents an attacker from knowing the input to the exponentiation function and thus timing information should not reveal the key. Blinding incurs a small performance penalty in the range of 2% to 10% [1].

Another possible defense is to make all private key operations not depend on the input. One possible way to accomplish this is to always carry out the extra reduction in the Montgomery algorithm, even though the result may not be used. This modification should be relatively easy to implement, as long as the extra reduction is not optimized away by the compiler, and should not affect the performance by much [3].

One other alternative is to quantize all RSA computations, i.e. make them always a multiple of some predefined time quantum [1]. The major drawback of this approach is that all computations must then take the longest of all computation times, eliminating the possibility of performance optimizations.

Mozilla’s NSS crypto library uses blinding to defend against a timing attack. Most crypto acceleration cards also have implemented defenses against the timing attack [1]. Before the publication of Boneh and Brumley’s paper, OpenSSL 0.9.7 included RSA blinding as an option, though it was generally not turned on. In March 2003, the United States Computer Emergency Readiness Team (CERT) issued a vulnerability note (VU#997481) [9] discussing the attack by Brumley and Boneh, its impact, and prevention. At the same time, the OpenSSL Project issued a security advisory and included a patch to switch blinding on by default.

Lessons Learned

Side channel attacks tell us that even if a cryptographic scheme is mathematically strong, it may not be secure in practice depending on the way it is implemented and on the design of the system as a whole. Cryptography should not be examined in isolation. The design of a secure system should encompass every aspect of the system, including the cryptographic and non-cryptographic aspects [8]. Attackers are like cheaters in a game; they don’t play by the presumed rules and they attack in ways that are sneaky and "unfair". Security engineers should constantly be on the lookout for unusual or unexpected attacks and be ready to re-secure a system when such an attack appears.

Acknowledgements

The author wants to thank Dr. Mark Stamp and Dr. Oliver Shih for their valuable comments and editing efforts.

Biography

Wing H. Wong is a graduate student at San José State University. Her research interests include network security and bioinformatics. She enjoys playing tennis in her spare time.

References

1. D.Brumley and D. Boneh, Remote timing attacks are practical, at crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

2. D.M. Burton, Elementary Number Theory, second edition, Wm. C. Brown Publishers,1989

3. J.F. Dhem, F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater, and J.-L. Willems, A practical implementation of the timing attack, at www.cs.jhu.edu/~fabian/courses/CS600.624/Timing-full.pdf

4. P.Kocher, Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems, at www.cryptography.com/resources/whitepapers/TimingAttacks.pdf

5. S.Levy, The open secret, Wired, issue 7.04, April 1999, at http://www.wired.com/wired/archive/7.04/crypto.html

6. OpenSSL Project. http://www.openssl.org

7. RSA Laboratories. http://www.rsasecurity.com/rsalabs/node.asp?id=2098

8. B.Schneier, Risks of relying on cryptography, Inside Risks 112, Communications of the ACM, vol. 42, no. 10, October 1999, at www.schneier.com/essay-021.html

9. US-CERT Vulnerability Note VU#997481, at http://www.kb.cert.org/vuls/id/997481