The birthday problem

How many people must be in a room with me before the probability is greater than 1/2 that at least one person in the room has the same birthday as me?

Suppose there are N people in the room. It is easier to compute that they all have a birthday different from mine, and subtract the result from 1. The result is that we are looking for the value of N for which

      1/2 = 1 - (364/365)N.

Solving this for N, we find that if 253 or more people are in the room, then the probability is greater than 1/2 that at least one will have the same birthday as me.

A related question is the following: How many people must be in a room before the probability is greater than 1/2 that at least two have the same birthday? The exact answer is given by the value of N for which

      1/2 = 1 - 365/365 ⋅ 364/365 ⋅⋅⋅ (365-N+1)/365

and the answer is that as long as there are at least 23 people in the room, the probability is greater than 1/2 that at least two will have the same birthday. While this might seem counterintuitive, a few minutes reflection indicates that the answer should be near √365, since each pair is a possible match.

The second problem above is generally known as the birthday problem.

What does any of this have to do with cryptography? One requirement of a hash function is the following: Given a particular message and its hash value, it should be difficult to find another message that gives the same hash value. This corresponds to the first problem, above.

Another more stringent requirement of a hash algorithm is that it should be difficult to find any "collision", i.e., it should be difficult to find two messages that produce the same hash. Of course, such collisions must exist, but the birthday problem gives us an upper bound on the difficulty of finding a collision. More precisely, if a hash algorithm produces an n-bit hash value, then the birthday problem assures us that if we generate 2n/2 hashes, then the probability is greater than 1/2 that at least two hash values will be the same.

So, while an 80-bit symmetric key (in conjunction with a strong crypto-algorithm) is secure, an 80-bit hash only requires 40 bits of work to find a collision.