LZ Help Menu

LZ Dictionary Algorithms


LZ77

The encoder of LZ77 examines the input sequence through a sliding window. The window consists of two parts, a search buffer that contains a portion of the recently encoded sequence, and a look-ahead buffer that contains the next portion of the sequence to be encoded.

To encode the sequence in the look-ahead buffer, the encoder moves a search pointer back through the search buffer until it encounters a match to the first symbol in the look-ahead buffer. The distance of the pointer from the look-ahead buffer is called the offset. The encoder then examines the symbols following the symbol at the pointer location to see if they match consecutive symbols in the look-ahead buffer. The number of consecutive symbols in the search buffer that match consecutive symbols in the look-ahead buffer, starting with the first symbol, is called the length of the match. The encoder searches the search buffer for the longest match. Once the longest match has been found, the encoder encodes it with a triple , where o is the offset, l is the length of the match, and c is the codeword corresponding to the symbol in the look-ahead buffer that follows the match.

LZ78

The LZ77 approach implicitly assumes that like patterns will occur close together. It makes use of this structure by using the recent past of the sequence as the dictionary for encoding. The LZ78 algorithm drops the reliance on the search buffer and keeping an explicit dictionary. The dictionary has to be built at both the encoder and decoder, and care must be taken that the dictionaries are built in an identical manner. The inputs are coded as a double (i, c), with i being an index corresponding to the dictionary entry that was the longest match to the input, and c being the code for the character in the input following the matched portion of the input. As in the case of LZ77, the index value of 0 is used in the case of no match. This double then becomes the newest entry in the dictionary. Thus, each new entry into the dictionary is one new symbol concatenated with an existing dictionary entry.

LZW

LZW removes the necessity of encoding the second element of the pair (i, c). That is, the encoder would only send the index to the dictionary. In order to do this; the dictionary has to be primed with all the letters of the source alphabet. The input to the encoder is accumulated in a pattern p as long as p is contained in the dictionary. If the addition of another letter a results in a patter p*a (* denotes concatenation) that is not in the dictionary, then the index of p is transmitted to the receiver, the pattern p*a is added to the dictionary, and we start another pattern with the letter a.


Send comments and suggestions about LZ to khuri@mathcs.sjsu.edu or hsu0832@sundance.sjsu.edu