LZ Dictionary Algorithms
The encoder of LZ77 examines the input sequence through a sliding window. The window consists of two parts, a search buffer that contains a portion of the recently encoded sequence, and a look-ahead buffer that contains the next portion of the sequence to be encoded.
To encode the sequence in the look-ahead buffer, the encoder moves
a search pointer back through the search buffer until it encounters
a match to the first symbol in the look-ahead buffer. The distance
of the pointer from the look-ahead buffer is called the offset.
The encoder then examines the symbols following the symbol at
the pointer location to see if they match consecutive symbols in the
look-ahead buffer. The number of consecutive symbols in the search
buffer that match consecutive symbols in the look-ahead buffer, starting
with the first symbol, is called the length of the match. The encoder
searches the search buffer for the longest match. Once the longest
match has been found, the encoder encodes it with a triple
The LZ77 approach implicitly assumes that like patterns will occur
close together. It makes use of this structure by using the recent
past of the sequence as the dictionary for encoding. The LZ78 algorithm
drops the reliance on the search buffer and keeping an explicit
dictionary. The dictionary has to be built at both the encoder
and decoder, and care must be taken that the dictionaries are built
in an identical manner. The inputs are coded as a double (i, c),
with i being an index corresponding to the dictionary entry that
was the longest match to the input, and c being the code for the
character in the input following the matched portion of the input.
As in the case of LZ77, the index value of 0 is used in the case
of no match. This double then becomes the newest entry in the
dictionary. Thus, each new entry into the dictionary is one new
symbol concatenated with an existing dictionary entry.
LZW removes the necessity of encoding the second element of
the pair (i, c). That is, the encoder would only send the index
to the dictionary. In order to do this; the dictionary has to
be primed with all the letters of the source alphabet. The input
to the encoder is accumulated in a pattern p as long as p is
contained in the dictionary. If the addition of another letter a results
in a patter p*a (* denotes concatenation) that is not in the
dictionary, then the index of p is transmitted to the receiver,
the pattern p*a is added to the dictionary, and we start another
pattern with the letter a.
Send comments and suggestions about LZ
to khuri@mathcs.sjsu.edu or hsu0832@sundance.sjsu.edu
LZ78
LZW