Our eyes are more sensitive to small changes in luminance (brightness of color) than small changes in chrominace (color). If we could decompose the image into a set of waveforms, each with a particular spatial frequency, we might be able to separate the image structure the eye can see from the structure that is imperceptible. The DCT can provide a good approximation to this decomposition.
The output of the DCT is the set of 64 basis-signal amplitudes or "DCT coefficients" whose values are uniquely determined by the particular 64-point input signal. The DCT coefficient values can be regarded as the relative amount of the 2D spatial frequencies contained in the 64-point input signal. The coefficient with zero frequency in both dimensions is called the "DC coefficient" and the remaining 63 coefficients are called the "AC coefficients". Because sample values typically vary slowly from point to point across an image, the DCT processing step lays the foundation for achieving data compression by concentrating most of the signal in the lower spatial frequencies. For a typical 8x8 smaple block from a typical source image, most of the spatial frequencies have zero or near-zero amplitude and need not be encoded.
E-mail comments or suggestions about JPEG to khuri@cs.sjsu.edu or joy@mathcs.sjsu.edu