The most common system for the compression of video is MPEG. It works like this. The single data stream off the CD-ROM is split into video and audio components, which are then decompressed using separate algorithms.
The video is processed to produce individual frames as follows. Imagine sequence of frames depicting a bouncing ball on a plain background. The very first is called an Intra Frame (I-frame). I-frames are compressed using only information in the picture itself just like conventional bitmap compression techniques like JPEG.
Following I-frames will be one or more predicted frames (P-frames).
The difference between the P-frame and I-frame it is based on is the only data that is stored for this P-frame. For example, in the case of a bouncing ball, the P picture is stored simply as a description of how the position of the ball has changed from the previous I-frame. This takes up a faction of the space that would be used if you stored the P-frame as a picture in its own right. Shape or color changes are also stored in the P-frame. The next P-frame may also be based on this P-frame and so on. Storing differences between the frames gives the massive reduction in the amount of information needed to reproduce the sequence. Only a few P-frames are allowed before a new I-frame is introduced into the sequence as anew reference point, since a small margin of error creeps in with each P-frame.
Between I and P-frames are bi-directional frames(B-frames), based on the nearest I or P-frames both before and after them. In our bouncing ball example, in a B-frame the picture is stored as the difference between the previous I or P-frame and the B-frame and as the difference between the B-frame and the following I or P-frame. To recreate the B-frame when playing back the sequence, the MPEG algorithm uses a combination of two references. There may be a number of B-frames between I or P-frames. No other frame is ever based on a B-frame so they don’t propagate errors like P-frames.
Typically , you will have two or three Bs between Is or Ps, and perhaps three to five P-frames between Is.