In VAS mode, the MCU switches which endpoint can be seen by the other endpoints by the levels of one's voice. If there are four people in a conference, the only one that will be seen in the conference is the site which is talking; the location with the loudest voice will be seen by the other participants.
Continuous Presence mode, displays multiple participants at the same time. The MP in this mode takes the streams from the different endpoints and puts them all together into a single video image. In this mode, the MCU normally sends the same type of images to all participants. Typically these types of images are called "layouts" and can vary depending on the number of participants in a conference