One family of visual processes that has relevance for various applications of computer vision is that of, what could be loosely described as, visual processes composed of ensembles of particles subject to stochastic motion. The particles can be microscopic, e.g plumes of smoke, macroscopic, e.g. leaves and vegetation blowing in the wind, or even objects, e.g. a human crowd, a flock of birds, a traffic jam, or a beehive. The applications range from remote monitoring for the prevention of natural disasters, e.g. forest fires, to background subtraction in challenging environments, e.g. outdoors scenes with vegetation, and various type of surveillance, e.g. traffic monitoring, homeland security applications, or scientific studies of animal behaviour.
Despite their practical significance, and the ease with which they are perceived by biological vision systems, the visual processes in this family still pose tremendous challenges for computer vision. In particular, the stochastic nature of the associated motion fields tends to be highly challenging for traditional motion representations such as optical flow, which requires some degree of motion smoothness, parametric motion models, which assume a piece-wise planar world, or object tracking, which tends to be impractical when the number of subjects to track is large and these objects interact in a complex manner.
The main limitation of all these representations is that they are inherently local, aiming to achieve understanding of the whole by modeling the motion of the individual particles. This is contrary to how these visual processes are perceived by biological vision: smoke is usually perceived as a whole, a tree is normally perceived as a single object, and the detection of traffic jams rarely requires tracking individual vehicles. Recently, there has been an effort to advance towards this type of holistic modeling, by viewing video sequences derived from these processes as dynamic textures or, more precisely, samples from stochastic processes defined over space and time
In this project, we extend the simple dynamic texture model to a mixture of dynamic textures, where the observed video is an instance of one of several possible dynamic texture models. With this framework we are able perform motion segmentation of a video sequence, along with clustering on a set of video sequences. Experimental results show that the model is capable of achieving segmentation and clustering that are perceptually plausible.