OIn direct motion segmentation, the spatial-temporal derivatives of the image intensities help us solve the segmentation problem. In the case of pixels that obey a single 2-D translational motion, these derivatives constitute a linear subspace. In the case of pixels that obey a sing 2-D affine motion, these derivatives along with the pixel co-ordinates constitute a bilinear subspace. When the scene contains only translational motions or only affine motions, the segmentation problem reduces to one of separating linear subspaces or bilinear subspaces respectively. However, when the scene contains both kind of motions, the problem becomes one of separating linear and bilinear subspaces.
Most of the existing algorithms perform 2-D motion segmentation by assuming that the motions in the scene can be modeled by 2-D affine motion models. In theory, this is valid because the simpler 2-D translational motion model can be represented by the more general affine motion model. However, such approximations in modeling can have negative consequences. Firstly, the translational model has 2 parameters and the affine model has 6 parameters. Therefore, for each translational model in the scene, the approximation in modeling leads to the estimation of 4 extra parameters. Moreover, since the data might not be rich enough to estimate the affine motion model, the parameter estimation might be erroneous. In our research, we address the problem of segmenting scenes that contain translational motion models as well as affine motion models without making any approximations in modeling the motions of the scene.
Our method is algebraic in nature and proceeds by fitting a polynomial to the image data such that it is satisfied by all the measurements irrespective of their true grouping. The polynomial helps characterize the vanishing set of the union of the individual linear and bilinear subspaces. Since this polynomial is representative of the individual motion models, it helps us define a Multibody Brightness Constancy Constraint (MBCC). The MBCC is essentially a homogeneous polynomial in the image derivatives and pixel co-ordinates, and forms the backbone of our proposed solution. A 3x3 sub-matrix of the MBCC's hessian when evaluated at the image measurements corresponding to a pixel, helps us identify the type of motion model at the pixel. This identification essentially involves a rank check of 1 vs 3 for checking if the type of motion model is 2-D translational or 2-D affine. Once the type of motion models are identified, the algorithm uses the first order derivatives of the MBCC and their cross products to estimate the individual motion models. After estimating the motion models, we perform segmentation by assigning to each pixel the motion model that explains its motion the best. While our method has the novelty of being the first known algebraic approach to account for the different kinds of models, it also has an important theoretical contribution shows that algebraic methods of similar nature as ours would fail if an translational model is approximated by an affine motion model.
The following is a typical example of demonstrating the power of our algorithm. The scene contains one 2-D translational motion(background patch in brown) and one 2-D affine motion model(foreground patch in blue). Fitting two 2-D translational motion models obviously gives poor segmentation results since the scene is not explained comprehensively. Similarly, fitting two affine motion models gives rise to a degeneracy and the algebraic segmentation algorithm breaks down. However, when we fit one 2-D translational model and one 2-D affine model, the correct segmentation results are obtained. The results show attempts to segment the background and the pixels that do not belong to the group are marked in black.