We have presented a system for automatic facade detection,
segmentation, and parameter estimation in the domain of
stereo-equipped mobile platforms. We have introduced a discriminative
model that leverages both appearance and disparity
features for improved classification accuracy. From the disparity
map, we generate a set of candidate planes using RANSAC
with a planar model that also incorporates local PCA estimates
of plane normals. We combine these in a two-layer Markov
Random Field model that allows for inference on the binary
(building/background) labeling at the mid-level, and for segmentation
of the identified building pixels into individual planar
surfaces corresponding to the candidate planemodels determined
by RANSAC.
Our BMA+D discriminativemodel provides superior performance
to other classifiers using only appearance features, and
our mid-level MRF labeling has proven to further improve the
accuracy of the classification to approximately 80%. We were
able to identify 84% of the building facades in our dataset, with
an average angular error of 24◦ fromthe ground truth. However,
the distribution of errors peaks in frequency below 10◦, indicating
that a large percentage of the labels provide very accurate
estimates for the ground truth, although some of the labels produced
by our method have very high error. Further analysis
shows that these high-error labelings most often occur on small
segmented regions. Thus our method produces accurate plane
estimates for at least the major facades in the image.
A further approach that may enhance these results is strict
enforcement of a verticality constraint on the candidate plane
models. Extraction of the ground plane would enable us to
leverage the assumption that building facades, in general, are
perpendicular to the ground plane. Using only locally vertical
candidate plane models is an avenue of future work in this area.
Another avenue for future investigation is the integration of the
distance-based uncertainty of each point in disparity space into
the RANSAC models in order to encourage plane fitting to the
more accurate points close to the camera. We also intend to
pursue other methods for either improving the quality of the input
data (e.g. multiview stereo) or improving the methods of
compensating for difficult disparity maps.