Given a captured frame, every pixel is categorized to be
either a skin-color pixel or a non-skin-color pixel. An adaptive
skin color-based method [17] is used to segment the hand
region. According to the generalized statistical skin color
model [14], each pixel is determined to be in the hand region
if the skin color likelihood is larger than a constant threshold.
In order to adapt the skin color model to the illumination
change, a color histogram of the hand region is learned for
each frame and accumulated with the ones from the previous
n frames (n = 5 works well in practice). Then the probability
of skin color is computed by combining the general skin
color model and the adaptively learned histogram.
The histogram is learned only when the hand is in view
and its fingertips are detected. For example, when the hand
is moved out of sight, the histogram keeps the previously
learned skin color model, so that the system can segment correctly
when the hand comes back into the scene.
The segmentation result, as shown in Figure 3, is used for
tracking the main hand region. Since we are talking about
wearable computing scenarios, where a body-mounted camera
sees the hand, which is by necessity within arm’s reach,
we can assume that the majority portion of the skin color segmented
image is the hand region. In order to find the largest
blob, we retrieve the point exhibiting the maximum distance
value from the Distance Transform [5] of the segmentation
image. Among skin-colored regions as shown in Figure 3b, a
single connected component of the hand contour is extracted
using OpenCV’s implementation [13] by checking which re-