Here, e represents the points of the edges, and d denotes the points of the depth. The measurements are calculated separately according to the attributes that serve to determine the weight of the samples. The edge is calculated using a gradient-based mask, and the depth is obtained from a stereo camera. The measurements of the edge for sample X are calculated using the difference between the transformed edge model points and the nearest edge pixel. Similarly, the measurements of the depth point are calculated using the difference between the depth value and the depth point of the model transformed by the sample X. When a robot hands an object to the designated person correctly, the robot should recognize the position of the designated person. To track the face which is detected and recognized by the AdaBoost [20] and PCA [21] methods, a mean-shift algorithm using bilateral filtering [22] is used due to its robustness to illumination changes.