(Step 1) First, we randomly selected about 1000 concrete
object words from a Japanese dictionary, and collected about
100 images for each object word from the Internet using the
image searching API provided by the Microsoft Bing Search.
(Step 2) We extracted edges from each image, and found
keypoints using the SIFT algorithm implemented in OpenCV.
Then, we selected 100 keypoints from each image, and applied
the bags-of-keypoints technique to the set of images [6]. In
this process, we employed the k-means clustering algorithm to
define 100 visual words. As a result, each image was
represented as a 100-dimensional visual words vector. (Step 3)
Note that our goal was not to recognize the shape of a given
image but to categorize a set of images according into their