2. The record-breaking ensemble model has been trained with almost 10 times less images than the previous state-of-theart model [11]. Moreover, a single CNN I performs as good as the model by Jia and Cristianini [11] with almost 20 times less training images. This result is of a particular importance, given the cost and complexity of collecting large image datasets.