The images that were used for training machine leaner
collected by using Google Image Search. The image collection
consisted of two separate accumulation phases: collecting
images that represented CDs; and collecting non-CD
images and images that represented similar diagrams.
To search for CDs the phrase “UML Class diagram” was
used. Various types of diagram such as blueprint, sequence
diagram, chart, flow chart, E/R model, and architectural
diagram were found by their corresponded phrases.
It was verified that no duplicates are in the set. The endresult
was a collection of 650 UML CDs and 650 non-UML
diagrams (1300 images in total). The non-UML images include
60 sequence diagrams, 34 use-cases, 61 ER diagrams,
80 architectural diagrams and 155 charts. Our dataset together
with the results that are presented later in the paper can be
found online via: http://bitly.com/dtsUMLClassifier