1. Partial use of a suite is the norm. Of 16 papers that
used benchmarks from CINT2000, only 4 used all 12
of them. CFP2000 fares even worse, only 2 out of 14
papers simulated all 14 benchmarks.
2. Average speedups are reported as harmonic or arith-
metic means. Rarely is the geometric mean used, and
never is it used for benchmarks that aren’t simulated.
3. The use of CPU95 is still widespread, two years after
its retirement. Out of the 10 papers that used CINT95,
6 solely used CINT95. The use of CFP95 was limited
to 4 papers.
4. Not one micro-architectural paper ran a SPEC bench-
mark, using the reference dataset, to completion (sev-
eral compiler papers have). The variability of options
is as large as the number of papers published.