Given the lack of a hardware implementation of the architecture, or
at least a cycle-accurate simulator, no real performance evaluation is
possible. However, we can get very rough estimates by collecting traces
and weighting instructions with their known or estimated latencies. We
use the estimated latencies shown in Table 1, which are appropriate for
an ARM A9 core and a typical memory system.