exhibit poor memory locality, only a modest throughput
speedup is possible by reducing compute time. As a result,
conventional single-thread processors which are optimized for
Instruction-Level-Parallelism have low utilization and wasted
power. Having many threads makes it easier to find something
useful to execute every cycle. As a result, processor utilization
is higher and significant throughput speedups are achievable.