In the realm of highly parallel code, and even to some extent for moderately parallel code,
advanced compilers have created more instruction-level parallelism. But for slightly parallel
code, most advances in optimizing compilers [1] have actually reduced the amount of
instruction-level parallelism. For example, common subexpression elimination, code motion of
loop invariants, induction variable elimination, and elimination of redundant loads and stores all
reduce redundant computation. And computing something redundantly (e.g., twice in a basic
block) clearly provides an increase in instruction-level parallelism! In our experience for slightly
parallel code, only tree height reduction and reduction in strength provide added instruction-level
parallelism.