generally utilized small (8 or 16 bit), simple yet highly optimized architectures and achieved energy efficiencies of a few picojoules per cycle by operating at or below the transistor threshold (VT) voltage. Successive designs improved the processing power by utilizing hardware accelerators, either crafted for specific applications, or targetting basic algorithms utilized by a wide range of applications [4], [5]. Evolving applications are motivating the use of larger architectures, though, with the advantages of larger address spaces, greater data precision, and better operating system support [6], [7].