In the performance-oriented design of Ipek et al. [17], the immediate reward function is picked solely based on expert intuition. Since the memory throughput (and ultimately execution time) of a memory-bound application tends to correlate strongly with the effective data bus utilization, the authors trivially assign an immediate reward of 1 to a read or write DRAM command, and an immediate reward of 0 to any other DRAM command. Unfortunately, this approach does not easily generalize: In a design that seeks to optimize a more sophisticated function (e.g., Et 2 or weighted speedup), an appropriate immediate reward function is not at all evident.