Efficiently utilizing off-chip DRAM bandwidth is a critical issue
in designing cost-effective, high-performance chip multiprocessors
(CMPs). Conventional memory controllers deliver relatively
low performance in part because they often employ fixed,
rigid access scheduling policies designed for average-case application
behavior. As a result, they cannot learn and optimize
the long-term performance impact of their scheduling decisions,
and cannot adapt their scheduling policies to dynamic workload
behavior.
We propose a new, self-optimizing memory controller design
that operates using the principles of reinforcement learning (RL)
to overcome these limitations. Our RL-based memory controller
observes the system state and estimates the long-term performance
impact of each action it can take. In this way, the controller
learns to optimize its scheduling policy on the fly to maximize
long-term performance. Our results show that an RL-based
memory controller improves the performance of a set of parallel
applications run on a 4-core CMP by 19% on average (up
to 33%), and it improves DRAM bandwidth utilization by 22%
compared to a state-of-the-art controller.