3 A GENERAL FRAMEWORK FOR SELF-OPTIMIZING MEMORY SCHEDULERS
In this section we describe how to generalize Ipek et al.’s original RL-based memory scheduler design [17] to obtain high-quality schedulers that can target arbitrary objective functions-not just performance. We first present our design approach, and then describe a practical Implementation.
3.1 Design
We now determine the three main characteristics of our RL-based design: actions, state attributes, and reward structure.
---------------------------
1 Not all pending requests will have actions available at any point
in time: For example, if a row has not yet been activated, a read to that row is not an available action. Even among available actions, only a subset may be evaluated in order to reach a decision every DRAM cycle.
---------------------------