Ipek et al.’s scheduler uses a simple ad hoc reward structure, which assigns a reward of 1 for reads and writes (immediately ‘productive’ actions), and 0 otherwise. To allow the controller to use the most appropriate set of state attributes for our experimental setup (different from theirs), we re-run their proposed linear feature selection [17], using their six winning attributes, plus another 44 relevant ones that we come up with. By using their original attributes as part of the set, we make sure the resulting scheduler is at least as good as the original one. In our experiments, we call this configuration Ipek.