Finally, the astute reader will notice that there is a “circular dependence” between automatic feature selection and automatic reward structure derivation: both search a space
of completely specified memory scheduler designs. What we do in our paper is to impose a basic ad hoc reward structure during feature selection (Read = Write = 1, rest = 0), but still use the appropriate objective function when evaluating candidate state attributes, and then use the resulting state attributes in the computation of the true reward structure. One could conceive iterating over these two steps to potentially refine the outcome, however for simplicity we do not explore this in this work.