This work builds upon Ipek et al.’s RL-based framework. We propose MORSE, a systematic and general mechanism to designing self-optimizing DRAM schedulers that can target arbitrary figures of merit. We employ genetic algorithms to automatically calibrate the relative importance that the scheduler places on the different DRAM actions for a given environment and objective function (Section 3.1.2). We also employ a multi-factor variation of feature selection that takes into account first-order interactions among system attributes, which are used by the scheduler to sense the system’s state at each point in time (Section 3.1.3). Importantly, the resulting hardware need not directly observe the objective function on the field: only during training at design time (using simulation models) does our framework require the objective function to be observable. This allows our framework to target relatively sophisticated figures of merit that would be generally hard to measure on the field (e.g.,weighted speedup). Still, the hardware can be made to allow for on-the-field reconfiguration (Section 4).