In theory, it should be possible to periodically re-calibrate the reward values as the application goes through different execution phases. These rewards would need to be re-learned on the fly by the hardware. We are currently investigating this aspect, but in this paper we confine our solution to static rewards learned offline, which still yields good results and simplifies the design of the scheduler (as we will see, the reward structure is just a small table).