On the other hand, other aspects of this reward assignment are not so obvious. For example, the specific ratios among the reward values for the different actions are non-intuitive. It is intriguing that, despite the fact that the objective function is straight performance, a PwDn-PwUp action sequence yields a slightly positive aggregate reward (-0.27+0.3), eventhough powering up a dormant rank incurs a penalty of 4-13 cycles. Note also that NoOp (which competes with PwUp when a rank is powered down) is assigned a definitely positive reward, even though keeping a rank powered down does not directly benefit performance. We will revisit this later.