In timed, zero-sum games, the goal is to maximize the probability
of winning, which is not necessarily the same as maximizing
our expected reward. We consider cumulative intermediate
reward to be the difference between our score and
our opponent’s score; the “true” reward of a win, loss, or
tie is determined at the end of a game by applying a threshold
function to the cumulative intermediate reward. We introduce
thresholded-rewards problems to capture this dependency
of the final reward outcome on the cumulative intermediate
reward. Thresholded-rewards problems reflect different
real-world stochastic planning domains, especially zero-sum
games, in which time and score need to be considered. We
investigate the application of thresholded rewards to finitehorizon
Markov Decision Processes (MDPs). In general, the
optimal policy for a thresholded-rewards MDP will be nonstationary,
depending on the number of time steps remaining
and the cumulative intermediate reward. We introduce
an efficient value iteration algorithm that solves thresholdedrewards
MDPs exactly, but with running time quadratic on
the number of states in the MDP and the length of the time
horizon. We investigate a number of heuristic-based techniques
that efficiently find approximate solutions for MDPs
with large state spaces or long time horizons.