In this paper we propose a robust formulation for discrete time dynamic programming (DP). The
objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy
to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a
set of conditional measures with each state-action pair. Consequently, in the robust formulation each
policy has a set of measures associated with it. We prove that when this set of measures has a certain
“Rectangularity” property all the main results for finite and infinite horizon DP extend to natural robust
counterparts. We identify families of sets of conditional measures for which the computational complexity
of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size
of the state space. These families of sets are constructed from the confidence regions associated with
density estimation, and therefore, can be chosen to guarantee any desired level of confidence in the robust
optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the
performance of robust and non-robust DP on small numerical examples