In this paper we propose a robust formulation for discrete time dynamic programming (DP). The
objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain Rectangularity" property all the main results for ¯nite and in¯nite horizon DP extend to natural robust counterparts. We identify families of sets of conditional measures for which the computational complexity of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size of the state space. These families of sets are constructed from the con¯dence regions associated with density estimation, and therefore, can be chosen to guarantee any desired level of con¯dence in the robust optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the
performance of robust and non-robust DP on small numerical examples.