This paper is concerned with sequential decision making in uncertain environments. Decisions are made in stages and each decision, in addition to providing an immediate reward, changes the context of future decisions; thereby affecting the future rewards. Due to the uncertain nature of the environment, there is limited information about both the immediate reward from each decision and the resulting future state. In order to achieve a good performance over all the stages the decision maker has to trade-off the immediate payoff with future payoffs. Dynamic programming (DP) is the mathematical framework that allows the decision maker to efficiently compute a good overall strategy by succinctly encoding the evolving information state. In the DP formalism the uncertainty in the environment is modeled by a Markov process whose transition probability depends both on the information state and the action taken by the decision maker. It is assumed that the transition probability corresponding to each state-action pair is known to the decision maker, and the goal is to choose a policy, i.e. a rule that maps states to actions, that maximizes some performance measure. Puterman (1994) provides a excellent introduction to the DP formalism and its various applications. In this paper, we assume that the reader has some prior knowledge of DP.