II BACKGROUND THEORY 9
2.1 Markov Processes 9
2.1.1 Discrete-Time Markov Chain 10
2.1.2 Markov Decision Process 11
2.2 Reinforcement Learning 12
2.2.1 Monte Carlo Method 14
2.2.2 Monte Carlo Estimation of Action Values 15
2.2.3 Monte Carlo Control 16
2.3 On-Policy Monte Carlo Method 17