1. Introduction
Controlling a nonlinear dynamic system is complex because
there are stochastic effects, initial condition may differ from its
expected value, system model may be imperfect, and there may
be external disturbances to the dynamic process. In addition there
may be measurement errors and full state measurement may not
be possible. Real time exact optimal control of an actual control
system is not possible, only approximate or suboptimal solutions
are possible.
RL has the potential to address approximate optimal control [1]
of high dimensional control problems. However, to ensure that
the optimization problem is well founded, most of the RL algorithms
place a strong constraint on structure of the environment
by assuming that it operates as an MDP [2]. In our view, assumption
of modeling the environment as an MDP severely limits the
scope of application of RL methods to control. Typically, an MDP
assumes a single agent operating in a stationary environment,
making the framework grossly inadequate for control problems
where the assumption of a stationary environment may not be
valid.
A game based RL approach generalizes MDP to a multi-agent
setting by allowing competing agents. While in certain applications
an MDP setup may be appropriate, game based RL provides
us with an alternative framework for adaptive optimal control of
complex nonlinear systems affected by noise and external disturbances.
An important and significant advantage of viewing the
controller optimization problem as a game is realization of “safe”
controllers, i.e., controller performance is disturber independent.
We look at game based RL formulation as an additional tool in the
control system designers’ toolkit for improving controller performance.
This paper focuses on a class of algorithms that infuse game
theoretic aspects into RL based controller design for improved
performance against disturbances. Key motivation behind these
game theory inspired RL approaches is controller optimization in
the face of worst-case disturbances, i.e., an attempt at designing
what may be called “risk-averse RL controllers”. Section 2 presents
a brief overview of standard reinforcement learning to facilitate
reader understanding of the techniques that are introduced in later
sections. For a detailed and exhaustive treatment of RL, reader is
referred to an excellent survey by Kaebling et al. [3] or the Sutton
book [4].
Section 3 describes how a two player zero sum Markov game
framework fits the controller optimization problem in presence of
noise and external disturbances and the advantages Markov game
formulation offers. Thereafter, we discuss pros and cons of using
function approximation in RL for dealing with large or continuous
state space problems and its ramifications to game theory based
RL. We also give a short discussion on mutiagent RL in partially
observable domains. We conclude the section by describing some
Markov game based applications. Section 4 concludes the paper
with a discussion on open problems confronting game based RL
and outlines future research directions. The paper assumes reader
familiarity with reinforcement learning concepts and terminology
for zeroing on Markov games as an interesting avenue for research
1. IntroductionControlling a nonlinear dynamic system is complex becausethere are stochastic effects, initial condition may differ from itsexpected value, system model may be imperfect, and there maybe external disturbances to the dynamic process. In addition theremay be measurement errors and full state measurement may notbe possible. Real time exact optimal control of an actual controlsystem is not possible, only approximate or suboptimal solutionsare possible.RL has the potential to address approximate optimal control [1]of high dimensional control problems. However, to ensure thatthe optimization problem is well founded, most of the RL algorithmsplace a strong constraint on structure of the environmentby assuming that it operates as an MDP [2]. In our view, assumptionof modeling the environment as an MDP severely limits thescope of application of RL methods to control. Typically, an MDPassumes a single agent operating in a stationary environment,making the framework grossly inadequate for control problemswhere the assumption of a stationary environment may not bevalid.A game based RL approach generalizes MDP to a multi-agentsetting by allowing competing agents. While in certain applicationsan MDP setup may be appropriate, game based RL providesus with an alternative framework for adaptive optimal control ofcomplex nonlinear systems affected by noise and external disturbances.An important and significant advantage of viewing the
controller optimization problem as a game is realization of “safe”
controllers, i.e., controller performance is disturber independent.
We look at game based RL formulation as an additional tool in the
control system designers’ toolkit for improving controller performance.
This paper focuses on a class of algorithms that infuse game
theoretic aspects into RL based controller design for improved
performance against disturbances. Key motivation behind these
game theory inspired RL approaches is controller optimization in
the face of worst-case disturbances, i.e., an attempt at designing
what may be called “risk-averse RL controllers”. Section 2 presents
a brief overview of standard reinforcement learning to facilitate
reader understanding of the techniques that are introduced in later
sections. For a detailed and exhaustive treatment of RL, reader is
referred to an excellent survey by Kaebling et al. [3] or the Sutton
book [4].
Section 3 describes how a two player zero sum Markov game
framework fits the controller optimization problem in presence of
noise and external disturbances and the advantages Markov game
formulation offers. Thereafter, we discuss pros and cons of using
function approximation in RL for dealing with large or continuous
state space problems and its ramifications to game theory based
RL. We also give a short discussion on mutiagent RL in partially
observable domains. We conclude the section by describing some
Markov game based applications. Section 4 concludes the paper
with a discussion on open problems confronting game based RL
and outlines future research directions. The paper assumes reader
familiarity with reinforcement learning concepts and terminology
for zeroing on Markov games as an interesting avenue for research
การแปล กรุณารอสักครู่..