1. IntroductionControlling a nonlin

1. Introduction
Controlling a nonlinear dynamic system is complex because
there are stochastic effects, initial condition may differ from its
expected value, system model may be imperfect, and there may
be external disturbances to the dynamic process. In addition there
may be measurement errors and full state measurement may not
be possible. Real time exact optimal control of an actual control
system is not possible, only approximate or suboptimal solutions
are possible.
RL has the potential to address approximate optimal control [1]
of high dimensional control problems. However, to ensure that
the optimization problem is well founded, most of the RL algorithms
place a strong constraint on structure of the environment
by assuming that it operates as an MDP [2]. In our view, assumption
of modeling the environment as an MDP severely limits the
scope of application of RL methods to control. Typically, an MDP
assumes a single agent operating in a stationary environment,
making the framework grossly inadequate for control problems
where the assumption of a stationary environment may not be
valid.
A game based RL approach generalizes MDP to a multi-agent
setting by allowing competing agents. While in certain applications
an MDP setup may be appropriate, game based RL provides
us with an alternative framework for adaptive optimal control of
complex nonlinear systems affected by noise and external disturbances.
An important and significant advantage of viewing the
controller optimization problem as a game is realization of “safe”
controllers, i.e., controller performance is disturber independent.
We look at game based RL formulation as an additional tool in the
control system designers’ toolkit for improving controller performance.
This paper focuses on a class of algorithms that infuse game
theoretic aspects into RL based controller design for improved
performance against disturbances. Key motivation behind these
game theory inspired RL approaches is controller optimization in
the face of worst-case disturbances, i.e., an attempt at designing
what may be called “risk-averse RL controllers”. Section 2 presents
a brief overview of standard reinforcement learning to facilitate
reader understanding of the techniques that are introduced in later
sections. For a detailed and exhaustive treatment of RL, reader is
referred to an excellent survey by Kaebling et al. [3] or the Sutton
book [4].
Section 3 describes how a two player zero sum Markov game
framework fits the controller optimization problem in presence of
noise and external disturbances and the advantages Markov game
formulation offers. Thereafter, we discuss pros and cons of using
function approximation in RL for dealing with large or continuous
state space problems and its ramifications to game theory based
RL. We also give a short discussion on mutiagent RL in partially
observable domains. We conclude the section by describing some
Markov game based applications. Section 4 concludes the paper
with a discussion on open problems confronting game based RL
and outlines future research directions. The paper assumes reader
familiarity with reinforcement learning concepts and terminology
for zeroing on Markov games as an interesting avenue for research

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

1. IntroductionControlling a nonlinear dynamic system is complex becausethere are stochastic effects, initial condition may differ from itsexpected value, system model may be imperfect, and there maybe external disturbances to the dynamic process. In addition theremay be measurement errors and full state measurement may notbe possible. Real time exact optimal control of an actual controlsystem is not possible, only approximate or suboptimal solutionsare possible.RL has the potential to address approximate optimal control [1]of high dimensional control problems. However, to ensure thatthe optimization problem is well founded, most of the RL algorithmsplace a strong constraint on structure of the environmentby assuming that it operates as an MDP [2]. In our view, assumptionof modeling the environment as an MDP severely limits thescope of application of RL methods to control. Typically, an MDPassumes a single agent operating in a stationary environment,making the framework grossly inadequate for control problemswhere the assumption of a stationary environment may not bevalid.A game based RL approach generalizes MDP to a multi-agentsetting by allowing competing agents. While in certain applicationsan MDP setup may be appropriate, game based RL providesus with an alternative framework for adaptive optimal control ofcomplex nonlinear systems affected by noise and external disturbances.An important and significant advantage of viewing the
controller optimization problem as a game is realization of “safe”
controllers, i.e., controller performance is disturber independent.
We look at game based RL formulation as an additional tool in the
control system designers’ toolkit for improving controller performance.
This paper focuses on a class of algorithms that infuse game
theoretic aspects into RL based controller design for improved
performance against disturbances. Key motivation behind these
game theory inspired RL approaches is controller optimization in
the face of worst-case disturbances, i.e., an attempt at designing
what may be called “risk-averse RL controllers”. Section 2 presents
a brief overview of standard reinforcement learning to facilitate
reader understanding of the techniques that are introduced in later
sections. For a detailed and exhaustive treatment of RL, reader is
referred to an excellent survey by Kaebling et al. [3] or the Sutton
book [4].
Section 3 describes how a two player zero sum Markov game
framework fits the controller optimization problem in presence of
noise and external disturbances and the advantages Markov game
formulation offers. Thereafter, we discuss pros and cons of using
function approximation in RL for dealing with large or continuous
state space problems and its ramifications to game theory based
RL. We also give a short discussion on mutiagent RL in partially
observable domains. We conclude the section by describing some
Markov game based applications. Section 4 concludes the paper
with a discussion on open problems confronting game based RL
and outlines future research directions. The paper assumes reader
familiarity with reinforcement learning concepts and terminology
for zeroing on Markov games as an interesting avenue for research

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

1. บทนำการควบคุมระบบแบบไดนามิกไม่เชิงเส้นที่มีความซับซ้อนเพราะมีผลกระทบสุ่มสภาวะเริ่มต้นอาจแตกต่างจากของมูลค่าที่คาดว่าจะรูปแบบระบบอาจจะไม่สมบูรณ์และอาจจะรบกวนภายนอกกระบวนการพลวัต นอกจากนี้ยังมีอาจจะมีข้อผิดพลาดในการวัดและการวัดเต็มรูปแบบรัฐไม่อาจจะเป็นไปได้ เวลาจริงการควบคุมที่แน่นอนที่ดีที่สุดของการควบคุมที่เกิดขึ้นจริงของระบบเป็นไปไม่ได้เพียงใกล้เคียงกับการแก้ปัญหาหรือก่อให้เกิดผลลัพธ์ที่เป็นไปได้. RL มีศักยภาพที่จะอยู่ประมาณควบคุมที่เหมาะสม [1] ของปัญหาการควบคุมมิติสูง อย่างไรก็ตามเพื่อให้แน่ใจว่าปัญหาการเพิ่มประสิทธิภาพที่มีการก่อตั้งดีที่สุดของอัลกอริทึม RL วางข้อ จำกัด ที่แข็งแกร่งในโครงสร้างของสภาพแวดล้อมโดยสมมติว่ามันทำงานเป็นMDP [2] ในมุมมองของเราสมมติฐานของการสร้างแบบจำลองสภาพแวดล้อมที่เป็น MDP อย่างรุนแรง จำกัด ขอบเขตของการใช้วิธีการในการควบคุม RL โดยปกติ MDP ถือว่าเป็นตัวแทนคนเดียวที่ทำงานอยู่ในสภาพแวดล้อมที่นิ่งทำให้กรอบไม่มีการลดไม่เพียงพอสำหรับปัญหาการควบคุมที่ข้อสันนิษฐานของสภาพแวดล้อมที่นิ่งอาจจะไม่ถูกต้อง. วิธี RL ตามเกม generalizes MDP ไปหลายตัวแทนการตั้งค่าโดยการอนุญาตให้การแข่งขันตัวแทน ในขณะที่การใช้งานบางอย่างการติดตั้ง MDP อาจจะเหมาะสม, เกมตาม RL ให้เราด้วยกรอบทางเลือกสำหรับการควบคุมที่เหมาะสมการปรับตัวของระบบไม่เชิงเส้นที่ซับซ้อนรับผลกระทบจากเสียงรบกวนและการรบกวนจากภายนอก. ประโยชน์ที่สำคัญและมีความสำคัญของการดูปัญหาการเพิ่มประสิทธิภาพการควบคุมเป็นเกมคือสำนึกของ "ปลอดภัย" ควบคุมคือประสิทธิภาพการทำงานที่ควบคุมคือรบกวนอิสระ. เรามองไปที่เกมตามสูตร RL เป็นเครื่องมือเพิ่มเติมในชุดเครื่องมือการออกแบบระบบการควบคุมสำหรับการปรับปรุงประสิทธิภาพการควบคุม. กระดาษนี้จะมุ่งเน้นไปที่การเรียนของอัลกอริทึมที่ใส่เกมทฤษฎีด้านเข้า RL ตามการออกแบบตัวควบคุมสำหรับการปรับปรุงประสิทธิภาพการทำงานกับการรบกวน แรงจูงใจสำคัญที่อยู่เบื้องหลังเหล่านี้ทฤษฎีเกมแรงบันดาลใจวิธี RL คือการเพิ่มประสิทธิภาพการควบคุมในหน้าของการรบกวนที่เลวร้ายที่สุดกรณีคือความพยายามในการออกแบบสิ่งที่อาจจะเรียกว่า"ควบคุม RL ไม่ชอบความเสี่ยง" ส่วนที่ 2 นำเสนอภาพรวมคร่าวๆของการเรียนรู้เสริมมาตรฐานเพื่ออำนวยความสะดวกในการทำความเข้าใจของผู้อ่านของเทคนิคที่นำมาใช้ในภายหลังส่วน สำหรับการรักษารายละเอียดและครบถ้วนสมบูรณ์ของ RL, อ่านจะเรียกว่าการสำรวจที่ดีเยี่ยมโดยKaebling et al, [3] หรือซัตตันหนังสือ[4]. ส่วนที่ 3 อธิบายถึงวิธีการที่สองผู้เล่นเกมผลรวมศูนย์มาร์คอฟกรอบเหมาะกับปัญหาการเพิ่มประสิทธิภาพการควบคุมในการปรากฏตัวของเสียงรบกวนและการรบกวนจากภายนอกและข้อได้เปรียบเกมมาร์คอฟเสนอสูตร หลังจากนั้นเราจะหารือข้อดีและข้อเสียของการใช้การประมาณฟังก์ชั่นใน RL การจัดการกับขนาดใหญ่หรืออย่างต่อเนื่องปัญหาสภาพพื้นที่และมีเครือข่ายในการเล่นเกมตามทฤษฎีRL นอกจากนี้เรายังให้การสนทนาสั้น ๆ เกี่ยวกับ mutiagent RL ในบางส่วนโดเมนที่สังเกตได้ เราสรุปได้ส่วนโดยการอธิบายบางเกมมาร์คอฟการใช้งานตาม ส่วนที่ 4 สรุปกระดาษที่มีการอภิปรายเกี่ยวกับปัญหาการเผชิญหน้ากับเปิดเกมตามRL และแสดงทิศทางการวิจัยในอนาคต กระดาษถือว่าผู้อ่านคุ้นเคยกับแนวคิดการเรียนรู้การเสริมแรงและคำศัพท์สำหรับzeroing ในการเล่นเกมมาร์คอฟเป็นถนนที่น่าสนใจสำหรับการวิจัย

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

1 . การควบคุมระบบไม่เชิงเส้นแบบไดนามิกแนะนำ

จะซับซ้อน เพราะมีผล Stochastic เงื่อนไขเริ่มต้นอาจแตกต่างจาก
คาดว่าค่าแบบจำลองระบบอาจจะไม่สมบูรณ์ และอาจได้รับการรบกวนจากภายนอก
กับกระบวนการแบบไดนามิก นอกจากนี้มี
อาจมีข้อผิดพลาดการวัดและการวัดสถานะอาจ
เป็นไปได้ เวลาจริงการควบคุมที่เหมาะสมที่แน่นอนของ
ควบคุมจริงระบบไม่ได้ เท่านั้น โดยประมาณ หรือ suboptimal โซลูชั่น

RL ที่เป็นไปได้ ที่มีศักยภาพที่จะที่อยู่ควบคุมที่เหมาะสมโดยประมาณ [ 1 ]
ของปัญหาการควบคุมมิติสูง อย่างไรก็ตาม เพื่อให้แน่ใจว่าเหมาะสมเป็นอย่างดีก่อตั้ง
ปัญหาส่วนใหญ่ของอัลกอริทึม
RL ที่แข็งแรงภายใต้โครงสร้างของสิ่งแวดล้อม
โดยสมมติว่ามันทำงานเป็น MDP [ 2 ] ในมุมมองของเรา อัสสัมชัญ
ของการจำลองสภาพแวดล้อมเป็น MDP อย่างรุนแรง จำกัด ขอบเขตของการประยุกต์ใช้วิธีการ
RL เพื่อควบคุม โดยปกติการ MDP
ถือว่าโสดเจ้าหน้าที่ปฏิบัติการในสภาพแวดล้อมที่นิ่ง
ทำกรอบไม่เพียงพอสำหรับการควบคุมปัญหา
ที่สมมติฐานของสภาพแวดล้อมที่คงที่อาจไม่ถูก

เกมที่ถูกต้อง วิธีการเช่นนี้ได้ขยายจาก RL MDP กับแบบจำลองชนิดหลายตัวแทน
การตั้งค่า โดยให้ตัวแทนแข่งขัน ในขณะที่ในการใช้งานบางอย่าง
การ MDP ติดตั้งอาจจะเหมาะสม ตามเกม RL ให้
เรากับกรอบทางเลือกสำหรับการปรับตัวของระบบการควบคุมที่เหมาะสม
ซับซ้อนเส้นได้รับผลกระทบจากเสียง และการรบกวนจากภายนอก ที่สำคัญและที่สำคัญ

เพิ่มประโยชน์ของการดูควบคุมปัญหาเกมรับของ " เซฟ "
ควบคุมฉันเช่น การแสดงชุดควบคุม disturber อิสระ .
เราดูเกมตามสูตร RL เป็นเครื่องมือเพิ่มเติมในระบบการควบคุมเครื่องมือ
นักออกแบบสำหรับการปรับปรุงประสิทธิภาพของตัวควบคุม
กระดาษนี้จะเน้นที่ระดับของขั้นตอนวิธีที่ดองเกม
ด้านทฤษฎีใน RL การออกแบบตัวควบคุมตามการปรับปรุงประสิทธิภาพกับ
สิ่งรบกวน แรงจูงใจหลักที่อยู่เบื้องหลังเหล่านี้
ทฤษฎีเกมแรงบันดาลใจจาก RL แนวทางมีตัวควบคุมที่เหมาะสมใน
ใบหน้าของทิน แปรปรวน เช่น ความพยายามในการออกแบบ
สิ่งที่อาจจะเรียกว่า " เสี่ยง averse RL ตัวควบคุม " ส่วนที่ 2 นำเสนอ
ภาพรวมโดยย่อของมาตรฐานการเรียนรู้แบบเสริมกำลังเพื่อความสะดวก
อ่านความเข้าใจในเทคนิคที่แนะนำในส่วนทีหลัง

สำหรับรายละเอียดและข้อมูลการรักษาของ RL ,ผู้อ่าน
เรียกว่าสำรวจยอดเยี่ยม โดย kaebling et al . [ 3 ] หรือซัตตัน
หนังสือ [ 4 ] .
ส่วน 3 กล่าวถึงวิธีการเล่นสองแบบรวมศูนย์เกม
กรอบเหมาะกับตัวควบคุมที่เหมาะสมปัญหาในการแสดงตนของ
เสียงรบกวนและการรบกวนจากภายนอก และข้อดี แบบมีสูตรเกม

หลังจากนั้น เราจะหารือเกี่ยวกับข้อดีและข้อเสียของการใช้
การประมาณค่าฟังก์ชันใน RL ให้เผชิญกับปัญหาที่มีขนาดใหญ่หรืออย่างต่อเนื่อง
สเตตและเป็นปัญหาในทฤษฎีเกม
RL ตาม เรายังให้การสนทนาสั้น ๆเกี่ยวกับ mutiagent RL ในบางส่วน
ข้อมูลโดเมน เราสรุปส่วนบรรยายบาง
แบบเกมตามการใช้งาน ส่วนที่ 4 สรุปกระดาษ
กับการเปิดเกมจากปัญหาที่เผชิญหน้ากับ RL
และสรุปทิศทางการวิจัยในอนาคต กระดาษถือว่าคุ้นเคยผู้อ่าน
) การเรียนรู้มโนทัศน์ศัพท์
สำหรับ zeroing ในแบบเกมเป็นที่น่าสนใจสำหรับการวิจัย
อเวนิว

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.