DiscussionWe tested several mechani

Discussion
We tested several mechanisms from the current literature on
modelling individual variation in the form of Pavlovian conditioned
responses (ST vs GT) that emerge using a classical
autoshaping procedure, and the role of dopamine in both the
acquisition and expression of these CRs. Benefiting from a rich set
of data, we identified key mechanisms that are sufficient to account
for specific properties of the observed behaviours. The resulting
model relies on two major concepts: Dual learning systems and
factored representations. Figure 4 summarizes the role of each
mechanism in the model.

Conclusion
Here we have presented a model that accounts for variations in
the form of Pavlovian conditioned approach behaviour seen
during autoshaping in rats; that is, the development of a signtracking
vs goal-tracking CR. This works adds to an emerging set
of studies suggesting the presence and collaboration of multiple RL
systems in the brain. It questions the classical paradigm of state
representation and suggests that further investigation of factored
representations in RL models of Pavlovian and instrumental
conditioning experiments may be useful.

Methods
Modelling the autoshaping experiment
In the classical reinforcement learning theory [1], tasks are
usually described as Markov Decision Processes (MDPs). As the
proposed model is based on RL algorithms, we use the MDP
formalism to computationally describe the Pavlovian autoshaping
procedure used in all simulations.
An MDP describes the interactions of an agent with its
environment and the rewards it might receive. An agent being
in a state s can execute an action a which results in a new state s’
and the possible retrieval of some reward r. More precisely, an
agent can be in a finite set of states S, in which it can perform a
finite set of discrete actions A, the consequences of which are
defined by a transition function T : S|A?P(S), where P(S) is
the probability distribution P(s’Ds,a) of reaching state s’ doing
action a in state s. Additionally, the reward function R : S|A?R
is the reward R(s,a) for doing action a in state s. Importantly,
MDPs should theoretically comply with the Markov property: the probability of reaching state s’ should only depend on the last state
s and the last action a. An MDP is defined as episodic if it includes
at least one state which terminates the current episode.
Figure 1 shows the deterministic MDP used to simulate the
autoshaping procedure. Given the variable time schedule (30–
150s) and the net difference observed in behaviours in inter-trial
intervals, we can reasonably assume that each experimental trial
can be simulated with a finite horizon episode.

The agent starts from an empty state (s0) where there is nothing
to do but explore. At some point the lever appears (s1) and the
agent must make a critical choice: It can either go to the lever (s2)
and engage with it (s5), go to the magazine (s4) and engage with it
(s7) or just keep exploring (s3,s6). At some point, the lever is
retracted and food is delivered. If the agent is far from the
magazine (s5,s7), it first needs to get closer. Once close (s7), it
consumes the food. It ends in an empty state (s0) which symbolizes
the start of the inter-trial interval (ITI): no food, no lever and an
empty but still present magazine.
The MDP in Figure 1 is common to all of the simulations and
independent of the reinforcement learning systems we use. STs
should favour the red path, while GTs should favour the shorter
blue path. All of the results rely mainly on the action taken at the
lever appearance (s1), when choosing to go to either the lever, the
magazine, or to explore. Exploring can be understood as not going
to the lever nor to the magazine.
To fit with the requirements of the MDP framework, we
introduce two limitations in our description, which also simplify
our analyses. We assume that engagement is necessarily exclusive
to one or no stimulus, and we make no use of the precise timing of
the procedure – the ITI duration nor the CS duration – in our
simulations.

Conclusion
Here we have presented a model that accounts for variations in
the form of Pavlovian conditioned approach behaviour seen
during autoshaping in rats; that is, the development of a signtracking
vs goal-tracking CR. This works adds to an emerging set
of studies suggesting the presence and collaboration of multiple RL
systems in the brain. It questions the classical paradigm of state
representation and suggests that further investigation of factored
representations in RL models of Pavlovian and instrumental
conditioning experiments may be useful.

Methods
Modelling the autoshaping experiment
In the classical reinforcement learning theory [1], tasks are
usually described as Markov Decision Processes (MDPs). As the
proposed model is based on RL algorithms, we use the MDP
formalism to computationally describe the Pavlovian autoshaping
procedure used in all simulations.
An MDP describes the interactions of an agent with its
environment and the rewards it might receive. An agent being
in a state s can execute an action a which results in a new state s’
and the possible retrieval of some reward r. More precisely, an
agent can be in a finite set of states S, in which it can perform a
finite set of discrete actions A, the consequences of which are
defined by a transition function T : S|A?P(S), where P(S) is
the probability distribution P(s’Ds,a) of reaching state s’ doing
action a in state s. Additionally, the reward function R : S|A?R
is the reward R(s,a) for doing action a in state s. Importantly,
MDPs should theoretically comply with the Markov property: the probability of reaching state s’ should only depend on the last state
s and the last action a. An MDP is defined as episodic if it includes
at least one state which terminates the current episode.
Figure 1 shows the deterministic MDP used to simulate the
autoshaping procedure. Given the variable time schedule (30–
150s) and the net difference observed in behaviours in inter-trial
intervals, we can reasonably assume that each experimental trial
can be simulated with a finite horizon episode.

The agent starts from an empty state (s0) where there is nothing
to do but explore. At some point the lever appears (s1) and the
agent must make a critical choice: It can either go to the lever (s2)
and engage with it (s5), go to the magazine (s4) and engage with it
(s7) or just keep exploring (s3,s6). At some point, the lever is
retracted and food is delivered. If the agent is far from the
magazine (s5,s7), it first needs to get closer. Once close (s7), it
consumes the food. It ends in an empty state (s0) which symbolizes
the start of the inter-trial interval (ITI): no food, no lever and an
empty but still present magazine.
The MDP in Figure 1 is common to all of the simulations and
independent of the reinforcement learning systems we use. STs
should favour the red path, while GTs should favour the shorter
blue path. All of the results rely mainly on the action taken at the
lever appearance (s1), when choosing to go to either the lever, the
magazine, or to explore. Exploring can be understood as not going
to the lever nor to the magazine.
To fit with the requirements of the MDP framework, we
introduce two limitations in our description, which also simplify
our analyses. We assume that engagement is necessarily exclusive
to one or no stimulus, and we make no use of the precise timing of
the procedure – the ITI duration nor the CS duration – in our
simulations.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

สนทนาเราทดสอบกลไกต่าง ๆ จากวรรณกรรมปัจจุบันในแบบจำลองการเปลี่ยนแปลงในแบบฟอร์ม Pavlovian ปรับอากาศแต่ละตอบ (เซนต์เทียบกับ GT) นั้นใช้แบบคลาสสิกกระบวนการ autoshaping และบทบาทของโดพามีนในทั้งสองซื้อและนิพจน์ของเหล่า CRs เกียรติยศจากชุดรวยข้อมูล เราระบุกลไกสำคัญที่เพียงพอต่อการบัญชีสำหรับคุณสมบัติเฉพาะของพฤติกรรมที่สังเกต การส่งผลแบบจำลองอาศัยแนวคิดหลักที่สอง: เรียนรู้ระบบสอง และแยกตัวประกอบนำเสนอ รูปที่ 4 สรุปบทบาทของแต่ละกลไกในรูปแบบบทสรุปที่นี่เราได้นำเสนอแบบจำลองที่บัญชีสำหรับการเปลี่ยนแปลงในแบบฟอร์ม Pavlovian ปรับอากาศเห็นพฤติกรรมวิธีการระหว่าง autoshaping ในหนู นั่นคือ การพัฒนาของ signtrackingเทียบกับ CR ติดตามเป้าหมาย งานนี้เพิ่มชุดการเกิดใหม่ศึกษาแนะนำการแสดงและทำงานร่วมกันของหลาย RLระบบในสมอง มันถามกระบวนทัศน์คลาสสิกของรัฐนำเสนอ และแนะนำให้ ตรวจสอบการแยกตัวประกอบใช้แทนในรุ่น RL Pavlovian และเพลงบรรเลงปรับทดลองอาจเป็นประโยชน์วิธีการแบบจำลองทดลอง autoshapingงานมีในเหล็กเสริมคลาสสิกที่เรียนทฤษฎี [1],มักจะอธิบายว่า กระบวนการตัดสินใจของ Markov (MDPs) เป็นการนำเสนอขึ้นอยู่กับอัลกอริทึม RL เราใช้ MDPformalism computationally อธิบาย Pavlovian autoshapingขั้นตอนที่ใช้ในการจำลองสถานการณ์ทั้งหมดMDP เป็นอธิบายการโต้ตอบของตัวแทนด้วยการสภาพแวดล้อมและรางวัลที่ได้อาจได้รับ ตัวแทนการในสถานะสามารถดำเนินการดำเนินการซึ่งผลลัพธ์ใน s มีสถานะใหม่ 'และเรียกได้ของ r บางรางวัล เพิ่มแม่นยำ การตัวแทนสามารถตั้งค่าจำกัดของอเมริกา S ซึ่งสามารถทำการชุดแยกกันกระทำ A ผลกระทบที่มีจำกัดกำหนด โดยฟังก์ชันการเปลี่ยน T: S| A P (S), P (S) อยู่ที่ไหนการแจกแจงความน่าเป็น P(s'Ds,a) ของการเข้าถึงสถานะ s' ทำนอกจากนี้ s. รัฐในการดำเนินการฟังก์ชันรางวัล R: S| A Rคือ รางวัล R(s,a) ทำการรัฐใน s. ที่สำคัญMDPs ควรให้สอดคล้องกับคุณสมบัติ Markov ครั้งแรกราคา: ความเป็นไปได้ถึงสถานะ s' ควรขึ้นอยู่กับสภาวะสุดท้ายเท่านั้นs และการดำเนินการสุดท้าย MDP เป็นไว้เป็น episodic ถ้ารวมรัฐน้อยซึ่งยุติตอนปัจจุบันรูปที่ 1 แสดง MDP deterministic ที่ใช้ในการจำลองการขั้นตอน autoshaping กำหนดตารางเวลาตัวแปร (30-150s) และผลต่างสุทธิที่สังเกตในอากัปกิริยาในระหว่างทดลองช่วง เราสามารถสมสมมตินั่นละทดลองทดลองสามารถจำลองกับตอนฮอไรซอนจำกัด แทนเริ่มต้นจากสถานะว่าง (s0) ไม่มีอะไรเพื่อ ได้สำรวจ ในบางจุด คันโยกปรากฏ (s1) และตัวแทนต้องทำการเลือกที่สำคัญ: มันสามารถจะไปคาน (s2)และมีส่วนร่วมกับมัน (s5), ไปที่นิตยสาร (s4) และต่อสู้กับมัน(s7) หรือเพียงเก็บสำรวจ (s3, s6) ในบางจุด คันโยกเป็นหด และส่งอาหาร ถ้าตัวแทนไม่ไกลจากนิตยสาร (s5, s7), แรกต้องได้ใกล้ชิด เมื่อปิด (s7), มันใช้อาหาร สิ้นสุดในสถานะว่าง (s0) ซึ่งสัญลักษณ์จุดเริ่มต้นของช่วงระหว่างทดลอง (ตัล): อาหารไม่ ไม่มีคานและนิตยสารว่าง แต่ยังคงมีอยู่MDP ในรูปที่ 1 ใช้ร่วมกับแบบจำลองทั้งหมด และเสริมสร้างระบบการเรียนรู้ขึ้นอยู่กับเราใช้ STsควรโปรดปรานเส้นทางสีแดง ในขณะที่งานจีทีเอสควรโปรดปรานสั้นลงเส้นทางสีน้ำเงิน ทั้งหมดผลใช้หลักในการดำเนินการในการลิ่มลักษณะ (s1), เมื่อต้องไปแบบคันโยกนิตยสาร หรือ ให้บริการ สำรวจสามารถเข้าใจว่าเป็นไปไม่การคันโยก ไม่ ให้วารสารให้พอดีกับความต้องการของกรอบ MDP เราแนะนำข้อจำกัดสองในคำอธิบายของเรา ซึ่งยัง ทำให้ง่ายขึ้นเราวิเคราะห์ เราสมมติว่า ความผูกพันเป็นพิเศษจำเป็นต้องกระตุ้นหนึ่ง หรือไม่ และเราต้องไม่ใช้การกำหนดเวลาที่ชัดเจนของตอน –ตัลระยะเวลาหรือระยะเวลา CS – ในของเราจำลอง

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

คำอธิบายเราได้ทดสอบกลไกจากหลายวรรณกรรมในปัจจุบันเกี่ยวกับการสร้างแบบจำลองการเปลี่ยนแปลงของแต่ละบุคคลในรูปแบบของเครื่องปรับอากาศPavlovian การตอบสนอง (ST เทียบ GT) ที่เกิดการใช้คลาสสิกขั้นตอนautoshaping และบทบาทของโดพามีนทั้งในการเข้าซื้อกิจการและการแสดงออกของCRs เหล่านี้ ได้รับประโยชน์จากชุดสมบูรณ์ของข้อมูลเราระบุกลไกสำคัญที่มีความเพียงพอที่จะบัญชีสำหรับคุณสมบัติเฉพาะพฤติกรรมที่สังเกต ส่งผลให้รูปแบบขึ้นอยู่กับสองแนวคิดหลักคือระบบการเรียนรู้แบบ Dual และเอาเรื่องการแสดง รูปที่ 4 สรุปบทบาทของแต่ละกลไกในรูปแบบ. สรุปที่นี่เราได้นำเสนอรูปแบบที่บัญชีสำหรับการเปลี่ยนแปลงในรูปแบบของพฤติกรรมวิธีPavlovian ปรับอากาศที่เห็นในช่วงautoshaping ในหนู; นั่นคือการพัฒนาของ signtracking เทียบกับเป้าหมายการติดตาม CR งานนี้จะเพิ่มชุดที่เกิดขึ้นใหม่จากการศึกษาชี้ให้เห็นการแสดงตนและการทำงานร่วมกันของหลาย RL ระบบในสมอง มันถามกระบวนทัศน์คลาสสิกของรัฐการแสดงและแสดงให้เห็นว่าการสอบสวนต่อไปของเอาเรื่องการแสดงในรูปแบบชีวิตของPavlovian และเครื่องมือทดลองเครื่องอาจจะมีประโยชน์. วิธีการสร้างแบบจำลองการทดลอง autoshaping ในการเสริมแรงคลาสสิกทฤษฎีการเรียนรู้ [1] งานจะมักจะอธิบายว่ามาร์คอฟกระบวนการตัดสินใจ (MDPs) ในฐานะที่เป็นการนำเสนอรูปแบบขึ้นอยู่กับขั้นตอนวิธีการ RL เราจะใช้ MDP พิธีคอมพิวเตอร์เพื่ออธิบาย Pavlovian autoshaping ขั้นตอนที่ใช้ในการจำลองทั้งหมด. MDP อธิบายปฏิสัมพันธ์ของตัวแทนด้วยสภาพแวดล้อมและผลตอบแทนที่อาจได้รับ ตัวแทนเป็นในของรัฐสามารถดำเนินการกระทำที่ส่งผลให้รัฐใหม่ s 'และเป็นไปได้ของการดึงอารางวัลบาง อีกอย่างแม่นยำเป็นตัวแทนสามารถอยู่ในขอบเขตของรัฐที่ S, ในการที่จะสามารถดำเนินการขอบเขตของการดำเนินการที่ไม่ต่อเนื่องA, ผลกระทบของการที่มีการกำหนดโดยฟังก์ชั่นการเปลี่ยนแปลงT: S | A P (S) ที่ P (S) คือความน่าจะเป็นการกระจายพี(s'Ds เป็น) ในการเข้าถึงรัฐ s 'ทำดำเนินการในs รัฐ นอกจากนี้ฟังก์ชั่นได้รับรางวัล R หรือไม่: S | A R เป็นรางวัล R (S, A) สำหรับการทำในการดำเนินการของรัฐ ที่สำคัญMDPs ในทางทฤษฎีควรสอดคล้องกับคุณสมบัติของมาร์คอฟ: น่าจะเป็นของรัฐถึง s 'ควรขึ้นอยู่กับรัฐที่ผ่านมาและการดำเนินการล่าสุด MDP มีการกำหนดเป็นหลักการถ้ามีอย่างน้อยหนึ่งรัฐยุติเหตุการณ์ปัจจุบัน. รูปที่ 1 แสดงให้เห็น MDP กำหนดใช้เพื่อจำลองขั้นตอนautoshaping ที่กำหนดตารางเวลาตัวแปร (30 150s) และความแตกต่างสุทธิสังเกตพฤติกรรมในระหว่างการพิจารณาคดีช่วงเวลาที่เรามีเหตุผลที่สามารถสรุปได้ว่าแต่ละทดลองสามารถจำลองกับขอบฟ้าตอนที่แน่นอน. ตัวแทนเริ่มจากรัฐว่างเปล่า (S0) ที่มีอะไรที่จะทำแต่สำรวจ ในบางจุดที่คันโยกปรากฏ (s1) และตัวแทนจะต้องทำให้เป็นทางเลือกที่สำคัญ: มันสามารถไปที่คันโยก (s2) และมีส่วนร่วมกับมัน (s5) ไปที่นิตยสาร (s4) และมีส่วนร่วมกับมัน(S7) หรือเพียงแค่ให้การสำรวจ (s3, S6) ในบางจุดที่คันโยกที่มีการหดและอาหารจะถูกส่ง หากตัวแทนอยู่ไกลจากนิตยสาร (s5, s7) มันเป็นครั้งแรกความต้องการที่จะได้ใกล้ชิด เมื่อปิด (S7) มันกินอาหาร มันจบลงในสภาวะที่ว่างเปล่า (s0) ซึ่งเป็นสัญลักษณ์ของการเริ่มต้นของช่วงเวลาระหว่างการพิจารณาคดี(ITI): ไม่มีอาหารคันไม่มีและ. นิตยสารที่ว่างเปล่า แต่ปัจจุบันยังคงMDP ในรูปที่ 1 เป็นเรื่องธรรมดาที่ทุกจำลองและเป็นอิสระเสริมระบบการเรียนรู้ที่เราใช้ STs ควรสนับสนุนเส้นทางสีแดงในขณะที่ GTS ควรสนับสนุนสั้นเส้นทางสีฟ้า ทั้งหมดของผลที่อาศัยส่วนใหญ่ในการดำเนินการในลักษณะคัน (s1) เมื่อเลือกที่จะไปทั้งคันโยกที่นิตยสารหรือการสำรวจ การสำรวจสามารถเข้าใจได้เป็นไปไม่ได้ที่จะคันหรือนิตยสาร. เพื่อให้พอดีกับความต้องการของกรอบ MDP เราแนะนำสองข้อจำกัด ในคำอธิบายของเราซึ่งยังลดความซับซ้อนของการวิเคราะห์ของเรา เราคิดว่าการมีส่วนร่วมคือจำเป็นต้องพิเศษหนึ่งหรือกระตุ้นไม่มีและเราจะทำให้ใช้ไม่ได้ในระยะเวลาที่ถูกต้องของขั้นตอน- ระยะเวลา ITI หรือระยะเวลา CS - ของเราจำลอง

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.