Results
We model the task as a simple Markov Decision Process (MDP)
with different paths that parallel the diverse observed behaviours
ranging from sign-tracking – engaging with the lever as soon as it
appears – to goal-tracking – engaging with the magazine as soon
as the lever-CS appears – (see Figure 1).
The computational model (see Figure 2) consists of two learning
systems, employing distinct mechanisms to learn the same task: (1)
a Model-Based system which learns the structure of the task from
which it infers its values; (2) a Feature-Model-Free system where values
for the relevant stimuli (lever-CS and the food magazine) are directly
learned by trial and error using RPEs. The respective values of each
system are then weighted by an v parameter before being used in a
classical softmax action-selection mechanism (see Methods).
An important feature of the model is that varying the systems
weighting parameter v (while sharing the other parameter values
of the model across subgroups) is sufficient to qualitatively
reproduce the characteristics of the different subgroups of rats
observed experimentally during these studies.