Introduction
Standard Reinforcement Learning (RL) [1] is a widely used
normative framework for modelling conditioning experiments
[2,3]. Different RL systems, mainly Model-Based and Model-Free
systems, have often been combined to better account for a variety
of observations suggesting that multiple valuation processes coexist
in the brain [4–6]. Model-Based systems employ an explicit model
of consequences of actions, making it possible to evaluate
situations by forward inference. Such systems best explain goaldirected
behaviours and rapid adaptation to novel or changing
environments [7–9]. In contrast, Model-Free systems do not rely
on internal models and directly associate values to actions or states
by experience such that higher valued situations are favoured.
Such systems best explain habits and persistent behaviours [9–11].
Of significant interest, learning in Model-Free systems relies on a
computed reinforcement signal, the reward prediction error
(RPE). This signal parallels the observed shift of dopamine
neurons’ response from the time of an initially unexpected reward
– an outcome that is better or worse than expected – to the time of
the conditioned stimulus that precedes it, which, in Pavlovian
conditioning experiments, is fully predictive of the reward [12,13].
However recent work by Flagel et al. [14], raises questions
about the exclusive use of classical RL Model-Free methods to
account for data in Pavlovian conditioning experiments. Using an
autoshaping procedure, a lever-CS was presented for 8 seconds,
followed immediately by delivery of a food pellet into an adjacent
food magazine. With training, some rats (sign-trackers; STs)
learned to rapidly approach and engage the lever-CS. However,
others (goal-trackers; GTs) learned to approach the food magazine