Abstract
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired
behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the
form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis
that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free
system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned
stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself – a lever – more
and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation.
Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers
does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor
the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for
such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can
account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to
individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be
observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other
behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model
makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We
suggest that further investigation of factored representations in computational neuroscience studies may be useful.