If the actual computational process in a decision-maker is
similar to that employed in the standard Q-learning model,
i.e., the value of the unchosen option remains unchanged, better
predictions could be achieved by constructing a regressor of the
regression with different clocks for each option. Specifically, such
a model should include the variables that represents reward or choice n trials back in trials in which that option was chosen, rather
than in actual trials (as in the method discussed in this paper).
However, for more general cases (αF
̸= αL, αF
̸= 0), mapping the
RL model to the regression model is not straightforward.