3.2 Learning
Although the above models extend the scope of traditional economic models of managerial decision-making, they are agnostic on the process of how markets might reach equilibrium. Likewise, empirical applications of these models also assume that equilibrium has either already been reached, or if there are multiple equilibria, equilibrium selection has already taken place. Data from laboratory experiments, however, show a more nuanced picture. In most experiments, at least for the first fewThere are several directions for future research. A possible next step in this line of research would be to extend this result to other coordination games. For example, the provision point mechanism is a simple public goods mechanism but with multiple equilibria. To select the most efficient equilibrium, one could use the same induced identity method used in this study. Chen and Chen (2011) model predicts that successful coordination to higher levels of public goods can be achieved systematically even with a weak method of increasing other-regarding preferences. Another direction is to evaluate the effects of identity-based teams outside the lab through field studies in fundraising or online communities.
3.2.2 Biological basis of learning
In numerous laboratory experiments, the general finding is that, for nontrivial games, players gradually reach equilibrium over time through some process of adaptation, typically referred to as learning (Camerer 2003, Chapter 6). A number of models of learning in games have been proposed, particularly reinforcement and belief-based models, as well as hybrid models such as experienced weighted attraction (EWA) (see Camerer and Ho 1999 and Ho et al. 2007). Hsu et al. (2010) build upon this literature by studying the neural mechanisms underlying strategic learning using functional magnetic resonance imaging (fMRI). This is a potentially fruitful endeavor as the neural mechanisms of learning have been revolutionized from the discovery of a class of neurons, namely the dopamine neurons, that appear to implement the temporal difference (TD) form of reinforcement learning. Derived from behavioral psychology and machine learning literatures, at the core of TD learning is the computation and updating of a reward prediction error (RPE), whereby organisms (in this case players) learn from the discrepancy between what is expected to happen and what actually happens (Sutton and Barto 1981). This includes a number of recent papers implicating such dopaminergic regions in decisions under risk and uncertainty (e.g., Fiorillo et al. 2003). More importantly from the perspective of strategic learning, this literature offers a set of biologically plausible formal models of behaviour that has the potential to directly connect behavioral observations of learning dynamics in games on the one hand (Roth and Erev 1995; Camerer 2003), and the neural observations of the brain dynamics on the other. Specifically, Hsu et al. (2010) used an asymmetric version of the patent race game, first studied experimentally in Rapoport and Amaldoss (2000), to search for regions of the brain involved in computation of expected payoffs and prediction errors that can be used to guide behavior. The large strategy space of this game improves the recovery of key parameters in learning models relative to smaller games typically used in such studies (Wilcox 2006). Reinforcement and belief-based models were implemented in the manner consistent with temporal difference models (Sutton and Barto 1981) and fitted to the neural data. The results show evidence of reinforcement and belief-based learning signals in the manner predicted by EWA learning. Somewhat surprisingly, these distinct signals are represented in both overlapping and distinct brain regions. There are a number of potential extensions to this study. First, despite the empirical success of the aforementioned models, the model fits are far from perfect. Belief-learning in particular, relies on strong functional form assumptions regarding the construction of beliefs, which has longbeen assumed in standard models to be unobservable. Recent studies using proper scoring rules to elicit beliefs, however, found substantial improvements in fit (Nyarko and Schotter 2002). As shown out by Rutstrom and Wilcox (2009), however, the act of elicitation itself may well bias the learning dynamics. In contrast, direct extraction of beliefs from neural activity presents the possibility of an unbiased measurement of beliefs. More generally, neural data can discipline behavioral models of learning by providing direct data regarding the causal mechanisms behind decision-making and learning.