Fig. 9. The score difference between the learning Bots and Hunter Bots.
Table 7
The score difference between our Bots and the enemy Bot during learning.
of codes in DSL remains in a reasonable region although there is prior knowledge encoded beforehand.
8. Conclusion
This paper has presented a computational model unifying two popular learning paradigms, namely imitative learning and rein- forcement learning, based on a class of self-organizing neural net- works called Fusion Architecture for Learning and COgnition (FAL- CON). Addressing the knowledge integration issue, the compu- tational model is capable of unifying states and actions spaces and transferring knowledge seamlessly across different learning paradigms. This enables the learning agent to perform continuous knowledge exploitation, while it enhances the reinforcement learn- ing with complementary knowledge. Specifically, two hybrid learn- ing strategies, known as Dual-Stage Learning (DSL) and the Mixed Model Learning (MML), are proposed to realize the integration of the two different learning paradigms within one designed frame- work. DSL and MML have been used to create non-player charac- ters (NPCs) in a first person shooting game named Unreal Tour- nament. A series of experiments shows that both DSL and MML are effective in enhancing the learning ability of NPCs in terms of faster learning speed and accelerating convergence. Most notably, the NPCs built by DSL and MML produce better combat perfor- mance comparing with NPCs using the pure reinforcement learn- ing and the pure imitative learning methods. The proposed hybrid learning strategies thus provide an efficient method to building in- telligent NPC agents in games and pave the way towards building autonomous expert and intelligent systems for other applications.
Although integration of different learning paradigms appears to be straightforward in our work, note that our integration strate- gies rely very much on the self-organizing neural network model employed, namely FALCON. As such our work does not provide a general solution for integration of different learning paradigms us- ing any learning algorithm or model.
In terms of algorithm design and experimentation, our main performance metric so far is just the combat performance of the NPC. Moreover, for solving the exploitation–exploration dilemma, we only consider simple direct rewards, such as those given when damaging opponents and collecting weapon. Other more sophisti- cated aspects of NPCs in first-person shooting scenarios, such as goals, memories, and humanity factors, so far have not been ex- plored.
Moving forward, for the purpose of creating intelligent, believ- able and attractive NPC agents, we still have to enhance the ca- pabilities of the agents by integrating other high level cognitive factors and human factors. For example, we shall investigate the use of a goal maintenance module, which may help to manage the exploitation–exploration dilemma and predict the outcome of ac- tions. On the other hand, we shall extend our model to be capable of human-like behavior by incorporating personalities and motiva- tions into agents.
Last but not least, it is important to augment the cognitive func- tions of the agents with affective capabilities, so that the NPCs
Learning Bots
FALCON-IL Bot FALCON-OIL Bot FALCON-RL Bot QL Bot FALCON-DSL Bot FALCON-MML Bot
Score difference after 5 runs
0.10 ± 3.68 0.22 ± 4.18 1.14 ± 3.52 2.50 ± 5.34 7.28 ± 1.96 7.25 ± 3.64
Score difference after 10 runs
0.49 ± 4.24 0.45 ± 5.12 7.04 ± 3.11 2.90 ± 3.34 7.72 ± 4.23
7.10 ± 4.36
Score difference after 20 runs
1.19 ± 3.86 1.01 ± 2.30 8.30 ± 4.16 6.60 ± 5.41 8.72 ± 4.66 7.68 ± 4.62
FALCON-MML Bot, and FALCON-RL Bot. Compared with the Bots evaluated in the first set of experiments, these Bots make use of online real-time learning, doing away with the need to do offline imitative learning before hand. Specifically, imitative learning is done completely in an online fashion for FALCON-OIL Bot, and interleaved with reinforcement learning for FALCON-MML Bot. Fig. 9 summarizes the performance of the three Bots in terms of score difference playing against Hunter. As before, the game score differences are calculated by averaging across ten sets of 20 continuous runs.
From Fig. 9, it can be seen that demonstrates the FALCON-OIL Bot can learn the behavior patterns very quickly, and have a sim- ilar fighting competency as that of Hunter Bot. Comparing with Fig. 8, we see that the FALCON-OIL Bot’s performance is as good as the FALCON-IL Bot. This result also shows that the online im- itative learning is capable of learning behavior patterns fast and accurately.
More importantly, Fig. 9 also shows that FALCON-MML Bot pro- duces an significantly higher level of fighting competency than its opponent. As FALCON-MML Bot provides fast learning speed and quick convergence in real time, this result also shows that MML is a powerful strategy to integrate online imitative l
Fig. 9. The score difference between the learning Bots and Hunter Bots.Table 7The score difference between our Bots and the enemy Bot during learning.of codes in DSL remains in a reasonable region although there is prior knowledge encoded beforehand.8. ConclusionThis paper has presented a computational model unifying two popular learning paradigms, namely imitative learning and rein- forcement learning, based on a class of self-organizing neural net- works called Fusion Architecture for Learning and COgnition (FAL- CON). Addressing the knowledge integration issue, the compu- tational model is capable of unifying states and actions spaces and transferring knowledge seamlessly across different learning paradigms. This enables the learning agent to perform continuous knowledge exploitation, while it enhances the reinforcement learn- ing with complementary knowledge. Specifically, two hybrid learn- ing strategies, known as Dual-Stage Learning (DSL) and the Mixed Model Learning (MML), are proposed to realize the integration of the two different learning paradigms within one designed frame- work. DSL and MML have been used to create non-player charac- ters (NPCs) in a first person shooting game named Unreal Tour- nament. A series of experiments shows that both DSL and MML are effective in enhancing the learning ability of NPCs in terms of faster learning speed and accelerating convergence. Most notably, the NPCs built by DSL and MML produce better combat perfor- mance comparing with NPCs using the pure reinforcement learn- ing and the pure imitative learning methods. The proposed hybrid learning strategies thus provide an efficient method to building in- telligent NPC agents in games and pave the way towards building autonomous expert and intelligent systems for other applications.Although integration of different learning paradigms appears to be straightforward in our work, note that our integration strate- gies rely very much on the self-organizing neural network model employed, namely FALCON. As such our work does not provide a general solution for integration of different learning paradigms us- ing any learning algorithm or model.In terms of algorithm design and experimentation, our main performance metric so far is just the combat performance of the NPC. Moreover, for solving the exploitation–exploration dilemma, we only consider simple direct rewards, such as those given when damaging opponents and collecting weapon. Other more sophisti- cated aspects of NPCs in first-person shooting scenarios, such as goals, memories, and humanity factors, so far have not been ex- plored.Moving forward, for the purpose of creating intelligent, believ- able and attractive NPC agents, we still have to enhance the ca- pabilities of the agents by integrating other high level cognitive factors and human factors. For example, we shall investigate the use of a goal maintenance module, which may help to manage the exploitation–exploration dilemma and predict the outcome of ac- tions. On the other hand, we shall extend our model to be capable of human-like behavior by incorporating personalities and motiva- tions into agents.Last but not least, it is important to augment the cognitive func- tions of the agents with affective capabilities, so that the NPCs Learning BotsFALCON-IL Bot FALCON-OIL Bot FALCON-RL Bot QL Bot FALCON-DSL Bot FALCON-MML BotScore difference after 5 runs0.10 ± 3.68 0.22 ± 4.18 1.14 ± 3.52 2.50 ± 5.34 7.28 ± 1.96 7.25 ± 3.64Score difference after 10 runs0.49 ± 4.24 0.45 ± 5.12 7.04 ± 3.11 2.90 ± 3.34 7.72 ± 4.237.10 ± 4.36Score difference after 20 runs1.19 ± 3.86 1.01 ± 2.30 8.30 ± 4.16 6.60 ± 5.41 8.72 ± 4.66 7.68 ± 4.62 FALCON-MML Bot, and FALCON-RL Bot. Compared with the Bots evaluated in the first set of experiments, these Bots make use of online real-time learning, doing away with the need to do offline imitative learning before hand. Specifically, imitative learning is done completely in an online fashion for FALCON-OIL Bot, and interleaved with reinforcement learning for FALCON-MML Bot. Fig. 9 summarizes the performance of the three Bots in terms of score difference playing against Hunter. As before, the game score differences are calculated by averaging across ten sets of 20 continuous runs.From Fig. 9, it can be seen that demonstrates the FALCON-OIL Bot can learn the behavior patterns very quickly, and have a sim- ilar fighting competency as that of Hunter Bot. Comparing with Fig. 8, we see that the FALCON-OIL Bot’s performance is as good as the FALCON-IL Bot. This result also shows that the online im- itative learning is capable of learning behavior patterns fast and accurately.More importantly, Fig. 9 also shows that FALCON-MML Bot pro- duces an significantly higher level of fighting competency than its opponent. As FALCON-MML Bot provides fast learning speed and quick convergence in real time, this result also shows that MML is a powerful strategy to integrate online imitative l
การแปล กรุณารอสักครู่..
