The methodology presented in this paper was able to allocate time and location information to sequences that consist of activities and transport modes. To the best of our knowledge, activity and location allocations have not yet been integrated and optimized in previous research in order to achieve maximal rewards for a given activity–travel pattern. The methodology was based on the reinforcement learning algorithm which has been used to help the agent search the optimal path in the huge number of states of given environments.
During learning, the Q-learning agent tries some actions (i.e., output values) on its environment. Then, it is reinforced by receiving a scalar evaluation (the reward) of its actions. In a first implementation, it has been assumed that time allocation is dependent on the type of activity, the starting time of the activity and the time already spent at that activity. Also, the sequence of different activities determined the time allocation. Indeed, two sequences that contain a similar activity which has the same starting time and the same time spent at that activity, do not have to (and often will not) receive the same time allocation for that particular activity, as a result of the different sequence order in which other activities occur in both diaries. Technically, the agent will come up with another optimal path, a different policy chart and as a result also a different time allocation for both sequences. The location allocation problem was initially also solved in the assumption that the allocation is dependent on the travel time between two locations and on the transport mode that has been chosen to reach these locations. Also in this case, it is obvious that the sequence information of activities and transport modes largely determines the allocation.
Then, in a final implementation, the idea to integrate time and location allocation simultaneously, has been conceived. Dealing with both allocations simultaneously, leads to some important advantages. The first advantage is that the reward is not only maximized in either the time or the location facet, but the total reward in a day (i.e., the reward that arises from determining optimal start and end times and the cost that arises from travelling between locations) will be maximized by means of an integrated approach, which is obviously more realistic. The second major advantage is that flexible travel times between two locations can be incorporated. In the first time allocation implementation, it was impossible to achieve this, due to the lack of location information.
The most important drawback of this integrated implementation, is that the magnitude of the importance between the time and location relationship cannot be immediately observed from the data. To this end, a simple conversion function has been proposed and tested in the empirical section. Further research could for instance use other alternative techniques (for instance, stated preference) to better specify and understand this relationship. It was also mentioned above that the reward tables used in the experiments can be derived from frequency information that is present in the data. Alternatively, one may also use reward functions or utility functions which include more parameters when determining the utility of an action. As such, apart from the starting time and the duration of the activity, the activity location, the position of the activity within the activity schedule and the activity history are also incorporated in these utility functions. An initial approach has been shown in [22].
As mentioned before, the approach presented in this paper largely relies upon a fixed sequence of activities and transport modes. Alternatively, one may also let the reinforcement algorithm determine this activity–travel sequence autonomously. An initial framework for this has been proposed in Vanhulsel et al. [23] in an application where a key event (obtaining a driver’s license) is simulated. However, the approach presented only some initial results and needs further investigation. In addition to this, one may also want to investigate the use of currently unexplored relational reinforcement learning approaches [7], [8] and [9] in this domain, which will employ a relational regression technique in cooperation with a Q-learning algorithm to build a relational, generalized Q-function. As such, it combines techniques from reinforcement learning with generalization techniques from inductive logic programming.
วิธีการนำเสนอในเอกสารนี้ได้จัดสรรข้อมูลเวลาและสถานลำดับที่ประกอบด้วยกิจกรรม และวิธีการขนส่ง กับความรู้ของเรา การปันส่วนกิจกรรมและสถานได้ยังไม่ได้รวม และเหมาะในการวิจัยก่อนหน้านี้เพื่อให้บรรลุผลตอบแทนสูงสุดสำหรับรูปแบบกิจกรรม – การเดินทางที่กำหนด ระเบียบวิธีที่เป็นไปตามขั้นตอนวิธีการเรียนรู้เสริมซึ่งถูกใช้เพื่อช่วยตัวแทนค้นหาเส้นทางเหมาะสมที่สุดในจำนวนมากของอเมริกาของ สภาพแวดล้อมให้ในระหว่างเรียน ตัวแทนเรียนรู้ Q พยายามดำเนินการบางอย่าง (เช่น ออกค่า) ในสภาพแวดล้อม แล้ว มันจะเสริม ด้วยการรับการประเมินแบบสเกลา (สะสม) ของการดำเนินการของ ในการดำเนินการแรก มันสันนิษฐานนั้นจะขึ้นอยู่กับชนิดของกิจกรรม เวลาเริ่มต้นของกิจกรรม และเวลาที่ใช้แล้วในกิจกรรมนี้ ยัง ลำดับของกิจกรรมต่าง ๆ กำหนดการปันส่วนเวลา แน่นอน ลำดับที่สองที่ประกอบด้วยกิจกรรมคล้ายกันที่มีว่าเวลาเริ่มต้นและเวลาเดียวกันที่ใช้ในกิจกรรม ไม่มี (และมักจะไม่) ได้รับการปันส่วนเวลาเดียวกันสำหรับกิจกรรมเฉพาะ จากใบสั่งต่าง ๆ ลำดับกิจกรรมอื่น ๆ เกิดขึ้นในทั้งสองไดอารีส์ เทคนิค แทนจะเกิดขึ้นอื่นเหมาะสมเส้น ทาง ผังนโยบายแตกต่างกัน และ เป็นผลยังต่าง ๆ การปันส่วนเวลาสำหรับลำดับทั้งสอง ปัญหาการจัดสรรตำแหน่งเริ่มต้นยัง แก้ไขในอัสสัมชัญที่การปันส่วนขึ้นอยู่ กับเวลาเดินทางระหว่างสถานสอง และโหมดการขนส่งที่ได้รับเลือกถึงสถานเหล่านี้ นอกจากนี้ ในกรณีนี้ ได้ชัดเจนว่า ข้อมูลลำดับของกิจกรรมและวิธีการขนส่งส่วนใหญ่กำหนดการปันส่วนThen, in a final implementation, the idea to integrate time and location allocation simultaneously, has been conceived. Dealing with both allocations simultaneously, leads to some important advantages. The first advantage is that the reward is not only maximized in either the time or the location facet, but the total reward in a day (i.e., the reward that arises from determining optimal start and end times and the cost that arises from travelling between locations) will be maximized by means of an integrated approach, which is obviously more realistic. The second major advantage is that flexible travel times between two locations can be incorporated. In the first time allocation implementation, it was impossible to achieve this, due to the lack of location information.The most important drawback of this integrated implementation, is that the magnitude of the importance between the time and location relationship cannot be immediately observed from the data. To this end, a simple conversion function has been proposed and tested in the empirical section. Further research could for instance use other alternative techniques (for instance, stated preference) to better specify and understand this relationship. It was also mentioned above that the reward tables used in the experiments can be derived from frequency information that is present in the data. Alternatively, one may also use reward functions or utility functions which include more parameters when determining the utility of an action. As such, apart from the starting time and the duration of the activity, the activity location, the position of the activity within the activity schedule and the activity history are also incorporated in these utility functions. An initial approach has been shown in [22].As mentioned before, the approach presented in this paper largely relies upon a fixed sequence of activities and transport modes. Alternatively, one may also let the reinforcement algorithm determine this activity–travel sequence autonomously. An initial framework for this has been proposed in Vanhulsel et al. [23] in an application where a key event (obtaining a driver’s license) is simulated. However, the approach presented only some initial results and needs further investigation. In addition to this, one may also want to investigate the use of currently unexplored relational reinforcement learning approaches [7], [8] and [9] in this domain, which will employ a relational regression technique in cooperation with a Q-learning algorithm to build a relational, generalized Q-function. As such, it combines techniques from reinforcement learning with generalization techniques from inductive logic programming.
การแปล กรุณารอสักครู่..
