Learning in Real-Time Search: A Unifying Framework
Avoir des agents intelligents qui apprennent en temps réels, qui sont capables de prendre des décisions informées, un rêve d’intelligence artificielle :
In this paper, we consider a simultaneous planning and learning problem. One motivating application lies with navigation on an initially unknown map under real-time constraints.
As an example, consider a robot driving to work every morning. Imagine the robot to be a newcomer to the town. The first route the robot finds may not be optimal because traffic jams, road conditions, and otherfactors are initially unknown. With the passage of time, the robot continues to learn and eventually converges to a nearly optimal commute.
Note that planning and learning happen while the robot is driving and therefore are subject to time constraints.
Present-day mobile robots are often plagued by localization problems and power limitations, but their simulation counter-parts already allow researchers to focus on the planning and learning problem. For instance, the RoboCup Rescue simulation league (Kitano, Tadokoro, Noda, Matsubara,Takahashi, Shinjou, & Shimada, 1999) requires real-time planning and learning with multiple agents mapping out an unknown terrain.
Pathfinding is done in real time as various crises, involving fire spread and human victims trapped in rubble, progress while the agents plan.
Similarly, many current-generation real-time strategy games employ a priori known maps. Full knowledge of the maps enables complete search methods such as A* (Hart, Nilsson, & Raphael,BULITKO & LEE1968) and Dijkstra’s algorithm (Dijkstra, 1959).
Prior availability of the maps allows pathfinding engines to pre-compute various data to speed up on-line navigation.
Examples of such data include visibility graphs (Woodcock, 2000), influence maps (Pottinger, 2000), space triangulation (Kallmann,Bieri, & Thalmann, 2003), state abstraction hierarchies (Holte, Drummond, Perez, Zimmer,& MacDonald, 1994; Holte, 1996; Botea, M¨uller, & Schaeffer, 2004) and route waypoints (Reece,Krauss, & Dumanoir, 2000).
However, the forthcoming generations of commercial and academic games (Buro, 2002) will require the agent to cope with initially unknown maps via explorationand learning during the game, and therefore will greatly limit the applicability of complete search algorithms and pre-computation techniques.
Incremental search methods such as dynamic A* (D*) (Stenz, 1995) and D* Lite (Koenig &Likhachev, 2002) can deal with initially unknown maps and are widely used in robotics, including DARPA’s Unmanned Ground Vehicle program, Mars rover, and other mobile robot prototypes (Herbert,McLachlan, & Chang, 1999; Thayer, Digney, Diaz, Stentz, Nabbe, & Hebert, 2000). They work well when the robot’s movements are slow with respect to its planning speed (Koenig, 2004).
In real-time strategy games, however, the AI engine can be responsible for hundreds to thousands
of agents traversing a map simultaneously and the planning cost becomes a major factor. To illustrate:
even at the smaller scale of the six-year old “Age of Empires 2” (Ensemble-Studios, 1999),
60-70% of simulation time is spent in pathfinding (Pottinger, 2000). This gives rise to the following
1. How can planning time per move, and particularly the first-move delay, be minimized so that each agent moves smoothly and responds to user requests nearly instantly?2. Given real-time execution, local sensory information, and initially unknown terrain, how can the agent learn a near-optimal path and, at the same time, minimize the learning time and memory required?