Scholarly article on topic 'GAIA: A CAD Environment for Model-Based Adaptation of Game-Playing Software Agents'

GAIA: A CAD Environment for Model-Based Adaptation of Game-Playing Software Agents Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{"Intelligent Agents" / "Interactive Games" / Meta-Reasoning / "Model-Based Systems Engineering" / Self-Adaptation / "Task Models"}

Abstract of research paper on Computer and information sciences, author of scientific article — Spencer Rugaber, Ashok K. Goel, Lee Martie

Abstract We view interactive games and game-playing software agents as complex systems. This allows us to adopt the stance of computer-aided design and model-based systems engineering to designing game-playing agents. In this paper, we describe a model-based technique for self-adaptation in game-playing agents. Our game-playing agent contains a self-model that describes its internal state.. Our approach to self-adaptation takes the form of an interactive game-agent development environment called GAIA, an agent modeling language called TMKL2, and an agent self-adaptation engine called REM.. We evaluate the approach by applying it to an agent that plays parts of the interactive turn-based strategy game called Freeciv.

Academic research paper on topic "GAIA: A CAD Environment for Model-Based Adaptation of Game-Playing Software Agents"

Available online at www.sciencedirect.com

SciVerse ScienceDirect PfOCGCl ¡Q

Computer Science

Procedía Computer Science 16 (2013) 29 - 38

Conference on Systems Engineering Research (CSER'13) Eds.: C.J.J. Paredis, C. Bishop, D. Bodner, Georgia Institute of Technology, Atlanta, GA, March 19-22, 2013.

GAIA: A CAD Environment for Model-Based Adaptation of Game-Playing Software Agents

Spencer Rugabera, Ashok K. Goela and Lee Martieb

aDesign & Intelligence Laboratory, College of Computing Georgia Institute of Technology, Atlanta, Georgia USA bDepartment of Informatics, University of California, Irvine, Irvine, CA USA

Abstract

We view interactive games and game-playing software agents as complex systems. This allows us to adopt the stance of computer-aided design and model-based systems engineering to designing game-playing agents. In this paper, we describe a model-based technique for self-adaptation in game-playing agents. Our game-playing agent contains a self-model that describes its internal state.. Our approach to self-adaptation takes the form of an interactive game-agent development environment called GAIA, an agent modeling language called TMKL2, and an agent self-adaptation engine called REM.. We evaluate the approach by applying it to an agent that plays parts of the interactive turn-based strategy game called Freeciv.

© 2013 The Authors. Published by Elsevier B.V.

Selection and/or peer-review under responsibility of Georgia Institute of Technology

Keywords-Intelligent Agents; Interactive Games; Meta-Reasoning, Model-Based Systems Engineering; Self-Adaptation, Task Models.

1. Introduction

We view interactive games and game-playing intelligent agents as complex systems. Interactive games typically have a large number of heterogeneous interacting components. The games are dynamic, and have large state spaces. Many games, such as multiplayer turn-based strategy games, are non-deterministic and only partially observable. The dynamic behavior of interactive games emerges out of the interactions of their numerous components.

Human players typically play interactive games against one or more game-playing intelligent software agents. We view a game-playing intelligent agent itself as a complex adaptive system. The intelligent agent too is composed of a large number of interacting components. The agent adapts its behavior to the changing situation in the game. Thus, the behavior of the agent is dynamic and emerges out of the interactions between the agent and the game.

One advantage of viewing interactive games and game-playing as complex systems is that it enables us to adopt the stance of intelligent computer-aided design and model-based system engineering to designing game-playing agents. Our approach to designing game-playing agents comprises an interactive CAD environment called GAIA for designing and executing game-playing agents, an agent modeling language called TMKL2, and an agent adaptation engine called REM.

GAIA is an interactive CAD environment for developing game-playing agents. It provides an architecture for modelling and adapting game-playing agents, including an interpreter for the TMKL2 agent modelling language,

ELSEVIER

1877-0509 © 2013 The Authors. Published by Elsevier B.V.

Selection and/or peer-review under responsibility of Georgia Institute of Technology doi:10.1016/j.procs.2013.01.004

and an application programming interface (API) with the game world. It comprises a visual editor for modeling game-playing agents, controls for interacting with a game server, simulating an agent design in the game world, and monitoring the results, and a persistence mechanism for managing agent models. TIELT [1] provides a similar interactive environment for simulating and testing game-playing agents.

TMKL2 is based on and extends the agent modeling language TMKL [2]. TMKL2 models of intelligent agents are expressed in terms of tasks, methods, and knowledge. A task describes the computational goal of producing a specific result. A method is a unit of computation that produces a result in a specified manner. A method decomposes a task into subtasks, specifies the ordering among the subtasks and is represented as a finite state machine (FSM). This decomposition of tasks into methods and methods into subtasks may continue until a primitive level is reached at which point either the task corresponds to an action in the game or to domain knowledge that directly satisfies the task. The knowledge portion of TMKL2 agent models is expressed in terms of the concepts and relations about the world in which the agents reside.

REM is also a legacy of [2]. REM reasons over TMKL2 agent models. REM uses the TMKL2 model of the agent to monitor the agent's internal state with respect to its accomplishment of goals and the methods it used to accomplish them. Further, it can adapt the agent to better accomplish those goals. In addition to its own reasoning mechanism, REM supports interaction with external reasoners such as planners.

In this paper, we first briefly describe TMKL2, GAIA and REM. Then we describe several experiments in self-adaptation in game-playing agents that play parts of an interactive strategy game called Freeciv.

2. Background

An intelligent agent is an autonomous entity that maps a history of percepts in an external environment into an action on the environment to achieve goals and maximize utility.. In general, for complex game environments, it is not possible to design an agent such that it always selects the right action in any state and thus always achieves its goal. Instead, game-playing agents can, and often do, fail to achieve their goals. When an agent fails to achieve its goal, the agent may attempt to adapt itself [3][4][5][6][7][8]. An agent may also seek to proactively adapt itself when its external environment changes. [2]. For example, a game-playing agent may want to proactively adapt itself if the game designer introduces a new percept, action, game rule or goal for the agent.

2.1. Freeciv

Our work on self-adaptation in game-playing agents takes place in the context of a complex, interactive, multiplayer, turn-based strategy game called Freeciva. Freeciv is an open-source variant of a class of Civilization games.. The goal in the game is to control and grow a civilization while competing for limited resources against other players' civilizations. The major activities in this endeavor are exploration of the randomly initialized game world, resource acquisition and development, and warfare. Winning the game is achieved most directly by destroying the civilizations of all opponents, but can also be accomplished through more peaceful means by scientific advancement resulting in the building of an interstellar spaceship. Figure 1 depicts a screenshot of Freeciv. We note that our agents play only small portions of the Freeciv game.

3. TMKL2

TMKL2 is an agent modeling language that captures the teleology of the agent's design. TMKL2 includes vocabulary for specifying agent goals, the mechanisms to accomplish the goals and the agent's knowledge of itself and its external environment. GAIA realizes TMKL2 models using an interpreter. When used to model a Freeciv agent, the interpreter is capable of executing a model in conjunction with Freeciv's server to play the game. GAIA assumes that the division between the agent's model and external parts is clearly defined, and this division takes the form of an API to the external code. GAIA also makes an explicit distinction between adaptation-time modelling of

a http://freeciv.wikia.com/

Figure 1: A Screenshot of the FreecCiv Client User Interface

agent and run-time execution of the adapted agent: Adaptation takes place on a model of the agent, and then the model is interpreted to effect agent behaviour in the game world.

3.1. Goals

TMKL2 has three submodels: Goals, Mechanisms and Environment, corresponding to the Tasks, Methods and Knowledge portions of the earlier TMKL language [2], respectively. The first submodel describes the agent's goals. A Goal expresses a reason that the agent does what it does, in terms of its intended externally visible effects on the agent's world. Goals may be parameterized enabling the agent to target specific elements of its environment, such as, for example, a specific city. A Goal is expressed via a pair of logical expressions describing the precondition for Goal accomplishment (called its Given condition) and the expected effect of Goal accomplishment on the agent's Environment (its Makes condition). The final element of a Goal specification is an indication of the means by which the Goal is to be accomplished. This takes the form of a Mechanism invocation. Hence, Goals are directly tied to the Mechanism by which they are to be achieved.

3.2. Mechanisms

The Mechanism portion of a TMKL2 model describes how the agent accomplishes its Goals. There are two kinds of Mechanisms—Organizers and Operations—that are each defined in terms of two logical expressions describing their precondition for execution (Requires conditions) and their effect (Provides condition).

An Organizer Mechanism is defined as a finite state machine comprising States and Transitions. Start, failure and success States are all explicitly indicated. States, in turn, define subGoals, enabling hierarchical refinement of an agent's specification. Transitions may be conditional (dependent on a DataCondition) with respect to the agents current perception of the world, as expressed in its Environment.

The other kind of Mechanism is an Operation. Operations are parameterized invocations of computational resources provided to the software agent via its API to external software, such as the Freeciv server. That is, each Operation models one of the agent's computational capabilities.

3.3. Environment

A TMKL2 program includes a description of the agent's understanding of itself and the world in which its exists.

Figure 2: A portion of the TMKL2 model of Alice, an agent that plays a part of Freeciv.

In particular, the agent's Environment comprises a set of typed Instances and Triples (3-tuples) relating the Instances to each other. In order to describe Instances and Triples, TMKL2 provides two modeling constructs, Concepts and Relations.

A Concept is a description of a set of similar Instances. It is defined in terms of a set of typed Properties. Moreover, Concepts are organized in a specialization hierarchy promoting compositionality and reuse. There is a built-in Concept called Concept. When a TMKL2 model is constructed, Instances of Concept are automatically added to it for each defined Concept enabling reflection by the agent over its own definition. A Relation describes a set of Triples allowing the modeling of associations among Instances. In particular, an Instance of one Concept can be related to Instance of another via a Triple.

3.4. TMKL2 Semantics

A TMKL2 model of an agent connects the Goals of the agent to the Mechanisms by which the Goals are accomplished. The program is declarative in the sense that all behavior is defined in terms of logical expressions (Given, Makes, Requires, Provides). Consequentially, one semantic interpretation of a TMKL2 program is that it describes the behavior that a software agent must exhibit in order for it to accomplish a set of toplevel Goals.

TMKL2 programs are not just descriptive, however: They can be used to actually control the modeled agent. This requires an operational semantics of TMKL2: a TMKL2 Model prescribes the detailed behavior of the agent in the world.

Operationally, a TMKL2 Model can be interpreted as a hierarchy of finite state machines controlling communication with the external software with which the agent interacts. Superior state machines in the hierarchy correspond to superior Goals. FSMs corresponding to Goals without any subGoals are called leaf FSMs.

All state machines execute synchronously; that is, at any given time, each machine is in a specific State. At the next virtual clock tick, all pending DataConditions for active leaf machines are evaluated, and the outgoing Transitions evaluating to true are traversed, resulting in entry into new States. Upon entry into a State, the corresponding Mechanism is interpreted. Mechanism interpretation ultimately resolves into Operation

invocations and updates to the Environment. After all invocations have been processed, the Environment is updated to reflect any changes to the agent's run-time data structures made by the invocations. Interpretation terminates if the Organizer for the top-level Goal enters either a success or failure State.

3.5. An Example of a TMKL2 Model of a Freeciv Agent

Figure 2 presents part of the model of an agent, called Alice, capable of playing a simplified version of Freeciv. The figure illustrates a visual syntax for TMKL2. The partial model includes Alice's top level Goals and Organizers. In particular, the top rectangle of the diagram denotes Alice's top-level Goal, that she should collected gold pieces. Contained within this rectangle is another, depicting an Organizer comprising three States—an initial State, a State specifying a subGoal and a final State. The subGoal is shown as the rightmost of the two rectangles on the second level. Its Organizer, in turn, has two subGoals—one that continually mints more gold until enough has been produced and the other determining when to end the game. The bottom two rectangles represent Operations, responsible for interacting with the Freeciv server. Complementing the Goals and Mechanisms shown in the figure is Alice's Environment (not shown). Example Concepts represented by Instances in the Environment include City, Tile, Player, and Unit.

4. GAIA

TMKL2 models can be constructed, modified and executed using the GAIA interactive development environment. In this section, we give a high-level description of GAIA's architecture and major subsystems. We also describe its run-time interface to the world in which the agent executes. GAIA is written in the Java programming language and is built using the Eclipseb software development environment.

4.1. The GAIA Architecture

The conceptual architecture for GAIA is presented in Figure 3. In the center left of the figure is SAGi, the GAIA user interface. Through the interface, a designer can enter and edit TMKL2 agent models, and, when desired, submit them for execution to the TMKL2 interpreter. Models can also be saved for later access. REM is the reasoning module responsible for adapting models. Also part of GAIA is the model manager responsible for encapsulating access to agent models and persisting them to permanent storage. Note that because we are interested in adapting agent models, there may be many model versions existing at any point in time. In-memory representation of TMKL2 models take the form of Java objects that are interpreted by the TMKL2 interpreter, which interacts with the world via the Runtime Communications Manager and associated queues.

4.2. SAGi

TMKL2 agents for playing Freeciv can be specified in one of two ways. The first is via textual import from external files. The preferred approach, however, is to use a visual syntax we have developed and the supporting editor, incorporated into SAGi. TMKL2 models are expressed in SAGi using the visual syntax accessed from a palette of model elements of different types. A property panel can be used to set values of elements attributes. SAGi also provides the means by which model interpretation is initiated, paused, and stopped.

4.3. TMKL2 Interpreter and Interface to Game World

SAGi invokes the TMKL2 interpreter to execute a model and thereby interact with the Freeciv server. The interpreter walks the TMKL2 tree of state machines iteratively until the agent either succeeds or fails to achieve its top-level Goals. When the interpreter attempts to accomplish a subGoal whose Mechanism is an Operation, it must place into the Operation Request Queue a request to the Freeciv server to execute a game action, encoding

b http://www.eclipse.org/

Figure 3: The Conceptual Architecture of GAIA.

parameters as necessary. The Operation must have been previously mapped by the game designer to a Freeciv action.

5. REM

REM is the other major component of GAIA. REM, when given an agent model and a situation—either a failed Goal or an altered Environment—produces an updated agent model engineered either to successfully accomplish the Goal or to take advantage of the new knowledge in the Environment.

To achieve retrospective adaptation, REM performs three steps: localization (determining which of an agent's subGoals and associated Mechanisms were inadequate to accomplish the agent's overall Goal), transformation (devising an alternative Goal), and realization (providing/altering a Mechanism to accomplish this Goal).

Localization is accomplished in REM using a heuristic to find a low-level State in an Organizer such that the State's Provides condition suffices to accomplish the failing Goal. Further, the detected State must have a failing precondition (Requires condition). The presumption is that the State had not been reached, and, if it had been reached, then the agent would have succeeded. Realization and transformation are accomplished by matching the failing situation against a library of adaptation plans, choosing a candidate transformation from the library and applying the result to the agent's Model to produce a revised Model.

REM sits atop the Powerloomc knowledge representation and reasoning system. Powerloom supports automatic classification (truth maintenance) as well as natural deduction. TMKL2 Logical Expressions are easily mapped to/from Powerloom, and REM algorithms are easily expressed in Powerloom's variant of first-order logic.

6. Adaptation Scenarios and Results

To validate our approach to model-based adaptation, we conducted several experiments, each involving variants of the Alice agent depicted in Figure 2. In the experiments, Alice plays a simplified variant of Freeciv against other agents. In particular, the simplified game consists of two agents. Each agent controls a civilization and is responsible for its government, economy, citizen morale, and military. Each civilization has one city, citizens in that city, and a

C http://www.isi.edu/isd/LOOM/PowerLoom/

number of warriors. Each game tile yields a quantity of food, production, and trade points during each turn of the game. Food points feed a city's civilians; production points are used to support existing warriors or generate new warriors; and trade points are distributed among luxury, tax, and science resources. Initially both players start out in year 4000BC, with fifty gold pieces, zero warriors, and one worker that collects resources from a nearby tile. A city is either producing a warrior or collecting gold pieces on any given turn. Alice can win by collecting a specified number of pieces of gold, and the other agent can win by capturing Alice's city. An experiment consists of running Alice against its opponent and noting the results. Then, REM adapts Alice, and the game is rerun. In this way, we can assess the strengths and weaknesses of the adaptation technique used in the experiment.

6.1. Experiment #1

The purpose of the first experiment we conducted was to test whether REM could make a trivial adaptation to improve Alice's performance versus Freeciv's built-in robot player. In general, a player of this reduced game has to make a decision about allocating resources between collecting gold and creating warriors to defend its city. At the beginning of the first experiment, Alice's strategy was to devote all of her resources to the former pursuit. An obvious adaptation is to adjust Alice to balance her resource allocation, and the first experiment tested whether REM could make this adaptation.

In the experiment Alice played against Freeciv's robot player, which we call Frank, configured at its highest skill level. Although Alice had knowledge that Frank could win by capturing her city, she was unaware that Frank had more powerful weaponry and more production capacity than she had.

When played against Frank, unadapted Alice directly succumbed to his attacking chariots, legions, and horsemen. Before losing, Alice was able to acquire 175 units of gold and lived for 3075 years. However, Alice failed to acquire sufficient gold to accomplish her Goal, thereby requiring retrospective adaptation.

In this experiment no transformation was needed. That is, the failure was in a flawed Organizer rather than a Goal. Realizing a replacement Organizer meant interjecting a new State, whose success would satisfy the preconditions of a problem State. Such States are called patch States. A patch State was created by first searching a small library of generic Goal patterns to see if any satisfy the preconditions of the problem State. After an instantiated Goal pattern was found it was assigned as the Goal of the patch State. This patch State was then inserted into the localized Organizer just prior to the problem State. This guarantees the problem State's precondition is satisfied upon its visitation. In Experiment #1, the patch State was added with a Goal to build additional warriors. This Goal increases the defense of Alice's city if she is visibly outgunned on the game map.

After performing this adaptation, the new agent, Alice', was tested against Frank. While still outgunned, Alice' fared better in longevity and defense. She lasted 3125 years and killed one of Franks powerful attacking units. Because some of her resources had been allocated to defense, she fared worse in gold acquisition, acquiring only 147 units. The lesson learned was that compensating for a well-understood limitation could be accomplished by making use of a simple heuristic alteration of a TMKL2 Model, and a small library of patterns.

6.2. Experiment #2

Because of Frank's superior fire power, Experiment #1 was somewhat of an unfair contest for Alice. To explore how Alice would fare versus a similarly equipped opponent, a second experiment was tried. This experiment involved two naive agents named Alice and Barbra. Both played the simplified version of Freeciv described in Experiment #1. Barbra's strategy was to focus on producing warriors to attack Alice's city. By so doing, Barbra wins by overwhelming Alice's defenses. Before succumbing, Alice is able to acquire 93 units of gold and while living through 1450 years. The same adaptation process in Experiment #1 was used to adapt Alice and resulted in the same Alice' being produced as in Experiment #1. Running Alice' versus Barbra results in Alice' winning. Alice' was able to collect 185 units of gold, while living through 4700 years. The experiment increased our confidence in the approach used in Experiment #1.

6.3. Experiment #3

The previous two experiments were examples of retroactive adaption in which a failure was mitigated. In Experiment #3, proactive adaption was attempted to take advantage of a slightly altered game rule. In particular, it now takes more gold units for Alice to win. Tests were run on Alice to see if Alice's model was still valid after the rule change. REM tested if each Mechanism's Provides condition satisfies its parent Goal's Makes condition; that is, if the Mechanism was capable of accomplishing the new Goal. When this test failed, REM located the responsible Mechanism. In this experiment, REM localized Alice's GainGold Organizer. Next, a replacement Organizer was created to achieve the new win condition. To do this, REM used an external planning tool, called Graphpland. A planner is artificial intelligence software intended to propose a sequence of operations for accomplishing a goal. Graphplan is a mature, publically available planner.

REM translated the initial game Environment into a Graphplan facts file, amounting to over 10,400 facts. Then all Organizers, Operations, and game rules were translated into a Graphplan operators file. After pruning out operators with no effects, the resulting Graphplan file contained 10 operators. Next, REM ran Graphplan on the facts and operators files. Graphplan generated a three-stage plan capable of accomplishing Alice's top-level Goal. This plan was then translated back into an Organizer to replace GainGold. The lesson learned from this experiment was that for a simple numeric change, an invalid TMKL2 Organizer can be located and adapted using an external planner.

6.4. Experiment #4

The first two experiments described above were off-line in the sense that the adaptations were made after a game was completed. Experiment #4 is an on-line adaption in that Alice is changed while she is running. Moreover, her opponent, Barbra is also adapted during the game. In this experiment, both Alice and Barbra were reconfigured into two parts, one allopoietic and the other autopoietic. These term are borrowed from the literature of self-organizing systems and denote, respectively, the part of a system that changes and the part that does the changing.

Alice's allopoietic part used a parameter, alpha, to determine how Alice should allocate her resources between obtaining gold or producing warriors. The autopoietic part of Alice adapted the allopoietic part by adjusting alpha to produce gold only if she had sufficient defensive capability to fend off Barbra's visible attackers. Similarly Barbra's allopoietic part used a parameter, beta, to determine the number of warriors with which to attack Alice's city. Initially beta is set to 0. The autopoietic part of Barbra adapts the allopoietic part by adjusting the number of warriors Barbra attacks Alice with after every failed attack.

For both agents, the autopoietic part was itself a (meta-) agent. In particular, the meta-agent's Environment consisted of a description of the allopoietic part, including Goals, Mechanisms and (allopoietic) Environment. By monitoring game status, the meta-agent could make appropriate adjustments to the base agent's parameter by executing (meta) Operations.

Running Alice versus Barbra resulted in the agents engaging in an arms race. Eventually Alice was able to defeat Barbra. In winning, Alice collected 186 gold units, Barbra had 6 dead warriors, Alice had 3 live warriors and never lost a battle. Barbra adapted herself 4 times, and Alice adapted herself 6 times. The lesson learned was that TMKL2 models allow for simple real-time adaptations by using meta Operations to control the agent strategy.

7. Related Work

Here we describe several other projects related to ours. Kephart and Chess [9] present a vision of a self-managing autonomic computing systems. They describe self-managing systems as having four responsibilities. The first is self-configuration, which is about the self-installation and self-integration of a system in accordance with some higher level policies put in place by the system designer. Second, self-optimization is a proactive process where a system adjusts parameters in order to increase its performance. Third, self-healing is when a system tries to diagnose and fix failures within itself. Fourth is self-protection. Self-protection includes defending against any attacks on the system

d http://www.cs.cmu.edu/~avrim/graphplan.html

and may involve anticipation of vulnerabilities. The experiments described in Section 6 exhibit elements of self-management and self-healing. To this extent, at least, they qualify as autonomic systems.

Zhang and Cheng [10] use a genetic algorithm to evolve simple "software organisms." A genetic algorithm supports evolution, which is a form of adaptation at the level of a class of organisms. In the case of Zhang et al's. work, the organisms take the form of UML state machine models representing part of an obstacle avoidance system. The changes are applied to a population, and a fitness function is used to select candidates to be further improved. Although the Zhang and Cheng's approach was similar to ours in that it adapted models, it differed in the way in which such adaptation was accomplished. In particular, it used random variation and selection instead of GAIA's knowledge-intensive model-based approach.

Penix and Alexander [11] describe a formal approach to adaptation at the architectural level. In particular, they investigate two kinds of architectural adaptations. Augmentation adaptation is where a problem specification and a component specification are plugged into an architecture. . The missing components are using the constraints of the architecture specification and the added problem specification. Sub-component replacement adaptation replaces a component in an architecture that does not meet the problem specification. This is done by replacing the original system-level specification with the target-level specification and retaining all the original components except for the one being replaced. The Penix and Alexander approach is similar to our in that it makes use of models and analyzes problems in terms of formal specifications. However, there is no description of applying their approach to agent

adaptation. Also, there is no equivalent to our distinction between Goals and Mechanisms.

Georgeoff and Lansky [12] describe the Procedural Reasoning System (PRS) that has some of the same capabilities as GAIA, though their project's goals and approaches are quite different from ours. PRS addresses the problem of adding reactivity to planning in autonomous robots operating in dynamic worlds. If the world is static, then autonomous robots can use a variety of methods such as means-ends analysis or planners to form plans for accomplishing goals. But when the world is dynamic, there needs to be the ability to react to changes in the world. To accomplish this, PRS represents the goals and knowledge states of the robot in the language of beliefs, desires and intentions. It then reasons about changes in the world and the goals and knowledge states of the world to modify the robot's plans at run-time. PRS can achieve some of the same retrospective adaptations that we have described in this paper. If a planned action taken by an agent results in a failure due to changes in the world, then PRS can modify the action, thus making the planning reactive to changes the world. The goals of our work are different: While PRS adapts a plan in response to changes in an agent's world, we instead seek to adapt the agent itself. From the behahavioral perspective of observing the output behavior of the agent, this may make for only a modest difference. But from the cognitive perspective of the information-processing inside the agent, the difference is significant: While PRS reasons about how to modify the agent's plan, our architecture reasons about how to diagnose and repair the agent itself.

Our work perhaps is most directly related to research on meta-reasoning, or reasoning about reasoning [13]. Much of the research on meta-reasoning for self-adaptation has used self-models of agents that help localize modifications to the agent design, e.g., [2] [3] [4] [5] [6] [7] [8]. We can trace several themes in model-based self-adaptation in intelligent agents. Firstly, self-adaptations can be retrospective, [3][4][5][6][7][8], i.e., after the agent has executed an action in the world and received some feedback on the result, or proactive [2], i.e., when the agent is given a new goal similar and related to but different from its original goal. The architecture described in this paper can handle both proactive and retrospective adaptations. Secondly, the architecture of self-adaptation, the language of the self-model and the taxonomy of failures can be domain-independent, e.g., [7], or they can be specific to classes of domains [6]. The architecture described here is specific to a class of agents, though we still need to formally characterize the agents' class. Thirdly, the architecture for self-adaptation may [2][7]]8] or may not [6] sanction the invocation of special-purpose reasoners such as planners. As we described in Experiment #3, our architecture supports the use of special-purpose planners.

8. Conclusions and Future Work

Earlier research on model-based reflection and self-adaptation in intelligent agents suggested [2][3][4][8] that (1) goal-based models of intelligent agents that captures the teleology of the agent design could help localize the changes to the agent design needed for classes of adaptations, and (2) hierarchical organization of the goal-based

models of the agent designs helped make the above localization efficient. The work reported here expands and extends earlier work on self-adaptation in two ways. Firstly, the agent specification language TMKL2 has a better defined syntax and semantics than its predecessor. This adds clarity, precision and rigor. While many agent specification languages specify the goals, mechanisms, structure and domain knowledge of agents, TMKL2 explicitly organizes the agent's mechanisms and domain knowledge around its goals. Together, the goals and the mechanisms that achieve them specify the teleology of the agent's design. Secondly, while earlier work on modelbased reflection and self-adaptation pertained to agents that operated in small, fully-observable, deterministic, and largely static worlds, GAIA operates in large, complex, dynamic, partially observable and non-deterministic worlds. For example, earlier agents performed tasks such as route planning [3] and assembly planning [2]. In contrast, Freeciv is large, complex, dynamic, partially observable, and non-deterministic. It is important to investigate if the technique for model-based reflection and self-adaptation works in worlds such as Freeciv.

The four experiments in self-adaptation described here cover a small range of retrospective and proactive agent adaptations. They demonstrate that (i) it is possible in principle to design game-playing agents so that their teleology can be captured, specified and inspected, (ii) the specification of the teleology of the agent's design enables localization of modifications needed for the four experiments in self-adaptation, and (iii) this self-adaption in turn enables the agent to play an interactive game, monitor its behavior, adapt itself, play the game again, and so on.

The next steps in our work are to empirically investigate more adaptation scenarios, generalize to other classes of adaptations, and relate model-based self-adaptation to theories of self-adaptation. Another important step is explore whether the meta-agent can adapt itself, thus completing the cycle of self-adaptation.

Acknowledgements

We thank Andrew Crowe, Joshua Jones, Chris Parnin, and Deepak Zambre for their contributions to this work. We are grateful to the US National Science Foundation for its support for this research through a Science of Design Grant (#0613744) entitled "Teleological Reasoning in Adaptive Software Design."

References

1. M. Molineaux and D. Aha. TIELT: A Testbed for Gaming Environments. In Procs. 20th National Conference on AI, pp. 1690-1691.

2. J. Murdock and A. Goel. "Meta-Case-Based Reasoning: Self-Improvement Through Self-Understanding". Journal of Experimental and Theoretical Artificial Intelligence, 20(1):1-36, 2008.

3. L. Birnbaum, G. Collins, M. Freed and B. Krulwich. "Model-Based Diagnosis Of Planning Failures". In Procs. 8th National Conference on Artificial Intelligence, pp. 318-23, 1990.

4. E. Stroulia and A. Goel. "Functional Representation and Reasoning in Reflective Systems". Journal of Applied Artificial Intelligence, 9(1):101-24, 1995.

5. E. Stroulia and A. Goel. "Evaluating PSMs In Evolutionary Design: The Autognostic Experiments". International Journal of HumanComputer Studies, 51(4):825-47, 1999

6. S. Fox and D. Leake. "Introspective Reasoning For Index Refinement In Case-Based Reasoning". Journal of Experimental and Theoretical Artificial Intelligence, 13:63-88, 2001.

7. M. Anderson, T. Oates, W. Chong and D. Perlis. "The metacognitive Loop I: Enhancing Reinforcement Learning with Metacognitive Monitoring and Control for Improved Perturbation Tolerance". J. of Experimental and Theoretical Artificial Intelligence, 18(3), 387-411, 2006.

8. J. Jones and A. Goel. "Perceptually Grounded Self-Diagnosis And Self-Repair Of Domain Knowledge". Knowledge-Based Systems, 27:281-301, 2012.

9. J. Kephart and D. Chess. "The Vision of Autonomic Computing". Computer, 36(1):41-50, 2003.

10. J. Zhang and B. Cheng. "Model-Based Development of Dynamically Adaptive Software". Procs. 28th International Conference on Software Engineering, pp. 371-80, 2006.

11. J. Penix and P. Alexander. "Toward Automated Component Adaptation". Procs. 9th International Conference on Software Engineering and Knowledge Engineering, pp. 535-42., 1997.

12. M. Georgoff and A. Lansky. "Reactive Reasoning and Planning". Procs, 6th National Conference on AI, pp. 677-82, 1987.

13. Goel, A., & Jones, J. (2011) Meta-Reasoning for Self-Adaptation in Intelligent Agents. In Meta-Reasoning: Thinking About Thinking. M. Cox & A. Raja (editors), MIT Press, 2011.