Introduction to Artificial Intelligence

Assignment 2


Adversarial games for agents in the Israeli traveller problem

In the second exercise you will be using the environment simulator from the first assignment, the Israeli traveller problem, as a platform for implementing intelligent adversarial agents. The environment is the same as before, except that now we will assume exacrly one opposing agent - an intelligent agent, rather than the dumb automaton such as the fire fighter or pyromaniac agents.

Game Environment

As before, the environment consists of an undirected weighted graph. Each edge can be either blocked by a fire or clear. Some vertices to contain water resevoirs. The environment can contain one or more agents, each with a known starting location, and a required goal location. For each agent, there are state variables designating it current location, and whether it is carrying water.

An agent can apply 4 types of action: drive (like move in the standard CTP), pickup water, start a fire, and no-op. The results of a drive (from a current vertex V to a specific adjacent vertex U) action is as follows. If the edge E to be traversed is clear, the agent reaches U, at a cost equal to the weight of the edge, or twice that weight if the agent is carrying water. If E is blocked by a fire, and the agent is carrying water, agent traverses E, and again the agent reaches U, with a side-effect that the fire is extinguished and the water disappears. However, if E is blocked but the agent is not carrying water, the action fails, and the agent remains at V. In either case the action encurs a cost equal to the weight of the edge. The pickup action leaves the agent at the same vertex, causes it to be carrying water, and the resevoir at the node to become empty. The action has a cost Cpickup specified by the user. The start action allows an agent situated at a node to start a fire in any adjacent edge, as a result the agent moves along that edge. This action has a cost Cstart. The no-op action does nothing and costs a small positive constant epsilon.

Note that in this assignment the agents can be completely adversarial, or semi-cooperating, as discussed below. We will also assume that a user defined horizon T exists, the game stops after T moves by each agent. An agent that has not achieved its goal after this number of steps incurs a penalty of F units (use 100 as a default penalty). An agent that has reached its goal can only execute no-op actions.

Implementation Steps

The simulator should query the user about the parameters, the type of game (see below) as well as other initialization parameters for each agent (Cpickup, Cstart position, and goal).

After the above initialization, the simulator should run each agent in turn, performing the actions retured by the agents, and update the world accordingly. Additionally, the simulator should be capable of displaying the world status after each step, with the appropriate state of the agents and their score. Here there are two types of agent programs: human (i.e. read input from keyboard) and game tree search agent.

Each agent program (a function) works as follows. The agent is called by the simulator, together with a set of observations. The agent returns a move to be carried out in the current world state. The agent is allowed to keep an internal state if needed. In this assignment, the agents can observe the entire state of the world. You should support the following types of games:

  1. A zero sum game. The score of an agent is cost of its moves (including penalty if applicable) minus the cost of moves for the opposing agent. ("optimal" agent using mini-max, with alpha-beta pruning).
  2. A non zero-sum game, same as above but the agent score is just the cost of its moves, including penalty if applicable ("optimal" agent using maxi-max). Ties are broken cooperatively.
  3. A fully cooperative game: the agents aim to minimize the total cost for their reaching their goals.

Since the game tree will usually be too big to reach terminal position in the search, you should also implement a cutoff, and a heuristic static evaluation function for each game. You may use the same heuristic for all games, if you think this is justified.

Deliverables

The program and code sent to the grader, by e-mail or otherwise as specified by the grader, a printout of the code and results. You need to show example scenarios where the optimal behavior differs for the 2 kinds of games (you will need to make the example scenarios very small in order to be able to reach terminal states in the search). A description of your heuristic evaluation functions and their rationale. Set up a time for frontal grading checking of the delivered assignment, in which both members of each team must demonstrate at least some familiarity with their program.

Due date: Monday, December 10.