Introduction to Artificial Intelligence

Assignment 2


Adversarial games for agents in the Syrian traveller problem

In the second exercise you will be using the environment simulator from the first assignment, the Syrian traveller problem, as a platform for implementing intelligent adversarial agents. The environment is the same as before, except that now we will assume exactly one opposing agent - an intelligent agent, rather than the dumb automaton such as the terrorist buster.

Game Environment

As before, the environment consists of an undirected weighted graph. Each edge can be either blocked by terrorists or clear. Some vertices to contain chems. The environment can contain one or more agents, each with a known starting location, and a there is a set of known goal locations where chems can be dumped. For each agent, there are is a state variable designating it current location.

An agent can apply 2 types of action: drive (like move in the standard CTP), and no-op. Semantics of the action are as in assignment 1, repeated below. The results of a drive (from a current vertex V to a specific adjacent vertex U) action is as follows. If the edge E to be traversed is clear, the agent reaches U. Side effects depend on the parameters: driving with chems moves the chems to U. Driving with escort also moves the military unit to U. The escort also makes a blocked edge clear, but only if not also carrying chems. If E is blocked by terrorists, and the agent is escorted, the agent succesfully traverses E, and again the agent reaches U, with a side-effect that the chems and the military escort forces are also at U. If the agent traverses a blocked edge carrying chems but unescorted, the terrorists obtain the chems and hell breaks loose! Finally, the agent can traverse a blocked edge if not carrying chems (these terrorists only want the chems). The cost of actions is as follows: no-op has a small cost epsilon determined by the user. drive has a cost equal to the edge weight, doubled for each true value of each of the parameters and thus quadruple for driving escorted with chems). But driving unescorted with chems across a blocked edge has a cost HUGE determined by the user (the cost of hell breaking loose...).

Note that in this assignment the agents can be completely adversarial, or semi-cooperating, as discussed below. We will also assume that a user defined horizon T exists, the game always stops after T moves by each agent, or when there are no more chems in the game, whichever comes first. Any agent reaching a goal with a chem gets a reward R, and the rewards accumulate additively if there is more than one chem in the game.

Implementation Steps

The simulator should query the user about the parameters, the type of game (see below) as well as other initialization parameters for each agent (epsilon, T, position, goals).

After the above initialization, the simulator should run each agent in turn, performing the actions retured by the agents, and update the world accordingly. Additionally, the simulator should be capable of displaying the world status after each step, with the appropriate state of the agents and their score. Here there are two types of agent programs: human (i.e. read input from keyboard) and game tree search agent.

Each agent program (a function) works as follows. The agent is called by the simulator, together with a set of observations. The agent returns a move to be carried out in the current world state. The agent is allowed to keep an internal state if needed. In this assignment, the agents can observe the entire state of the world. You should support the following types of games:

  1. A zero sum game. The self-score of an agent is the sum of earned rewards minus the cost of its moves (including penalty if applicable). The total score of an agent is its self-score minus the self-score of the opposing agent. Here you should implement an "optimal" agent using mini-max, with alpha-beta pruning.
  2. A non zero-sum game, same as above but the agent score consisting only of its self-score. Ties are broken cooperatively.
  3. A fully cooperative game: the agents aim to maximize the sum of the self-scores of both agents.

Since the game tree will usually be too big to reach terminal positions in the search, you should also implement a cutoff, and a heuristic static evaluation function for each game. You may use the same heuristic for all games, if you think this is justified.

Deliverables

The program and code sent to the grader, by e-mail or otherwise as specified by the grader, a printout of the code and results. You need to show example scenarios where the optimal behavior differs for the 2 kinds of games (you will need to make the example scenarios very small in order to be able to reach terminal states in the search). A description of your heuristic evaluation functions and their rationale. Set up a time for frontal grading checking of the delivered assignment, in which both members of each team must demonstrate at least some familiarity with their program.

Due date: Monday, November 25.