In the second exercise you will be using the environment simulator from the first assignment, the EU refugee problem, as a platform for implementing intelligent adversarial agents. The environment is the same as before, except that now we will assume exactly one opposing agent - an intelligent agent, rather than the dumb automaton such as the greedy Hungarian police agent.
As before, the environment consists of an undirected weighted graph. This time, A refugee agent has a start location and a goal location. For a refugee agent, there are is a state variable designating it current location and amount of food it is carrying.
An agent can apply 2 types of action: traverse (like move in the standard CTP), and no-op. Semantics of the action are as in assignment 1, repeated below. The no-op action does nothing, except consuming 1 unit of food if the agent is a refugee (refugee dies if its turn arrives and if carrying no food and no aid package at node). The results of a traverse (from a current vertex V to a specific adjacent vertex U) action is as follows. A traverse operation by Hungarian poilce always succeeds. A traverse operation by a refugee is successful if there are no Hungarian police at U and a sufficient amount of food is carried: at least equal to the weight of the edge. The amount of food being carried is reduced by the weight of the edge. Any food packages at U are automatically picked up. However, if the traversal is unsuccessful the refugee dies. The scoring in this game are slightly different from assignment 1. In this assignment, the cost of moving is always equal to the weight of the edge, for both police and refugees. The cost for dying (for a refugee) is 1000, rather than infinite.
Note that in this assignment the agents can be completely adversarial, or semi-cooperating, as discussed below. We will also assume that a user defined horizon T exists, the game always stops after T moves by each agent, or when all refugees reach their goal or die, whichever comes first.
The simulator should query the user about the parameters, the type of game (see below) as well as other initialization parameters for each agent (T, position, goals).
After the above initialization, the simulator should run each agent in turn, performing the actions returned by the agents, and update the world accordingly. Additionally, the simulator should be capable of displaying the world status after each step, with the appropriate state of the agents and their score. Here there are two types of agent programs: human (i.e. read input from keyboard) and game tree search agent.
Each agent program (a function) works as follows. The agent is called by the simulator, together with a set of observations. The agent returns a move to be carried out in the current world state. The agent is allowed to keep an internal state if needed. In this assignment, the agents can observe the entire state of the world. You should support the following types of games:
Since the game tree will usually be too big to reach terminal positions in the search, you should also implement a cutoff, and a heuristic static evaluation function for each game. You may use the same heuristic for all games, if you think this is justified.
The program and code sent to the grader, by e-mail or otherwise as specified by the grader, a printout of the code and results. You need to show example scenarios where the optimal behavior differs for the 3 kinds of games (you will need to make the example scenarios very small in order to be able to reach terminal states in the search). A description of your heuristic evaluation functions and their rationale. Set up a time for frontal grading checking of the delivered assignment, in which both members of each team must demonstrate at least some familiarity with their program.
Due date: Thursday, December 10.