Sequential decision making under uncertainty using belief-state MDP for decision-making: the Canadian traveller problem.

The domain description is as described in previous assignments, except that now the agent has to traverese the path not knowing whether edges are flooded unless they have been observed from an ajacent vertex.

Thus, in the current problem to be solved, we are given a weighted undirected graph, where each edge has a known probability that an edge is flooded (blocked). The probabiliies of flood are mutually independent. Whether an edge is flooded becomes known with certainty when the agent reaches one of the two vertices incident on that edge. The agent's only actions are to traverese an edge, at a cost equal to the weight of the edge divided by the speed of the car, or trasfer to another car, in the same vertex, at cost 1. Given a start vertex s and a goal vertex t, the problem is to find a policy that navigates from s to t with minimal expected cost.

The graph can be provided as in previous assignments, for example:

#V 4 ; number of vertices n in graph (from 1 to n) #C 1 100 50 ; Car at vertex 1, speed 100, speed on flooded road 50 #C 2 200 0 ; Car at vertex 2, speed 200, speed on flooded road 0 (cannot traverse flooded roads) #E 1 2 W300 F0.1 ; Edge from vertex 1 to vertex 2, weight 300, flood probability 0.1 #E 2 3 W200 F0.8 ; Edge from vertex 2 to vertex 3, weight 200, flood probability 0.8 #E 3 4 W300 F0.3 ; Edge from vertex 3 to vertex 4, weight 300, flood probability 0.3 #E 2 4 W400 F0.3 ; Edge from vertex 2 to vertex 4, weight 400, flood probability 0.3 #E 1 4 W1000 F0 ; Edge from vertex 1 to vertex 4, weight 1000, flood prob. 0

The start and goal vertices should be determined via some form of user input (a file or querying the user). For example, in the above graph the start vertex could be 1, and the goal vertex could be 4.

The Canadian traveller problem is known to be PSPACE-complete, so you will be required to solve only very small instances. We will require that the entire belief space be stored in memory explicitly, and thus impose a limit of at most 8 unknown edges in the graph and at most 2 cars. Your program should initialize belief space value functions and use a form of value iteration (discussed in class) to compute the value function for the belief states. Maintain the optimal action during the value iteration for each belief state, so that you have the optimal policy at convergence.

Your program should read the data, including the parameters (start and goal vertices). You should construct the policy, and present it in some way. Provide at least the following types of output:

- A full printout of the value of each belief-state, and the optimal action in that belief state, if it exists. (Print something that indicates so if this state is irregular, e.g. if it is unreachable).
- Run a sequence of simulations. That is, generate a graph instance (flood locations) according to the flood distributions, and run the agent through this graph based on the (optimal) policy computed by your algorithm. Display the graph instance and sequence of actions. Allow the user to run additional simulations, for a newly generated instance in each case.

- Source code and executable files of programs.
- Explanation of the method employed in your algorithm.
- Non-trivial example runs on at least 2 scenarios, including the input and output.
- Submit makefile and short description on how to run your program.

To get the bonus, you must extend the CTP assignment, as follows:

- Bonus 1: as in the exam, cars not in the initial vertex are at an unknown location, equally
distributed among a given list of vertices. A car is detected with certainty
once the agent arrives there. Also by implication, if the agent arrives at a vertex where
a car MAY be but the car is NOT seen there, can reach the appropriate conclusions
about where the car is likely to be. For the input, use the following format, for example
to indicate that there is a car
at vertex 1, 2, or 4 with equal probability, speed 100 (speed 50 in flooded road):
#C (1, 2, 4) 100 50

- Bonus 2: Add remote sensing actions for some EDGES.
This is a new type of action allowed from some edges, that has an indicated cost
(time in our variant of CTP), and as a result of the action the agent knows with certainty
whether the edge is flooded. To indicate these costs, use input like:
#SE 1 2 4 ; Sensing the edge from vertex 1 to vertex 2 takes 4 hours