Sequential decision making under uncertainty using belief-state MDP for decision-making: the Canadian traveller problem.
The domain description is as described in previous assignments, except that now the agent has to traverese the path not knowing whether edges are icy unless they have been observed from an ajacent vertex. In this version there is no salt in vertices, so an icy edge cannot be traversed at all.
Thus, in the current problem to be solved, we are given a weighted undirected graph, where each edge has a known probability that an edge is icy (blocked). The probabiliies of ice are mutually independent. Whether an edge is icy becomes known with certainty when the agent reaches one of the two vertices incident on that edge. The agent's only actions are to traverese a non-icy edge, at a cost equal to the weight of the edge. Given a start vertex s and a goal vertex t, the problem is to find a policy that navigates from s to t with minimal expected cost. In order to make the problem well defined, we assume that there is one edge between s and t, typically with some large weight, that is known not to be icy.
The graph can be provided as in previous assignments, for example:
#V 4 ; number of vertices n in graph (from 1 to n) #E 1 2 W3 I0.1 ; Edge from vertex 1 to vertex 2, weight 3, ice probability 0.1 #E 2 3 W2 I0.8 ; Edge from vertex 2 to vertex 3, weight 2, ice probability 0.8 #E 3 4 W3 I0.3 ; Edge from vertex 3 to vertex 4, weight 3, ice probability 0.3 #E 2 4 W4 I0.3 ; Edge from vertex 2 to vertex 4, weight 4, ice probability 0.3 #E 1 4 W100 I 0 ; Edge from vertex 1 to vertex 4, weight 100, ice prob. 0
The start and goal vertices should be determined via some form of user input (a file or querying the user). For example, in the above graph the start vertex could be 1, and the goal vertex could be 4.
The Canadian traveller problem is known to be PSPACE-complete, so you will be required to solve only very small instances. We will require that the entire belief space be stored in memory explicitly, and thus impose a limit of at most 10 unknown edges in the graph. Your program should initialize belief space value functions and use a form of value iteration (discussed in class) to compute the value function for the belief states. Maintain the optimal action during the value iteration for each belief state, so that you have the optimal policy at convergence.
Your program should read the data, including the parameters (start and goal vertices). You should construct the policy, and present it in some way. Provide at least the following types of output:
Deadline: January 10, 2010.