Sequential decision making under uncertainty using belief-state MDP for decision-making: the Hurrucane Evacuation problem. (This is an obscure variant of the Canadian traveler problem.)
The domain description is similar to that described in assignment 1, except that again we do not know the locations of the blocakges. For simplicity, however, we will assume that the blockages occur independently, with a known given probability. They are revealed with certainty when the agent reaches a neigbouring vertex. We will also assumes that the number of evacuees at each vertex is known, and is always less than 5.
Thus, in the current problem to be solved, we are given a weighted undirected graph, where each edge has a known probability of being blocked. These distributions are jointly independent. The agent's only actions are traveling between vertices. Traversal times are the weight of the edge (that is, K=0). Also for simplicity, we will assume only one agent, starting at s, and only one shelter at vertex t, The problem is to find a policy that saves (in expectation) as many people as possible before the deadline.
The graph can be provided in a manner similar to previous assignments, for example:
#V 5 ; number of vertices n in graph (from 1 to n) #V 3 Ev2 ; Vertex 3, has 3 evacuees #V 4 Ev1 ; Vertex 4, has 1 evacuee etc. #E1 1 2 W3 ; Edge from vertex 1 to vertex 2, weight 3 #E2 2 3 W2 ; Edge from vertex 2 to vertex 3, weight 2 #E3 3 4 W3 B0.3 ; Edge from vertex 3 to vertex 4, weight 3, probability of blockage 0.3 #E4 4 5 W1 ; Edge from vertex 4 to vertex 5, weight 1 #E5 2 4 W4 ; Edge from vertex 2 to vertex 4, weight 4 #Deadline 10 #Start 1 #Shelter 5
The start and goal (shelter) vertices, should be determined via some form of user input (as in the file, or querying the user). For example, in the above graph the start vertex is 1, and the goal (shelter) vertex is 5.
The Canadian traveller problem is known to be PSPACE-complete (and it is likely that the uncertain Hurricane Evacuation problem is also PSPACE-complete) so you will be required to solve only very small instances. We will require that the entire belief space be stored in memory explicitly, and thus impose a limit of at most 10 vertices with and at most 10 possible blockages. Your program should initialize belief space value functions and use a form of value iteration (discussed in class) to compute the value function for the belief states. Maintain the optimal action during the value iteration for each belief state, so that you have the optimal policy at convergence.
Your program should read the data, including the parameters (start and shelter vertices). You should construct the policy, and present it in some way. Provide at least the following types of output:
Official deadline: January 6, 2019