Introduction to Artificial Inteligence

Assignments 4-5

Programming assignment - Decision-making under uncertainty

Hierarchical Rock Sample

Goals

Decision making under uncertainty using Bayes networks for reasoning and belief-state MDP for decision-making, on scenarios similar to the simplified Unreal Tournament. Part 1 is on just the probabilistic reasoning part, and part 2 is on sequential decision-making.

Hierarchical rock sample - Domain Description

Out for a refueling mission on planet Prytkon, our robot is trying to retrieve dilithium crystals from potential deposit sites mapped from space. However, observation from space is greatly imprecise. The information available from space is probabilistic: it can detect potential dilithium sites, but cannot determine with certainty whether they contain dilithium. In addition, the information from space can be used to partition the area into four uniform substrate sectors. Alionite, Barionite, Craptonite, and Dratdonite. Sites occuring in each substrate have a different probability of of actually containing dilithium. Unfortunately, although one can separate the sectors from space due to different colour, it is unknown with certainty which sector contains which substrate.

Fortunately, the robot has sensors, rather costly to operate, that can provide additional information about whether a site contains dilithium crystals. The sensors are rather precise, at close range.

Additional statistical information is available, on how likely it is that a site in a certain substrate contain dilithium, and the prior probability that each sector is a certain substate.

We will be using the same simplified Unreal Tournament domain, modified as follows. There is only one moving agent. Initial location of sites is known and provided on the map. Also, the set of sites is partitioned into sectors. The agent can use a sensor in order to scan a site. The sensor output is binary: 1 means dilithium crystals, 0 means no crystals.

Sensor model

We will assume that a sensor can target a single site. However, the sensor is noisy, and returns a correct reading with probability p, and an incorrect reading with probability p. Sensor readings are independent given the contents of the site. When the robot is within 5 (Euclidean distance) from a site, it has probability of correctness p=Pnear, and when it is farther away it has a probability of being correct p=Pfar.

The survey map

In your program, a file specifies the geometry (map), including the potential deposit sites (typically 2-20). Additionally, you should read from the user 4 numbers, one for each probability that a site contains dilithium, given the substrate. That is: P(dilithium at site i=true| substrate = Alionite) etc. There will be several scenario types, requiring successively more complicated skills, defined below.

Part 1: Scenarios requiring pure probabilistic reasoning

In this type of environment, sensing actions are free, and all the measurements are made before any decision is made. Movement is also free, but once the agent commits to a set of sites, no change is possible. The agent should make the following computations:

What is the distribution of substrates for each sector, given sensor readings?
What is the SINGLE location where dilithium is most likely to be, given the sensor readings (and with what posterior probability)?
What are the TWO locations that are most likely to BOTH have dilithium, given the sensor readings?

For simplicity, we will assume that the locations where the sensing actions are executed are given in the input. Input can be as an ASCII file, similar to maps in previous assignments, for example:

##################
#   1         ####
###   A####      #
# ### ##    ######
#2     1    3    #
#         #   3  #
# 2  2    #      #
##################

In the map, numbers represent potential dilithium sites, where the value indicates a sector number. A is the location of the agent. For example, the three sites marked "2" at the bottom left of the map are on the same (unknown) substrate. The distribution over substrate type for this region should be given as input. Likewise for the other regions (sites marked as "1" and sites markes as "3" respectively).

Requirements

Your program should read the data, including the distribution parameters. An example of the type of distribution provided for the above map (Program output questions/prompts in capitals, numbers are user input, comments begin with ;):

DISTRIBUTION FOR REGION 1?
0.1  0.3  0.4  0.2
; probabilities for substrate:
;  Alionite, Barionite, Craptonite, and Dratdonite, resp.
DISTRIBUTION FOR REGION 2?
0.9  0.1  0  0
DISTRIBUTION FOR REGION 3?
0.1  0.8  0  0.1

PROBABILITY OF DILITHIUM FOR REGION 1?
0.9 0.8 0 0.3
; Probabilities given each type of substrate.
; etc.

You should add nodes for potential sensor readings, up to 2 readings per site. Then, the program should construct a Bayes network according to the scenario (and print it out). Example:

Node for region 1, priors:
P([A,B,C,D]) = 0.1  0.3  0.4  0.2

Node for site at (2, 4) in region 1:
P(Dilithium = true | [A, B, C, D]) = 0.9 0.8 0 0.3

Reading node 1 for site at (2, 4):
P(true| dilithium = true) = 0.9
P(true| dilithium = false) = 0.1

;etc.

Note that your BN for each region is tree-shaped. You should then query the user for a set of sensor readings. An example of the type of sensor readings provided for the above map:

Site at (2, 4) reading: TRUE
Site at (2, 4) reading: FALSE
Site at (1, 8) reading: TRUE

Observe that readings should be independent given the value of the site. Up to 2 reading per site are allowed, and not all sites must have any readings at all. Now, perform probabilistic reasoning in order to answer the questions on the locations most likely to have dilithium, and report on the answers, including the posterior probabilities. You may use any algorithm in the literature that supports solution of BNs, including simple enumerarion, variable elimination, polytree propagation, or sampling.

The program should also allow for an output of the Bayes network constructed for the scenario.

Part 2: Scenarios requiring sequential decision making

(Now bonus version - can do simplified version below for non-bonus)

In this type of scenarios, the sensor result are not given a-priori. Movement has a cost Cm and sensing has a cost Cs, and the agent should decide how to move and where and when to make sensing actions. Reward for collecting dilithium is Rd and reward for each action is -1. The actions available to the robot are to move one square in each of 4 directions, and to measure a site. At each step, the robot can eithr move or perform a measurement, but not both. Motion to an unblocked location succeeds with probability Pm (an input parameter). A failed motion results with equal probability to either: stay in the same place, or to a move perpendicular to the intended direction of movement. The environment in this case should be treated as a belief-state MDP. In order to make this part managanble, we will introduce additional assumptions as follows.

The robot is limited to at most 2 sensing actions.
Any sensing actions must be made before the robot starts moving.

Requirements

As before, your program should read the data, including the parameters You should then construct and optimize the finite horizon belief-state MDP (number of steps n also given as input to the program). Note that for probabilistic reasoning you will use the functions you constructed for part 1 of the assignment. You need to output the optimal policy, either directly or interactively - your choice.

Part 2: Simplified scenarios requiring sequential decision making

(Simplified, non-bonus version)

This one is the same as above except that the information in the sensing data is given in the input. That is, the agent cannot make any more measurements. Therefore, the problem is now a simple MDP, with the following state space: robot location, and status of sites (visited/not visited, and what was found there). That is, for each site the state is one of: Unvisited (U), Dilithium (D), Nothing (0). After visiting the site, the state of that site always changes to "nothing". Use the simple finite-horizon MDP equation to solve this problem, and print out the optimal policy for each state and time slice (an arrow head indicating direction of motion for each location).

Deliverables

Source code and executable files of programs.
Explanation of the method for constructing the BN and your reasoning algorithm.
Non-trivial example runs on at least 2 scenarios of each type, including the input and output.
Submit makefile and short description on how to run your program. i.e. how /what parameters are passed to the program and how other inputs (e.g. positions of the agents are known to your program) including at least one full example on how to run your program with actual input.

Deadline: April 13, 2008.