Introduction to Artificial Inteligence

Assignment 4

Programming assignment - Decision-making under uncertainty

Potentially-hostile rock sample

Goals

Decision making under uncertainty using Bayes networks for reasoning and belief-state MDP for decision-making, on scenarios similar to the simplified Unreal Tournament. Part 1 is on just the probabilistic reasoning part, and part 2 is on sequential decision-making.

Potentially-hostile rock sample - Domain Description

Out for a refueling mission on planet Prytkon, our robot is trying to retrieve dilithium crystals from potential deposit sites mapped from space. However, observation from space is greatly imprecise, so that for some sites the survey information may be incorrect. To make matters worse, Dr. Sarg Calan, the ship's chief scientist, has theorised that a colony of living rocks consisting of anti-matter may have landed on the planet. Approaching a site containing such anti-matter deposits would cause an explosion, destroying the robot. Anti-matter has the same detected signature (for space observations) as dilithium crystal deposits.

Fortunately, the robot has sensors, rather costly to operate, that can provide additional information about whether a site contains dilithium crystals, anti-matter, or nothing (assuming that a site cannot contain both). The sensors are rather precise, at close range, fortunately beyond the danger point of proximity to the anti-matter, though...

Additional statistical information is available, on how likely it is that the Calan theory is correct (probability Pc), and the probability that a site contains dilithium or anti-matter, for both cases w.r.t. the correctness of the theory. For example, the probability of the deadly anti-matter deposits is very small, if Calan's theory is false.

We will be using the same simplified Unreal Tournament domain, modified as follows. There is only one moving agent. Initial location of sites is known and provided on the map. The agent can use a sensor in order to scan a site. The sensor output is ternary: D means dilithium crystals, A means anti-matter deposit, and 0 means neither.

Sensor model

We will assume that a sensor can target a single site precisely. However, the sensor is noisy, and returns a correct reading with probability p/(d+1), and a random incorrect reading (chosen uniformly from the other possibilities) with probability (1-p/(d+1)). Sensor readings are independent given the contents of the site.

The survey map

In your program, a file specifies the geometry (map), including the potential deposit sites (typically 2-20), and the probability Ps that a specific site s contains dilithium, for the case where Calan's theory is false (we will also assume that in this case, probability of antimatter is zero). If calan's theory is true, however, there is probability b that the site contains anti-matter, and probability only (1-b)Ps that the site contains dilithium. There will be several scenario types, requiring successively more complicated skills, defined below.

Part 1: Scenarios requiring pure probabilistic reasoning

In this type of environment, sensing actions are free, and all the measurements are made before any decision is made. Movement is also free, but once the agent commits to a set of sites, no change is possible. The agent should make the following computations:

What is the probability that Calan's theory is true, given the sensor readings?
What is the location where dilithium is most likely to be, given the observations (and with what posterior probability)?
What are the TWO locations that are most likely to BOTH have dilithium?

For simplicity, we will assume that the locations where the sensing actions are executed are given in the input. Input can be as an ASCII file, similar to maps in previous assignments, for example:

##################
#   3         ####
###   A####      #
# ### ##    ######
#s     2    D    #
#         #   9  #
# 1  s    #      #
##################

In the map, numbers represent potential dilithium sites, where the value indicates a prior probability of dilithium in that location, assuming Calan's theory is incorrect (e.g. 2 means probability 20% that the site contains dilithium). s is a location where a sensing action occured but returned 0, D is a location where a sensing action returned D. and A is a location where a sensing action returned A. Assume for simplicity that the sensor readings pertain to the site that is physically the closest one.

Requirements

Your program should read the data, including the parameters p, Ps, b, Pc. Then, the program should construct a Bayes network according to the scenario (and print it out), and then perform probabilistic reasoning in order to answer the questions on the locations most likely to have dilithium, and report on the answers, including the posterior probabilities. You may use any algorithm in the literature that supports solution of polytree BNs, including simple enumerarion, variable elimination, and polytree propagation.

The program should also allow for an output of the Bayes network constructed for the scenario, e.g.:

Location 1 (given Calan's theory being false): dilithium probability 0.9
Location 2 (given Calan's theory being false): dilithium probability 0.3

... etc.

Sensor reading 1 depends on location: 1
Probability table:
    P(D| antimatter) = 0.15
    P(A| antimatter) = 0.7
    P(0| antimatter) = 0.15

    P(D| nothing) = 0.15
    P(A| nothing) = 0.15
    P(0| nothing) = 0.7

    P(D| dilithium) = 0.7
    P(A| dilithium) = 0.15
    P(0| dilithium) = 0.15

Sensor reading 2 depends on location: 2
Probability table:
    P(D| antimatter) = 0.05
    P(A| antimatter) = 0.9
    P(0| antimatter) = 0.05

    P(D| nothing) = 0.05
    P(A| nothing) = 0.05
    P(0| nothing) = 0.9

    P(D| dilithium) = 0.9
    P(A| dilithium) = 0.05
    P(0| dilithium) = 0.05

Part 2: Scenarios requiring sequential decision making

(Now bonus version - can do simplified version below for non-bonus)

In this type of scenarios, the sensor result are not given a-priori. Movement has a cost Cm and sensing has a cost Cs, and the agent should decide how to move and where and when to make sensing actions. Reward for collecting dilithium is Rd and reward for "collecting" anti-matter and dying is Ra (usually negative). The actions available to the robot are to move one square in each of 4 directions, and to measure a site. At each step, the robot can eithr move or perform a measurement, but not both. Motion to an unblocked location succeeds with probability Pm (an input parameter). A failed motion results with equal probability to either: stay in the same place, or to a move perpendicular to the intended direction of movement. The environment in this case should be treated as a belief-state MDP. In order to make this part managanble, we will introduce additional assumptions as follows.

The robot is limited to at most 2 sensing actions.
There is a small number of locations (at most 2 for each site) where sensing operations can be made, provided in the input. This reduces the number of potential observations which can be made, and allows for a compact belief-state to be reresented in terms of observation histories.
Assume that Calan's theory is false.

Requirements

As before, your program should read the data, including the parameters p, Ps, b. You should then construct and optimize the finite horizon belief-state MDP (number of steps n also given as input to the program). Note that for probabilistic reasoning you will use the functions you constructed for part 1 of the assignment. You need to output the optimal policy, either directly or interactively - your choice.

Part 2: Simplified scenarios requiring sequential decision making

(Simplified, non-bonus version)

This one is the same as above except that the information in the sensing data is given in the input. That is, the agent cannot make any more measurements. Therefore, the problem is now a simple MDP, with the following state space: robot location, and status of sites (visited/not visited, and what was found there). That is, for each site the state is one of: Unvisited (U), Dilithium (D), Antimatter (A), Nothing (0). After visiting the site, the state of that site always changes to "nothing". Use the simple finite-horizon MDP equation to solve this problem, and print out the optimal policy for each state and time slice (an arrow head indicating direction of motion for each location).

Deliverables

Source code and executable files of programs.
Explanation of the method for constructing the BN and your reasoning algorithm.
Non-trivial example runs on at least 2 scenarios of each type, including the input and output.
Submit makefile and short description on how to run your program. i.e. how /what parameters are passed to the program and how other inputs (e.g. positions of the agents are known to your program) including at least one full example on how to run your program with actual input.

Deadline: January 8, 2007.

Postponed (conditionally) to January 21, as follows: part I to be submitted by January 11. Part II (simplified) by January 21. Part II (original) is a bonus).