Introduction to Artificial Inteligence

Assignments 4-5

Programming assignment - Decision-making under uncertainty

Russian Traveler Problem - DRAFT


Decision making under uncertainty using Bayes networks for reasoning and belief-state MDP for decision-making, on scenarios similar to the simplified Unreal Tournament environment of assignment 2. Part 1 (assignment 4) is on just the probabilistic reasoning part, and part 2 (assignment 5) is on sequential decision-making.

Russian Traveller Problem - Domain Description

(This is an obscure variant of the well-known Canadian Traveller problem.) Out for a supply detail in the russian steppes, our robot is trying to deposit supplies in a set of known sites (flag locations). A map of the area is supplied, but some of the area may be icy, causing the robot to skid, so that it cannot change its motion direction, as in assignments 1 and 2. The map contains the location where icy patches may reside, but whether these locations are actually icy is unknown until the robot actually steps on them.

We will be using the same simplified Unreal Tournament domain, modified as follows. There is only one moving agent. Initial location of flags and potential icy patches is known and provided on the map. Also, the set of potential icy patches is partitioned into different climate regions, within which the climate is similar (same probability of ice), but which may be different between them. For each climactic region statistical information about climate conditions is available, we will assume for simlicity that each region can be either Warm (W), Cold (C), or Frigid (D). The climate conditions for the regions in turn depend on the season, which could be either winter (W) or summer (S), but which are the same for all regions. Neither the season nor the climate conditions are directly observable. In addition, the agent can use a sensor in order to scan a site. The sensor output is binary: 1 means ice, 0 means no ice.

Sensor model

We will assume that a sensor can target a single location. However, the sensor is noisy, and returns a correct reading with probability p, and an incorrect reading with probability 1-p. Sensor readings are independent given the contents of the site. When the robot is within 5 (Euclidean distance) from a site, it has probability of correctness p=Pnear, and when it is farther away it has a probability of being correct p=Pfar.

The survey map

In your program, a file specifies the geometry (map), including the icy patches (typically 2-20). Additionally, you should read from the user numbers, one for each probability a site is actually icy for each of the possible climate conditions (W, C, F), a number designating the prior probablity over the season, and the distribution of climate given the season for each region. There will be several scenario types, requiring successively more complicated skills, defined below.

Part 1: Scenarios requiring pure probabilistic reasoning

In this type of environment, sensing actions are free, and all the measurements are made before any decision is made. Movement is also free, but once the agent commits to a set of sites, no change is possible. The agent should make the following computations:

  1. What is the distribution of ice for each region, given sensor readings?
  2. What is the SINGLE location where ice is most likely to be, given the sensor readings (and with what posterior probability)?
  3. What are the TWO locations that are most likely to BOTH have ice, given the sensor readings?

For simplicity, we will assume that the locations where the sensing actions are executed are given in the input. Input can be as an ASCII file, similar to maps in previous assignments, for example:

#   1         ####
###   A####      #
# ### ##    ######
#2     1    3    #
#         #   3  #
# 2  2    #      #

In the map, numbers represent potential icy patches, where the value indicates a region number. A is the location of the agent. For example, the three sites marked "2" at the bottom left of the map are in the same region, region 2. The distribution over climate conditions for this region should be given as input. Likewise for the other regions (sites marked as "1" and sites markes as "3" respectively).


Your program should read the data, including the distribution parameters. An example of the type of distribution provided for the above map (Program output questions/prompts in capitals, numbers are user input, comments begin with ;):


0.1  0.8 0.1
; probabilities climate conditions Warm, Cold, Frigid
0.1  0.2 0.7
; probabilities climate conditions Warm, Cold, Frigid

0.3  0.3 0.4
; probabilities climate conditions Warm, Cold, Frigid
0.05 0.05 0.9
; probabilities climate conditions Warm, Cold, Frigid

0.9 0.1 0
; probabilities climate conditions Warm, Cold, Frigid
0.7  0.2 0.1
; probabilities climate conditions Warm, Cold, Frigid

PROBABILITY OF ICE for Warm climate?
PROBABILITY OF ICE for Cold climate?
PROBABILITY OF ICE for Frigid climate?
You should add nodes for potential sensor readings, up to 2 readings per site. Then, the program should construct a Bayes network according to the scenario. The program should also allow for an output of the Bayes network constructed for the scenario. For example (a partial printout):
P(Season) = (0.1, 0.9)    ; Winter, summer

P(Region1|Winter) = (0.1, 0.2, 0.7)   ; (W, C, F)
P(Region1|Summer) = (0.1, 0.8, 0.1)   ; (W, C, F)

P(Region2|Winter) = (0.05, 0.05, 0.9)   ; (W, C, F)
P(Region2|Summer) = (0.3, 0.3. 0.4)   ; (W, C, F)

; Etc. for all regions

P(Site11 | Region1 = W) = (0.1, 0.9)     ; Ice, no ice
P(Site11 | Region1 = C) = (0.2, 0.8)     ; Ice, no ice
P(Site11 | Region1 = F) = (0.9, 0.1)     ; Ice, no ice

P(Site12 | Region1 = W) = (0.1, 0.9)     ; Ice, no ice
P(Site12 | Region1 = C) = (0.2, 0.8)     ; Ice, no ice
P(Site12 | Region1 = F) = (0.9, 0.1)     ; Ice, no ice

; etc. note that conditional probabilities for all sites are the same,
; but conditioned on values at different regions

P(Meas1 | Site12 = Ice) = (0.9, 0.1)    ;  Reading (1,0) resp. 0.9 = Pnear
P(Meas1 | Site12 = no Ice) = (0.1, 0.9)    ;  Reading (1,0) resp.

P(Meas2 | Site12 = Ice) = (0.7, 0.3)    ;  Reading (1,0) resp. 0.7 = Pfar
P(Meas2 | Site12 = no Ice) = (0.3, 0.7)    ;  Reading (1,0) resp.

; etc.

Note that your BN for each region is tree-shaped. You should then query the user for a set of sensor readings. An example of the type of sensor readings provided for the above map:

Site at (2, 4) reading: 1
Site at (2, 4) reading: 0
Site at (1, 8) reading: 1

Observe that readings should be independent given the value of the site. Up to 2 reading per site are allowed, and not all sites must have any readings at all. Now, perform probabilistic reasoning in order to answer the questions on the locations most likely to have ice, and report on the answers, including the posterior probabilities. You may use any algorithm in the literature that supports solution of BNs, including simple enumerarion, variable elimination, polytree propagation, or sampling.

Part 2: Scenarios requiring sequential decision making

(Bonus version - can do simplified version below for a SMALL bonus)

In this type of scenarios, the sensor result are not given a-priori. Movement has a cost Cm and sensing has a cost Cs, and the agent should decide how to move and where and when to make sensing actions. Reward for reaching a supply site (flag) is Rd and reward for each motion action is -1. The actions available to the robot are to move one square in each of 4 directions, and to measure a site. At each step, the robot can either move or perform a sensor reading, but not both. The environment in this case should be treated as a belief-state MDP. In order to make this part managanble, we will introduce additional assumptions as follows.


As before, your program should read the data, including the parameters You should then construct and optimize the finite horizon belief-state MDP (number of steps n also given as input to the program). Note that for probabilistic reasoning you will use the functions you constructed for part 1 of the assignment. You need to output the optimal policy, either directly or interactively - your choice.

Part 2: Simplified scenarios requiring sequential decision making

(Simplified, small-bonus version)

This one is the same as above except that the information in the sensing data is given in the input. That is, the agent cannot make any more measurements. Therefore, the problem is now a simple MDP, with the following state space: robot location, and status of sites (visited/not visited, and what was found there). That is, for each site the state is one of: Unvisited (U), Ice (I), No ice (0). Use the simple finite-horizon MDP equation to solve this problem, and print out the optimal policy for each state and time slice (an arrow head indicating direction of motion for each location).


  1. Source code and executable files of programs.
  2. Explanation of the method for constructing the BN and your reasoning algorithm.
  3. Non-trivial example runs on at least 2 scenarios of each type, including the input and output.
  4. Submit makefile and short description on how to run your program. i.e. how /what parameters are passed to the program and how other inputs (e.g. positions of the agents are known to your program) including at least one full example on how to run your program with actual input.

Deadline: February 19, 2009.