Introduction to Artificial Inteligence

Assignment 6


Written assignment: probabilistic inference and learning

Justify all answers!

1) We have the following distribution over the binary variables A, B, C, given as a full table:

   A     B     C     probability
  -------------------------------
   F     F     F       0.81
   F     F     T       0
   F     T     F       0
   F     T     T       0.09
   T     F     F       0
   T     F     T       0.09
   T     T     F       0.01
   T     T     T       0

a)   Construct the Bayes network that most compactly describes the distribution (i.e. one that has
     the smallest number of arcs). 

b) Is the answer unique?

2) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        B               A
        C               B E
        D               A
        E               B D
        F               D

    a) Is this network a poly-tree?
    b) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({B}, {D} | {})
       2)  I({B}, {D} | {A})
       3)  I({B}, {D} | {A, C})
       4)  I({B}, {F} | {A})
       5)  I({B}, {F} | {A, C})
    c) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.4, P(D=true|A=true) = 0.5, P(D=true|A=false) = 0.1,
       P(E=true|D=true) = 0.5, P(E=true|D=false) = 0.2

       Find P(A=true | D=true, F=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.

3) You need to construct a 5-input neural network that has value 1 in the 
   output just when the number of its inputs that are on is either 0
   or 3. (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?
   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

4) a) Using the perceptron learning rule (the "delta" rule), show the steps
      of learning 2-input NAND function with a single perceptron
      and a step activation function with initial threshold 1 (note
      that the threshold can also be learned as a weight). Initially all
      other weights are minus 0.9, and the learning rate is 0.5.
      Note: in this case use the constant 1 instead of the derivative
      of the g() function.

   b) Show (by example) that the order of presenting the examples can
      change the number of training episodes.


5)  Consider the following taxation scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the amount of money that your assets are worth.
    Assume also that moral considerations are not part of your utility function.
 
    Your accountant has notified you that your taxes for FY 2006-2007 amount to 1,000,000 NIS.
    You have the following options:
       1) Declare the correct taxes verbatim, and pay the full amount.
       2) Claim that you have used much of the money to fund
          scientific experiments on monkeys, in which case you will only need to
          pay 200,000 NIS. However, a senior tax collection evaluator will then assess
          your forms, and some of them have militant friends in the ASPCM* which
          will trash your house, costing you an additional 1,500,000.

    There are 3 possible evaluators, equally likely to be on your case:

    Evaluator A has no friends in the ASPCM.

    Evaluator B has a militant friend in the ASPCM, but likes the
       colour green, and if he sees a sufficient amount of it (say
       $30,000 = 100,000 NIS), with  probability 0.5, will
       forget that monkeys have anything to do with animals (and
       otherwise will take the money and still tell his friend).

    Evaluators C will always report to his militant ASPCM friends.

     a) Which of the above taxation options should you prefer as a rational agent?

     b) Suppose that a "friend" of the accountant working at the treasury department can give you a "hint"
        about which evaluator will be on your case, if you grease his palm.
        How much is this information worth, given that your "friend"
        is perfectly reliable in this respect?

     Draw the decision diagram for each case, and solve it.



6) Consider the following belief-state MDP, where scoring is total value of rewards obtained in 3 steps.
   The world has 4 locations in linear sequence, as follows:  F1 I F2 P
   Actions are: R attempts to move right, and L attempts to move left,
   and S senses location F2. Location F1 contains a  flag, which has a
   value of 1. Location F2 contains a flag, with value either 10 or
   100, with probability 0.5 each. 

   A robot visiting a location with a flag gets a reward equal to the
   flag value, and then the flag disappears. P is an absorbing state,
   with a reward of -100.
   Transition probabilities are based on: motion succeeds with probability 0.8, but with probability
   0.2 the robot either stays in the same location, or moves in the opposite direction (if possible),
   with probability 0.1 each (0.2 to stay in place if at end of sequence).
   Sensing, if performed, tells the robot the actual value of the flag
   at F2. 

   b1) Formalize the above problem as a (belief-state) MDP, i.e. what are the states, the transition probability
       function, and the reward function?

   b2) Find the optimal policy. You may assume that the robot starts
       at location I in order to save some computation.

Deadline: Wednesday, April 9, 2008, at 12 noon.

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, or people, is unintentional.