Introduction to Artificial Inteligence

Assignment 6 (Solutions, *under construction*)


Written assignment: planning, probabilistic inference, and learning

Justify all answers!

1) We are given the following independence statements about random variables
   A, B, C, D, E, F:

      I({A,B,C,D},{E,F}|{})
      I({A},{C,D}|{B})
      I({D},{A,B}|{C})
      
      not I({A},{B}|{})
      not I({B},{C}|{})
      not I({C},{D}|{})
      not I({E}, {F}|{})


a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

ANSWER: The first independence statement means there are no edges between any of {A,B,C,D} and any of {E,F}
so we have two separate components. 

In the {E, F} component, due to the last dependent statement,
there is a path between E and F, and as there are only 2 nodes there this means an edge between
E and F (say the directed edge E to F).

In the {A,B,C,D} component there is a path from A to B, from B to C, and from C to D that does not
contain colliders ("converging" nodes), due to the dependnece statements. However, due to the 2nd and 3rd
independence statemenets, there is no edge from A to C or D, and resp. no edge from A or B to D.
Since this part of the graph is connected, A must have an edge with B, and D must have an edge with C,
and B must have an edeg with C. This means that this part of the graph is a chain A,B,C,D with no
collodiers, because B blocks the path ABC. Likewise C is not a collider. So one possible
way to make a directed graph here from A to B to C to D.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with all the statements?

ANSWER: Not unique. Edge between E and F can be in either direction.
  Also, path ABCD can be oriented in any way that contains no colliders, so essentially any of the
  nodes A,B,C,D can be the "root", with edges directed away from it, and the rest of this component
  determined uniquely.

2) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        B               none
        C               A
        D               A
        E               C B
        G               D B
        H               E

    a) Is this network a poly-tree?
ANSWER: No, due to undirected cycle ACEBGDA

    b) Is this network (directed-path) singly connected?
ANSWER: Yes, the only simple undirected cycle contains four direction changed, and thus no
        alternate directed  path between any par of nodes.

    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER: No, path DACE is is unblocked (A is non-evidence diverging, and C is non-evidence passthrough, so neithr blocks the path).
       2)  I({D}, {E} | {A})
ANSWER: Yes, path DACE is now blocked by A (diverging evidence node), and path DGBE is blocked by G (non-evidence, childless, converging "barren" node)
       3)  I({D}, {E} | {A, G})
ANSWER: No, now path DGBE has converging evidence node G, and non-evidence diverging node B, neither blocking.
       4)  I({B}, {E} | {D})
ANSWER: No, there is a direct edge from B to E.
       5)  I({B}, {C} | {})
ANSWER: Yes, path BGDAC is blocked by barren node G, and path BEC is blocked by barren node E.
       6)  I({B}, {C} | {G})
ANSWER: No, path BGDAC is no longer blocked by G, and D, A are non-evidence and passthrough, diverging, resp.

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.2, P(B=true)= 0.1, P(E=true|C=false) = 0.5,
       P(E=true|C=true) = 0.5, 
       P(H=true|E=true) = 0.2, P(H=true|E=false) = 0.4,
       P(C=true|A=true) = 0.9, P(C=true|A=false) = 0.2
       P(D=true|A=true) = 0.8, P(D=true|A=false) = 0.1

       Find P(C=true | D=true, G=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.

ANSWER: H is barren, dropped. Now E is barren, so dropped. Now the only path from C to G is CADG, so
       d-sep(A,G | D) in the stripped graph. So we have: P(C=true | D=true, G=true) = P(C=true | D=true)
       In this new query, G is also barren and dropped, and now B is barren and dropped. So we are left with just A, C, D, and can use
       simple enumeration.

        P(C| D=true) = alpha (P(C, A=false, D=true) + P(C, A=true, D=true))
                     = alpha (P(C|A=false)P(D=true|A=false)P(A=false) +  P(C|A=true)P(D=true|A=true)P(A=true))
                     = alpha [ (0.2 * 0.1 * (1-0.2)  + 0.9 * 0.8 * 0.2),                          /* for C=true */
                               ((1-0.2) * 0.1 * (1-0.2) + (1-0.9) * 0.8 * 0.2)]                   /* for C=false */
                     = alpha [ 0.16, 0.08 ] = [ 2/3, 1/3 ]

        That is,  P(C=true | D=true, G=true) = P(C=true | D=true) = 2/3


3) You need to construct a 3-input neural network that has 2 outputs: the first
   has value 1 in the output just when the number of its inputs that are "on" is even,
   and the other has a value of 1 in the ouput just  when the number of its inputs that are "on" is greater than
   or equal to 2 (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

ANSWER: Majority can be done without hidden units (see below),
       but parity is not linearly separable, so overall the answer is: no.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

ANSWER: The function computing "greater than or equal to 2" is simple: use a unit U1 connected to
     all inputs, with input weights of 1 and a threshold of 1.5

     Computing "even" requires hidden units, but we will "borrow" the above majority
     implementation as one of them (it is thus a "visible hidden unit", that is it is an
     internal layer but its output also appears as system output.

     The output unit U2 will be connected to all inputs with a weight of -1, and to
     the output of U1 with a weight of 2. Its threshold will be -0.5, so that it
     will provide an output of 1 when none of the inputs is active (zero is even).
     But if one of the inputs is on, the weighted sum will be -1 and the output will be 0.
     Now if 2 inputs are on, the weighted sum will be -1-1+2=0 and so the output will be 1.
     Finally, when 3 inputs are on the weighed sum will be -1 and the output wil again be 0 as required.


4) a) For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:
 
       D
       A
    C  B
------------

      and the goal state containing: On(A,B) and On(B,D)

   b) Suppose that is not known whether the above is the initial state,
      and the initial state can either the above initial state, or the
      following state, and these are the only possibilities.

       D
       B
    A  C
------------

      and that there is an operator "Check(clear(x)))" which results in knowing whether
      this is true or not. Show how a conditional plan is constructed for the above
      case.


5)  Consider the following scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in value unit discused below (morts).
    Assume also that moral considerations are not part of your utility function.
 
    A well-known band of criminals has stolen your mojo(**), is 
    threatening to destroy it immediately (a damage of 1 mort), 
    and will only return it to you if release their band member the great Humongous(***).
    Surrendering Humongous is not of immediate
    material damage, but may cause you a loss in the future, as Humongous is a notorious criminal. 
    With probability 0.5 this will cause a loss
    of 1 mort next fortnight, and additionally with 
    probability 0.4, independent of the above, it will cause a damage of 10 morts
    the one fortnight after the next. Your options are as follows:

    a) Do nothing, and then the criminals destroy your FOO.
    b) Release Humongous, and possibly encur the respective future losses.
    c) Request the police to try a raid to recover your mojo in location A.
    d) Request the police to try a raid to recover your mojo in location B.

      Since you do not know where your mojo is hidden, the results of a raid are:
      With probability 0.4 they will succeed at location A, and with probability
      0.9 in location B, assuming your mojo is at that location. The probability
      of the mojo being at A is 0.6, and it is certain that your Mojo is either at A or at B.
      A failed raid causes immediate and permanent loss of you mojo.

    5.1) What is your optimal course of action? Consider two reward accumulation functions:
       a) Simple sum of rewards.
ANSWER:
     Policy a costs 1,
     Policy b costs 0.5*1+0.4*10 = 4.5
     Policy c succeeds with probability 0.4*0.6 = 0.24 and in all other cases lose 1 mort, exected cost 1-0.24=0.76
     Policy d succeeds with probability 0.9*0.4 = 0.36 so expected cost 1-0.36=0.64

     So clearly policy d (raid at B) is the best.

       b) Discounted sum of rewards with a discounting factor of 0.5 (per fortnight).

ANSWER:
     Same cost for all policies except for b which now has expected discounted cost:
        0.5*(0.5+0.5*0.4*10) =  1.2
     Not quite so bad, but still the worst one. 
     (Not part of the answer:) Perhaps with a discount factor of 0.3, if we are really
     myopic then perhaps it will look better. Indeed:
        0.3*(0.5+0.3*0.4*10) = 0.51

    5.2) Would the answer be different, if in releasing Humongous, there is
         in addition a probability of 0.5 that the band of criminals will steal
         your mojo again after 2 fortnights (and then leave you with the same options?)

ANSWER: No change, because in this policy b will be even worse!

    5.3) In 5.1, suppose that an informer can provide information that
         will tell with certainty whether your mojo is at A or B, how much
         should you be willing to pay for such information (in both cases a and b)?

ANSWER:  Since raiding is optimal, this will help make raiding even better.
         If the answer is A (probability 0.6) we will change to raid A and succeed with probability 0.4,
         a gain of 0.4 in this case (because ordinarily we would raid at B and lose 1).
         If the answer is B (probability 0.4), we still raid at B and succeed with the same
         probability, so no gain in this case. Total expected value of information is thus 0.6*0.4 =  0.24
         Note that the total cost of this policy is 0.64-0.24 = 0.4 if the information is free.
         (Also equal to 1 minus new overall probability of success, 1-(0.6*0.4+0.4*0.9) = 0.4
         Again discounting makes no difference here.

6) Consider the following Syrian traveller problem instance.

   The (undirected) graph has 3 vertices, as follows:
     (I, V1, G)
   The edges are as follows:
     (I, V1), weight 10, no terrorists.
     (V1, G), weight 20, probability of terrorists 0.2
     (I, G), weight 100, no terrorists

   Vertex I contains chems and an army, and G is a goal vertex.
      
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function?

ANSWER: We use the following state variables: location of agent A, location of chems C, location of army M, and belief state T of edge (V1,G).
       The domains of each of the first 3 variables are [I, V1, G], and the belief state of the edge is [true, false, U].

       Actions are define by triples: [edge, chems, army] with edge being either (I, V1) or (V1, G) or (I,G),
       and chems, army are binary valued (true, false). We will define transitions only for
       actions that are legal (illegal ones can just return to the same state with probability 1, for example).
       The number of possible transitions is huge, so we will just provide an outline:

       The following transitions are with probability 1:

       [I, x, y, true] with action [(I,G), false, false] leads to [G, x, y, true]      /* move to G */
       [I, I, y, true] with action [(I,G), true, false] leads to [G, G, y, true]       /* move to G with chems */
       [I, y, I, true] with action [(I,G), false, true] leads to [G, x, G, true]       /* move to G with army */
       [I, I, I, true] with action [(I,G), true, true] leads to [G, G, G, true]        /* move to G with army and chems */

       (Note that variables like x or y mean "for any value of x, y, etc.")

       Same as above, with belief state of (V1,G) being false.
       Also, repeat all the above action of moving across edge (I,V1) and also
       moving across edge (V1,G), whenever the action is legal.

       The only transitions that are stochastic are for moves starting from I when the belief state of edge (V1,G)
       is U, for example:

       [I, x, y, U] with action [(I,G), false, false] leads to [G, x, y, true]   with probability 0.2
                                                      leads to [G, x, y, false]  with probability 0.8
 
       Likewise for all moves to V1 or G when the state of the edge is unknown.

       The reward function R(S,A) will depend on the action, and be the negative of the weight of
       the edge mentioned in the action, multiplied by 2 for each argument of the action that has the value "true".

       This will be a terminal-state belief MDP, and all states of the form [x, G, y, z] are terminal
       (that is, any state with chems at G).

   b2) Find the optimal policy. You may assume that the robot starts
       at vertex I in order to save some computation.

ANSWER:  The assumption is that the intial state is [I, I, I, U]
       We will apply a mix between expectimax search and value iteration in order to solve the problem.
       Note that here all rewards are negative, so we want the solution to be as least negative as possible.
       Terminal states have a value of 0, and all other states can have an initial value of -infinity.

       Now, in the initial state can always traverse (I,G) with chems (cost 200) to get to a terminal state, so we can update
          U([I, I, I, U]) = -200 
       We can use this to prune the search: any policy which is known to have expected utility below -200 can be discarded,
       even if not fully defined or developed!

       We will continue processing top-down for now. From the initial state, moving with army along (I,G) already
       costs 200, so cannot be optimal as it does not reach a goal state. Moving along (I,G) without army or
       chems will nee to be checked, but later we will show that it is also suboptimal. (By which we mean
       NO POLICY which begins with this action is optimal.)

       There are 4 more possible actions: moving along (I,V1), with or without chems, with or without army. Checking the best
       of these require knowing the value of the states reached as a result, so we will try to compute these next.

       First, consider the state: [V1,V1,x,false]. From here can reach [G,G,x,false] for a cost of 40, by moving
       to G with chems, so we update:
          U([V1,V1,x,false]) = -40
      Next, consider the state: [V1,V1,V1,true]. From there, we can reach the state [G,G,G,true] for  a cost of 80, by moving
      to G with chems and army, so we update:
          U([V1,V1,V1,true]) = -80

      So, from the initial state we can move to V1 with chems and army, cost 40, and land in [V1,V1,V1,true] with probability 0.2
      and in [V1,V1,V1,false] with probability 0.8, so we can update 
          U([I, I, I, U]) = -40 - 0.2*80 - 0.8 * 40 = -88

      Now consider the value of the state [I, V1, I, true], i.e. chems moved, army and agent at I. We can get to [V1,V1,V1,true]
      with certainty for a cost of 20, so can update:
          U([I,V1,I,true]) = -20 - 80 = -100

      So now at state [V1,V1,I,true] we can get to [I, V1, I, true] by agent moving to I for a cost of 10, and can update:
          U([V1,V1,I,true]) = -10 - 100 = -110

      The action of moving with chems from initial state to V1 costs 20 and can land us either in [I, V1, I, true] (with probability 0.2)
      or in [I, V1, I, false] (with probability 0.8) so taking this action from th initial state has value:
          -20 - 0.2*110 - 0.8*40 = -74

      So we can update 
          U([I, I, I, U]) = -74

      Finally, from the initial state we can move to V1 without chems or army, for a cost of 10, reaching [V1,I,I,true] with probability
      0.2 and [V1,I,I,true] with probability 0.8

      From [V1,I,I,true] returing to I and then need to travel with army and chems to [V1,V1,V1,true] costs 50, so update:
        U([V1,I,I,true]) = -50 -80 = -130

      From [V1,I,I,false] returing to I and then need to travel with chems to [V1,V1,I,false] costs 30, so update:
        U([V1,I,I,true]) = -30 -40 = -70

      So moving without chems or army from the initial state has value:
        -10 - 0.2*130 - 0.8*70 = -88
      which is worse than moving with chems.

      All other policies can be shown to be worse, so the best policy from the initial state is to move to V1 with chems.
      If (V1,G) has no terrorists, continue to G with chems, total cost 60. Otherwise, return to I and bring the army to V1,
      and then proceed with army and chems to G - total cost 20+10+20+80 = 130. Expected cost of this policy, as indicated earlier
      by the value of the initial state, is 74.

   b3)In addition, from node I only, and before it starts moving,
      the agent can perform a SENSING action
      for a cost of 10, which results in the agent knowing the state of
      the edge (V1, G), i.e. whether it contains terrorists. Re-compute the optimal policy for this case.

ANSWER: Doing the sensing action lands us in [I, I, I, true] with probability 0.2, and in [I, I, I, false] with probability 0.8
       U([I, I, I, false]) = -60, and  U([I, I, I, true]) = -120
     So the policy starting with sensing costs:  -10 - 0.2*120 -0.8*60 = -82 which is worse.
     Alternately, compute value of information. With probability 0.8 will know that there are no terrorists,
     so will start by traveling with chems as before (zero gain). With probability 0.2 will know that there are terrorists,
     thus also travel with army and pay a total of 120 (a gain of 10 compared to the 130 in this case). So total value
     of information here is 0+0.2*10 = 2  meaning it is not a good idea to pay 10 for it!
     (Note how this policy overall costs (sensing cost - VOI)= 10-2 = 8 more than not doing the sensing)

7)  A principal researcher in the Israeli Smart Grid (ISG) consortium
    would like to construct a decision procedure based on results from hiring
    employees in the DARPA Robotics Challenge project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  C programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
--------------------------------------------
   2      F      L      0             N
   2      F      M      1             Y
   2      T      L      0             N
   3      F      H      0             Y
   3      T      M      2             Y
   3      T      M      0             N
   3      T      H      1             Y

ANSWER: Try each of the attributes at the top level.

With A:
 branch 2: distribution 1Y, 2N
 branch 3: distribution 3Y, 1N

With B:
 branch F: distribution 2Y, 1N
 branch T: distribution 2Y, 2N

With C:
 branch L: distribution 0Y, 2N
 branch M: distribution 2Y, 1N
 branch H: distribution 2Y, 0N

Note that the amount of information needed to classify A:2 is the same as for B:F and the same as for C:M
But other branches for A and B need more information, whereas other branches for C need none, so C is the
the best of these. So now to consider D:
 branch 0: 1Y, 3N
 branch 1: 2Y, 0N
 branch 2: 1Y, 0N

So branches D:0,2 need no further information, and need to compare information needed for C:M versus D:0
i.e. compare 3*H(1/3, 2/3) to 4*H(1/4, 3/4), so here we actually need to compute the information, we get
 for C: approx 2.755 bits
 for D: approx 3.3  bits
So C is better, and we choose C. A recursive call occurs on branch C:M, and we now consider
attributes A,B,D. Neither A nor B allow full classification, but with D we get:

C:M  D branch 0: N
       branch 1: Y
       branch 2: Y

D is preferred, completing the tree, which is:
C: branch L: N
   branch M:  check D branch 0: N
                      branch 1: Y
                      branch 2: Y
   branch H: Y

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

ANSWER: This tree has 2 internal nodes. Since above we tried all possible trees with 1 internal node
and failed, this tree has the minimum number of internal nodes (among trees consistent with the table).

Deadline: Wednesday, January 15, 2014, at 12 noon.

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.

** Mojo: copyright some film studio we do not recall.

*** The Great Humongous: copyright some other film studio we do not recall.