Introduction to Artificial Inteligence

Assignment 6 (solution)

Written assignment: planning, probabilistic inference, and learning

1) We are given the following independence statements about random variables
   A, B, C, D, E, F:

      not I({A},{B}|{})
      not I({B},{C}|{})
      not I({E},{F}|{})
      not I({D},{F}|{})

a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

  Assuming independence implies d-separation, this means, using the first statement that
  there is no edge between {A,B,C} and {D,E,F} and since there are no other nodes in the graph these
  disconnected parts of the graph.

  The third statement means no edge between D and E, and the last two lines means there is a path between E and F
  and a path between D and F, so they are all connected, and it therefore must be D-F-E. But since D and E are
  independent without evidence, it the (and every) path between them must have a collider (converging node), so
  F must be converging.

  The other part, according to the 2nd statement there cannot be an edge between A and C,
  but the 4th and 5th statements imply that A-B-C must be connected with B in the middle.
  B cannot be a converging node, so this leaves 3 possibilities (see below)

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with all the statements?

  Not unique.
  The part D-F-E must have F as a child of D and E, where D and E have no parents.

  The part A-B-C can have any of A, B, C, as the parent-less node and all arcs
  emanating from it. That is, B can be diverging, or B can be pass-through in either

2) Consider the following variables and their immediate causes (parents):

     variable         parents
        A               none
        B               none
        C               A
        D               A
        E               C B
        G               D B
        H               E
        I               A B H

    a) Is this network a poly-tree?
     Not a poly-tree, as there are undirected cycles, such as AIBECA

    b) Is this network (directed-path) singly connected?
     Not DP singly connected, as we have directed paths A-I and A-C-E-H-I,
     i.e. more than one directed path from A to I.

    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
       2)  I({D}, {E} | {A})
       3)  I({D}, {E} | {A, G})
       4)  I({B}, {E} | {D})
       5)  I({B}, {C} | {})
       6)  I({B}, {C} | {I})
      1) No, path E-C-A-D is not blocked at A (diverging) and not blocked at C (passthrough).
      2) Now this E-C-A-D is blocked at A (diverging evidence node), and all other paths
         must pass through I or G, which are barren nodes (non-evidence nodes with no
         children, so d-separation holds.
      3) No, now the path D-G-B-E is not blocked: G is a converging evidence node,
         and B is a diverging non-evidence node, so neither of them blocks this path.
      4) No, the path B-E can never be blocked.
      5) Yes, all paths B-E-C, B-G-D-etc., B-I-etc. pass through a converging node,
         and the evidence set is empty. To see this, recursively remove all barren nodes and
         B, C will be in different connected components.
      6) No, the path B-I-A-C is not blocked: C is non-evidence passthrough, and I is
         evidence converging, so neither blocks this path.

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.2, P(B=true)= 0.1, P(E=true|C=false) = 0.5,
       P(E=true|C=true) = 0.5, 
       P(H=true|E=true) = 0.2, P(H=true|E=false) = 0.4,
       P(C=true|A=true) = 0.9, P(C=true|A=false) = 0.2
       P(D=true|A=true) = 0.8, P(D=true|A=false) = 0.1

       Find P(C=true | D=true, G=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.

    First, drop barren node I. Now H is baren and can be dropped, and now E is barren. The remaining
    graph now contains only one path: B-G-D-A-C, and it is easy to see that
    I(C, G | D) as D is a passthough evidence node. So we have:
      P(C|D,G) = P(C|D). 
    To evaluate the latter, now G is barren and is dropped, and now B is barren.
    We remain with just A with children C and D, and have (by definition, or Bayes nework semantics):
      P(C=true | D=true) = P(C=true, D=true) / P(D=true)
    We will compute each of these by conditioning on A:
      P(D=true) = P(D=true|A=true)P(A=true)+P(D=true|A=false)P(A=false) = 0.8*0.2 + 0.1*0.8 = 0.16 + 0.08 = 0.24

      P(C=true, D=true) = P(D=true|A=true)P(C=true|A=true)P(A=true)+P(D=true|A=false)P(C=true|A=false)P(A=false)= 0.8*0.9*0.2 + 0.1*0.2*0.8 = 0.16
    So we have:
      P(C=true|D=true,G=true) = P(C=true | D=true) = P(C=true, D=true) / P(D=true) = 0.16/0.24 = 2/3

3) You need to construct a 3-input neural network that has 2 outputs, corresponding to
   the binary value of the sum of the inputs (e.g. if all 3 inputs are on, then both outputs are on,
   because 3 in binary is "11".)
   a) Can this be done without hidden units?
     No, because the LSB is the parity of the three inputs, and that is not linearly seperable
     (to see this project on the case where the first input is 0, and now you need the output to by
     XOR of the other 2 digits, and we already know that XOR is not linearly seperable).

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

     The MSB is 1 just if 2 or more of the inputs is 1, so we can do this with a single unit
     that have weights of 1 on all inputs, and a threshold between 1 and 2 (make it 1.5).
     Call it the "MSB" unit.

     For the LSB, as argued above, we must have at least one hidden unit. We can mimic the
     gate implementation of a dull adder, but more efficient is to have just one unit representing
     at least 2 (which we already have above, it is both a "hidden" unit and an output unit), and one unit
     and one more unit (call it the "LSB" unit). It has as input all the original inputs with a weight of 1,
     (in order to activate the MSB output when some input is 1),
     and also has as input the output of the MSB unit, but with a weight of -2 (this inhibits the output when
     2 or more input units are 1). The threshold is between
     0 and 1, say 0.5. This way, when all 3 inputs are 1, the inhibition (-2) is overcome and the ouput becomes 1 again.

4)    For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:

    C  B

      and the goal state containing: On(A,B) and On(B,D)

    Initial state has 2 dummy steps:
       START:  preconditions: null
               effects:       On(C,Tab), On(B,Tab), On(A,B), On(D,A), On(E,D), Clear(C), Clear(E)

       END:    precondition:  On(A,B), On(B,D)
               effects:       null

       and constraint "START before END".

    This state is not a solution so we pick an unsatisfied precondition, say the On((B,D) precondition of END.
    This precondition cannot be satisfied by existing step, so add:
       STEP1:  Puton(B,D)
               preconditions: Clear(B), Clear(D), On(B,Tab)
               effects:       On(B,D), not Clear(D)
    and constraints "START before STEP1 before END", and link effect On(B,D) to the same preconditin of END.
    Now precondition On(B,Tab) of STEP1 can be satisfied  by effect On(B,Tab) of START so this link is added.
    Now say we look at the Clear(D) precondition of STEP1. This cannot be satisfied by existing steps, so we add:
       STEP2:  Puton(E,Tab)
               preconditions: Clear(E), On(E,x)
               effects:       On(E,Tab), Clear(x), not on(E,x)
    Setting x=D we can add the link from effect Clear(D) of STEP2 to the same procondition of STEP1
    (and obviously the constraint "STEP2 before STEP1".
    Now we can process the preconditions Clear(E) and On(E,x) by adding the links from the same effects of START,
    and the constraint "START before STEP2".
    Say we now select the Clear(B) precondition of STEP1. This requires adding a step:
       STEP3:  Puton(A,Tab)
               preconditions: Clear(A), On(A,x)
               effects:       On(A,Tab), Clear(x), not on(A,x)
    and add the link from effect Clear(x) (with X=B) of STEP3 to the same precondition of STEP1.
    The precondition On(A,B) can be handled by adding a link from START. Add the obvious constraints.
    Now to handle Clear(A) we need an additional step:
       STEP4:  Puton(D,Tab)
               preconditions: Clear(D), On(D,x)
               effects:       On(D,Tab), Clear(x), not on(D,x)
    with x=A. Add the link to Clear(A) precondition of STEP3, and the obvious constraints.
    Now we can treat On(D,A) by linking an effect of START, but node that now STEP1 potentially clobbers this link
    and must be placed after STEP4 (cannot be before START). The precondition Clear(D) can be met by linking
    the effect of STEP2. Note that STEP1 also potenially clobbers this link, so place STEP1 before STEP4.
    Finally, we must handle the On(A,B) precondition of END. 
    We COULD link it to the On(A,B) of START, but then this link is clobbered by STEP3. But STEP3
    cannot be promoted or demoted, as it must be after START and before END, so we would get stuck.
    So need to add another step:
       STEP5:  Puton(A,B)
               preconditions: Clear(A), Clear(B), On(A,Tab)
               effects:       On(A,B), not Clear(B)
    And link the effect On(A,B) to the same precondition of END. Now the clobbering by STEP3 can be handled
    by adding the constraint "STEP3 before STEP5". 
    Also, STEP5 clobbers the link Clear(B) between and STEP3 and STEP1 so need to add the constraint
   "STEP1 before STEP5" (cannot place STEP5 before STEP3 as this would be inconsistent with the constraint
    added 3 lines above).  
    Now Clear(B) can be addressed by linking from STEP4 and Clear(A) by linking from STEP3. 
    Handling all the other potential clobberings we have
    the final plan with the above steps, and the constraints:
    "START before STEP2 before STEP4 before STEP3 before STEP1 before STEP5 before END"
    and this plan is complete and correct.

5)  Consider the following voting scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the amount of money that your assets are worth.
    Assume also that moral considerations are not part of your utility function.

    You are to vote for determining the government for the next 4 years.
    There are 3 candidates for whom you could vote: Yahoo, Future, and BozoAndWhite.

    Yahoo hates the poor, but always wishes to appease the "party of the eternally poor" (POEP)
    which will cost you 100 MEMU (mid-eastern monetary units) this year in subsidies paid to
    PEOP, and the amount would double every following year due to success of POEP, which will generate more
    poor people receiving your money.

    Future, as its name implies, is future-oriented. It will give nothing to POEP, but will invest
    200 MEMU of your money this year to educate the supporters of PEOP, after which they
    will require no further subides.

    BozoAndWhite are more of an uncertain proposition: they claim that, like Future, they will
    educate the supporters of PEOP, and that they can do it for only 180 MEMU. However, based on
    past record, there is a probability of 0.5 that after winning the elections they will choose
    to appease PEOP anyway, with the same results as if Yahoo won.
    Assume that all are equally likely to win, but that your vote will swing the election if you vote.

    You have the following options:
       1) Vote for Yahoo
       2) Vote for Future
       3) Vote for BozoAndWhite
       4) Not vote at all, since you can save 1 MEMU by staying at home on election day.

     Draw the decision diagram for each case below, and solve it.

    a) What is your rational choice in voting, for the following utility combination functions:
       1) Undiscounted rewards.
       2) Discounted rewards with a discounting factor of 0.1 (i.e you really do not
          care very much about next year).

   For undiscounted rewards, we have:
     Voting Yahoo has only one possible outcome, where you lose 100, then 200, then 400, then 800, total
     expected utility EU[Y] = -1500.

     Voting Future also has only one possible outcome, where you lose 200 and then nothing, total
     expected utility EU[F] = -200

     Voting BandW has 2 possible outcomes, in the first (probability 0.5) you lose 180, and then
     nothing, in the outcome (probability 0.5) you lose 1500 as in voting for Yahoo. So we have:
          EU[BandW] = 0.5 * (-180) + 0.5 *(-1500) = -840

     Finally, not voting at all you gain 1 and then get each of the above with probability 1/3, so we have:
          EU[no vote] = 1 + 1/3 * (EU[Y]+ EU[F]+ EU[BandW]) = 1+ (-1500-200-840)/3 = approx -846

     So it is much better to vote for Future.

  For a discounting factor of 0.1 per year, we have:
     Voting Yahoo costs are multiplies by a factor of 0.1 per year, so we get:
         EU[Y] = -100 - 0.1*200 -0.01*400 - 0.001*800 = -124.8

     Voting for future still has expected utility -200.

     Voting for Band W we have:
         EU[BandW] = 0.5*(-124.8) + 0.5*(-180) = -152.4

     Not voting, you gen 1 and then average of the above 3:
         EU[no note] = 1 + (-124.8-200-152.4)/3 = -158

     So if you really do not care about the future you should vote for Yahoo.

    b) As above, but you have the choice to pay 10 MEMU to Guri Heller
       who can use his powers to divine what BozoAndWhite will actually do
       if elected. Should you pay for Guri Heller's opinion, and what should
       you vote after hearing his response?

   For undiscounted rewards, if you do not pay Guri then you should vote future
   and get EU[F] = -200, which is optimal.
   If you do pay, then since he is an exact future predictor (sensor) and the probability of 
   each outcome is 0.5, then the probability that he will say that BandW will do as they say is 0.5
   (in which case you should vote for them as you only lose 180) and with probability 0.5 you now know
   they will not comply, so you would lose 1500 here, so should vote for Future, and in this case gain nothing.
   So the value of information here is VOI[BandW] = 0.5 * (-180) + 0.5*(-200) - (-200) = 10
   This means the VOI here is exactly equal to the cost of buying the information, so you are INDIFFERENT
   on paying for this opinion.

   For the discounted case, you would vote for Yahoo anyway, so the VOI is 0, and you never want
   to pay for this information.

   Decision diagram: there is more than 1 possibility, but the minimal is:
     Binary-valued decision node GURI? (whether or not to consult Guri)
     Ternary-valued random variable GURI (his response: yes/no/no comment)
     Quad-valued decision node VOTE (Yahoo/Future/BandW/nothing)
     Binary-valued random variable BandW (whether BandW do as promised or not)
     Ternary-valued variable WINNER (who won: Yahoo/Future/BandW)
   The edges are: 
         Information edges: from GURI? and GURI to VOTE (you know the value of these variables
             at the time you vote)
         Dependency edges: from GURI? to GURI and BandW to GURI.
                           from VOTE to WINNER
                           from WINNER to BandW
                           from GURI?, WINNER, and BandW to the final utility node.
         (Can also have additional variables for Yahoo and Future, but not needed. Can
         also have additional edges, say from VOTE to the utility node, but not needed).
   For part 1 of the question, simply drop the nodes GURI? and GURI and their incident edges.

6) Consider the following Yazidi refugee problem instance.

   The (undirected) graph has 4 vertices, as follows:
     (I, V1, V2, G)
   The edges are as follows:
     (I, V1), weight 10.
     (I, G), weight 60.
     (V1, V2), weight 10.
     (V2, G), weight 10.

   I is the start vertex, V2 may contain terrorists with a probability of 0.5, and G is a goal vertex.
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function?

   The belief-state MDP can be represented by the variables LOC (ranging over all possible
   agent locations), and one ternary variable (T,F,Unknown) for each location where it is unknown whether
   a terrorist is there. In this case, there is only one such unknown location, V2, So the
   belief state can be represented by a pair [LOC, Ter], with LOC in {I, V1, V2, G} and Ter in {T,F,U}.
   The value Ter=U means that the probability of there being a terrorist in V2 is 0.5.

   There is one action for each possible edge traversal, but we will disallow traversals that
   place the agent at V2 when a terrorist is known to be there, thereby also disallowing the state [V2,T].
   Terminal (goal) states are [G, X] where X is "don't care", any of T, F, U. The utility of these states is zero,
   and this will also be used to start the value iteration: U([G,X]) = 0

   In this version of MDP there is no reward for states, but instead we have rewards given as R(s,a),
   i.e. depend on previous state and action, and this reward is just minus the weight of the edge
   which the action traverses (note how it does not even depend on the previous state s).
  The transition distribution is: for an action traversing an edge e, with probability 1 the agent is moved from the 
  current vertex to the vertex on the other side of e.  The Ter value stays the same with probability 1,
  except if Ter=U and the action is traversing to V1 or G (nodes incident on V2) in which case we have:
    P([V1,T]| [I,U], move to V1) = 0.5
    P([V1,F]| [I,U], move to V1) = 0.5
    P([G,T]| [I,U], move to G) = 0.5
    P([G,F]| [I,U], move to G) = 0.5
  But of course the last 2 lines do not matter as these are both goal states.

   b2) Find the optimal policy. You may assume that the Yazidi starts
       at vertex I in order to save some computation.

  We will only list and work on the meaningful states, and assume the following initial utilities:
    U([G,X]) = 0     ; three possible goal states, this is their true utility.
    U([I,X]) =-60    ; three such states, can get to goal from any of them by an action that costs 60
    U([V1,T]) =-inf  ; say we don't know what to do there, so max possible penalty
    U([V1,F]) =-inf  ; say we don't know what to do there, so max possible penalty
    U([V1,U])=?      ; don't care, illegal state (can observe value of Ter from V1)
    U([V2,F] = -10   ; Can get to goal by an action that costs 10
  This covers all possible states except for [V2,T] (illegal) and [V2,U] (unreachable)
  Let us now update the value of [V1,F]. Can traverese to either [I,F] at the cost of 10
  to a state with current utility -60 (total cost 70), or to [V2,F] at the cost of 10 to
  get to a state with current value -10 (total cost -20). Since the last one is better,
  we update:
    U([V1,F]) = -20
  Likewise we can update the value for [V1,T], here there is only one legal
  action, traverse to G costing 10, to get to [G,T], so we have:
    U([V1,T]) = -70  ; action at cost 10 to get to state with utility -60
  Now consider the state [I,U], which is the initial state. We have two possible actions:
  traverse to G for cost 60, or traverse to V1 for cost 10. There are 2 outcomes with non-zero
  probability, resulting on average in utility better than -60, so we update:
    U([I,U])  = max(-60, -10 + (0.5*U([V1,T]) + 0.5*U(V1,F])) = -10 *(0.5*(-70)+0.5*(-20)) = -55
  We can also update the value for [G,F] in a similar manner:
    U([I,F]) = -30
  And after that the values have converged and will no longer change in value iteration.
  The optimal policy from [I,U] is to traverse to V1. If we land in [V1,F] traverse
  to V2 and then to G. But in [V1,T] traverse back to I (state [I,T]) and then to G.
  The above is an incompletely state policy, what to do only in states reachable
  from [I,U] when executing the optimal policy. A full policy would also
  say what to do in e.g. state [I,F], which would not be reachable from [I,U] when using
  the optimal policy.

   b3)In addition, from node I only, and before it starts moving,
      the agent can perform a SENSING action
      for a cost of 10, which results in the agent knowing whether V2 contains
      Re-compute the optimal policy for this case.

      Since we computed final utility values for [I,X] this problem is easy.
      The sensing action can land us in either [I,F] or [I,T], each with probability 0.5
      So the cost of doind sensing and then acting optimally is:
       EU(sense) = -10 +(0.5*U([I,F]) + 0.5*U(I,T]) = -10 + (0.5*(-30)+0.5*(-60)) = -55
      which is the same as without sensing. So we are indifferent on whether to sense or not.

      Another way to do this is to compute VOI(sense). If the result is Ter=T then
      we need to pay 60 (direct to G) instead of 80 (try V1 and return), so gain 20.
      If the result is Ter=F then we need to pay 30, same as before for this case, so we gain 0.
      The average gain, the expected VOI(sense) is thus 0.5*20+0.5*0 = 10, same as the cost of sensing
      so again we are indifferent on whether to do the sensing action.

7)  A principal researcher in the Israeli Smart Grid (ISG) consortium
    would like to construct a decision procedure based on results from hiring
    employees in the DARPA Robotics Challenge project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  C programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
   2      F      L      0             N
   2      F      M      1             Y
   2      T      L      0             N
   3      F      H      0             Y
   3      T      M      2             Y
   3      T      L      0             Y
   3      T      H      1             Y

   We need to examine the amount of infomation needed to classify the examples after each
possible attribute value is revealed, by examining the decision (target) attribute
ditribution in each branch.

For attirbute A we have:
  A=2:  1Y, 2N  (3 examples, binary distribution not uniform, so less than 3 bits total)
  A=3:  4Y      (here no additional information is needed)

For attribute B we have:
  B=F: 2Y, 1N   (3 examples, binary distribution not uniform, so less than 3 bits total, same as for A=2)
  B=T: 3Y, 1N

For attribute C we have:
  C=L: 1Y, 2N   (3 examples, binary distribution not uniform, so less than 3 bits total, same as for A=2)
  C=M: 2Y
  C=H: 2Y

For attribute D we have:
  D=0: 2Y, 2N   (2 examples, uniform distribution, so 4 bits needed now)
  D=1: 2Y
  D=2: 1Y

Clearly C and A are equivalent, and both better than B and D, so we select either C or A.
Say here I  prefer A as it has only 2 children.

Now need to classify the examples for which A=2, and this can be
done fully by C, or D. Say we select C. The complete tree is thus:

  A=2:    C=L:  N
          C=M:  Y
          C=H:  N (default value for A=2)
  A=3: Y

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

  In general, this algorithm does NOT guarantee the smallest tree. But in this case,
we have a tree with 2 internal nodes, and we already tried all trees with 1 internal node
and they dod not workm so the above tree happens to be minimal.