Introduction to Artificial Inteligence

Assignment 6 - Solutions

Written assignment: planning, probabilistic inference, and learning

Justify all answers!

1) We are given the following independence statements about random variables
   A, B, C, D, E, F, G, H:

      I({A,B,C},{D,E}|{})
      I({A,B,C},{F,G,H}|{})
      I({D,E},{F,G,H}|{})
      I({A},{C}|{B})
      I({F},{H}|{})
      
      not I({A},{C}|{})
      not I({D},{E}|{})
      not I({F},{H}|{G})

a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

ANSWER:

  B        D       F   H
 / \       |        \ /
A   C      E         G

 with edges directed downwards.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with all the statements?

ANSWER:
 Not unique. The first fragment could also have B as a passthrough node in either direction,
 and the second fragment could also have E as a parent of D. The last fragment (F, G, H) has no other options.


2) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        B               none
        C               A B
        D               A
        E               B
        F               D
        G               D E

    a) Is this network a poly-tree?

ANSWER:
   No, there is an undirected cycle: A-D-G-E-B-C-A

    b) Is this network (directed-path) singly connected?
ANSWER:
   Yes, above is the only simple cycle, and it 4 direction changes, so no
   more than one directed path from any one node to another.

    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER: Yes, the path D-A-C-B-E is blocked by non-evidence converging node C,
        and the path D-G-E is blocked by non-evidence converging node G
       2)  I({D}, {E} | {A})
ANSWER: Yes, the path D-A-C-B-E is blocked by non-evidence converging node C, and by evidence diverging node A
        and the path D-G-E is blocked by non-evidence converging node G
       3)  I({D}, {E} | {A, G})
ANSWER: No, the path D-G-E is not blocked (G is a converging evidence node)
       4)  I({B}, {F} | {C})
ANSWER: No, the path B-C-A-D-F consists only of diverging and passthrough non-evidence nodes,
        except for C this is converging evidence node, so none of them block the path
       5)  I({B}, {F} | {D})
ANSWER: Yes, can remove barren nodes G and C and now there is no (undirected) path from B to F
       6)  I({B}, {F} | {A, G})
ANSWER: No, the path B-E-G-D-F is not blocked.

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.2, P(B=true)= 0.3
       P(C=true|A=true,B=true) = 0.9, P(C=true|A=false,B=true) = 0.2
       P(C=true|A=true,B=false) = 0.5, P(C=true|A=false,B=false) = 0.1
       P(E=true|B=true) = 0.8, P(E=true|B=false) = 0.1
       P(G=true|D=true, E=true) = 0

       Find P(E=true, A=True | B=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.
ANSWER: Remove barren nodes C, G, F. Now A is completely disconnected from A and E, so we have
        D-sep({E},{A}|{B}). We also have D-sep({A},{B}|{}). Therefore we have:
           P(E=true, A=True | B=true) =  P(E=true | B=true) P(A=True | B=true) 
                                      =  P(E=true | B=true) P(A=True)
                                      = 0.8 * 0.2 = 0.16
        
3)  For the blocks world as defined in class, show how POP constructs the plan
    (that is, show all partial plans from the initial plan until a correct
    plan is reached) for initial state:

       E
       D
       A
    C  B
------------

      and the goal state containing: On(D,A) and On(A,C)

ANSWER:Initial state has 2 dummy steps:
       START:  preconditions: null
               effects:       On(C,Tab), On(B,Tab), On(A,B), On(D,A), On(E,D), Clear(C), Clear(E)

       END:    precondition:  On(A,C), On(D,A)
               effects:       null

       and constraint "START before END".

    (Note that the algorithm as stated is not deterministic, so below is one answer that is not the
    only correct answer.)
    This state is not a solution so we pick an unsatisfied precondition, say the On(A,C) precondition of END.
    This precondition cannot be satisfied by existing step, so add:
       STEP1:  Puton(A,C)
               preconditions: Clear(A), Clear(C), On(A,x)
               effects:       On(A,C), not Clear(C), not On(A,C)
    and constraints "START before STEP1 before END", and link effect On(A,C) to the same preconditin of END.
    Now precondition On(A,x) of STEP1 can be satisfied  by effect On(A,B) of START so this link is added.
    Now precondition Clear(C) of STEP1 can be satisfied  by effect Clear(C) of START so this link is added.
    Say we now select the Clear(A) precondition of STEP1. This requires adding a step:
       STEP2:  Puton(D,Tab)
               preconditions: Clear(D), On(D,y)
               effects:       On(D,Tab), Clear(y), not on(D,y)
    and add the link from effect Clear(y) (with y=A) of STEP2 to the same precondition of STEP1.
    The precondition On(D,y) can be handled by adding a link from START. Add the obvious constraints.
    Now to handle Clear(D) we need an additional step:
       STEP3:  Puton(E,Tab)
               preconditions: Clear(E), On(E,z)
               effects:       On(E,Tab), Clear(z), not on(E,z)
    with z=D. Add the link to Clear(D) precondition of STEP2, and the obvious constraints.
    Now we can treat On(E,z) by linking an effect On(E,D) of START.
    Now we can treat the Clear(E) precondition of STEP3 by linking this effect of START.
    This does not introduce threats, so we continue.
    
    Finally, we must handle the On(D,A) precondition of END. 
    We COULD link it to the On(D,A) of START, but then this link is clobbered by STEP2. But STEP2
    cannot be promoted or demoted, as it must be after START and before END, so we would get stuck.
    So need to add another step:
       STEP4:  Puton(D,A)
               preconditions: Clear(A), Clear(D), On(D,Tab)
               effects:       On(D,A), not Clear(A), not On(D,Tab)
    And link the effect On(D,A) to the same precondition of END. Now the clobbering by STEP2 can be handled
    by adding the constraint "STEP2 before STEP4". 
    Also, STEP4 clobbers the link Clear(A) between and STEP2 and STEP1 so need to add the constraint
   "STEP1 before STEP4" (cannot place STEP4 before STEP2 as this would be inconsistent with the constraint
    added 3 lines above).  
    Handling all the other potential clobberings we have
    the final plan with the above steps, and the constraints:
    "START before STEP3 before STEP2 before STEP1 before STEP4 before END"
    and this plan is complete and correct.

4)  Consider the following business opportunities scenario, occuring in a certain country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the amount of money that your assets are worth.
    Assume also that moral considerations are not part of your utility function.

    You have just purchased a new tobacco products company, and wish to do business in said country.
    A preliminary market survey suggests that you expect to make 1000 MMU (Million monetary units) 
    every year. However, you wish to improve your gains, by vigorous advertizing, for a cost of 100 MMU per
    year. The result of advertizing will gain you nothing if the advertizing campaign fails, but with a
    probability of 0.5 it will succeed and add 500 MMU per year to your profits. Unfortunately, under directives from
    the health ministry it is currently illegal to advertise such products (and it is openly illegal, so
    you cannot take that chance). Nevertheless, if you meet Mr. Lim, the
    minister of health, then Lim surely will be convinced to legalize such advertisements for a share of 300 MMU
    for the first year, and 100 MMU for every following year, delivered in unmarked envelopes.
    Since convincing Lim of your position by greasing his palm is also illegal, 
    assume a probability of 0.001 of getting caught
    and losing ALL your gains, plus landing in jail, which is an additional loss of per 10000 MMU per year to you.
    If this happens, you will also have to do the time in a cell shared with Lim, which you consider equivalent to
    an additional loss of 10000 MMU per year. But in both cases, jail time will only start next year.
    In this case, you will not participate in any business in the next years, so not have to pay advertising beyond
    the first year.

    Clarification: the decisions are to be made initially before the beginning of the current year,
    and the decisions are valid for the next years as well (unless aborted by your landing in jail), you do not
    make a new decision each year.

    a) What is your optimal policy, in order to maximize your sum of gains for the first 3 years
       (i.e. this year, next year, and the year after the next year)?

ANSWER: We have 2 decisions: grease (G) or not, and if grease advertise (A) or not. But greasing costs, and
        gains only if we advertise, so greasing and not advertising is dominated by greasing and
        advertising. So only the first decision needs to be further checked.

        U(not G) = 3*1000 = 3000 MMU
        U(G) = 1000 - 300 - 100 + 0.5*500 - 0.001*2*(10000+10000) + 0.999*2*(1000 - 100 + 0.5*500 -100) 
             = 2907.9 MMU

        So it pays to be honest here.

    b) You can now also purchase an additional market survey, that will give you perfect information on
       whether the advertizing campain will work, for 50 MMU. Does your optimal policy differ?

ANSWER:Now with probability of 0.5 you get the same 3000, and probability 0.5 you get:
       U(G|success) = 1000 - 300 - 100 + 1*500 - 0.001*2*(10000+10000+1000) + 0.999*2*(1000 - 100 + 1*500 -100)
                    = 3659.4 MMU

       On the average you get: 3000*0.5 + 3659.4*0.5 = 3328.7 MMU
       and the difference from the previous optimum is more than 50 MMU, so the information is worth more than
       the cost. So the optimal policy is: but the information. If it says "no success" do not grease. If it
       says "success" then grease and advertise.

    c) Repeat a and b above for sum of gains for 3 years under a discounted reward factor of 0.1, that is,
       next year is only worth 10% of what this year is worth, etc.

ANSWER:Note here that the damage of going to jail now becomes negligible, but in the first year you still pay
       400 to get 0.5*500, so not worth greasing and advertising for the first year (lose 150), but you gain in Y2 and Y3.
       But that gain was not enough to overcome the loss in the first year even for no discounting, so with discounting
       it is even more clear that you should not advertise.

       Your utility for the optimal action, not greasing and not advertising, is thus:
       U(not G) = 1000 + 0.1*1000 + 0.01*1000 = 1110 MMU

       If you but the information and it says "campaign fails" then not G is still optimal, but
       if success is predicted then you get:
       U(G|success) = 1000 - 300 - 100 + 500 + 0.1*(0.001*(-20000)+0.999*(1000-100-100+500)) + 0.001*(0.001*(-20000)+0.999*(1000-100-100+500))
                    = 1229.1487

       On the average you get 0.5*1000 + 0.5*1229.1487 = 1114.57435 so you still gain more than 50. So in this case the
       cost of the information is still less than the gain.

Note: there are some calculation errors in the above answers, I decided to leave them in deliberately (they
do not affect the optimal policy).

5) Consider the following Harassed Citizen problem instance.

   The (undirected) graph has 7 vertices, as follows:
     (S, I1, I2, V1, V2, V3, G)
   The edges are as follows:
     (S, V1), weight 10.
     (V1, V2), weight 1.
     (V2, V3), weight 1.
     (V3, G), weight 1.
     (S, G), weight 100.
     (S, I1), weight 1.
     (S, I2), weight 2.
     (I1, V2), weight 1000.
     (I2, V3), weight 1000.

   S is the start vertex, V2 may contain a key with a probability of 0.5, 
   V3 may contain a lock of the same type with probablity 0.8, and G is a goal vertex.
   No other locks or keys are possible in this graph.
 
     
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function? (He the cost of traversing an edge is equal to it weight).

ANSWER: This problem is similar to the example given in class. We have state variables:
          (Location (L), Key at V2 (K), Blocked at V3 (B), Carrying key (C))
        The location ranges over {S,V1,V2,V3,I1,I2,G},
        the "Carrying Key" over {T,F}
        and the other variables over {T,F,U} where U stands for 
        probability 0.5 of for K=true and probability of 0.8 for B=true.

        The initial belief state is (S,U,U,F) and terminal states are (G,X,X,X)
        where "X" stands for "don't care" (and not necessarily the same value).
        Rewards on all legal transitions are minus the edge cost, with the reward
        function being defined as R(s,a), i.e. depends on last state and on the action.

        Transition probabilities are all in {0,1} except when a U value is revealed,
        as follows:
        Traversing from S to I1 ("Y" and "Z" can be {T,F,U}, and are not changed by the action):
        P((I1,T,Y,Z)|(S,U,Y,Z), goto I1) =  P((I1,F,Y,Z)|(S,U,Y,Z), goto I1) = 0.5

        Traversing from S to I2:
        P((I2,Z,T,Y)|(S,Z,U,Y), goto I2) =  0.8
        P((I2,Z,F,Y)|(S,Z,U,Y), goto I2) =  0.2

        Traversing from S to V1:
        P((V1,T,Y,Z)|(S,U,Y,Z), goto V1) =  P((V1,F,Y,Z)|(S,U,Y,Z), goto V1) = 0.5

        Traversing from V1 to V2, when the value of K is Y in {F,T}
        (must be, since the state (V1, U, X, X) is unreachable):
        P((V2,Y,T,Y)|(S,Y,U,F), goto V2) =  0.8
        P((V2,Y,F,Y)|(S,Y,U,F), goto V2) =  0.2

        Other distributions are uninteresting, e.g. taking the action "go to G"
        from the initial state can land in 2 different belief states, but we don't care which
        since they are both terminal and we have the same cost, so we wil skip this.

   b2) Find the optimal policy. You may assume that the agent starts
       at vertex S in order to save some computation.

ANSWER:There are several ways to do this. We will take the "value of information" approach.
       Suppose the I1, I2 nodes did not exist. Then at the initial state we can either go to G
       costing 100, or to V1 costing 10 to reach either (V1,T,U,F) or (V1,F,U,F)
       with equal probability. Now if we are at  (V1,T,U,F) then we can go back to S (ignore this for now,
       as this action is dominated),
       or go to V2 (cost 1), which lands us in one of two states (V2,T,F,T) or (V2,T,T,T).
       But we actually don't care which, so we will call them (V2,T,X,T). From this we can reach
       (V3,X,F,X) for a cost of 1. Now by one step of a value update (a partial "iteration") the value
       of (V3,X,F,X) can be updated to -1, since we can reach a goal from there. So the value of (V2,T,X,T)
       can be updated to -2, and the value of (V1,T,U,F) can be updated to -3.

       Now we need to find a value for (V1,F,U,F). Again we can go back to S or go to V2.
       Going to V2, again we have two cases, but now there is NO key so the difference
       is important. We land in (V2,F,T,F) with probability 0.8 and in (V2,F,F,F) with probability 0.2
       The value of (V2,F,F,F) can be updated to -2 since there is no lock. 
       The value of (V2,F,T,F) can be updated to -111 as we then have no choice but to go back to S and
       take the traversal costing 100. Thus we assign:
       U((V1,F,U,F)) = -1 + 0.2 *(-2) + 0.8 * (-111) = -90.2
       Note that the optimal action at (V1,F,U,F) is to go to V2 as going to S will be worse.
       From this we get the optimal action at S: go to V1, which gives us the total cost:
       U((S,U,U,F)) = -10 + 0.5* U((V1,F,U,F)) + 0.5*U((V1,T,U,F)) = -10+0.5*(-90.2)+0.5*(-3) = -56.6

       Now add back the nodes I1 and I2. Note that from I1 or I2 we have to return to S, or pay 1000
       which is clearly suboptimal. So going to I1 or I2 is only to gain information, and the cost
       of obtaining information is twice the respective edge weight (as we need to go and return).

       What is the VOI of learning both items of infomation?
       With probability 0.5*0.8=0.4 we will learn that the way is blocked, and go directly to
       G for a cost of 100. With probability 0.6 we will learn that the way is unblocked
       or there is a key, and in both cases we will go through V1,V2,V3 for a cost of 13.
       The total expected cost is 0.4*100+0.6*13 = 47.8 so it is worth paying 6 to get the information.

       However, we are not done, since we have the option of obtaining only one item of information,
       or a conditional. For example, clearly if we go to I2 and find no blockage, we do not need to go to I1
       as we aleady know we should go through V1. But if we find the way blocked, we need to check if we should
       also try I1 (or try V1, but this can be shown to be suboptimal). Alternately, we can try to go to I1 first.
       Let us compare the two conditional strategies.

       Try I1 first: this costs 2, and then with probability 0.5 we detect a key then can reach the goal
       with cost 13. If no key, we try I2, for a cost of 4, and then with probability of 0.8 detect blockage
       and need to pay 100, and with probability of 0.2 can still pay only 13. Total expected cost is:
         cost(try I1 first) = 2 + 0.5*13 + 0.5*(4+0.8*100+0.2*13) = 51.8

       Try I2 first: this costs 4, and then with probability 0.2 we detect no blockage, and can reach the goal
       for a cost of 13. If blocked, we try I1 for a cost of 2, then with probability 0.5 no key so
       pay 100, otherwise pay only 13. So we get:
         cost(try I2 first) = 4 + 0.2*13 + 0.8*(2+0.5*13+0.5*100) = 53.4

       All in all, the optimal policy is to go to I1 first. (We also can rule out trying I1 and then not trying
       I2, because if we do not then we pay 100 in these cases, or pay 20 (to V1 and back) in too many cases).

       Advanced note on the VOI ordering: this can be seen as a case where we are trying a "cascade" of tests, T1 has cost 2
       and success probability 0.5, whereas T2 has cost 4 and success probability 0.2
       The optimal ordering of tests: try tests in non-increasing order of "quality", defined as
       success probability divided by cost.
       T1 here has quality 0.5/2 = 0.25, and T2 here has quality 0.2/4 = 0.05 so T1 should be tried first.
       (See Daniel Berend, Ronen I. Brafman, Shimon Cohen, Solomon Eyal Shimony, Shira Zucker:
        Optimal ordering of independent tests with precedence constraints. Discrete Applied Mathematics 162: 115-127 (2014)
        Also US patent 8,175,999, B2)

6)  A principal researcher (PR) in the Paypal project must hire a new programmer for
    the GA optimization, since the last employees have quit suddenly.
    The PR would like to construct a decision procedure based on results from hiring
    employees in the DARPA Robotics Challenge project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  Java programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a decision tree as close to consistent as possible for the following table:

     input variables               decision
   A      B      C      D
--------------------------------------------
   2      F      L      0             N
   2      F      H      1             Y
   2      T      L      0             N
   3      F      H      0             Y
   3      T      H      2             Y
   3      T      H      2             Y
   3      T      H      2             N
   3      T      L      0             Y
   3      T      H      1             Y

ANSWER: We try all the attributes at the root.
  Attribute A:  For A=2 we get 2N, 1Y
                For A=3 we get 1N, 5Y

  Attribute B:  For B=F we get 1N, 2Y
                For B=T we get 2N, 4Y

  Attribute C:  For C=L we get 2N, 1Y
                For C=H we get 1N, 5Y

  Attribute D:  For D=0 we get 2N, 2Y
                For D=1 we get 0N, 2Y
                For D=2 we get 1N, 2Y

Clearly A and C have equivalent distributions, and B is worse (more balanced than A in
2nd branch, with same number of examples). So need to compare A to D. The first branch of
A has the same distribution (ignoring "polarity") as the last branch of D, so they are the same
in this respect. So need to compare just the remainder, the entropy of "1N, 5Y" to "2N, 2Y".
The latter requires 4 bits to classify all examples (even distribution with 4 examples).
The former needs (-log(1/6)-5 log(5/6)) = approx 3.9 bits. So attributes A or C are better.

Suppose we choose A. Then for A=2, we can choose C, which will classify these examples perfectly.
For A=3, again look at all remaining possibilities.

  Attribute B: For B=F we have 0N, 1Y
               For B=T we have 1N, 4Y


  Attribute C: For C=L we have 0N, 1Y
               For C=H we have 1N, 4Y


  Attribute D:  For D=0 we get 0N, 2Y
                For D=1 we get 0N, 1Y
                For D=2 we get 1N, 2Y

So (compute information of the last branch in these cases), D is better. Now we need to handle the
case where D=2. Here actually none of the remaining attributes will provide any information.
Strictly speaking the algorithm will use them both to get the tree:

A: 2  --- C:  L --- N
              H --- Y
   3  --- D:  0 --- Y
              1 --- Y
              2 --- B: F --- Y (default)
                       T --- C: L --- Y (default)
                                H --- Y (mode of 1N, 2Y)
Note that this tree does not always provide the correct answer on the examples,
and there is no tree that does that, as the examples are inconsistent.

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

ANSWER: Yes, the last 3 attributes contribute no information, so can be pruned. But this algorithm
        does not do it. The smaller tree is:

A: 2  --- C:  L --- N
              H --- Y
   3  --- Y   (concesnus of all leaves in the subtree of previous tree)

7) You need to construct a 7-input neural network that has 1 output: the output should be 1 if
   the number of input units that are 1 is odd, and 0 otherwise.
   a) Can this be done without hidden units?

ANSWER: No. Fix all inputs except the two last ones to 0. Then the output is
        the XOR of the last two inputs. So if we could implement the above network
        without hidden units, we have just shown how to do XOR with no hidden units,
        which we know cannot be done. Contradiction.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

ANSWER: But it is easy to "count", so add some hidden units that implement "X inputs or more are on",
        with X being 2, 4, and 6. Each such unit has all input weights 1 and a threshold of X-0.5
        Call these units "X>=2", "X>=4", "X>=6", respectively. These will be "inhibitors" of the output units.
        Final construction: we have the above units fed by all inputs, and all these units connect to one output
        unit, with a weight -2. The output unit is also fed by all inputs, with weight 1. 
        The threshold of the output unit is 0.5
        This means that its output is 0 if no input unit is on, and becomes 1 when exactly one input is on.
        But having two input units being 1 turns on the "X>=2" unit, which inhibits the output unit back to 0.
        Making 3 inputs be 1 again turns on the output unit, which is inhibited to 0 again due to "X>=4" with
        one more input turning to 1, etc.
Deadline (strict): Wednesday, January 25, 2017, at 12 noon.
Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)
* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.