Introduction to Artificial Inteligence

Assignment 6

Written assignment: planning, probabilistic inference, and learning

Justify all answers!

1) We are given the following independence statements about random variables
   A, B, C, D, E, F:

      not I({A},{B}|{})
      not I({B},{C}|{})
      not I({D},{F}|{})

a) Construct a Bayes network that is consistent with the 
   above set of independence statements.Mayer

   First independence statement means no edges between ABC and DF, so they are
   in separate components. (other than E, see below)  E is not mentioned so it can also be separate (or not).
   The dependence statements mean that there is a path between A and B, and between B and C.
   But the second independence statement means there is no edge between A and C, so the edges
   are A-B-C, with B either passthrough (in either direction) or diverging.
   Last statemement means an edge between D and F in either direction.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with all the statements?

   Answer to a makes it obvious that the BN is not unique. There are 3 possibilities for ABC, and
   two for DF. Also, E can be alone or in any of the components, or can even connect them if it is a
   converging node!

2) Consider the following variables and their immediate causes (parents):

     variable         parents
        A               none
        B               none
        C               A
        D               A
        E               B
        G               C D B
        H               E
        I               A B H

    a) Is this network a poly-tree?
ANSWER:   No, say undirected cycle B-E-H-I-B.
    b) Is this network (directed-path) singly connected?
ANSWER:   No, above cycles also contains two directed paths from B to I.
    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER:  Yes, all paths through I and G have them as non-evidence converging nodes,
         and they have no children, so they block these paths. There are no other paths
         between AD and E. (That is, I and G are barren nodes).
       2)  I({D}, {E} | {A})
ANSWER:   Yes, same as above.
       3)  I({D}, {E} | {A, G})
ANSWER:   No, now the path D-G-B-E is unblocked, since B is a non-evidence diverging node,
          and G is an evdence convergiing node, so neither B nor G block this path.
       4)  I({B}, {E} | {D})
ANSWER:  No, B-E is a 1-edge path and can never be blocked.
       5)  I({B}, {C} | {})
ANSWER: Yes as in 1, I and G are barren nodes, and removing them leaves no paths
        between B and C. 
       6)  I({B}, {C} | {I})
ANSWER: No, the path B-I-A-C is unblocked: A is a non-evidence diverging node,
        and I is an evidence converging node on the path. So neither blocks the path.
    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.2, P(B=true)= 0.3
       P(H=true|E=true) = 0.2, P(H=true|E=false) = 0.4,
       P(G=true|B=C=D=true) = 0.9, P(G=true| B,C,D anything else) = 0.3, 
       P(C=true|A=true) = 0.9, P(C=true|A=false) = 0.2
       P(E=true|B=true) = 0.8, P(E=true|B=false) = 0.1

       Find P(E=true | B=true, G=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.
ANSWER: drop barren node I. Now the only path from E to G is through B, so
        E is d-separated from G by B, implying independence, so
           P(E=true | B=true, G=true)=P(E=true | B=true)=0.8
        (The latter is actually given above).

3) You need to construct a 3-input neural network that has 2 outputs: one is 1 if the number
   of inputs with value 1 is even, and the other is 1 if there are at least 2 inputs with value 1.
   (e.g. if all 3 inputs are 1, then the first ouput should be 0, and the second should be 1).
   a) Can this be done without hidden units?
ANSWER: Not the first function. Suppose you set one of the inputs to 0. Then the output now
        needs to be a XOR of the other 2 inputs, and XOR cannot be done without hidden units.
        The second function is a simple majority, and can be implemented with a 3-input unit
        with all weights 1 and a threshld of 1.5

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)
ANSWER: We have already implemented the majority part above. Now to implement the parity. The output needs
        to be 1 when no input is on, and when 2 inputs are on. We can use the "majority" above as the hidden unit.
        Once we have that, we can do the rest with only 1 more unit, as follows. It has 4 inputs: 3 being the circuit
        inputs, with weights being -1. The threshold is -0.5, so that if all inputs are 0 we have 0>-0.5 and
        so the output is 1. The forth input of this unit is the output from the "majority" unit.
        We set the conection weight to 2 so that the 1 from the majority overcomes the -2 from the inputs to
        the circuit, when exactly 2 of them are on, and in this the output also becomes 1. When all 3 circuit
        inputs are 1, we have a total of -1-1-1+2=-1 < -0.5, so the output is 0 again, as required.

4) a) For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:

    C  B

      and the goal state containing: On(A,B) and On(B,C)
    Initial state has 2 dummy steps:
       START:  preconditions: null
               effects:       On(C,Tab), On(B,Tab), On(A,B), On(D,A), On(E,D), Clear(C), Clear(E)

       END:    precondition:  On(A,B), On(B,C)
               effects:       null

       and constraint "START before END".

    This state is not a solution so we pick an unsatisfied precondition, say the On((B,C) precondition of END.
    This precondition cannot be satisfied by existing step, so add:
       STEP1:  Puton(B,C)
               preconditions: Clear(B), Clear(B), On(B,Tab)
               effects:       On(B,C), not Clear(C)
    and constraints "START before STEP1 before END", and link effect On(B,C) to the same preconditin of END.
    Now precondition On(B,Tab) of STEP1 can be satisfied  by effect On(B,Tab) of START so this link is added.
    Say we now select the Clear(B) precondition of STEP1. This requires adding a step:
       STEP2:  Puton(A,Tab)
               preconditions: Clear(A), On(A,x)
               effects:       On(A,Tab), Clear(x), not on(A,x)
    and add the link from effect Clear(x) (with X=B) of STEP2 to the same precondition of STEP1.
    The precondition On(A,B) can be handled by adding a link from START. Add the obvious constraints.
    Now to handle Clear(A) we need an additional step:
       STEP3:  Puton(D,Tab)
               preconditions: Clear(D), On(D,x)
               effects:       On(D,Tab), Clear(x), not on(D,x)
    with x=A. Add the link to Clear(A) precondition of STEP2, and the obvious constraints.
    Now we can treat On(D,x) by linking an effect On(D,A) of START.
    Now we need to handle the Clear(D) precondition of STEP3. We can do this by adding a step:
       STEP4:  Puton(E,Tab)
               preconditions: Clear(E), On(E,x)
               effects:       On(E,Tab), Clear(x), not on(E,x)
    With x=D, and add the link from STEP4 to Clear(D) of STEP3. Also add the obvious order constraints
    START before STEP4 before STEP3.
    This does not introduce threats, so we continue. Selecting Clear(E), this can be solved by
    adding a link from START, Same for On(E,D). There are no threats here.
    Finally, we must handle the On(A,B) precondition of END. 
    We COULD link it to the On(A,B) of START, but then this link is clobbered by STEP2. But STEP2
    cannot be promoted or demoted, as it must be after START and before END, so we would get stuck.
    So need to add another step:
       STEP5:  Puton(A,B)
               preconditions: Clear(A), Clear(B), On(A,Tab)
               effects:       On(A,B), not Clear(B)
    And link the effect On(A,B) to the same precondition of END. Now the clobbering by STEP2 can be handled
    by adding the constraint "STEP2 before STEP5". 
    Also, STEP5 clobbers the link Clear(B) between and STEP2 and STEP1 so need to add the constraint
   "STEP1 before STEP5" (cannot place STEP5 before STEP2 as this would be inconsistent with the constraint
    added 3 lines above).  
    Now Clear(B) can be addressed by linking from STEP2 and Clear(A) by linking from STEP3. 
    Handling all the other potential clobberings we have
    the final plan with the above steps, and the constraints:
    "START before STEP4 before STEP3 before STEP2 before STEP1 before STEP5 before END"
    and this plan is complete and correct.

5)  Consider the following business opportunities scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the amount of money that your assets are worth.
    Assume also that moral considerations are not part of your utility function.

    You have just bought rights for construction in the SacredTurf project, and stand to 
    gain 1000 MEMU (Million eastern monetary units) for it. However, you wish to improve your gains, by
    requesting that you be allowed to build 3 times as many floors as originally approved, thereby
    increasing your gains to 2500 MEMU. You need to pay 100 MEMU for the request, and because
    this is against zoning laws the probability that it will be approved is 0.01, and your
    100 MEMU for the request is lost whether or not the request is granted.

    You have the option of approaching a facilitator ("MA'KHER" in language of said country),
    who, for the paltry sum of 400 MEMU will make sure that Mayor Somegraft (mayor of city where SacredTurf resides), and the government
    officials in charge will "see reason" (and also cash, but you don't want to know), 
    thereby increasing the probability that the petition is granted to 1.

    However, you also know that such use of a facilitator may be illegal, and that  there is a probability of 0.8
    that 10 years from now charges will be brought, and if so, probability of 0.5 that you will be convicted,
    go to jail and also made to pay a fine.
    Suppose also that you estimate the total loss in such a case (fine, plus jail time in monetary value) is
    10000 MEMU.

    a) What is your rational choice from:
        A) Not filing a request, B) Filing a request with no facilitator, C) Filing a request with facilitator,
       for the following utility combination functions:
       1) Undiscounted rewards.
       2) Discounted rewards with a discounting factor of 0.2 per 10 years (i.e you really do not
          care very much about 10 years from now).

ANSWER: For undiscounted rewards, we have:
         U(A) = 1000
         U(B) = 1000 - 100 + 0.01 * 1500 = 915
         U(C) = 2500 - 100 - 400 - 0.8*0.5*10000 = -2000
       So clearly the best action is A: not filing a request and settling for the sure 1000 MEMU.
       For discounted rewards, it is all the same except for action C, where the 10000 penalty is
       multiplied by the discount factor 0.2, so we get:
         U'(C) = 2500 - 100 - 400 - 0.8*0.5*10000*0.2 = 1600
       So option C is now the best.

    b) As above, but you have the choice to pay 100 MEMU for legal advice 
       on whether charges against you will stick. For simplicity, we will assume that this legal advice
       is perfectly correct. What is the rational policy in this case?

       Assuming that the probability of being charged is independent of that of being convicted
       (NOT a good assumption in practice), the probability that the legal advice will say that the
       charges will stick is the same as that of being convicted, which is 0.5
       Note that even if conviction is assured, there is still a 0.2 probability chance
       of not being charged, but still in this case it is best to apply A.
       Obviously, if the legal advice is "will not be convicted", C is the best option. So we get:
         E(U(info)) =  0.5 * U(A| will be convicted) + 0.5 * U(C|will not be convicted) =
             = 500 + 0.5 * (2500 -100 -400) = 1500
       So the expected value of information here is 500 MEMU (we can only get 1000 MEMU without it)
       and it is worthwhile to pay 100 MEMU for the legal advice.

6) Consider the following EU refugee problem instance.

   The (undirected) graph has 4 vertices, as follows:
     (I, V1, V2, G)
   The edges are as follows:
     (I, V1), weight 10.
     (I, G), weight 60.
     (V1, V2), weight 10.
     (V2, G), weight 10.

   I is the start vertex, V2 may contain Police with a probability of 0.5, and food with probablity 0.5, and G is a goal vertex.
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function? (He the cost of traversing an edge is equal to it weight).
       We need a state variable for location, and one for each unknown (police at V2, food at V2). So the state
       is represented by a triple (L, P, F) with L in {I,V1,V2,G}, and P,F in {T,F,U}
       Overall we have 36 belief states, many of them unreachable or impossible.
       The terminal states are any states of the form (G, X, Y) where X and Y are "don't care", i.e.
       any value in {T,F,U}.

       Transition probabilities are 1, for all allowed transitions, except in a transitions to (V1, X, Y) and (G, X, Y) from a state
       where F, P are unknown. The only such state is (I, U, U), and the transition probabilities are
       multiplication of the probabilities of police (resp. no police), and food (resp. no food). Due to
       symmetry (P(Food)=P(Not Food) =P(Police)=P(No olice)=0,5), each such transition
       has probability 0.5*0.5=0.25

       The actions are the edge traversals. The Rewards are for transitions (actions at states), R(s,a).
       The reward is minus the wieght of the edge, except when starting from (V2, F, T), i.e. food and
       no police at V2, where the reward is minus half the weight of the edge.
   b2) Find the optimal policy. You may assume that the refugee starts
       at vertex I in order to save some computation.

ANSWER:  We will use a mix of expectimax and value iteration. Set the utility of goal belief states
           U( (G, X, Y) ) = 0
         This is the final correct value for these states.
         We will also set as an upper bound, temporarily:
            U( (I,X,Y)) = -60
         But also know that U( (I,T,X)) = -60 as a final value, as with police at V2 must traverse (I,G).
         Now for some Bellman updates, for belief states close to the goal:
            U( (V2, F, T)) = -5   (reach goal from V2 with food, and no police)
            U( (V2, F, F)) = -10  (likewise with no food)
         We can show that these are final values.
         Now for some cases of V1:
            U( (V1, F, T) ) = -10 - U( (V2, F, T)) = -15
            U( (V1, F, F) ) = -10 - U( (V2, F, F)) = -20
            U( (V1, T, X) ) = -10 - U( (I, T, X)) = -70
         Finally we can try to update:
            U( (I, U, U) ) = -10 + 0.25*(U( (V1, F, T) )+U( (V1, F, F) )+U( (V1, T, T))+U( (V1, T, F)))
                           = -10 + 0.25*(-15-20-70-70) = -53.75
         The update succeeds, as the value is greater than -60. No more updates are needed.
         The optimal policy is thus: 
               Traversse (I, V1). 
               If police at V2, traverse (V1,G), and then (V1,G)
               Otherwise, traverse (V1, V2) and then (V2, G)

   b3)In addition, from node I only, and before it starts moving,
      the agent can perform a SENSING action
      for a cost of 10, which results in the agent knowing whether V2 contains food
      and whether V2 contains police.
      Re-compute the optimal policy for this case.

ANSWER: After receiving the information, we land at one of the belief states:
          (I, F, F), (I, F, T), (I, T, F), (I, T, T) 
        each with probability 0.5
        We have: 
           U((I,F,F)) = -30
           U((I,F,T)) = -25
           U((I,T,X)) = -60
        So now, either do a bellman update on (I,U,U) considering SENSING as an action,
        or compute the value of information of SENSING and compare to its cost. 
        The first method is:
           U*((I,U,U)) =  -10 + 0.25*( U((I,F,F))+U((I,F,T))+2* U((I,T,X)))
                       =  -10 + 0.25*(-30-25-120) = -53.75
        Since it is better than the current value of U((I,U,U)) the update succeedes and
        SENSING is AS GOOD as the best (after which go through V2 or directly to G depending on the value
        of Police).
        The second method is: after receiving the information we land at one of the 4 states mentioned
        above, and get an average worth of: 
          U =  0.25*( U((I,F,F))+U((I,F,T))+2* U((I,T,X))) =  0.25*(-30-25-120) = -43.75
        So the expected VOI for SENSING is:
          U-U((I,U,U)) = -43.75 - (-52.5) = 10
        Since the VOI of SENSING is equal to its cost, it is indefferent whether we pay for the information.

7)  A principal researcher in the Israeli Smart Grid (ISG) consortium
    would like to construct a decision procedure based on results from hiring
    employees in the DARPA Robotics Challenge project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  Python programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
   2      F      L      0             N
   2      F      H      1             Y
   2      T      L      0             N
   3      F      H      0             Y
   3      T      H      2             Y
   3      T      L      0             Y
   3      T      H      1             Y

ANSWER:   Try all possible attributes for the top level.
A: If A=3 we can answer Y. If A=2 we have 1Y, 2N.

B: If B=F we have 2Y, 1N, and if B=T we have 3Y, 1N. The distribution
   and number of examples in the first brance of B is already as bad as the 2nd
   branch of A, and there is also required further information in the 2nd branch of
   B, so A is clearly better than B.

C: If C=L we have 2N, 1Y (already as bad as the 2nd branch of A). For C=H we can answer Y,
   so C is just as good as A.

D: If D=1 we can answer Y. If D=2 we can also answer Y. If D=0 we have 2Y, 2N,
   which is 4 examples with uniform distribution (the worst) which is clearly worse than the 2nd branch
   of A, which is only 3 examples. So A is also better than D.

So, the top attribue is A or C, and we choose one of them arbitrarily (say A).
Now in the case A=3 we are done, and now recursively
apply this method to the 2nd branch of A, with the examples where A=2.

If we now try B at this branch, for B=F we have 1Y, 1N, and for B=T we can answer N.

If we try C here, for C=L we can answer N, and for C=H we can answer Y, for
complete consistent classification. We cannot do better than that, so can decide on C and
are done (although D would also work here instead of C, with the downside that it has more
branches than C).

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

ANSWER: In this case, a more compact tree is not possible. Our tree had 2 internal nodes,
and the only smaller tree has 1 internal node. As we have tried all possible attributes
at the top level, and none of them resulted in a consistent classifier, a
tree with 1 internal node is not possible, QED.

Deadline: Tuesday, January 20, 2016, at 12 noon. (Postponed to 14:00)

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional (yeah, sure it ain't).