Introduction to Artificial Inteligence

Assignment 6 - Solution

Written assignment: planning, probabilistic inference, and learning

Justify all answers!

1) We are given the following independence statements about random variables
   A, B, C, D:

      not I({B},{C}|{D})

a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

   I({A},{B,C,D}|{}) entails that A is not a neighbor of any of B, C, D so
   is a single node not connected to the rest of the graph.

   I({B},{C}|{}) means that B is not a neigbor of C. However,
   "not I({B},{C}|{D})" means that there is a path from B to C, which must be
   through D, the only remaining node. Since it is not blocked by D, then
   D must be a converging node on this path.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with the statements?

   In this case, the answer above is unique.

2) Consider the following variables and their immediate causes (parents):

     variable         parents
        A               none
        B               none
        C               none
        D               A C
        E               A F
        F               B
        G               D B
        H               C E

    a) Is this network a poly-tree?
ANSWER: No, the undirected path AEFBGDA is a cycle in the underlying undirected graph.

    b) Is this network (directed-path) singly connected?
ANSWER: Yes. No 2 or more directed paths from any node to any other node.

    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER: No, path DAE is not blocked, since A is a diverging non-evidence node.

       2)  I({D}, {E} | {A})
ANSWER: Yes, now path DAE is blocked by A, a diverging evidence node.
        All othe paths are either through H or through G, both blocking
        nodes on these paths as they are child-less non-evidence ("barren")  nodes.

       3)  I({D}, {E} | {A, G})
ANSWER: No, making G evidence causes the path EFBGD to be unblocked: F and B
        are respectively passthrough and diverging non-evidence, and G a converging
        evidence node, so none of them block the path.

       4)  I({B}, {E} | {D, F})
ANSWER: Yes, now the above path is blocked by F (passthrough evidence node), and
        the other paths are still blocked (A a diverging evidence node, H
        a barren node).

       5)  I({B}, {C} | {})
ANSWER: Yes. Removing all barren nodes leave only B and C, which are not

       6)  I({B}, {C} | {H})
ANSWER: No. Path BFEHC is unblocked: F and E are both passthrough non-evidence,
        and H a converging evidence node, so none of them block the path.

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.2, P(B=true)= 0.1, P(E=true|A=F=True) = 0.5,
       P(E=true|A=false,F=true) = 0.1, P(E=true|A=true,F=false) = 0.5, 
       P(E=true|A=F=false) = 0.5
       P(H=true|E=true) = 0.1, P(H=true|E=false) = 0.4, 
       P(F=true|B=true) = 0, P(F=true|B=false) = 0.5 

       Find P(A=true | B=true, F=true, H=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.

ANSWER: Can remove barren node G, and then D becomes barren. Also, A is d-separated
        from B by F and H, so it is sufficient to compute: Find P(A=true | F=true, H=true).
        So what is the answer? Well, this is a trick question: this probability is
        undefined, because: 
        P(B=true, F=true, H=true) = P(H=true|B=true, F=true)*P(F=true|B=true)*P(B=true)=0

3) You need to construct a 5-input neural network that has value 1 in the 
   output just when the number of its inputs that are "on" is either 1
   or 3 (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?
ANSWER: No. Any hyperplane that separates the cases of 1 unit on (case "A") 
        from 2 units on (case "B") must have the cases with 3 units on placed
        on the same side as "B", so would give a wrong answer. Actually, need
        several (4) hyperplanes to separate all the cases.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

ANSWER: Can be done in several ways, the simplest way is using 3 hidden units,
        each generating one of 3 hyperplanes, and the output unit generating
        the 4th and tallying the scores. All units get all inputs, with a weight of 1.
        The output unit also gets as input the output of the other 3 units. Thresholds
        are as follows:

        Output unit: threshold 0.5, so comes on if any input is 1. (but see below).

        First hidden unit ("2+" detector), has a threshold of 1.5, and feeds
        into the output unit with a weight of -5 (so as to override the inputs if
        "2+" is detected.

        Second hidden unit ("3+" detector), has a threshold of 2.5, and feeds into
        the output unit with a weight of 5, so as to cancel the result of the "2+"

        Third hidden unit ("4+" detector) has a threshold of 3.5, and feeds into
        the ouytput unit with a weight of -5 (or more negative), so as
        to cancel the effect of the "3+" detector.

4) a) For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:
    C  B

      and the goal state containing: On(A,B) and On(B,C)

      Initial plan has the dummy steps Start and End, with:

      Start having "effects": On(C,Table), On(B,Table), On(A,B), Clear(A), Clear(C), Clear(Table)
      (i.e. describes initial state).

      End having preconditions: On(A,B), On(B,C)  
      (i.e. same as goal state requirements)

      With ordering constraint: Start before End.

      1) Now POP detects 2 unsatisfied preconditions for step End: On(A,B), On(B,C)
      Suppose we pick On(B,C)    (Picking the other one may cause problems!).
      There is no step with these effects in the plan, so we need to
      add a step:

      Step1:  Op: PutOn(B, C)
              Pre: Clear(B), Clear(C), On(B,z)
              Effects: On(B,C), not Clear(C), Clear(z), not On(B,z)

      We add a link from Step1 to End, and constraints: Start before Step1, and Step1 before End.
      There are no cloberers, so continue.

      2) Suppose we now work on the Clear(C) precondition of Step1. This can be satisfied
         by the Clear(C) effect of Start, and add the link. No cloberers.

      3) Suppose we now work on the On(B,z) precondition of Step1. This can be satisfied
         with z=Table, by the On(B,Table) effect of Start, and add the link. No cloberers.

      4) Suppose we now work on the Clear(B) precondition of Step1. No way to satisfy it
         with existing steps, so add a step:

      Step2: Op: PutOn(A, Table)
             Pre: Clear(A), On(A,z)
             Effects: On(A,Table), not On(A,z), Clear(z)

      With z=B. We add a link from Step2 to the Clear(B) precondition of Step1.
      Constraints added are Start before Step2 and Setp2 before Step1.

      5) Clear(A) precondition of Step2 can be lined to Clear(A) effect in Start.

      6) On(A,z) is actually On(A,B) due to step 4, and this precondition is
         linked to effect On(A,B) of Start.

      7) The only unsatisfied precondition is now On(A,B) of End. Although we COULD link
         it to the On(A,B) effect of Start, we have Step2 as a cloberer that cannot
         be eithe demoted or promoted. The only solution is another new step:

      Step3:  Op: PutOn(A, B)
              Pre: Clear(A), Clear(B), On(A,z)
              Effects: On(A,B), not Clear(B), Clear(z), not On(A,z)

      With z=Table. We add the link from Step3 to the On(A,B) precondition of End.
      Constraints are Start before Step3 before End. Also note that Step3 clobbers
      precondition Clear(B) of Step1, and must be promoted (Step1 before Step3)
      or demoted (Step3 before Step2). However, Step2 also clobbers the On(A,B)
      precondition of End (achived by Step3) and since Step2 cannot be promoted
      (cannot do End before Step2), must demote (Step2 before Step3), which is not
      consistent with demoting Step3. Thus we get add the constraints:
      Step1 before Step3 and Step2 before Step3.

      8) Now to resolve the preconditions of Step3, as follows:
         Clear(A) by Start,
         Clear(B) by Step2,
         On(A,Table) by Step2.

      The constaints resolve to a total ordering: 
        Start before Step2 before Step1 before Step3 before End.
      The plan is now complete and correct.

      Note that our choices were such that no backtracking is required. Different
      choices might have required considerable backtracking to find the solution.

   b) Suppose that is not known whether On(A,B) or On(B,A) in the initial state,
      and that there is an operator "Check(On(x,y))" which results in knowing whether
      this is true or not. Show how a conditional plan is constructed for the above

ANSWER: In this case the initial plan would have:
      Start having "effects": On(C,Table), Clear(C), Clear(Table),
      (On(B,Table), On(A,B) Clear(A)) or  (On(A,Table), On(B,A) Clear(B))

      End having preconditions: On(A,B), On(B,C)  
      (i.e. same as goal state requirements)
      With ordering constraint: Start before End.

      We could begin the same way:

      1) POP detects 2 unsatisfied preconditions for step End: On(A,B), On(B,C)
      Suppose we pick On(B,C)    (Picking the other one may cause problems!).
      There is no step with these effects in the plan, so we need to
      add a step:

      Step1:  Op: PutOn(B, C)
              Pre: Clear(B), Clear(C), On(B,z)
              Effects: On(B,C), not Clear(C), Clear(z), not On(B,z)

      We add a link from Step1 to End, and constraints: Start before Step1, and Step1 before End.
      There are no cloberers, so continue.

      2) Suppose we now work on the Clear(C) precondition of Step1. This can be satisfied
         by the Clear(C) effect of Start, and add the link. No cloberers.

      3) Suppose we now work on the On(B,z) precondition of Step1. This can be satisfied
         with z=Table, by the On(B,Table) effect of Start, or with z=A by the On(B,A) effect
         of Start. But we do not know WHICH yet. So need to check. Add a step:

      StepC: Op: Check(On(B,z))
             Pre: none
             Effects: Know(On(B,z))
      For simplicity, assume that we add the
      constraint that StepC follows immediately after Start.
      The effect could be Know(On(B,Table)) and thus, assuming we have the reasoning mechanism,
      we would also get that B is not on A. 
      For this case we continue as in the case where we were certain that On(A,B), and
      add steps Step2 and Step3 with the same ordering constraints, to get:
        Start before StepC before Step2 before Step1 before Step3 before End 

      The effect of StepC could also be: Know(On(B,A)), in which case we continue as follows:

      4') Now we know that as a result of StepC, we have On(A,Table), On(B,A) Clear(B).
          So now we can link this side of the effects of StepC to satisfy preconditions
          On(B,A) Clear(B) of Step1.

      5') Now the only unsatisfied precondition is On(A,B) of End. The only way to satisfy it
          is by adding a step (cannot use Step3, because it is for the "other" case only): 

      Step4:  Op: PutOn(A, B)
              Pre: Clear(A), Clear(B), On(A,z)
              Effects: On(A,B), not Clear(B), Clear(z), not On(A,z)

      And link it to the On(A,B) precondition of End. Since Step4 clobbers
      Clear(B) of Step1, and cannot promote, we have Step1 before Step4.

      6') Now we need to resolve the preconditions of Step4. They can all be resolved
          as follows:
           Clear(A) as an effect of Step1,
           Clear(B) as an effect of StepC (and Start),
           On(A,Table) as an effect of StepC.
      So for this case we have the constraints: Step1 Before Step4 Before End.

      The conditional plan is now complete and correct.
      Note that the solution requires some details such as reasoning about
      the relationship between On and Clear, that have not been considered.
      Alternately, these could be effects of the Check operator.

5)  Consider the following scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the value unit "Million 
    Unspecified-Middle-Eastern-Country Monetary Unit" (MUMECMU for short).
    Assume also that moral considerations are not part of your utility function.

    Forest fires are a disaster that occur with intervals approximately according to an
    exponential distribution. The damage caused by a fire can also be modeled by
    an appropriate distribution. For simplicity, we will use an annual time granularity,
    and assume that the probability of forest fire count per year is as follows:
      a) no forest fire: 0.1
      b) 1 forest fire:  0.6
      c) 2 forest fires: 0.3
    The distribution of forest fire sizes is:
      a) Large: 0.9
      b) Huge: 0.1
    Assume that fire size distributions are independent, and so are fire occurences
    in different years.

    The damage caused by a forest fires are as follows (depending on whether you have
    fire-extinguishing air-craft):
      a) Large: 10 MUMECMU if you have no aircraft, 2 MUMECMU if you have at least 1 aircraft
      b) Huge:  300 MUMECMU if you have no aircraft, 50 if you have at least one aircraft,
                but only 10 MUMECMU if you have 2 or more aircraft.

    You need to decide whether to purchase fire-extinguishing aircraft, and if so,
    how many. Each aircraft costs 60 MUMECMU, and maintainance costs of each
    aircraft is 1 MUMECMU per year. 

    A) Which is the best policy (resp. worst) from the candidates below,
       assuming that the lifetime of each aircarft is 5 years, and assuming simple 
       5-year cumulative rewards, for the following cases:
     a) Do not purchase any aircraft (since they are expensive and huge fires
        have not happened yet).
     b) Do not purchase any aircraft, only do so after the first huge fire.
     c) Immediately purchase 1 aircraft.
     d) Immediately purchase 2 aircraft.

ANSWER: Compute expected utility for each policy: 
     a) Not purchasing any aircraft, in any year we get damages as follows:
        with probability 0.1 no fire and 0 damage,
        with probability 0.6 we get 1 fire, large with probability 0.9 (damage 10),
        huge with probability 0.1 (damage 300).
        with probability 0.3 we get 2 fires, which are both large with probability 0.81,
        both huge with probability 0.01, and one of each with probability 0.18.

        Total expected damages per year are thus:
        D = 0.1*0 + 0.6*(0.9*10+0.1*300) + 0.3 *(0.81*20+0.01*600+0.18*310) = 46.8

        Total expected damage for 5 years is 5*D, so expected utility is E(U(a)) = (-234).

     c) With 1 aircraft, same probabilities, but different damages. Damages per year are:
        D' = 0.1*0 + 0.6*(0.9*2+0.1*50) + 0.3 *(0.81*4+0.01*100+0.18*52) = 8.160
        Total expected damages are 5*D', and figuring cost of aircraft and maintenance
        we get E(U(c)) = -5*D' - 60 - 5 = (-105.8)

     b) Purchasing an aircraft only after the 1st huge fire, damages are like D'
        the year after the first huge fire. In order not to do the complete cpmputation,
        note that the probability of at least 1 huge fire in 5 years is:
        1-(1-0.6*0.1-0.3*0.19)^5 = approx 0.46
        So with probability 0.46 damage will be 300+60 (huge fire damages plus cost
        of aircraft), and we have E(U(c)) < (-165) even ignoring damages from years
        after purchasing the aircraft, so b is clearly worse than c.

     d) Same distributions again as in c, less damages, as follows:
        D'' = 0.1*0 + 0.6*(0.9*2+0.1*10) + 0.3 *(0.81*4+0.01*20+0.18*12) = 3.36
        Total expected damages are 5*D'', and figuring costs of 2 aircaft and
        maintenence costs we get: E(U(d)) = -5*D'' - 120 - 10 = (-146.8)
        So it does not pay to purchase 2 aircraft in this case, despite the fact
        that damages are further reduced.

     Optimal policy is c, purchase 1 aircraft.

    B) Repeat for discounted rewards, with a discounting factor of 0.2.

ANSWER: Again, compute expected utility. We have same probabilities and costs.
     a) For policy a (no purcahse), we have the same expected damages per year,
        but need to factor in the discounting. We get:
          E(U(a)) = - (46.8 + 0.2*(46.8+0.2*(46.8+0.2*(46.8+0.2*46.8)))) = -58.48128

     c) Policy c costs 60 immediately, and may still encur future damages,
        so is worse than a.

     b) This one is a bit hard to compute. We will show that it is worse than a.
        Consider any year in which a huge fire occurs. In such a year the total cost
        and damages will be at least 360. In all years prior to that costs will be
        the same as for policy a. Reasoning by cases, we can show that b is worse than a.
        Alternately, we can do the exact computation, which is a lot of work.

     d) Policy d costs 120 immediately, and may still encur future damages,
        so is worse than a.

    So not doing anyting is a good policy here for someone with a discounting factor
of 0.2, i.e. someone who really does not care much about the future.

    C) In a year that is considered a "global heat wave", the distribution
       of forest fire sizes changes so that a huge fire now has a probability
       of 0.3, and a large fire 0.7. Repeat A assuming
       that the probability of a "global heat wave" year occurence is independent
       and with a probability of 0.1

We need to repeat A for probabilities consistent with a "global heat wave", and
then do a weighted average with A. We get:

     a) Policy a, expected damages for a heat wave year:
        H = 0.1*0 + 0.6*(0.7*10+0.3*300) + 0.3 *(0.49*20+0.09*600+0.42*310) = 116.4

     Total expected damages for 5 years:
     E(U(a)) = -5*(0.1*H+0.9*D) = -(268.8)

     c) Policy c,expected damages for a heat wave year:
        H' = 0.1*0 + 0.6*(0.7*2+0.3*50) + 0.3 *(0.49*4+0.09*100+0.42*52) = 19.68

     Total expected damage for 5 years, factoring in 65 for aircraft costs, we get:
     E(U(c)) = -65 - 5*(0.1*H'+0.9*D') = (-111.56)
     So again c is better than a, and the margin is higher than in A.

     d) Policy d was inferior to c before, but with huge fires more likely we need to
        reconsider. We get, for heatwaves years:
        H'' =  0.1*0 + 0.6*(0.7*2+0.3*10) + 0.3 *(0.49*4+0.09*20+0.42*12) = 5.28

     Total expected damage for 5 years, factorting in purchase and maintenence costs for
     2 aircraft gives us:
     E(U(d)) = -130 - 5*(0.1*H''+0.9*D'') = (-147.76)
     Still not worth purchasing 2 aircraft.

     As to case b, a similar analysis to case A shows that it is inferior to c.
     So optimal is again c, purchase 1 aircraft.

    D) Giru Heller can predict "global heat wave" years with absolute accuracy,
       but charges 1 MUMECMU for each year predicted. Should we pay him, and if yes,
       for which years out of the next 5 years?

    Even for non heat-wave years, it is a purchasing 1 aircraft is better than none. So
    the question is, would knowing about a heatwave in advance cause us to purchase 2?
    If not, then automatically the information is worthless, otherwise we need additional
    computation. First, suppose we got info for all 5 years. For each heat-wave year,
    we save H'-H'' =  19.68-5.28 = 14.4 in damages by having the additional aircraft.
    For each non-heatwave year we save D'-D'' = 8.16-3.36 = 4.8 in damages by having the
    additional aircraft. We need to check in which cases out decisions will be to prefer
    action d to action c, depending on the number of known heat-wave years (deducting 65
    in the gain for costs of extra aircraft):

    5 heat-wave years: 14.4*5 - 65 = 7   (gain 7 in d vs. c)
    4 heat-wave years:  14.4*4+4.8 - 65 = (-2.6)  better to stick with c,
      and the same for any SMALLER number of heat-wave years.

    This means that the VOI for the information is non-zero, so need to look deeper.
    We only gain from knowing about 5 heat waves (if we know only about 4, there is
    only a probability of 0.1 for the 5th heat wave, in which case we should still
    stick with c, CHECK IT!).

    Therefore, the only possible set of information items to purchase (that has non-zero VOI)
    is for ALL 5 years, which costs 5. But what is it worth? Well, 0 in all cases except
    if it turns out that we have 5 heat-wave years, which will occur with probability
    0.1^5 = 0.00001, in which case we gain 7. So the VOI for this set of info
    is only worth VOI = 7*0.00001 = 0.00007 < 5   and is not a good value for money, as the VOI
    is smaller than even the cost of the 1st year. Perhaps
    if the clairvoyant offered his services much more cheaply... 

    Actually, for a cheaper prediction, this may get rather tricky. Suppose that the cost
    per year were only 0.00001401, i.e, the cost for 5 years is a BIT more than the VOI.
    Still, in this case it may still be worth it to pay, but not all at once! In this case
    the optimal policy (after purchasing the first aircraft) is to ask about one year. 
    If the answer is NO HEAT WAVE, stop. If the answer is yes, ask about a 2nd year, and
    keep doing this conditional scheme until the answer is NO, or until you get a YES
    answer for all 5 years, in which case purchase the 2nd aircraft. Why is this optimal?
    Because here the EXPECTED payment for the predictions is LESS than the VOI! (And it is
    the cheapest way of obtaining the information.)

6) Consider the following Canadian traveller problem instance.

   The (undirected) graph has 4 vertices, as follows:
     (I, V1, V2, G)
   All vertices are directly connected by edges.
   The edges are as follows:
     (I, V1), weight 1, no ice.
     (I, V2), weight 2, no ice.
     (V1, V2), weight 3, probability of ice 0.5
     (V1, G), weight 1, probability of ice 0.8
     (V2, G), weight 2, probability of ice 0.5
   The cost factor K for crossing an icy edge is 10.
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function?

ANSWER: We have 4 locations, and 3 unknown edges. The belief-state space
      is thus spanned by (v, e1, e2, e3) where
      the domain of v is {I, V1, V2, G} and the domain of the ei are {I, N, U}
      standing for ice, no ice, and unknown, respectively. We will assign:
         e1 to represent the state of (V1, V2)
         e2 to represent the state of (V1, G)
         e3 to represent the state of (V2, G)
      Actions are: traversing any edge incident to the current location.

      Transition distribution: defined based on components. For any action,
      the location will be the other side of the edge stated in the action.
      The belief state edge components will be unchanged for all known edges,
      and for unknown edges incident on the new location, they will transition
      to I or N depending on the probability of ice. For example, for the initial
      state with the action: traverse edge (I, V1) we get:

       T((I,U,U,U), (I,V1), (V1,I,I,U)) = 0.5*0.8 = 0.4
       T((I,U,U,U), (I,V1), (V1,I,N,U)) = 0.5*(1-0.8) = 0.1
       T((I,U,U,U), (I,V1), (V1,N,I,U)) = (1-0.5)*0.8 = 0.4
       T((I,U,U,U), (I,V1), (V1,N,N,U)) = (1-0.5)*(1-0.8) = 0.1
      Note that these numbers must sum to 1.
      Likewise, from the initial state, with the action: traveres edge (I, V2) we get:
       T((I,U,U,U), (I,V1), (V2,I,U,I)) = 0.5*0.5 = 0.25
       T((I,U,U,U), (I,V1), (V2,I,U,N)) = 0.5*(1-0.5) = 0.25
       T((I,U,U,U), (I,V1), (V2,N,U,I)) = (1-0.5)*0.5 = 0.25
       T((I,U,U,U), (I,V1), (V2,N,U,N)) = (1-0.5)*(1-0.5) = 0.25
      We need to define these distributions for all 4*3^3 = 108 belief states and
      for each possible action in each state, but in fact doing so only for the
      belief states REACHABLE from the initial belief-stae suffices. For example
      the belief state (G,U,U,U) is not reachable from the initial state, as we cannot
      reach G without observing some of the unknown edges.
      The reachable states are, in addition to the 8 above indicated states:
       (x, e1, e2, e3) where x could be I, V1, V2 ,G and ei could be either I or N (total 32).
       (I, e1, e2, e3) where e1, e2, e3 are as in the above 8 belief states (total 8)

      The reward function is defined by the belief state before the action
      and the action. It is minus the weight of the edge if the edge
      has no ice in the belief state, and K times the weight if the
      edge is icy. An edge cannot be unknown when it is crossed, due to the definition
      of the transition distribution. For example:
       Reward for action traversing edge (I,V1) in the initial state
       (denoted R((G,U,U,U), (I,V1))) is -1. Likewise:
       R((V1,N,N,U), (V1,V2)) = -3
       R((V1,I,N,U), (V1,V2)) = -30

      Since this is an indefinite-horizon POMDP, we need to define final belief states,
      and these are defined as all states of the form: (G,e1,e2,e3), i.e.
      any state where the agent reaches G, with any edge state whatsoever.

   b2) Find the optimal policy. You may assume that the robot starts
       at vertex I in order to save some computation.

   We use a truncated version of value iteration, starting from goal states, and updating
   only belief states reachacle from the initial state.

     V((G,e1,e2,e3)) = 0    ; obvious for any goal state

   However, we will now apply a "shortcut". Note that edge (V1, V2) has a weight of 3,
   and therefore V1 can be reached from V2, or vice versa, for a cost of 3 regardless
   of whether this edge has ice, through I. Therefore the state of this edge is
   irrelevant, and we can lump together all states that disagree only on the
   state of this edge. This reduces by a factor of 2 or 3 the number of belief states we
   need to evaluate. Next we examine the states where all edges are known (disregarding (V1, V2)):
    V((V1,x,N,y)) = -1   ; here state of (V2,G) does not matter, another shortcut
    V((V1,x,I,I)) = -10
    V((V2,x,y,N)) = -2   ; here the state of (V1,G) does not matter.
   Based on these we get:
    V((I,x,N,y)) = -2    ; due to action (I,V1), independent of state of edge (V2,G)
    V((I,x,I,N)) = -4    ; due to action (I,V2)
    V((I,x,I,I)) = -11   ; due to action (I,V1)

   Now we can update some additional states:
    V((V1,x,I,N)) = -5   ; due to action (V1,I)
    V((V2,x,N,I)) = -4   ; due to action (V2,I)
    V((V2,x,I,I)) = -13  ; due to action (V2,I)
   This concludes all the cases with known edges, as well as some with unknown edges.
   We need to know the values of all the 8 states (V1,x,y,U) and (V2,x,U,y). Of these,
   4 are already covered: (V1,x,N,U), (V2,x,U,N), and we still need to evaluate
   (V1,x,I,U), (V2,x,U,I). However, in order to get their values we need the values of
   (I,x,I,U) and (I,x,U,I), respectively. 

   Starting in (I,x,I,U) and doing action (I, V2) we get to (V2,x,I,I) or (V2,x,I,N)
   with equal probability, paying 2 to do so. We get an expected value for this action of:
     -2 - 0.5*(13+2) = - 9.5
   and since this is less than going throgh V1, we get:
    V((I,x,I,U)) = -9.5   ; due to action (I,V2)

   Starting in (I,x,U,I) and doing action (I, V1) we get to (V1,x,I,I) or (V2,x,N,I)
   with probabilities (0.2, 0.8) respectively, and thus we would get an expected value:
     -1 - 0.8*10 - 0.2*1 = -1-8-0.2 = -9.2
   and since going through V2 is worse, we get:
    V((I,x,U,I)) = -9.2   ; due to action (I,V1)

   Now to evaluate the rest of the states needed:
    V((V1,x,I,U)) = -10   ; due to action (V1,G), because doing (V1,I) brings us to (I,x,I,U)
                          ; costing 1 and getting to a state with value -9.5, total -10.5

    V((V2,x,U,I)) = -11.2 ; due to action (V2,I), because doing (V2,G) costs 20

   We are now in position to compute best action from initial state (G,U,U,U).
   Action (I,V2) results in  V((V2,x,U,I)) or V((V2,x,U,N)) with equal probability,
   and the expected value is:
    Q((G,U,U,U),(I,V2)) = -2 - 0.5*(11.2+2) = -8.6

   Action (I,V1) results in  V((V1,x,I,U)) or V((V1,x,N,U)) with probability (0.8, 0.2)
   and the expected value is:
    Q((G,U,U,U),(I,V1)) = -1 - 0.8*10 - 0.2*1 = -9.2

   So the better action from the initial state is (I,V2), with expected cost of optimal
   policy being -9.2, despite the fact that the path through V1 may be shorter. The rest of the
   policy is, if (V2,G) is discovered to be icy, go back to I to V1 to G, regardless
   of the state of the edge (V1,G). If (V2,G) is not icy, traverse it to G.

   b3) In addition, from node I only, and before it starts moving,
      the agent can perform a SENSE action
      for a cost of 1, which results in the agent knowing the state of
      the edge (V2, G). Re-compute the optimal policy for this case.

   If we apply the sensing action, there are 2 scenarios:
    a) (V2,G) is not icy (probability 0.5). 
       In this case the optimal policy is to go through V2, same
       as before, and no gain from the information, for a certain cost of 4.
    b) (V2,G) is icy (probability 0.5).
       In this case it would be better to go from V1, regardless, for an
       expected value  V((I,x,U,I)) = -9.2
       In this scenario the old optimal policy would have value -2-V((V2,x,U,I)) = -13.2
       The gain from the old policy in this case is VOI(icy (V2,G)) = -9.2 - (-13.2) = 4

    Thus the expected value of information from the sensing action is:
      VOI = 0.5*0 + 0.5*4 = 2
    which is more than the cost of 1 for the sensing action.
    The optimal policy thus is first to sense (V2,G) and then go through V2
    if the edge is not icy, otherwise to go through V1, whether or not (V1,G) is icy.

7)  A principal researcher in the Canadian-traveler problem research project
    would like to construct a decision procedure based on results from hiring
    employees in the IMG4 consortium. 
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  C programming ability (B) Self motivation (C), grade in AI (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
   1      F      L      70             N
   1      F      M      95             Y
   1      T      L      70             N
   2      F      H      95             Y
   2      T      M      95             N
   2      T      M      80             N
   2      T      H      95             Y
   2      T      H      80             Y

ANSWER: Check amount of information gain by each attribute.

For A=1 we have 1Y, 2N,
    A=2 we have 3Y, 2N

For B=F we have 2Y, 1N,
    B=T we have 2Y, 3N

So distributions after A or B are the same, and neither is better than
the other, no need to compute anything for now.

For C=L we have 0Y, 2N  (perfect classification)
    C=M we have 1Y, 2N
    C=H we have 3Y, 0Y

The case C=M is the same as for A=1, and the rest perfectly classified,
so C is better than A or B.

For D=70 we have 0Y, 2N
    D=80 we have 1Y, 1N
    D=95 we have 3Y, 1N

It looks as if C is better than D, but not all that obvious so we need to
compute remaining information to make sure.

For C, we have only the case C=M with 3 examples that cannot be classified
perfectly, number of bits needed to complete the classification (distribution 1/3, 2/3)
is: I(C=M) = - 3 ((1/3)*log(1/3) + (2/3)*log (2/3))= - log(1/3) - 2*log(2/3)
which is less than 3 bits.

For D, we need I(D=80) = - 2((1/2)*log(1/2)+(1/2)log(1/2)) = 2 bits
and I(D=95) = -4 (3/4*log(3/4)+1/4*(log(1/4)) = - 3*log(3/4) - log(1/4)
Total number of bits needed more than 4.

Since after getting ansewr to C less information is needed
to classify, C is picked.
Now we look at the case: C=M, and the remaining 3 examples.
These can now be prefectly classified after using A or B, but D
does not help. So arbitrarily pick A, to get the decision tree:

C? --- C=L  --- N
    -- C=H  --- Y
     - C=M  --- A?  --- A=1 ---- Y
                      - A=2 ---- N

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

ANSWER: Not possible. We only have 2 nodes (1 internal node), and have already shown that
    a there is no decision tree with only one node (0 internal nodes)
    that perfectly classifies the examples.

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.