Introduction to Artificial Inteligence

Assignment 6 - Solution


Written assignment: planning, probabilistic inference, and learning

Justify all answers!

1) We are given the following independence statements about random variables
   A, B, C, D, E, F:

      I({A,E,F},{B,C,D}|{})
      I({B},{C}|{})
      I({A},{E}|{F})
      
      not I({B},{C}|{D})
      not I({A},{E}|{})


a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

ANSWER:
   I({A, E, F},{B,C,D}|{}) entails that none of A, E, F are neighbors of any of B, C, D so
   the graph can be split into 2 separate components: {A,E,F} and {B,C,D}

   I({B},{C}|{}) means that B is not a neigbor of C. However,
   "not I({B},{C}|{D})" means that there is a path from B to C, which must be
   through D, the only remaining node in this part of the graph.
   Since the path is not blocked by D, then D must be a converging node on this path.

   I({A},{E}|{F}) means that A is not a neigbor of E.
   "not I({A},{E}|{})" means that there is a path from A to E, which must be
   through F (cannot be direct, see above), the only remaining node in this part of the graph. 
   Since the path is blocked by D, then D must be either a diverging or a passthrough node on this path.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with the statements?
ANSWER:
   Obviously, not unique. Although te first part of the graph {B,C,D} only has
   one possible configuration, the second part {A,E,F} has three possiblilities,
   one diverging, and 2 passthrough.

2) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        B               none
        C               none
        D               A C
        E               A
        G               D B
        H               C E

    a) Is this network a poly-tree?
ANSWER: No, the undirected path ADCHEA is a cycle in the underlying undirected graph. 

    b) Is this network (directed-path) singly connected?
    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER: No, the undirected path EAD is not blocked (A is a diverging, non-evidence node)

       2)  I({D}, {E} | {A})
ANSWER: Yes. The undirected path EAD is blocked by A (a diverging evidence node), and
        path EHCD is blocked by H (converging, childless non-evidence node)

       3)  I({D}, {E} | {A, G})
ANSWER: Yes, same reason as in 2.

       4)  I({B}, {E} | {D})
ANSWER: Yes. All paths from B to E are blocked by D as it is a passthrough evidence node in all of them.

       5)  I({B}, {C} | {})
ANSWER: Yes, all paths from B to C are blocked by G, a converging, childless non-evidence node.

       6)  I({B}, {C} | {H})
    d) Assume that the nodes are binary-valued.
ANSWER: Yes, for the same reason as in 5.

       Suppose that P(A=true) = 0.2, P(B=true)= 0.1, P(E=true|A=false) = 0.5,
       P(E=true|A=true) = 0.5, 
       P(H=true|E=true) = 0.2, P(H=true|E=false) = 0.4, 

       Find P(A=true | B=true, H=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.
ANSWER: G is a childless non-evidence non-query node, it is thus barren and is can be removed.
        Now the same holds for D, which can be removed.
        Now B is disconencted from the rest of the graph, and is d-separated from A for any
        evidence set so:  P(A|B,H) = P(A|H) and it is thus sufficent to compute  P(A=true | H=true)
        Now we have a remaining BN consisting of A, E, H, C and lacking the conditionals of H
        given E and C. But we can avoid this need, since we know that In({A},{H}|{E})
        due to d-separation, and therefore we have:

        P(A|H) = P(A,H)/P(H) = ((sum over E) P(A,H,E)) / ((sum over E) P(H,E)) =
                 ((sum over E) P(A,H,E)) / ((sum over E and A) P(H,E,A)) =
                ((sum over E) P(A)P(E|A)P(H|E)) / ((sum over E and A) P(A)P(E|A)P(H|E))

        And for the latter we have all the numbers needed, and we can compute 
        P(A|H) for any value of A and H.
        However, in fact, given the numbers in P(E|A) in fact
        it happens that E is also not dependent on A (accidentally), and therefore H also
        does no depend on A. So actually P(A=true|H) = P(A) = 0.2, saving a lot of computation...

3) You need to construct a 6-input neural network that has value 1 in the 
   output just when the number of its inputs that are "on" is either 1
   or 4 (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?
ANSWER: No. Required 4 hyperplanes to separate all the 0's from the 1's in the output,
     and without hidden units only a single hyperplane is possible.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)
ANSWER: Numerous possibilities. One of the simplest uses 2 hidden units standing as
      detectors for: U1="more than 1", U4="more than 4".
      Each of the Ui is fed by all inputs, with an input weight of 1 and a threshold of i+0.5
      The output unit O has as inputs all the network inputs (weight of 1), and has a threshold of 0.5
      It is also fed by the outputs of U1 and U4 with a weight of -3 each.

      Then when all inputs are 0, none of the units are on and the output is 0.
      When 1 input is 1, the hidden units are off so the threshold of O is passed and the output is 1.
      When 2 or 3 inputs are on, unit U1 turns on, adding -3 to the weighted input of O, so the output
      of O is 0.
      When 4 inputs are on, these 4 inputs overcome the -3 from U1, and output O becomes 1 again.
      Finally, when 5 or 6 inputs are 1, unit U4 comes on, and together with U1 causes a -6 weight
      in the input of O, causing its output to be 0 again.

4) a) For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:
 
       
    D  A
    C  B
------------

      and the goal state containing: On(A,B) and On(B,C)

ANSWER:
      Initial plan has the dummy steps Start and End, with:

      Start having "effects": On(C,Table), On(B,Table), On(A,B), Clear(A), On(D,C), Clear(D), Clear(Table)
      (i.e. describes initial state).

      End having preconditions: On(A,B), On(B,C)  
      (i.e. same as goal state requirements)

      With ordering constraint: Start before End.

      1) Now POP detects 2 unsatisfied preconditions for step End: On(A,B), On(B,C)
      Suppose we pick On(B,C)    (Picking the other one may cause problems!).
      There is no step with these effects in the plan, so we need to
      add a step:

      Step1:  Op: PutOn(B, C)
              Pre: Clear(B), Clear(C), On(B,z)
              Effects: On(B,C), not Clear(C), Clear(z), not On(B,z)

      We add a link from Step1 to End, and constraints: Start before Step1, and Step1 before End.
      There are no cloberers, so continue.

      2) Suppose we now work on the Clear(C) precondition of Step1.
      There is no step with these effects in the plan, so we need to
      add a step:

      Step2:  Op: PutOn(D, Table)
              Pre: Clear(D), Clear(Table), On(D,z)
              Effects: On(D,Table), Clear(z), not On(D,z)
      With z=C. We add a link from Step2 to the Clear(C) precondition of Step1.
      There are no cloberrers, so we can continue.

      3) Suppose we now work on the On(B,z) precondition of Step1. This can be satisfied
         with z=A, by the On(B,A) effect of Start, and add the link. No cloberers.

      4) Suppose we now work on the Clear(B) precondition of Step1. No way to satisfy it
         with existing steps, so add a step:

      Step3: Op: PutOn(A, Table)
             Pre: Clear(A), On(A,z)
             Effects: On(A,Table), not On(A,z), Clear(z)
      With z=B. We add a link from Step3 to the Clear(B) precondition of Step1.
      Constraints added are Start before Step3 and Step3 before Step1.

      5) Clear(A) precondition of Step3 can be lined to Clear(A) effect in Start.

      6) On(A,z) is actually On(A,B) and this precondition is
         linked to effect On(A,B) of Start.

      7) The only unsatisfied precondition is now On(A,B) of End. Although we COULD link
         it to the On(A,B) effect of Start, we have Step3 as a cloberer that cannot
         be either demoted or promoted. The only solution is another new step:

      Step4:  Op: PutOn(A, B)
              Pre: Clear(A), Clear(B), On(A,z)
              Effects: On(A,B), not Clear(B), Clear(z), not On(A,z)

      With z=Table. We add the link from Step4 to the On(A,B) precondition of End.
      Constraints are Start before Step4 before End. Also note that Step4 clobbers
      precondition Clear(B) of Step1, and must be promoted (Step1 before Step4)
      or demoted (Step4 before Step1). However, Step1 also clobbers the On(A,B)
      precondition of End (achived by Step4) and since Step4 cannot be promoted
      (cannot do End before Step4), must demote (Step1 before Step4), which is not
      consistent with demoting Step4. Thus we get add the constraints:
      Step1 before Step4 and Step3 before Step4.

      8) Now to resolve the preconditions of Step4, as follows:
         Clear(A) by Start,
         Clear(B) by Step3,
         On(A,Table) by Step3.

      The constaints resolve to the following:
         Setp2 before Step1
         Step3 before Step1
         Step1 before Step4
      Which resolves to the ordering:
        Start before {Step2, Step3} before Step1 before Step4 before End.
      The plan is now complete and correct (and Step2 and Step3 can be executed in any order)

      Note that our choices were such that no backtracking is required. Different
      choices might have required considerable backtracking to find the solution.

   b) Suppose that is not known whether On(A,B) or On(B,A) in the initial state,
      and that there is an operator "Check(On(x,y))" which results in knowing whether
      this is true or not. Show how a conditional plan is constructed for the above
      case.

ANSWER: We will assume that the issue of  "On(A,B) or On(B,A)": is the ONLY unknown.
      In this case the initial plan would have:
      Start having "effects": On(C,Table),  On(D,C), Clear(D), Clear(Table),
      (On(B,Table), On(A,B) Clear(A)) or  (On(A,Table), On(B,A) Clear(B))

      End having preconditions: On(A,B), On(B,C)  
      (i.e. same as goal state requirements)
      With ordering constraint: Start before End.

      We could begin the same way:

      1) POP detects 2 unsatisfied preconditions for step End: On(A,B), On(B,C)
      Suppose we pick On(B,C)    (Picking the other one may cause problems!).
      There is no step with these effects in the plan, so we need to
      add a step:

      Step1:  Op: PutOn(B, C)
              Pre: Clear(B), Clear(C), On(B,z)
              Effects: On(B,C), not Clear(C), Clear(z), not On(B,z)

      We add a link from Step1 to End, and constraints: Start before Step1, and Step1 before End.
      There are no cloberers, so continue.

       2) Suppose we now work on the Clear(C) precondition of Step1.
      There is no step with these effects in the plan, so we need to
      add a step:

      Step2:  Op: PutOn(D, Table)
              Pre: Clear(D), Clear(Table), On(D,z)
              Effects: On(D,Table), Clear(z), not On(D,z)
      With z=C. We add a link from Step2 to the Clear(C) precondition of Step1.
      There are no cloberrers, so we can continue.

      3) Suppose we now work on the On(B,z) precondition of Step1. This can be satisfied
         with z=Table, by the On(B,Table) effect of Start, or with z=A by the On(B,A) effect
         of Start. But we do not know WHICH yet. So need to check. Add a step:

      StepC: Op: Check(On(B,z))
             Pre: none
             Effects: Know(On(B,z))
      
      For simplicity, assume that we add the
      constraint that StepC follows immediately after Start.
      The effect could be Know(On(B,Table)) and thus, assuming we have the reasoning mechanism,
      we would also get that B is not on A. 
      For this case we continue as in the case where we were certain that On(A,B), and
      add steps Step3 and Step4 with the same ordering constraints, to get:
         Setp2 before Step1
         Step3 before Step1
         Step1 before Step4

      The effect of StepC could also be: Know(On(B,A)), in which case we continue as follows:

      4') Now we know that as a result of StepC, we have On(A,Table), On(B,A) Clear(B).
          So now we can link this side of the effects of StepC to satisfy preconditions
          On(B,A) Clear(B) of Step1.

      5') Now the only unsatisfied precondition is On(A,B) of End. The only way to satisfy it
          is by adding a step (cannot use Step3, because it is for the "other" case only): 

      Step5:  Op: PutOn(A, B)
              Pre: Clear(A), Clear(B), On(A,z)
              Effects: On(A,B), not Clear(B), Clear(z), not On(A,z)

      And link it to the On(A,B) precondition of End. Since Step5 clobbers
      Clear(B) of Step1, and cannot promote, we have Step1 before Step5.

      6') Now we need to resolve the preconditions of Step5. They can all be resolved
          as follows:
           Clear(A) as an effect of Step1,
           Clear(B) as an effect of StepC (and Start),
           On(A,Table) as an effect of StepC.
      So for this case we have the constraints: Step1 Before Step4 Before End.

      The conditional plan is now complete and correct.
      Note that the solution requires some details such as reasoning about
      the relationship between On and Clear, that have not been considered.
      Alternately, these could be effects of the Check operator.


5)  Consider the following scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the value unit "Million 
    Unspecified-Middle-Eastern-Country Monetary Unit" (MUMECMU for short).
    Assume also that moral considerations are not part of your utility function.

    Forest fires are a disaster that occur with intervals approximately according to an
    exponential distribution. The damage caused by a fire can also be modeled by
    an appropriate distribution. For simplicity, we will use an annual time granularity,
    and assume that the probability of forest fire count per year is as follows:
      a) no forest fire: 0.4
      b) 1 forest fire:  0.6
    The distribution of forest fire sizes is:
      a) Large: 0.9
      b) Huge: 0.1
    Assume that fire size distributions are independent, and so are fire occurences
    in different years.

    The damage caused by a forest fires are as follows (depending on whether you have
    fire-extinguishing air-craft):
      a) Large: 10 MUMECMU if you have no aircraft, 2 MUMECMU if you have at least 1 aircraft
      b) Huge:  300 MUMECMU if you have no aircraft, 50 if you have at least one aircraft,
                but only 10 MUMECMU if you have 2 or more aircraft.

    You need to decide whether to purchase fire-extinguishing aircraft, and if so,
    how many. Each aircraft costs 40 MUMECMU, and maintainance costs of each
    aircraft is 1 MUMECMU per year. 

    A) Which is the best policy (resp. worst) from the candidates below,
       assuming that the lifetime of each aircraft is 10 years, and assuming simple 
       3-year cumulative rewards, for the following cases:
     a) Do not purchase any aircraft (since they are expensive and huge fires
        have not happened yet).
     b) Do not purchase any aircraft, only do so after the first huge fire.
     c) Immediately purchase 1 aircraft.
     d) Immediately purchase 2 aircraft.

ANSWER: Assuming that at ONE forest fire per year can occur, and that occurences between years
       are independent, and that the types are independent. We get:
     Policy a: no aircraft. Expected utility for year 1 is:
           EU1(a) = - 0.6*(0.9*10+0.1*300) = -0.6*39 = -23.4  per year
           EU(a) = 3*EU1(a) = -70.2  for 3 years

     Policy c: purchase 1 aircraft immediately (we do before b, as it is easier and we
               may be able to prune b and d withiout having to compute them completely).
           EU1(c) = -41 - 0.6*(0.9*2+0.1*50) = -45.08  for the first year
           EU2(c) = -1 - 0.6*(0.9*2+0.1*50) = -5.08  for the years thereafter
           EU(c) = EU1(c)+2*EU2(c) = -55.16  for 3 years. So policy b is better than a.

     Policy d: purchase 2 aircraft. But this one costs at least 80 just for the aircraft, so
           its expected utility is less than -80, so prune the computation as d is worse than a and c.

     Policy b: purcahse an aircraft only after the first huge fire. This is rather hard to compute,
           so let us see if we can bound the costs. The first year is the same as policy a,
           since we did not yet have a fire, so:
               EU1(b) = EU1(a) = -23.4
             Now, for year2, either we get the same as before (with probability 0.4+0.6*0.9 = 0.94
             there is no huge fire in year 1, so we have the same in year2. But with probability 0.06
             we have a huge fire in year 1, so purchase an aircraft and utility in year 2 is the
             same as year 1 in policy c). So we have:
               EU2(b) = 0.94*EU1(a) + 0.06*EU1(c) =   0.94*(-23.4) + 0.06*(-45.08) = -24.7008 for year 2.
             The total up to now is EU1(b)+EU2(b) = -23.4 - 24.7008 = -48.1008
             which is still beter than loss by policy c, so we cannot prune yet. But in the 3rd year,
             there is a probability of 0.94*0.94=0.8836 of no huge fire in euther of the previous years,
             in which case we will have to pay the expectation as in policy a, so we have:
               EU3(b) < 0.8836*EU1(a) =  0.8836*(-23.4) = -20.67624
             From this we already can see that EU(b) =  EU1(b)+EU2(b)+EU3(b) < -48.1008-20.67624 < -68
             and this is already worse than policy c. 

             So policy c (purchase one aircraft immediately) is optimal.

    B) Repeat for discounted rewards, with a discounting factor of 0.2.
ANSWER:   Here EU=EU1+0.2*EU2+0.04EU3    (where EUi is the expected utility in year i for 
                                          the appropriate policy) so we have:

      EU(a) = -23.4-0.2*23.4-0.04*23.4 = -29.016

      EU(c) < EU1(c) = -45.08    which is worse than a so no need to compute fully.

      EU(d) < -80 just for aircraft purchase, so again no need to compute exact utiliy.

      EU(b) < -23.4-0.2*24.7008-0.04*20.67624 = -29.1672096 which is already worse than a.

      So policy a is optimal in this case (because the large cost of the aircraft is no longer
      fully amortized across the 3 years).
 
    C) In a year that is considered a "global heat wave", the distribution
       of forest fire sizes changes so that a huge fire now has a probability
       of 0.3, and a large fire 0.7. Repeat A assuming
       that the probability of a "global heat wave" year occurence is independent
       and with a probability of 0.1

ANSWER: In a heat wave year, aircraft become a BETTER proposition than before. Suppose we KNOW
        that this is a heat wave, then we have:

       Policy a: EU1(a|HW) = - 0.6*(0.7*10+0.3*300) = -58.20 which is worse than a normal year by far.
       So even factoring with cases of a normal year, we get:
            EU1(a) = 0.9 * EU1(a|~HW) + 0.1* EU1(a|HW) = -26.88
       In any case a will be worse than c.

       Policy c: EU1(c|HW) = -41 - 0.6*(0.7*2+0.3*50) = -50.84
                 EU2(c|HW) = -1 - 0.6*(0.7*2+0.3*50) =  -10.84
                 EU3(c|HW) = -1 - 0.6*(0.7*2+0.3*50) =  -10.84
       Therefore EU(c|HW) = -50.84-10.84-10.84 = -72.52
       But EU(c) is a convex sum of EU3(c|HW) and EU3(c|~HW) so cost is between them, thus less than 80.
       This implies that policy d is STILL sub-optimal.

       It is also easy to show that policy b is suboptimal, so the best policy is again c,
       purchase 1 aircraft in any case.

    D) Giru Heller can predict "global heat wave" years with absolute accuracy,
       but charges 1 MUMECMU for each year predicted. Should we pay him, and if yes,
       for which years out of the next 3 years?

ANSWER:
       The best policy is (a), purchase 1 aircraft, and this remains the case even if we KNOW
       that the year is a heatwave (see computation above). 
       So getting the information will not change our decision, therefore
       the value of information about "global heat wave" in this question is ZERO.
       Therefore, send Giru Heller packing...

6) Consider the following multi-car Canadian traveller problem instance.

   The (undirected) graph has 4 vertices, as follows:
     (I, V1, V2, G)
   All vertices are directly connected by edges.
   The edges are as follows:
     (I, V1), weight 10, no flood.
     (I, V2), weight 20, no flood.
     (V1, V2), weight 30, probability of flood 0.5
     (V1, G), weight 10, probability of flood 0.8
     (V2, G), weight 20, probability of flood 0.5
    There is a car C1 with speed 10 at V1 that can travel at speed 5 on a flooded road,
    And a car C2 with speed 20 that cannot travel a flooded road ad all at V2
    Switching cars cost 1
      
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function?
ANSWER: 
There are 3 edge state variables. Denote (V1,G) by e1, (V2,G) by e2, and (V1,V2) by e3.
Each edge can be flooded (F), not flooded (N), or unknown (U). The locations of cars
will be variables C1, C2 each with domains {I, V1, V2, G}. The agent can be either in C1 or C2,
we will call these possible values of the variable A. The belief state spece is the cross product
of all these variables:  D(e1)*D(e2)*D(e3)*D(C1)*D(C2)*D(A) which has size 3*3*3*4*4*2 = 864

Actions are: traversing any edge incident to the current location.
    
Now we cheat a little. Note that (V1,V2) has weight 30, which is equal to the sum
of (I,V1) and (I,V2) which are known to be flooded. 
Since there is no car in this domain that travels a flooded road faster than
an unflooded one, every policy that traverses (V1,V2) can be modified to one that traverses
(V1,I) and then (I,V2) at the same or lower cost. So we can DELETE the edge e3 = (V1,V2) from the
problem statement, and will do this to simplify the state space and the problem.
So a state is denoted by a tuple: (C1, C2, A, E1, E2)

Transition distribution: defined based on components. For any action,
the location will be the other side of the edge stated in the action.
The belief state edge components will be unchanged for all known edges,
and for unknown edges incident on the new location, they will transition
to F or N depending on the probability of flood. For example, for the initial
state with the action: traverse edge (I, V1) we get:

       T((I,V2,C1,U,U), (I,V1), (V1,V2,C1,F,U)) = 0.8    (the probability that e1 is flooded)
       T((I,V2,C1,U,U), (I,V1), (V1,V2,C1,N,U)) = 0.2    (the probability that e1 is not flooded)
      Note that these numbers must sum to 1.
      Likewise, from the initial state, with the action: traveres edge (I, V2) we get:
       T((I,V2,C1,U,U), (I,V2), (V1,V2,C1,U,FU)) = 0.5    (the probability that e2 is flooded)
       T((I,V2,C1,U,U), (I,V2), (V1,V2,C1,U,FU)) = 0.5    (the probability that e2 is not flooded)

We need to define these distributions for all 4*4*2*3*3 = 288 belief states and
for each possible action in each state, but in fact doing so only for the
belief states REACHABLE from the initial belief-stae suffices. For example
the belief state (G,V2,C1,U,U) is not reachable from the initial state, as we cannot
reach G without observing some of the unknown edges.

Reward function: The reward function is R(s,a) defined by the belief state before the action
and the action. It is minus the weight of the edge divided by the car speed for the state of the edge.
An edge cannot be unknown when it is crossed, due to the definition
of the transition distribution. For example:
Reward for action traversing edge (I,V1) in the initial state
       (denoted R((I,V2,C1,U,U), (I,V1))) is -1. Likewise:
        R((I,V2,C1,U,U), (I,V2))) = -20/10 = -2  (edge weight is 20, car speed is 1)
        R((I,I,C2,U,U), (I,V2)))  = -20/20 = -1  (since now the agent is using car C2, and it has speed 20)
        R((V1,V2,C1,F,N),(V2,G)) = -10/5    (since the flooded edge costs 10, and agent is in car C1 with speed 5) 
      Etc.

Since this is an indefinite-horizon POMDP, we need to define final belief states,
and these are defined as all states of the form: (G,X,C1,X,X)  or (X,G,C2,X,X) i.e.
any state where the agent is in a car that is at G with any edge state whatsoever.


   b2) Find the optimal policy. You may assume that the robot starts
       at vertex I in a car C1 in order to save some computation.

ANSWER:
   We use a truncated version of value iteration, starting from goal states, and updating
   only belief states reachacle from the initial state. We will use arbirary variables X,Y,Z
   to denote ANY possible state of a variable (the equations below hold for every possible such value). 

     V((G,X,C1,Y,Z)) = 0    ; obvious for any goal state
     V((X,G,C2,Y,Z)) = 0    ; obvious for any goal state

Next we examine the states where all "relevant" edges are known, that are 1 action away from the goal:
     V((V1,X,C1,N,Y)) = -1     ; traversing e1
     V((X,V2,C2,N,Y)) = -0.5   ; traversing e1 with the faster car
     V((X,V2,C2,Y,N)) = -1     ; traversing e2 with the faster car
     V((V1,X,C1,F,Y)) = -2     ; traversing e1, flooded edge
     V((V2,V2,C1,F,F)) = -4    ; traversing e2, flooded edge
Note that in all this cases, the state of the "other" edge is irrelevant, and can also be unknown,
so caters for all states from which the goal can be reached in one action.

Now for s     V((V1,X,C1,N,Y)) = -1     ; traversing e1
     V((V1,V1,C2,F,F)) = -3    ; e1 flooded, so switch to car C1, then in state from V((V1,X,C1,F,Y))
     V((V1,X,C1,N,Y)) = -2     ; traversing (I,V1) then in state from V((V1,X,C1,N,Y)) 
     V((V2,V2,C1,Y,N)) = -2    ; e2 not flooded, so switch to car C2, then in state from V((X,V2,C2,Y,N)) = -1

We are now ready to find the optimal policy from the initial state (I, V2, C1, U, U).
We only have 2 possible actions: (I,V1) and (I,V2). 
 Taking action (I,V1) which costs 1, we reach:
      V((V1,V2,C1,N,U)), which has value -1, with probability 0.2
      V((V1,V2,C1,F,U)), which has value -2,  with probability 0.8
    So expected value of taking this option is -2 - (0.2*1+0.8*2) = -3.8

 Taking action (I,V2) which costs 2, we reach:
     V((V2,V2,C1,U,N)), which has value -2, with probability 0.5
     V((V2,V2,C1,U,F)), which has an unknown value < -2, with probability 0.5
    So expected value off taking this option is less than -4, so this is suboptimal!

The best policy is therefore to traverse (I, V1) followed by (V1, G) using car 1 regardess
of what is observed along the way! Observe that we saved a LOT of computation by pruning...
 

   b3)In addition, from node I only, and before it starts moving,
      the agent can perform a SENSING action
      for a cost of 1, which results in the agent knowing the state of
      the edge (V2, G). Re-compute the optimal policy for this case.

ANSWER:
If we knew the state of (V2, G), then in any event the cost of a policy that begins with
travelling to V2 would be at least 4. So the policy given the information would not change at all,
and the value of information for sensing is ZERO. Certainly not worth paying 1 for it.

7)  A principal researcher in the Intelligent Smart Grid (ISG) consortium
    would like to construct a decision procedure based on results from hiring
    employees in the Canadian Traveler Problem research project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  C programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
--------------------------------------------
   2      F      L      0             N
   2      F      M      1             Y
   2      T      L      0             N
   3      F      H      0             Y
   3      T      M      2             N
   3      T      M      0             N
   3      T      H      1             Y
   3      T      H      1             Y

ANSWER: None of the attributes allow exact classification immediately, so we need to
look at all of them in detail.

A: For A=2 we have 1Y, 2N
   For A=3 we habe 3Y, 2N

B: For B=F we have 2Y, 1N
   For B=T we have 2Y, 3N
These are exactly the same distributions as for A (ignoring the labels), so A and B are equally
good (or bad, actually).

C: For C=L we have 0Y, 2N  (exact classification)
   For C=M we have 1Y, 2N
   For C=H we have 3Y, 0N  (exact classification)
So the remaining information required for the case C=M is the same as for A=2, but we
have exact classification in all other cases. So C is better than A, and also better than B.

D: For D=0, we have 1Y, 2N
   For D=1, we have 3Y, 0N  (exact classification)
   For D=2, we have 0Y, 1N  (exact classification)
So here we have again only one branch not fully classified, and with the exact
same distribution as for C and same number of examples. So C and D are a tie for best.
Suppose we choose C, getting the temporary tree:

C?  C=L : N
    C=M :  (grow a tree here)
    C=H : Y

Examining the branch C=M, any of A or B or D will allow full clasification. But D will generete more
leaves, so pick either A or B, to get the final tree:

C?  C=L : N
    C=M : A?  2: Y
              3: N
    C=H : Y

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

ANSWER: No, since we only have 1 internal node, and we already showed above that a tree with 0
internal nodes is not possible.

Deadline: Wednesday, January 25, 2012, at 12 noon.

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.