Introduction to Artificial Inteligence

Theoretical Assignment 2 (solution) UNDER CONSTRUCTION


Written assignment: planning, reasoning and decision-making under uncertainty, learning

Justify all answers!

1) (15%)
   a) For the blocks world as defined in class, where there is an additional
       action: Paint(Block, Color), with precondition Clear(Block) and with
       effect Color(Block, Color).
       Show how POP constructs the plan (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial as follows, where it is known that Color(A,Blue), Color(B,Green), Color(C,Green).
 
       
       A
    C  B
------------

      and the goal state containing: On(x,y) and Color(x,Blue) and Color(y,Blue), meaning "some blue block on some blue block".


ANSWER:
    Initial state has 2 dummy steps:
       START:  preconditions: null
               effects:       On(C,Tab), On(B,Tab), On(A,B), Clear(C), Clear(A), Color(A,Blue), Color(B,Green), Color(C,Green)

       END:    precondition:  On(x,y) and Color(x,Blue) and Color(y,Blue)
               effects:       null

       and constraint "START before END".

   This state is not a solution so we pick an unsatisfied precondition, say, Color(x,Blue) precondition of END.
   This precondition can be satisfied e.g. by effect Color(A,Blue) of START with x=A, so add the causal link between them.
   The ordering constraint is already there between START and END.
   Next pick, say, precondition Color(y,Blue) of END. This can be achieved as an effect of Paint(C,Blue) if y=C.
   So add the step STEPC:
      STEPC:  Paint(C,Blue)
              preconditions: Clear(C)
              effects: Color(C,Blue)
   With causal link and ordering STEPC before END (and after START). No clobberers.
   Say we now pick precondition Clear(C) of STEPC. This can be achieved as an effect of START, so add this causal link. 
   No cloberring possible.
   Now we only have On(x,y), that is, On(A,C) precondition of END to achieve. This can be done by PutOn(A,C) so we add the step:
     STEPP: Puton(A,C)
            preconditions: Clear(A), Clear(C), On(A,z)
            effects:   On(A,C), not Clear(C), not On(A, z), Clear(z)
   And add causal link and ordering STEPP before END.
   STEPP clobbers Clear(C) arc between START and STEPC, and the only way to resolve that is by
   adding the constraint STEPC before STEPP.
   Now picking the Clear(A) of STEPP, this can be achieved by START, so add causal link.
   Now picking the On(A,z) of STEPP, this can be achieved by START, with z=B, so add causal link.
   Now picking the Clear(C) of STEPP, this can be achieved by START, so add causal link.
   The plan is now complete, with steps START before STEPC before STEPP before END.
   (This happens to be a totally ordered plan.)

   b) Suppose that in the above problem, the initial color of B could be either Blue or Green, and the rest is the same as above.
      Suppose that there is an operator "Check(Color(x,Blue))" which results in knowing whether
      this is true or not. Show how a conditional plan is constructed for the above case.

ANSWER:
   As it happens, the above does not depend on the color of B. However, if we are looking for a CHEAPEST plan
   and moving is more expensive, we might want to try not to move any blocks. In this case we might have:

    Initial state has 2 dummy steps:
       START:  preconditions: null
               effects:       On(C,Tab), On(B,Tab), On(A,B), Clear(C), Clear(A), Color(A,Blue), Color(C,Green)

       END:    precondition:  On(x,y) and Color(x,Blue) and Color(y,Blue)
               effects:       null

       and constraint "START before END".

   This state is not a solution so we pick an unsatisfied precondition, say, Color(x,Blue) precondition of END.
   This precondition can be satisfied e.g. by effect Color(A,Blue) of START with x=A, so add the causal link between them.
   The ordering constraint is already there between START and END.
   Next pick, say, precondition Color(y,Blue) of END. This can possibly achieved as an effect of Check(Color(y,Blue)) if y=B.
   So add the step:
      CHECK1: Check(Color(B,Blue))
              preconitions: null
              effects: either  Color(B,Blue)) or not Color(B,Blue)
   We can add the causal link between the POSITIVE case and the precondition Color(B,Blue),
   but must "duplicate" the problem to handle the negative case as well (add an END' as a copy of END, that CANNOT use the positive
   case).
   Remaining in the positive case, we have only the precondition On(A,B) of END.
   This can be handled by using On(A,B) effect of START so add that causal link, and we are finished with the positive side.
   Now for the negative side, we need to satisfy Color(y,Blue) of END some other way.
   This can be done by adding a coloring step, and in general proceeding as in the answer to part A for this branch.
  
   All in all, we will have the (conditional) plan:
   START before CHECK1 (BLUE) before END
                CHECK1 (NOT BLUE) before STEPC' before STEPP' before END'

2) (10 %) We are given the following independence statements about random variables
   A, B, C, D, E, F, G, H:

      I({A,B,C},{D,E,F,G,H}|{})
      I({D},{E,F,G,H}|{})
      I({A},{B}|{})
      I({F},{G}|{E})
      
      not I({A},{B}|{C})
      not I({F},{G}|{})
      not I({F},{G}|{E,H})

a) Construct a Bayes network that is consistent with the 
   above set of independence statements.

b) Is the answer above unique? If not, what is the set of possible BNs 
   that is consistent with all the statements?

ANSWER:
   The first independence statement rules out all edges between ABC and DEFGH so this is a graph partition.
   The second statement means D is also separate. So we have 3 separated subgraphs,  ABC, D, EFGH.
   As D is a singleton, no further possiblilities there. 

   Now examine ABC.
   I({A},{B}|{}) means no edge between A and B, but
   not I({A},{B}|{C}) means that there is a PATH between A and B, and as C is the only other node here
   the path must be A-B-C. Now the fact that A, B are dependent given C means that C is a collider,
   so the only option here is the A and B have no parents and C has A, B as parents.

   Now examine EFGH.
   I({F},{G}|{E}) means no edge between F and G.
   But  not I({F},{G}|{}) means at least 1 PATH between them with no collider.
   All path with no collider must include E as a non-collider due to  I({F},{G}|{E})
   But  not I({F},{G}|{E,H}) means that there is also one path WITH a collider.
   But the only nodes that are not F,G in the component are E and H so H must be a collider, and E cannot
   be a non-collider on the through H.
   Therefore, H is a collider with parents E,F, and there is another path D-E-F with E a non-collider.
   However, we CANNOT tell if E is passthrough (in either direction) or diverging, leaving 3 possibilities here.
   Note also that we cannot rule out the possibility that H also has E as a parent, as this arc (or its absence)
   are both consistent with the given independence statements. (Total 6 possible directed possibilities for this component).
   
   
3) (15 %) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        B               none
        C               D E
        D               A
        E               B
        F               D H
        G               A B
        H               none

    a) Is this network a poly-tree?
ANSWER: No, due to undirected cycle A-D-C-E-B-G-A

    b) Is this network (directed-path) singly connected?
ANSWER: Yes, the above is the only undirected cycle, and it has 2 colliders
      (Note that if we have 2 disjoint directed paths from s to t then connecting these 2 paths
       into a cycle, it will only have t as a collider)
 
    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
       2)  I({D}, {E} | {A})
       3)  I({D}, {E} | {A, C})
       4)  I({B}, {F} | {G})
       5)  I({B}, {F} | {D})
       6)  I({B}, {F} | {A, C})
ANSWER:
       1)  I({D}, {E} | {}): yes, D-C-E is blocked by barren (and non-evidence) node C, and D-A-G-B-E is blocked by barren node G,
                             and no other simple paths.
       2)  I({D}, {E} | {A}): yes, same as above, second path now also blocked by diverging evidence node A.
       3)  I({D}, {E} | {A, C}): no, D-C-E now unblocked as C is a collider evidence node.
       4)  I({B}, {F} | {G}):  no, path B-G-A-D-F is not blocked at any of A (diverging non-evidence), 
                               D (passthrough non-evidence), C (collider evidence node)
       5)  I({B}, {F} | {D}): yes, path B-G-A-D-F blocked at passthrough evidence node D, and path B-E-C-D-F
                              blocked at (now diverging) evidence node D. Note how "type" of node depends on path!
       6)  I({B}, {F} | {A, C}): no, path B-E-C-D-F has no blocking node, C is evidence collider, E is non-evidence passthrough,
                               and D is non-evidence diverging.

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.3, P(B=true)= 0.2
       P(G=true|A=true,B=true) = 0.9, P(G=true|A=false,B=true) = 0.2
       P(G=true|A=true,B=false) = 0.5, P(G=true|A=false,B=false) = 0.1
       P(E=true|B=true) = 0.8, P(E=true|B=false) = 0.1
       P(C=true|D=true, E=true) = 0
       P(H=true) = 0.1

       Find P(E=true, A=true | B=true, H=false)

       Hint: you may want to preprocess to get rid of nodes and/or simplify the query before doing any computation.
ANSWER: taking the hint, first remove barren nodes C, F, G. Now D is barren and removed. Now H is in a separate subgraph,
      so we have I({E,A}, {H} | {B}) so have:
         P(E=true, A=true | B=true, H=false) = P(E=true, A=true | B=true). Note that A is also separate so now we have:
         P(E=true, A=true | B=true)=P(E=true| B=true)P(A=true | B=true)=P(E=true| B=true)P(A=true)
      Now P(A=true) is given as 0.3, and   P(E=true|B=true) is given as 0.8, so we only need to multiply them to get
        P(E=true, A=true | B=true, H=false)=P(E=true| B=true)P(A=true)=0.8*0.3=0.24


4) (15%)  The Democratic Republic of Maluku Selatan has just undergone elections, typically a coalition govenment is then formed.
    As a leader of the Party of Advanced Eternal Students in Applied Theology, your interests depend on the following factors.

    The leading TuchesLekkers Party (TLP for short) does not care about anything one way or the other except staying
    in power, and for that they will reach a deal with your party to continue funding your party's institutions 
    (utility U(Funds)=10 to you per year). This will be the outcome if you deal with the TLP, unless some disaster strikes (see below).
    Although this is great, eventually the eternal students realize on which side their bread is buttered and they may decide 
    (with probability 0.2) to
    move to the TLP next time (utility U(TLPmove) = -20, but only 2 years in the future (count this in the third year).

    The Riteous Secular Party strongly objects to funding institutes of applied theology, but wishes for a unity government
    with TLP and the main opposition Yellow and Green Party (YGP). In such a coalition government funding for your institutions
    will drop but will not be canceled (U=5 per year). But this depends on whether the head of TLP will agree to a coalition,
    which has probability only 0.05 if you support TLP. 
    However, if you support the TLP and they do not enter a unity government, there is a chance for disaster, the 
    may TLP lose the coalition govenment formation process despite your support, which can happen either due to a revolt in the TLP 
    (probability 0.2), or the RSP deciding to support the YGP and form a narrow coalition against the TLP (probability 0.5). 
    If this happens, you lose it all. Assume that the all the above (TLP joining unity, revolt in the TLP, and the actions by RSP) are independent.

    You can also deal with the YPG, in which case you will get slightly reduced utility (U=8 per year) with certainty, regardless of
    what TLP and RSP do.

 A) Assuming a utility being a (simple) sum over the next 3 years: 
    a) Which of the two parties (TLP, YPG) should you deal with?
ANSWER:
   EU[TLP] = (10+10+10-20*0.2)*(1-0.2)*(1-0.5)*(1-0.05) + (5+5+5-20*0.2)*0.05  = 26*0.8*0.5*0.95 + 0.55  = 10.43
          (10 per year, -20 with probability 0.2) all multiplied by probability of no revolt in TLP an by probability of no narrow coalition,
          and no wide coalition  as you need all the above in order to gain the big bonus. To this we add what you gain 
          in case of wide agreed coalition with no revolt (RSP or revolt now irrelevant).
   EU[YGP] = (5+5+5)*1 = 15
       Support of YSP now forces a unity govrnment, so you get the smaller bonus for sure.

    b) You figure out that you can go out even further with the TLP and vote to exempt the allegedly corrupt TLP head from criminal procedures.
       This can earn you additional funding (U=2 per year) but may backfire if the TLP loses the coalition govenment formation process,
       which can happen either due to a revolt in the TLP (probability 0.2 as before), or the RSP deciding to support the YGP and form a narrow
       coalition against the TLP (probability now is up to 0.7, as they take offense at your vote). If this happens, again you lose it all.
ANSWER: this increases your potential bonus, but also the risk. This new action achieves:
    EU[TLP+exempt] = (12+12+12-20*0.2)*(1-0.2)*(1-0.7)*(1-0.05) + (5+5+5-20*0.2)*0.05  = 7.297+0.55 = 7.846
    So this is even worse than just the TLP action.

    c) (optional)
        Now consider the additional possibility of (as a first action) asking the supposedly clairvoyant cleric (SCC) 
        for a prediction on what the RSP will do.
       SCC will require some units to provide an indication on the future actions of RSP, but you believe is only
       correct 90% of the time. What is the value of information of SCC's answer?
ANSWER: It is simpler to compute the value of PERFECT information, the value of IMPERFECT information is never higher.
     Now if you know that RSP will join a narrow coalition, then nothing changes here, your best action will be YLP to gt 15.
     If you know that RSP will NOT join a narrow coalition, you get:
     EU[TLP|RSP not narrow) = (10+10+10-20*0.2)*(1-0.2)*(1)*(1-0.05) + (5+5+5-20*0.2)*0.05  = 18.2 + 0.55  = 18.75
     So in this case you would select TLP.
     There seems to be a non-zero value of perfect information here, so need to work to compute the value of
     imperfect information (latter is not in course material, so not doing this here, especially since this question is optional).


 B) Repeat items a, b, c, assuming a discounting factor of 0.1 per year.
ANSWER: a) Same as above but numbers change to have the next year values multiplied by 0.1 and the final year ny 0.1*0.1=0.01
  (the last one can probably be ignored, as it is small numbers!). We get:
   EU[TLP] = (10+1+0.1-0.2*0.2)*(1-0.2)*(1-0.5)*(1-0.05) + (5+0.5+0.05-20*0.2)*0.05  = 2.7+0.2755 =approx 3
   EU[YSP] = (5+0.5+0.05)*1 = 5.55
  So YSP is still better (though a lesser margin as you care less about the -20 loss at the end).
       b) Similar method, will still not channge the best action.

5) (20%) Consider the following uncertain Hurricane Evacuation decision  problem instance.

   The (undirected) graph has 5 vertices, as follows:
     (S, V1, P1, P2, G)
   The edges are as follows:
     (S, V1), weight 1.
     (V1, P2), weight 1, blocked with probability 0.5
     (V1, G), weight 5.
     (S, P1), weight 1.
     (P2, G), weight 1.

   S is the start vertex. G contains a shelter, and has a deadline of 7.
   You may assume a deadline of 7 for all other nodes as well for convenience.
   Vertices P1 and P2 each contain one person initially.
   No edges are blocked in this graph, except possibly the ones indicated above. 
     
   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function? (The time to traversing an edge is equal to it weight).

   b2) Find the optimal policy. You may assume that the agent starts
       at vertex S in order to save some computation.

ANSWER: no time now, you supposedly did this in prgramming assignment 4... Will try to do this tomorrow.


6) (15%)  A principal researcher (PR) in the MDM consortium needs to hire a  programmer to
    develop and test algorithms for priviledged learning with Bayes Networks.
    The PR would like to construct a decision procedure based on results from hiring
    employees in the ISG consortium project.
    The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  Java programming ability (B) Self motivation (C), number of
    papers published in AI workshops (D).
    The target attribute is the "decision", whether to hire or not.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a decision tree as close to consistent as possible for the following table:

     input variables               decision
   A      B      C      D
--------------------------------------------
   2      F      L      0             N
   2      F      H      0             Y
   2      T      L      1             N
   3      F      L      0             N
   3      F      H      0             N
   3      F      H      2             Y
   3      T      L      1             Y
   3      T      H      1             Y

ANSWER:
   Check all possible attributes at the root, comparing "output entropy".
   Using A, branch 2 has 2N, 1Y, branch 3 has 2N, 3Y
   Using B, branch F has 3N, 2Y, branch T has 1N, 2Y
   Using C, branch L has 3N, 1Y, branch H has 1N, 3Y
   Using D, branch 0 has 3N, 1Y, branch 1 has 1N, 2Y, branch 2 has 1Y

   We can immediately see that D is better than C, because they are the same in the first branch,
   in the second branch they are the same except for 1Y additional for C, and the last D branch has 0 entropy.
   Likewise D is better than B because they are the same in the 2nd branch, same in first branch except B has
   1 more Y thus higher total entropy,
   Now comparing D and A, the first branch of A is the same w.r.t. total entropy as the second branch of D,
   and the second branch of B is worse than the first branch of D, so (again because 3rd branch of D has zero entropy)
   D is better than A. So we pick D.

   Now need to work on the first 2 branches of A, as the last one (A=2) will always give Y
   In branch D=1, using A will give us: for A=2 we have 1N (decide N), and for A=3 we have 3Y (decide Y) 
   so no need to look at other attributes

   In branch D=0 no attribute immediately suffices, so further work needed.
   Using A, in branch 2 we have 1Y, 1N and in branch 3 we have 2N
   Using B, in branch F we have 1Y, 3N and nothing in branch T, so in fact this attribute is useless here.
   Using C, in branch L we have 2N, and in branch H we have 1Y, 1N
   So A and C have exactly the same total entropy. Say we pick A.
   Now for the branch A=3 we are done (decide N),
   For the branch A=2, we have 1Y,1N which can be fully discriminated by C, so we pick C.
   We end up with the decision tree:
   D: D=2   Y
      D=1   A:  A=2   N
                A=3   Y
      D=0   A:  A=3   Y
                A=2   C:  C=L  N
                          C=H  Y

    This tree has 4 internal nodes.

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?

   The above answer is NOT minimal, as the following tree with 3 internal nodes does the job:
   A:  A=2   C:   C=L  N
                  C=H  Y
       A=3   D:   D=0  N
                  D=1  Y
                  D=2  Y
   The reason this was not found is because the "greedy information" criterion does not
   realize that after using A, despite the higher total entropy, just  more internal node
   in each branch leads to a full classification of the examples.

7) (10%) You need to construct a binary 6-input neural network that has 1 output: the output should be 1 if
   the number of inputs that have a value of 1 is divisible by 4, and 0 otherwise.
   a) Can this be done without hidden units?
ANSWER: No. Consider the case where the first 3 inputs are 1, the 4th is 0, and we need to set only
        the value of the I5 and I6. Note that in this case we have Out = I5 XOR I6,
        which means that if we could compute our function without hidden units, we could also
        compute XOR without hidden units, which contradicts the theorem that the latter cannot be done
        with one layer.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)
ANSWER: In more simple terms, we need the output to be 1 if EXACTLY 4 inputs are 1.
        The MINIMAL way to do this is by having a ">4" detector, which will override
        anything else with a negative weight to turn off the output when more than 4 inputs are 1.
        To implement this, simply have a perceptron with all input weights being 1 and a threshold of 4.5
        Now the output perceptron also has all inputs with weight 1, but threshold 3.5 in order to detect
        AT LEAST 4 1's. It also has the output of the ">4" detector as input, say with weight -10.
        So 2 perceptrons suffice, but the ">4" detector is a "hidden" unit.

Deadline (strict): Tuesday, January 21, 2020, at 4 PM.

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.