Introduction to Artificial Inteligence

Assignment 6

Written assignment: planning, probabilistic inference, and learning


(*** Sorry, still under construction, due to unexpected administrative emergency, updates continually ***)

1) We have the following distribution over the binary variables A, B, C, D, given as a full table:

   A     B     C     D   probability
   F     F     F     F   0.0
   F     F     F     T   0.1
   F     F     T     F   0.05
   F     F     T     T   0.1
   F     T     F     F   0.0
   F     T     F     T   0.1
   F     T     T     F   0.05
   F     T     T     T   0.1
   T     F     F     F   0.0
   T     F     F     T   0.1
   T     F     T     F   0.05
   T     F     T     T   0.1
   T     T     F     F   0.0
   T     T     F     T   0.1
   T     T     T     F   0.05
   T     T     T     T   0.1

a)   Construct the Bayes network that most compactly describes the distribution (i.e. one that has
     the smallest number of arcs). 

Note that the above table is 4 copies of the first 4 lines, those being 
for the state (A,B) = (F,F). Thus the probabilities for (A,B,C,D) do not depend on A and
B at all, and therefore Independent({A, B}, {C,D} |{}). For the same reasons, we have
Independent({A},{B}| anything). So we know that A and B are not connencted to each other and to
the other nodes. Now, looking at the first 4 lines, for the case A=B=F we have
P(C=D=F) = 0 which is NOT equal to the product of:
 P(C=F) = 4*(P(C=D=F) + P(C=F,D=T)) = 4*(0.0+0.1) = 0.4
 P(D=F) = 4*(P(C=D=F) + P(C=T,D=F)) = 4(0.05+0.0) = 0.2

So there is an arc, say, from C to D. In this case, we have P(C=T) = 1-P(C=F) = 0.6
 P(D=T|C=T) = P(D=C=T)/P(C=T) = 4*P(A=B=C=D=T)/P(C=T) = 4*0.1/0.6 = 2/3
 (obviously P(D=F|C=T) = 1- P(D=T|C=T) = 1/3

 P(D=T|C=F) = P(D=T,C=F)/P(C=F) = 4*P(A=B=C=F,D=T)/P(C=F) = 4*0.1/0.4 = 1

b) Is the answer above unique? If not,
  what is the set of possible BNs that describe the distribution

No, it is also possible for the arc to be from D to C, with P(D=T)=0.8,
and defining the table P(C|D) accordingly. 

2) Consider the following variables and their immediate causes (parents):

     variable         parents
        A               none
        B               none
        C               none
        D               A
        E               A B
        F               B C
        G               D F

    a) Is this network a poly-tree?

ANSWER: No, because of the undirected cycle A-D-G-F-B-E-A

    b) Is this network (directed-path) singly connected?

ANSWER: Yes, because the above is the only undirected cycle,
        and in it both A and B are parent-less nodes.

    c) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({D}, {E} | {})
ANSWER: No, because D-A-E is not blocked (A is a diverging non-evidence node)

       2)  I({D}, {E} | {A})
ANSWER: Yes, because D-A-E is blocked (A is a diverging evidence node)
             and D-G-F-B-E is blocked by G (a childless converging non-evidence node)

       3)  I({D}, {E} | {A, G})
ANSWER: No, because D-G-F-B-E is no longer blocked by G (a converging evidence node)
            and F,B are passthrough, diverging, resp. non-evidence nodes.

       4)  I({B}, {G} | {D, F})
ANSWER: Yes, because B-F-G is blocked by (passthrough evidence node) F, and
            B-E-A-D-G are blocked by both E and D.

       5)  I({A}, {C} | {B, F})
ANSWER: Yes, beacuse A-D-G-F-C is blocked by F, a passthrough evidence node, and
                     A-E-B-F-C is blocked by B, a diverging evidence node.
     Note that in the 2nd path, F does NOT block this path, as F is a converging
     evidence node in this path!

    d) Assume that the nodes are binary-valued.
       Suppose that P(A=true) = 0.4, P(B=true)= 0.1, P(E=true|A=B=True) = 0.5,
       P(E=true|A=false,B=true) = 0.1, P(E=true|A=true,B=false) = 0.5, P(E=true|A=B=false) = 0.5
       P(C=true) = 0.9

       Find P(A=true | B=true, E=true, C=true)

       Hint: you may want to preprocess to get rid of nodes before doing any computation.

ANSWER: First, drop barren nodes G, then D and F. Now C is disconnected so
        we have Independent({A},{C}|{B,E}) and therefore we have:
          P(A=true | B=true, E=true, C=true) = P(A=true | B=true, E=true)
        and need only compute the latter.
        Using Bayes rule (or the definition of conditional probabilities, whichever
        you prefer) we have:

         P(A=true | B=true, E=true) = P(A=true, B=true, E=true)/ P(B=true, E=true) =
             P(E=true|A=true, B=true)P(A=true)P(B=true) / 
            ( P(B=true, E=true, A=true) +P(B=true, E=true, A=false)) =
            0.5*0.4*0.1 / (0.5*0.4*0.1 + P(E=true|B=true,A=false)*P(B=true)*P(A=false)) =
            0.002 / (0.002 + 0.1*0.1*0.6) = 0.002 / 0.0026 = 10/13

3) You need to construct a 5-input neural network that has value 1 in the 
   output just when the number of its inputs that are on is either 1
   or 2. (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

 Consider the subspace where the last 2 inputs are 0. We have now a 3D space
with the 2 extreme points (0,0,0) and (1,1,1) requiring an output of 1,
and the rest requiring an output of 1. Now, consider any plane separating (0,0,0)
from (0,0,1),(0,1,0),(1,0,0). It is easy to show that any such plane does not separate,
e.g., (1,1,1) from (1,1,0), by writing the general case for this plane, and showing that
both (1,1,1) and (1,1,0) must be on one side of it.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

ANSWER: Many possible ways to do this.
The smallest solution has only ONE hidden unit, that detects the case ">2",
by having input weights 1 and a threshold of 2.5.
The output unit gets all inputs with a weight of 1, the output of the hidden unit
with a large negative weight (say -10), and a threshold of 0.5, and it is
easy to verify that this setup works.

4) a) For the blocks world as defined in class, show how POP constructs the plan
      (that is, show all partial plans from the initial plan until a correct
      plan is reached) for initial state:
    A  B
    C  D

      and the goal state cointaining: On(A,C) and On(C,B)

ANSWER: First we must define the operators, such as in class or as define in the
book. We will assume a very well informed search that makes no mstakes in selecting
ways to satisfy a precondition.

Initial state (partial plan):

         (Dummy step: START)
On(A,C), On(C,Table), On(B,D), On(D, Table), Clear(A), Clear(B)

     On(A,C), On(C,B)
    (Dummy step: END)

Now, we select unresolved precondition On(A,C) of END, and add the
operator PutOn(A,C) to satisfy it.
(* Note: trying to satisfy it using the effect of START will FAIL later on! *)
So now we have:

         (Dummy step: START)
On(A,C), On(C,Table), On(B,D), On(D, Table), Clear(A), Clear(B)

    Clear(C), Clear(A), On(A, x)
          STEP1: PutOn(A,C)
    On(A,C), not Clear(C), not On(A,x), Clear(x)
     On(A,C), On(C,B)
    (Dummy step: END)

With the constraints: START before STEP1 before END.  
Henceforth I will continue this solution without drawing...
Now, select (say) precondition On(C,B) of END, and add a new step to satisfy it:

STEP2: PutOn(C,B)
Precond: Clear(C), Clear(B), On(C,y)
Effects: On(C,B), not Clear(B), not On(C,y), Clear(y)

With the constraints: START before STEP2 before END.
Now, resolve Clear(B) in STEP 2 by using this effect of START
and On(C,y) by using On(C,Table) of START. At present there are no threats.

Resolve Clear(A) in STEP1 by using this effect of START.
Now, resolve Clear(C) in STEP1 by adding a new step:

STEP3: PutOn(A,Table)
Precond: Clear(A), On(A,C)
Effects: On(A,Table), not On(A,C), Clear(C)

With the constraints: START before STEP3 before STEP1.
Resolve the preconditions Clear(A) and On(A,C) by these effects of START.
Resolve precondition Clear(C) in STEP2 also by using STEP3,
requiring the additional constraint STEP3 before STEP2.

The final plan has the steps in total order 
START before STEP3 before STEP2 before STEP1 before END

   b) Suppose that is not known whether On(B,D) or On(D,B) in the initial state,
      and that there is an operator "Check(On(x,y))" which results in knowing whether
      this is true or not. Show how a conditional plan is constructed for the above

Similar to above, we will show only the final result. We will also need to
assume that the system knows to compute the "Clear" predicate, etc. so will drop
it from the effects of the actions, and also that On(x,y) entails
that not On(x,z) for z not equal y.
Steps are:

START: Dummy
Precond: None
Effect: On(A,C), On(C,Table), On(B,D) or On(D,B), On(D, Table)

STEP1: PutOn(A,Table)
Precond: Clear(A), On(A,C)
Effects: On(A,Table)

STEP2: Check(On(D,B))
Precond: None
Effects: Know(On(D,B))
(Branch true: On(D,B), branch false: not On(D,B),
with the true branch satisfying the precondition of
STEP3, and the false branch satisfying the Clear(B)
precondition of STEP4).

STEP3: PutOn(D,Table)
Precond: Clear(D)
Effects: On(D,Table)
(Satisfying the Clear(B) precondition of STEP4)

STEP4: PutOn(C,B)
Precond: Clear(B), Clear(C), On(C,x)
Effects: On(C,B)

Precond: Clear(A), Clear(C), On(A,x)
Effects: On(A,C)

Constraints are: (in addition to the obvious: START before everything, END after)
STEP1 before STEP4
STEP2 before STEP3
STEP3 before STEP4 (if STEP3 is executed at all!)
STEP2 before STEP4
STEP4 before STEP5

Note that STEP1 can be before STEP2, or after STEP2 and before STEP3,
or after both STEP2 and STEP3 but befor STEP4, and any of these
is a valid linearization of this plan.

5)  Consider the following voting scenario, occuring in a certain middle-eastern country.*
    We will assume that you are a risk-neutral, i.e. that
    your utility function is linear in the amount of money that your assets are worth.
    Assume also that moral considerations are not part of your utility function.
    There are 4 candidates for whom you could vote: Yahoo, The Man, Lightning, and White,
    all equally likely to win, unless your vote happens to swing the election.
    Assume that there is a 1% chance that your vote will actually swing the election,
    and a 0.5% chance that "The Man" will somehow find out what your vote was.
    If that happens, and you voted for someone OTHER than The Man, your business
    will be trashed for a loss of 100,000 NIS.

    Your accountant has notified you that your taxes for FY 2008-2009 amount to 1,000,000 NIS
    on your clam-encapsulation business.
    Also, you happen to be a member of "The Man"'s party, and if this party wins
    you have a 10% chance of being in the legislature, in which case you will be able to
    pass a law that declares all clam-related businesses tax-exempt. But there is also a
    10% chance that in this case a corruption charge against you will stick and you will
    go to jail and lose your business, a total additional cost of 5,000,000 NIS.

    Additional information is as follows: Yahoo claims he will reduce your taxes by 50%,
    and you think he will actually deliver with probability 0.5
    If White is elected there will be mediocre-size bonuses which will gain you 100,000 NIS.
    Lightning getting elected will improve police forces, which will decrease your security
    costs by 150,000 NIS.

    Assume that event combinations not mentioned above have a probability of 0, e.g.
    the probability that you will be a member of the legislature given that "The Man"
    does not win is 0.

    You have the following options:
       1) Vote for Yahoo     (Y)
       2) Vote for The Man   (M)
       3) Vote for Lightning (L)
       4) Vote for White     (W)
       5) Not vote at all    (0)

     Draw the decision diagram for each case below, and solve it.

    a) What is your rational choice in voting?

Actually modeling the results of voting is complicated, so we will try a simplistic
model, as follows. We have only one decision node, "Vote", with the above 5 possibilities.
We have a chance node W for the winner, which depends on the decision.
Another chance node K is whether "The Man" will know your vote, and one node L for
wherher you will be in the legislature, which depends only on the winner.
There is also a chance node J denoting whether you will be jailed for corruption,
deopending only on node W. Finally, the chance node T, depending only on W,
determines the state of taxation (in case Yahoo wins).
We have a value node for collecting the value, defined by the above, and apparently
has ALL the above nodes as parents.

The decision node "Vote" has no parents, thus does not know the state of
any of the other variables. So we need to simply check all the actions.
But to save computations, we can pre-compute some of the terms, as they will
be re-used in several possibilities.

First, consider the case where Yahoo wins. The sub-utility 
(meaning: ignoring the possibility of retaliation by "The Man") there is independent
of your vote, and is:

EU'[W=Y]= 0.5 * (-1000000) + 0.5 * (-500000) = -750000

Likewise, if Lightning wins, there is only one possible outcome:

EU'[W=L] = -850000

And if White wins, also one outcome:

EU'[W=W] = -900000

Finally, if "The Man" wins:

EU'[W=M] = 0.9 * (-1000000) + 0.1 * (0 + 0.1 *(-5000000)) = -900000 - 50000 = -950000

So looks like you should vote for Yahoo, except for possible retaliation by "The Man".
Now, retaliation by "The Man" is independnt of which party you vote for, unless you vote
for "The Man" or for no one. So we need only compare the 3 possibly dominant
actions: Y, M, 0.

If you vote either for no one or for "The Man", there will be no
retaliation. Voting for no one gives equal probabilities of wins, so:

EU[D=0] = 0.25 * (EU'[W=Y]+EU'[W=L]+EU'[W=W]+EU'[W=M])
        = 0.25 * ( -750000-850000-900000-950000)
        = -862500

Voting for "The Man", in 0.99 of the cases makes no difference (as if you did not vote),
and in the remaining
case, probability 0.01, "The Man" wins. So in this case we have:

EU[D=M] = 0.99 * EU[D=0] + 0.01 EU'[W=M] = 0.99 * (-862500) + 0.01 * (-950000) = -863375

So it looks like not voting is better than voting for "the man". What about voting for
Yahoo? Well, in this case the expected cost or retaliation is:

EUr'[D=Y] = 0.005 * (-100000) = -500

Again, voting for Yahoo only affects 0.01 of the cases, so all in all we have:

EU[D=Y] =  0.99 * EU[D=0] + 0.01 EU'[W=Y] + EUr'[D=Y] 
        = 0.99 * (-862500) + 0.01 * (-750000) - 500 = -861875

So looks like the expected gain from Yahoo overweights the chance of retaliation,
so you'd better vote for Yahoo...

    b) As above, but you have the choice to pay 10,000 NIS to "future-tech polling inc."
       for a poll which tells you with certainty which 2 of the above 4 will surely NOT win
       (leaving the other 2 equally likely). If you do make the payment, you get this
       information before you need to decide about your vote. 
       Do you pay for the poll?
       What do you vote after getting the poll information, assuming you do?

In this case, we have another decision node P, whether to do the poll, and
another chance node R (with 6 possible outcomes) on the poll results.
Assuming as above equal probabilities for parties winning, symmetry dictates that all
6 outcomes are equally likely, and have a probability of 1/6 each.
The decision node D gets information about P and the state of R.

Now, we must be careful about the conditional probability of W given D and R.
That is because if some parties are ruled out, the chance of our vote
swinging the elections if we vote for one of the favorites should increase.
We will assume for now that you have a 2% chance of swinging the elections
after the poll. Suppose you take the poll. Let us examine the case after the poll.
We can still use the partial utility EU', as well as EUr', as they only
depend on the winner and on who you vote for, respectively.
We will designate the results of the poll in the positive: i.e. who has 
a CHANCE to win, according to the poll. We will omit the cost of
the poll until the end. Thus, we have:

1) For the case where Lightning and White can win, we have:

If we do not vote, or vote for the man, we have
EU[R=[LW],D=0] = 0.5*(-850000-900000) = -875000

If we vote for White, we do stand to decrease Lightning's chances, thereby
reducing our payoff, and in addition risk retaliation by "The Man", so
this option is clearly inferior.
Finally, voting for Lightning, we have:

EU[R=[LW],D=L] = 0.98 * EU[R=[LW],D=0] + 0.02 *EU'(W=L) + EUr'[D=Y]
               = 0.98 * (-875000) + 0.02 * (-850000) - 500 = -874500

So looks like in this case the risk of retaliation is less than the
gain, so it is better to vote, and get -874500

2) Now for the case where Yahoo and White can win. We have:

If we do not vote, or vote for the man, we have
EU[R=[YW],D=0] = 0.5*(-750000-900000) = -825000

If we vote for White, we do stand to decrease Yahoo's chances, thereby
reducing our payoff, and in addition risk retaliation by "The Man", so
this option is clearly inferior.
Finally, voting for Yahoo, we have:

EU[R=[YW],D=L] = 0.98 * EU[R=[YW],D=0] + 0.02 *EU'(W=Y) + EUr'[D=Y]
               = 0.98 * (-825000) + 0.02 * (-750000) - 500 = -824000

In this case, the gain from Yahoo outweights the possible loss from
retaliation, so we prefer to vote for Yahoo and get -824000

3) Now for the case where Yahoo and Lightning can win. We have:

If we do not vote, or vote for the man, we have
EU[R=[YL],D=0] = 0.5*(-750000-850000) = -800000

If we vote for Lightning, we do stand to decrease Yahoo's chances, thereby
reducing our payoff, and in addition risk retaliation by "The Man", so
this option is clearly inferior.
Finally, voting for Yahoo, we have:

EU[R=[YW],D=L] = 0.98 * EU[R=[YL],D=0] + 0.02 *EU'(W=Y) + EUr'[D=Y]
               = 0.98 * (-800000) + 0.02 * (-750000) - 500 = -799500

In this case, the gain from Yahoo is more than the possible loss from
retaliation, we should vote Yahoo, to get  -799500

Now we have 3 cases involving possible win by "The Man".

4) Now for the case where Yahoo and  "The Man" can win:

If we do not vote, we have: 
EU[R=[YM],D=0] = 0.5*(-750000-950000) = -850000

If we vote for "The Man", we have:
EU[R=[YM],D=M] = 0.98 * 0.5*(-750000-950000) + 0.02 * (-950000) = -852000

And if we vote for Yahoo, we have:
EU[R=[YM],D=Y] = 0.98 * 0.5*(-750000-950000) + 0.02 * (-750000) - 500 = -848500

Voting for the other candidates makes no sense, as it has the same
effect as not voting and in addition takes the risk of retaliation. But voting
for Yahoo here is optimal, getting us -848500.

5)  Now for the case where Lightning and  "The Man" can win:

If we do not vote, we have: 
EU[R=[LM],D=0] = 0.5*(-850000-950000) = -900000

If we vote for "The Man", we have:
EU[R=[LM],D=M] = 0.98 * 0.5*(-850000-950000) + 0.02 * (-950000) = -901000

And if we vote for Lightning, we have:
EU[R=[LM],D=L] = 0.98 * 0.5*(-850000-950000) + 0.02 * (-850000) - 500 = -899500

Voting for the other candidates makes no sense, as it has the same
effect as not voting and in addition takes the risk of retaliation.
So looks  voting for Lightning is optimal -899500.

6) Finally, the case where White and  "The Man" can win:
If we do not vote, we have: 
EU[R=[WM],D=0] = 0.5*(-900000-950000) = -925000

If we vote for "The Man", we have:
EU[R=[WM],D=M] = 0.98 * 0.5*(-900000-950000) + 0.02 * (-950000) = -925500

And if we vote for Lightning, we have:
EU[R=[WM],D=W] = 0.98 * 0.5*(-900000-950000) + 0.02 * (-900000) - 500 = -925000

Voting for the other candidates makes no sense, as it has the same
effect as not voting and in addition takes the risk of retaliation.
So in this case it is best not to vote, and get -925000.

After all the 6 cases are done, simply compute an expectation, and then
compare the result against the expectation of the optimal decision
before the pole. If the difference in favor of the former is less than 10000,
we should not do the poll.

So we have, after the poll:

EU[P=yes] = 1/6 * (-874500-824000-799500-848500-899500-925000) = -861833

Now, this is only very slightly better than the case of not knowin the results
of the poll (only about 42 in gain), so clearly the poll is not worth it!

6) Consider the following belief-state MDP, where scoring is total value of rewards
   obtained in 3 steps.
   The world has 4 locations in linear sequence, as follows:  F1 I F2 P
   Actions are: R attempts to move right, and L attempts to move left,
   and S senses location F2. Location F1 contains a  flag, which has a
   value of 1. Location F2 contains a flag, with value either 10 or
   100, with probability 0.5 each, and may be icy (with a probability of 0.5).

   A robot visiting a location with a flag gets a reward equal to the
   flag value, and then the flag disappears. P is an absorbing state,
   with a reward of -100.
   Transition probabilities are based on: motion succeeds with probability 0.8, 
   but with probability
   0.2 the robot either stays in the same location, or moves in the opposite direction
  (if possible),
   with probability 0.1 each (0.2 to stay in place if at end of sequence).
   Sensing, if performed, tells the robot the actual value of the flag
   at F2. 

   b1) Formalize the above problem as a (belief-state) MDP, i.e. 
       what are the states, the transition probability
       function, and the reward function?

The world state variables are the location of the agents (we will number them
(1,2,3,4) statring from the left), and the value of the flags at
locations 1 and 3. We will call this variable X.
The flag at location 1 (variable F1) has value 1 if not yet collected, and 0 otherwise
The flag at location 3 (variable F2) has value 100, 10, or 0. 
In addition, location 3 may or may not have ice (variable I).
Finally, we need the last direction of travel, in case the agent is skidding
(variable D, with domain L, R, 0).

The world state is thus a 5-tuple: (X, D, F1, F2, I), each variable with the
domain dafined above. Of course, since this is a finite-horizon problem,
time can also be seen as part of the state, or alternately we will leave
it as NOT part of the state and compute a time-dependent policy. We will
use this last version, as discussed in class.

Now, the BELIEF STATE is a bit different, since the agent does NOT KNOW
whether F2 is 10 or 100, and also whether location 3 is icy. So we can either
add additional variables indicating that, or extend the DOMAIN of these
variables to indicate this lack of knowledge. 
So we will have the domain of
I be: {0, 1, U}, where U stands for "unknown". 
Likewise, the domain of F2 will be: {0, 10, 100, U}.

Now we must define transition probabilities and rewards.
A flag is picked up when the agent gets to that location, and
then the flag disappears. So it looks like the reward depends
on both the flag value before the transition, and the one after
the transition, but not the action. It is a bit annoying that it
depends on both! One "trick" that we can do is to make the flag NOT
disappear immediately, but only later, and also collect the reward only
in the NEXT time slot. This will allow us to define a reward based ONLY
on the current state! This creates a slightly different, but equivalent
and simpler, MDP.

Using this trick, we define the reward function r as follows
(where "x" stands for "don't care". This is a compact representation,
in order to avoid specifying an array of 4*3*4*2*3=288 entries
one by one.

Reward for flag at location 1:
r(1, x, 1, x, x) = 1
r(1, x, 0, x, x) = 0
r(1, x, U, x, x) = 0  actually, the states in this line are unreachable

Reward for being in the initial location:
r(2, x, x, x, x) = 0

Reward for flag at location 3:
r(3, x, x, 0, x) = 0
r(3, x, x, 10, x) = 10
r(3, x, x, 100, x) = 100
r(3, x, x, U, x) = 0  actually, the states in this line are unreachable

Reward for being in the pit:
r(4, x, x, x, x) = -100

This covers all possible states, and completes the definition of r.

Now, to define the transition probabilities. Again, since there are 3
possible action, a full distribution requires 288*3*288=248832 numbers,
which is rather a lot. So we use a factored representation - using
the state variables. Essentially, the location transition does not
depend on the flag variables in the previous state, and each
flag variable depends only on the previous value of the flag variable
and the location after the transition. We will use "prime" notation
to refer to the value of a variable after a transition.
The "ice" variable depends only on its previous variable and
the current location. Also, we will mention only transitions with
probability greater than zero, and ones beginning with states
that are not obviously unreachable.
Thus, we have, beginning with the location

P(X'=i+1|X=i,R) = 0.8 ; for i=1,2
P(X'=1|X=1,R) = 0.2

P(X'=2|X=2,R) = 0.1
P(X'=1|X=2,R) = 0.1

P(X'=1|X=1,L) = 0.9
P(X'=2|X=1,L) = 0.1

P(X'=1|X=2,L) = 0.8
P(X'=2|X=2,L) = 0.1
P(X'=3|X=2,L) = 0.1

P(X'=4|X=3,R,no I) = 0.8
P(X'=3|X=3,R,no I) = 0.1
P(X'=2|X=3,R,no I) = 0.1

P(X'=4|X=3,L,no I) = 0.1
P(X'=3|X=3,L,no I) = 0.1
P(X'=2|X=3,L,no I) = 0.8

P(X'=4|X=3,D=R,I) = 1  ; Assuming ALWAYS slipping on the ice, for every action
P(X'=2|X=3,D=L,I) = 1  ; unreachable

P(X'=4|X=4) = 1    ; Regardless of action or other state variables

P(X'=i|X=i,S) = 1  ; for i=1,2
P(X'=3|X=3,S, no I) = 1
P(X'=2|X=3,D=L, S, I) = 1 ; unreachable...
P(X'=4|X=3,D=R, S, I) = 1

As to the D variable, it only matters when X=3 and WITH ice, and the only
reachable case is when we reach this location moving right.
So for all intents and purposes we can assume that always D=R,
even if it ISN'T, which saves s LOT!

Now, the ice and flags are a bit tricky, since we need to map in the uncertainty,
as follows.

First, an unknown flag stays unknown unless we step on it or sense it,
and kwwps its value if known:
P(F2'=U|F2=U,X'=i,R) = 1  ; for i=1,2,4
P(F2'=U|F2=U,X'=i,L) = 1  ; for i=1,2,4

P(F1'=1|F1=1,X'=i) = 1       ; for i=2,3,4
P(F2'=10|F2=10,X'=i) = 1     ; for i=1,2,3
P(F2'=100|F2=100,X'=i) = 1   ; for i=1,2,3

Now, the sensing action:
P(F2'=10|F2=U,S) = 0.5
P(F2'=100|F2=U,S) = 0.5

Stepping on flag F2 also discovers it:
P(F2'=10|F2=U,X'=3) = 0.5
P(F2'=100|F2=U,X'=3) = 0.5

After stepping on a flag it disappears:
P(F1'=0|X=1) = 1
P(F2'=0|X=3) = 1

Finally, the ice. If known it stays the same:
P(I'|I) = 1
P(no I'|no I) = 1
P(I'=U|I=U, X=i) = 1  ; for i=1,2,4

Stepping in X=3, we discover the ice:
P(I'| I=U, X=3) = 0.5 
P(no I'| I=U, X=3) = 0.5

This completes the definition of the transition probability.

   b2) Find the optimal policy. You may assume that the robot starts
       at location I in order to save some computation.

ANSWER: Now we need only solve the problem. We use value determination,
starting from the end, that is T=3. Again we use a factored representation.

V(3, [X=4]) = -100         ; reward for being in pit
V(3, [X=1,F1=1]) = 1       ; reward for picking up F1 
V(3, [X=1,F1=0]) = 0       ; no reward for picking up F1 if already picked
V(3, [X=2]) = 0            ; no reward for location 2
V(3, [X=3,F2=100]) = 100   ; reward for F2
V(3, [X=3,F2=10]) = 10     ; reward for F2
V(3, [X=3,F2=0]) = 0       ; no reward for F2 if already picked

All other cases are already covered or unreachable. So now we go to T=2:

The easiest is:
V(2, [X=4]) = -200   ; we are in the pit, and sure to stay there!
V(2, [X=1,F1=1]) = 1 ; get reward, and no action gets anything non-zero
V(2, [X=1,F1=0]) = 0 ; get no reward, and no action gets anything non-zero

For X=2, optimal action is R, which lands one of several states: 
V(2, [X=2,F2=U,F1=1]) = 0 + 0.4*V(3, [X=3,F2=100]) + 0.4*V(3, [X=3,F2=10])
                     + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1])
                 = 44.1

V(2, [X=2,F2=100,F1=1]) = 0 + 0.8*V(3, [X=3,F2=100])
                       + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1])
                   = 80.1

V(2, [X=2,F2=10],F1=0) = 0 + 0.8*V(3, [X=3,F2=10])
                      + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1])
                   = 8.1

V(2, [X=2,F2=U,F1=0]) = 44    ; Like the case for F1=1, but without the reward for F1
V(2, [X=2,F2=100,F1=0]) = 80  ; Like the case for F1=1, but without the reward for F1
V(2, [X=2,F2=10,F1=0]) = 8    ; Like the case for F1=1, but without the reward for F1

Now, the risky location X=3:
V(2, [X=3,D=R,I,F2=0]) = -100   ; get no reward, and sure to get to X=4 in T=3
V(2, [X=3,D=R,I,F2=10]) = -90   ; get reward 10, and sure to get to X=4 in T=3
V(2, [X=3,D=R,I,F2=0]) =    0   ; get reward 100, and sure to get to X=4 in T=3

V(2, [X=3,no I,F2=0]) =     0 ; get no reward, and do sensing to avoid risk of landing in pit
V(2, [X=3,no I,F2=10]) =   10 ; get reward 10, and do sensing to avoid risk of landing in pit
V(2, [X=3,no I,F2=100]) = 100 ; get reward 100, and do sensing to avoid risk of landing in pit

This is sufficient as the value function is again defined for
all reachable states. Now move on to compute V(1, .), and V(0, .) likewise.

V(1, [X=4]) = -300   ; we are in the pit, and sure to stay there!
V(1, [X=1,F1=1,F2=U]) = 1 + 0.8 * V(2, [X=2,F2=U,F1=0]) = 36.28
              ; get reward, and doing R may get os to [X=2,F2=U,F1=0]
V(1, [X=1,F1=1,F2=100])  ; unreachable (requires 2 action since T=0)
V(1, [X=1,F1=1,F2=10])  ; unreachable (requires 2 action since T=0)


7)  A project leader in the IMG4 consortium would like to construct a
    decision procedure based on results from hiring employees in the KITE
    consortium. The decision procedure will be based on a decision tree, using information
    in the case database. Attributes for each employee are:
    degree level (A),  Java programming ability (B), self-motivation (C), grade in AI (D).
    The target attribute is the "decision", describing the amount of money to
    pay the employee.

  a) Construct by hand (using the greedy search using the best information
     gain heuristic) a consistent decision tree for the following table:

     input variables               decision
   A      B      C      D
   1      F      M      95             8
   1      T      L      70             4
   1      F      L      70             4
   1      T      M      95             6
   1      T      M      80             6
   2      T      H      95             8
   2      T      H      80             8
   2      F      H      95             8

ANSWER: try each attribute to see what happens.
We will do "lazy computation" in that we will not bother with exact computations
unless really needed.
Using A we get:
 Case A=1: {8,4,4,6,6}, i.e. distribution {0.2, 0.4, 0.4}.
 Case A=2: {8,8,8}  i.e. distribution {1,0,0}, i.e. 0 bits required.

Using B we get:
 Case B=F: {8,8,4} i.e. distribution {2/3, 1/3, 0}. 
 Case B=T: {4,6,6,8,8} i.e. distribution {0.2, 0.4, 0.4} as in A=1
(So B is clearly worse than A as number of bits required after getting answer to
B is greater than for A). 

Using C we get:
 Case C=M: {6,6,8} i.e. distribution {2/3, 1/3, 0} as in case B=F.
 Case C=L: {4,4} i.e. 0 bits required.
 Case C=H: {8,8,8} i.e. 0 bits required.
(So C is clearly better than B but we still need to 
figure out (maybe) if C is better than A)

Using D we get:
 Case D=95: {8,8,8,6} i.e. distribution {0.75, 0.25,0} which is only slightly better than case C=M
 Case D=70: {4,4}  i.e. 0 bits required.
 Case D=80: {8,6} i.e. distribution {0.5,0.5} with 2 examples (2 bits required).
Therefore C is better than D.

Now we must compare A and C more exactly.
Case A=1: Distribution {0.2, 0.4, 0.4} is more chaotic than {0, 0.5, 0.5} and thus requires MORE
than 1 bit per example, so total is more than 5 bits.
(of course one can compute exact number of bits, 
i.e. -5*(0.2 log 0.2 + 2* 0.4 log 0.4) but we will avoid this, being lazy). 

Case C=M: Distribution {2/3, 1/3, 0} is LESS chaotic than {0.5, 0.5, 0} so requires
LESS than one bit per example, so total is less than 3 bits.
So C is clearly better than A, and thus C is selected as the root of the tree.

Now, in the recursive calls the only interesting part is for case C=M.
In this case, using B we get:
 Case B=F: {8} i.e. no additional bits required.
 Case B=T: {6,6} i.e. no additional bits required.

(note that neither A nor D will help here).
So the decision tree has 2 internal nodes only, and is:

 C? --- H:  8
     ---L:  4
      --M:  B? --- F: 8
                -- T: 6

  b) Is a more compact (fewer internal nodes) decision tree possible for the above table?
     If no - prove it. If yes, which tree?


In general, the above procedure may not generate the optimal tree. However, in this case
it did happen to do so. 
Proof: clearly 0 internal nodes don't work here, as the target attiribute differs
       between the examples.
      Also, we tried all possible attributes for the top level, and none of them
      could fully classify the examples, so there is NO tree with ONE node that will
      work. The result above has 2 nodes, and since 2 is the smallest integer
      greater than 1, we are done.

Deadline: Tuesday, February 24, 2009, at 12 noon.

Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)

* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.