1) We have the following distribution over the binary variables A, B, C, D, given as a full table: A B C D probability ------------------------------- F F F F 0.0 F F F T 0.1 F F T F 0.05 F F T T 0.1 F T F F 0.0 F T F T 0.1 F T T F 0.05 F T T T 0.1 T F F F 0.0 T F F T 0.1 T F T F 0.05 T F T T 0.1 T T F F 0.0 T T F T 0.1 T T T F 0.05 T T T T 0.1 a) Construct the Bayes network that most compactly describes the distribution (i.e. one that has the smallest number of arcs). ANSWER: Note that the above table is 4 copies of the first 4 lines, those being for the state (A,B) = (F,F). Thus the probabilities for (A,B,C,D) do not depend on A and B at all, and therefore Independent({A, B}, {C,D} |{}). For the same reasons, we have Independent({A},{B}| anything). So we know that A and B are not connencted to each other and to the other nodes. Now, looking at the first 4 lines, for the case A=B=F we have P(C=D=F) = 0 which is NOT equal to the product of: P(C=F) = 4*(P(C=D=F) + P(C=F,D=T)) = 4*(0.0+0.1) = 0.4 P(D=F) = 4*(P(C=D=F) + P(C=T,D=F)) = 4(0.05+0.0) = 0.2 So there is an arc, say, from C to D. In this case, we have P(C=T) = 1-P(C=F) = 0.6 Also: P(D=T|C=T) = P(D=C=T)/P(C=T) = 4*P(A=B=C=D=T)/P(C=T) = 4*0.1/0.6 = 2/3 (obviously P(D=F|C=T) = 1- P(D=T|C=T) = 1/3 P(D=T|C=F) = P(D=T,C=F)/P(C=F) = 4*P(A=B=C=F,D=T)/P(C=F) = 4*0.1/0.4 = 1 b) Is the answer above unique? If not, what is the set of possible BNs that describe the distribution compactly? ANSWER: No, it is also possible for the arc to be from D to C, with P(D=T)=0.8, and defining the table P(C|D) accordingly. 2) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none B none C none D A E A B F B C G D F a) Is this network a poly-tree? ANSWER: No, because of the undirected cycle A-D-G-F-B-E-A b) Is this network (directed-path) singly connected? ANSWER: Yes, because the above is the only undirected cycle, and in it both A and B are parent-less nodes. c) Determine the truth of the following independence statements (using d-separation): 1) I({D}, {E} | {}) ANSWER: No, because D-A-E is not blocked (A is a diverging non-evidence node) 2) I({D}, {E} | {A}) ANSWER: Yes, because D-A-E is blocked (A is a diverging evidence node) and D-G-F-B-E is blocked by G (a childless converging non-evidence node) 3) I({D}, {E} | {A, G}) ANSWER: No, because D-G-F-B-E is no longer blocked by G (a converging evidence node) and F,B are passthrough, diverging, resp. non-evidence nodes. 4) I({B}, {G} | {D, F}) ANSWER: Yes, because B-F-G is blocked by (passthrough evidence node) F, and B-E-A-D-G are blocked by both E and D. 5) I({A}, {C} | {B, F}) ANSWER: Yes, beacuse A-D-G-F-C is blocked by F, a passthrough evidence node, and A-E-B-F-C is blocked by B, a diverging evidence node. Note that in the 2nd path, F does NOT block this path, as F is a converging evidence node in this path! d) Assume that the nodes are binary-valued. Suppose that P(A=true) = 0.4, P(B=true)= 0.1, P(E=true|A=B=True) = 0.5, P(E=true|A=false,B=true) = 0.1, P(E=true|A=true,B=false) = 0.5, P(E=true|A=B=false) = 0.5 P(C=true) = 0.9 Find P(A=true | B=true, E=true, C=true) Hint: you may want to preprocess to get rid of nodes before doing any computation. ANSWER: First, drop barren nodes G, then D and F. Now C is disconnected so we have Independent({A},{C}|{B,E}) and therefore we have: P(A=true | B=true, E=true, C=true) = P(A=true | B=true, E=true) and need only compute the latter. Using Bayes rule (or the definition of conditional probabilities, whichever you prefer) we have: P(A=true | B=true, E=true) = P(A=true, B=true, E=true)/ P(B=true, E=true) = P(E=true|A=true, B=true)P(A=true)P(B=true) / ( P(B=true, E=true, A=true) +P(B=true, E=true, A=false)) = 0.5*0.4*0.1 / (0.5*0.4*0.1 + P(E=true|B=true,A=false)*P(B=true)*P(A=false)) = 0.002 / (0.002 + 0.1*0.1*0.6) = 0.002 / 0.0026 = 10/13 3) You need to construct a 5-input neural network that has value 1 in the output just when the number of its inputs that are on is either 1 or 2. (assume inputs are in {0, 1}). a) Can this be done without hidden units? ANSWER: No. Consider the subspace where the last 2 inputs are 0. We have now a 3D space with the 2 extreme points (0,0,0) and (1,1,1) requiring an output of 1, and the rest requiring an output of 1. Now, consider any plane separating (0,0,0) from (0,0,1),(0,1,0),(1,0,0). It is easy to show that any such plane does not separate, e.g., (1,1,1) from (1,1,0), by writing the general case for this plane, and showing that both (1,1,1) and (1,1,0) must be on one side of it. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) ANSWER: Many possible ways to do this. The smallest solution has only ONE hidden unit, that detects the case ">2", by having input weights 1 and a threshold of 2.5. The output unit gets all inputs with a weight of 1, the output of the hidden unit with a large negative weight (say -10), and a threshold of 0.5, and it is easy to verify that this setup works. 4) a) For the blocks world as defined in class, show how POP constructs the plan (that is, show all partial plans from the initial plan until a correct plan is reached) for initial state: A B C D ------------ and the goal state cointaining: On(A,C) and On(C,B) ANSWER: First we must define the operators, such as in class or as define in the book. We will assume a very well informed search that makes no mstakes in selecting ways to satisfy a precondition. Initial state (partial plan): ---------------------- (Dummy step: START) -------------------------------------------------------------- On(A,C), On(C,Table), On(B,D), On(D, Table), Clear(A), Clear(B) On(A,C), On(C,B) ------------------------------ (Dummy step: END) Now, we select unresolved precondition On(A,C) of END, and add the operator PutOn(A,C) to satisfy it. (* Note: trying to satisfy it using the effect of START will FAIL later on! *) So now we have: ---------------------- (Dummy step: START) --------------------------------------------------------------- On(A,C), On(C,Table), On(B,D), On(D, Table), Clear(A), Clear(B) Clear(C), Clear(A), On(A, x) ---------------------------- STEP1: PutOn(A,C) ------------------------ On(A,C), not Clear(C), not On(A,x), Clear(x) | | On(A,C), On(C,B) ------------------------------ (Dummy step: END) With the constraints: START before STEP1 before END. Henceforth I will continue this solution without drawing... Now, select (say) precondition On(C,B) of END, and add a new step to satisfy it: STEP2: PutOn(C,B) Precond: Clear(C), Clear(B), On(C,y) Effects: On(C,B), not Clear(B), not On(C,y), Clear(y) With the constraints: START before STEP2 before END. Now, resolve Clear(B) in STEP 2 by using this effect of START and On(C,y) by using On(C,Table) of START. At present there are no threats. Resolve Clear(A) in STEP1 by using this effect of START. Now, resolve Clear(C) in STEP1 by adding a new step: STEP3: PutOn(A,Table) Precond: Clear(A), On(A,C) Effects: On(A,Table), not On(A,C), Clear(C) With the constraints: START before STEP3 before STEP1. Resolve the preconditions Clear(A) and On(A,C) by these effects of START. Resolve precondition Clear(C) in STEP2 also by using STEP3, requiring the additional constraint STEP3 before STEP2. The final plan has the steps in total order START before STEP3 before STEP2 before STEP1 before END b) Suppose that is not known whether On(B,D) or On(D,B) in the initial state, and that there is an operator "Check(On(x,y))" which results in knowing whether this is true or not. Show how a conditional plan is constructed for the above case. Similar to above, we will show only the final result. We will also need to assume that the system knows to compute the "Clear" predicate, etc. so will drop it from the effects of the actions, and also that On(x,y) entails that not On(x,z) for z not equal y. Steps are: START: Dummy Precond: None Effect: On(A,C), On(C,Table), On(B,D) or On(D,B), On(D, Table) STEP1: PutOn(A,Table) Precond: Clear(A), On(A,C) Effects: On(A,Table) STEP2: Check(On(D,B)) Precond: None Effects: Know(On(D,B)) (Branch true: On(D,B), branch false: not On(D,B), with the true branch satisfying the precondition of STEP3, and the false branch satisfying the Clear(B) precondition of STEP4). STEP3: PutOn(D,Table) Precond: Clear(D) Effects: On(D,Table) (Satisfying the Clear(B) precondition of STEP4) STEP4: PutOn(C,B) Precond: Clear(B), Clear(C), On(C,x) Effects: On(C,B) STEP5:PutOn(A,C) Precond: Clear(A), Clear(C), On(A,x) Effects: On(A,C) Constraints are: (in addition to the obvious: START before everything, END after) STEP1 before STEP4 STEP2 before STEP3 STEP3 before STEP4 (if STEP3 is executed at all!) STEP2 before STEP4 STEP4 before STEP5 Note that STEP1 can be before STEP2, or after STEP2 and before STEP3, or after both STEP2 and STEP3 but befor STEP4, and any of these is a valid linearization of this plan. 5) Consider the following voting scenario, occuring in a certain middle-eastern country.* We will assume that you are a risk-neutral, i.e. that your utility function is linear in the amount of money that your assets are worth. Assume also that moral considerations are not part of your utility function. There are 4 candidates for whom you could vote: Yahoo, The Man, Lightning, and White, all equally likely to win, unless your vote happens to swing the election. Assume that there is a 1% chance that your vote will actually swing the election, and a 0.5% chance that "The Man" will somehow find out what your vote was. If that happens, and you voted for someone OTHER than The Man, your business will be trashed for a loss of 100,000 NIS. Your accountant has notified you that your taxes for FY 2008-2009 amount to 1,000,000 NIS on your clam-encapsulation business. Also, you happen to be a member of "The Man"'s party, and if this party wins you have a 10% chance of being in the legislature, in which case you will be able to pass a law that declares all clam-related businesses tax-exempt. But there is also a 10% chance that in this case a corruption charge against you will stick and you will go to jail and lose your business, a total additional cost of 5,000,000 NIS. Additional information is as follows: Yahoo claims he will reduce your taxes by 50%, and you think he will actually deliver with probability 0.5 If White is elected there will be mediocre-size bonuses which will gain you 100,000 NIS. Lightning getting elected will improve police forces, which will decrease your security costs by 150,000 NIS. Assume that event combinations not mentioned above have a probability of 0, e.g. the probability that you will be a member of the legislature given that "The Man" does not win is 0. You have the following options: 1) Vote for Yahoo (Y) 2) Vote for The Man (M) 3) Vote for Lightning (L) 4) Vote for White (W) 5) Not vote at all (0) Draw the decision diagram for each case below, and solve it. a) What is your rational choice in voting? ANSWER: Actually modeling the results of voting is complicated, so we will try a simplistic model, as follows. We have only one decision node, "Vote", with the above 5 possibilities. We have a chance node W for the winner, which depends on the decision. Another chance node K is whether "The Man" will know your vote, and one node L for wherher you will be in the legislature, which depends only on the winner. There is also a chance node J denoting whether you will be jailed for corruption, deopending only on node W. Finally, the chance node T, depending only on W, determines the state of taxation (in case Yahoo wins). We have a value node for collecting the value, defined by the above, and apparently has ALL the above nodes as parents. The decision node "Vote" has no parents, thus does not know the state of any of the other variables. So we need to simply check all the actions. But to save computations, we can pre-compute some of the terms, as they will be re-used in several possibilities. First, consider the case where Yahoo wins. The sub-utility (meaning: ignoring the possibility of retaliation by "The Man") there is independent of your vote, and is: EU'[W=Y]= 0.5 * (-1000000) + 0.5 * (-500000) = -750000 Likewise, if Lightning wins, there is only one possible outcome: EU'[W=L] = -850000 And if White wins, also one outcome: EU'[W=W] = -900000 Finally, if "The Man" wins: EU'[W=M] = 0.9 * (-1000000) + 0.1 * (0 + 0.1 *(-5000000)) = -900000 - 50000 = -950000 So looks like you should vote for Yahoo, except for possible retaliation by "The Man". Now, retaliation by "The Man" is independnt of which party you vote for, unless you vote for "The Man" or for no one. So we need only compare the 3 possibly dominant actions: Y, M, 0. If you vote either for no one or for "The Man", there will be no retaliation. Voting for no one gives equal probabilities of wins, so: EU[D=0] = 0.25 * (EU'[W=Y]+EU'[W=L]+EU'[W=W]+EU'[W=M]) = 0.25 * ( -750000-850000-900000-950000) = -862500 Voting for "The Man", in 0.99 of the cases makes no difference (as if you did not vote), and in the remaining case, probability 0.01, "The Man" wins. So in this case we have: EU[D=M] = 0.99 * EU[D=0] + 0.01 EU'[W=M] = 0.99 * (-862500) + 0.01 * (-950000) = -863375 So it looks like not voting is better than voting for "the man". What about voting for Yahoo? Well, in this case the expected cost or retaliation is: EUr'[D=Y] = 0.005 * (-100000) = -500 Again, voting for Yahoo only affects 0.01 of the cases, so all in all we have: EU[D=Y] = 0.99 * EU[D=0] + 0.01 EU'[W=Y] + EUr'[D=Y] = 0.99 * (-862500) + 0.01 * (-750000) - 500 = -861875 So looks like the expected gain from Yahoo overweights the chance of retaliation, so you'd better vote for Yahoo... b) As above, but you have the choice to pay 10,000 NIS to "future-tech polling inc." for a poll which tells you with certainty which 2 of the above 4 will surely NOT win (leaving the other 2 equally likely). If you do make the payment, you get this information before you need to decide about your vote. Do you pay for the poll? What do you vote after getting the poll information, assuming you do? In this case, we have another decision node P, whether to do the poll, and another chance node R (with 6 possible outcomes) on the poll results. Assuming as above equal probabilities for parties winning, symmetry dictates that all 6 outcomes are equally likely, and have a probability of 1/6 each. The decision node D gets information about P and the state of R. Now, we must be careful about the conditional probability of W given D and R. That is because if some parties are ruled out, the chance of our vote swinging the elections if we vote for one of the favorites should increase. We will assume for now that you have a 2% chance of swinging the elections after the poll. Suppose you take the poll. Let us examine the case after the poll. We can still use the partial utility EU', as well as EUr', as they only depend on the winner and on who you vote for, respectively. We will designate the results of the poll in the positive: i.e. who has a CHANCE to win, according to the poll. We will omit the cost of the poll until the end. Thus, we have: 1) For the case where Lightning and White can win, we have: If we do not vote, or vote for the man, we have EU[R=[LW],D=0] = 0.5*(-850000-900000) = -875000 If we vote for White, we do stand to decrease Lightning's chances, thereby reducing our payoff, and in addition risk retaliation by "The Man", so this option is clearly inferior. Finally, voting for Lightning, we have: EU[R=[LW],D=L] = 0.98 * EU[R=[LW],D=0] + 0.02 *EU'(W=L) + EUr'[D=Y] = 0.98 * (-875000) + 0.02 * (-850000) - 500 = -874500 So looks like in this case the risk of retaliation is less than the gain, so it is better to vote, and get -874500 2) Now for the case where Yahoo and White can win. We have: If we do not vote, or vote for the man, we have EU[R=[YW],D=0] = 0.5*(-750000-900000) = -825000 If we vote for White, we do stand to decrease Yahoo's chances, thereby reducing our payoff, and in addition risk retaliation by "The Man", so this option is clearly inferior. Finally, voting for Yahoo, we have: EU[R=[YW],D=L] = 0.98 * EU[R=[YW],D=0] + 0.02 *EU'(W=Y) + EUr'[D=Y] = 0.98 * (-825000) + 0.02 * (-750000) - 500 = -824000 In this case, the gain from Yahoo outweights the possible loss from retaliation, so we prefer to vote for Yahoo and get -824000 3) Now for the case where Yahoo and Lightning can win. We have: If we do not vote, or vote for the man, we have EU[R=[YL],D=0] = 0.5*(-750000-850000) = -800000 If we vote for Lightning, we do stand to decrease Yahoo's chances, thereby reducing our payoff, and in addition risk retaliation by "The Man", so this option is clearly inferior. Finally, voting for Yahoo, we have: EU[R=[YW],D=L] = 0.98 * EU[R=[YL],D=0] + 0.02 *EU'(W=Y) + EUr'[D=Y] = 0.98 * (-800000) + 0.02 * (-750000) - 500 = -799500 In this case, the gain from Yahoo is more than the possible loss from retaliation, we should vote Yahoo, to get -799500 Now we have 3 cases involving possible win by "The Man". 4) Now for the case where Yahoo and "The Man" can win: If we do not vote, we have: EU[R=[YM],D=0] = 0.5*(-750000-950000) = -850000 If we vote for "The Man", we have: EU[R=[YM],D=M] = 0.98 * 0.5*(-750000-950000) + 0.02 * (-950000) = -852000 And if we vote for Yahoo, we have: EU[R=[YM],D=Y] = 0.98 * 0.5*(-750000-950000) + 0.02 * (-750000) - 500 = -848500 Voting for the other candidates makes no sense, as it has the same effect as not voting and in addition takes the risk of retaliation. But voting for Yahoo here is optimal, getting us -848500. 5) Now for the case where Lightning and "The Man" can win: If we do not vote, we have: EU[R=[LM],D=0] = 0.5*(-850000-950000) = -900000 If we vote for "The Man", we have: EU[R=[LM],D=M] = 0.98 * 0.5*(-850000-950000) + 0.02 * (-950000) = -901000 And if we vote for Lightning, we have: EU[R=[LM],D=L] = 0.98 * 0.5*(-850000-950000) + 0.02 * (-850000) - 500 = -899500 Voting for the other candidates makes no sense, as it has the same effect as not voting and in addition takes the risk of retaliation. So looks voting for Lightning is optimal -899500. 6) Finally, the case where White and "The Man" can win: If we do not vote, we have: EU[R=[WM],D=0] = 0.5*(-900000-950000) = -925000 If we vote for "The Man", we have: EU[R=[WM],D=M] = 0.98 * 0.5*(-900000-950000) + 0.02 * (-950000) = -925500 And if we vote for Lightning, we have: EU[R=[WM],D=W] = 0.98 * 0.5*(-900000-950000) + 0.02 * (-900000) - 500 = -925000 Voting for the other candidates makes no sense, as it has the same effect as not voting and in addition takes the risk of retaliation. So in this case it is best not to vote, and get -925000. After all the 6 cases are done, simply compute an expectation, and then compare the result against the expectation of the optimal decision before the pole. If the difference in favor of the former is less than 10000, we should not do the poll. So we have, after the poll: EU[P=yes] = 1/6 * (-874500-824000-799500-848500-899500-925000) = -861833 Now, this is only very slightly better than the case of not knowin the results of the poll (only about 42 in gain), so clearly the poll is not worth it! 6) Consider the following belief-state MDP, where scoring is total value of rewards obtained in 3 steps. The world has 4 locations in linear sequence, as follows: F1 I F2 P Actions are: R attempts to move right, and L attempts to move left, and S senses location F2. Location F1 contains a flag, which has a value of 1. Location F2 contains a flag, with value either 10 or 100, with probability 0.5 each, and may be icy (with a probability of 0.5). A robot visiting a location with a flag gets a reward equal to the flag value, and then the flag disappears. P is an absorbing state, with a reward of -100. Transition probabilities are based on: motion succeeds with probability 0.8, but with probability 0.2 the robot either stays in the same location, or moves in the opposite direction (if possible), with probability 0.1 each (0.2 to stay in place if at end of sequence). Sensing, if performed, tells the robot the actual value of the flag at F2. b1) Formalize the above problem as a (belief-state) MDP, i.e. what are the states, the transition probability function, and the reward function? ANSWER: The world state variables are the location of the agents (we will number them (1,2,3,4) statring from the left), and the value of the flags at locations 1 and 3. We will call this variable X. The flag at location 1 (variable F1) has value 1 if not yet collected, and 0 otherwise The flag at location 3 (variable F2) has value 100, 10, or 0. In addition, location 3 may or may not have ice (variable I). Finally, we need the last direction of travel, in case the agent is skidding (variable D, with domain L, R, 0). The world state is thus a 5-tuple: (X, D, F1, F2, I), each variable with the domain dafined above. Of course, since this is a finite-horizon problem, time can also be seen as part of the state, or alternately we will leave it as NOT part of the state and compute a time-dependent policy. We will use this last version, as discussed in class. Now, the BELIEF STATE is a bit different, since the agent does NOT KNOW whether F2 is 10 or 100, and also whether location 3 is icy. So we can either add additional variables indicating that, or extend the DOMAIN of these variables to indicate this lack of knowledge. So we will have the domain of I be: {0, 1, U}, where U stands for "unknown". Likewise, the domain of F2 will be: {0, 10, 100, U}. Now we must define transition probabilities and rewards. A flag is picked up when the agent gets to that location, and then the flag disappears. So it looks like the reward depends on both the flag value before the transition, and the one after the transition, but not the action. It is a bit annoying that it depends on both! One "trick" that we can do is to make the flag NOT disappear immediately, but only later, and also collect the reward only in the NEXT time slot. This will allow us to define a reward based ONLY on the current state! This creates a slightly different, but equivalent and simpler, MDP. Using this trick, we define the reward function r as follows (where "x" stands for "don't care". This is a compact representation, in order to avoid specifying an array of 4*3*4*2*3=288 entries one by one. Reward for flag at location 1: r(1, x, 1, x, x) = 1 r(1, x, 0, x, x) = 0 r(1, x, U, x, x) = 0 actually, the states in this line are unreachable Reward for being in the initial location: r(2, x, x, x, x) = 0 Reward for flag at location 3: r(3, x, x, 0, x) = 0 r(3, x, x, 10, x) = 10 r(3, x, x, 100, x) = 100 r(3, x, x, U, x) = 0 actually, the states in this line are unreachable Reward for being in the pit: r(4, x, x, x, x) = -100 This covers all possible states, and completes the definition of r. Now, to define the transition probabilities. Again, since there are 3 possible action, a full distribution requires 288*3*288=248832 numbers, which is rather a lot. So we use a factored representation - using the state variables. Essentially, the location transition does not depend on the flag variables in the previous state, and each flag variable depends only on the previous value of the flag variable and the location after the transition. We will use "prime" notation to refer to the value of a variable after a transition. The "ice" variable depends only on its previous variable and the current location. Also, we will mention only transitions with probability greater than zero, and ones beginning with states that are not obviously unreachable. Thus, we have, beginning with the location variable: P(X'=i+1|X=i,R) = 0.8 ; for i=1,2 P(X'=1|X=1,R) = 0.2 P(X'=2|X=2,R) = 0.1 P(X'=1|X=2,R) = 0.1 P(X'=1|X=1,L) = 0.9 P(X'=2|X=1,L) = 0.1 P(X'=1|X=2,L) = 0.8 P(X'=2|X=2,L) = 0.1 P(X'=3|X=2,L) = 0.1 P(X'=4|X=3,R,no I) = 0.8 P(X'=3|X=3,R,no I) = 0.1 P(X'=2|X=3,R,no I) = 0.1 P(X'=4|X=3,L,no I) = 0.1 P(X'=3|X=3,L,no I) = 0.1 P(X'=2|X=3,L,no I) = 0.8 P(X'=4|X=3,D=R,I) = 1 ; Assuming ALWAYS slipping on the ice, for every action P(X'=2|X=3,D=L,I) = 1 ; unreachable P(X'=4|X=4) = 1 ; Regardless of action or other state variables P(X'=i|X=i,S) = 1 ; for i=1,2 P(X'=3|X=3,S, no I) = 1 P(X'=2|X=3,D=L, S, I) = 1 ; unreachable... P(X'=4|X=3,D=R, S, I) = 1 As to the D variable, it only matters when X=3 and WITH ice, and the only reachable case is when we reach this location moving right. So for all intents and purposes we can assume that always D=R, even if it ISN'T, which saves s LOT! Now, the ice and flags are a bit tricky, since we need to map in the uncertainty, as follows. First, an unknown flag stays unknown unless we step on it or sense it, and kwwps its value if known: P(F2'=U|F2=U,X'=i,R) = 1 ; for i=1,2,4 P(F2'=U|F2=U,X'=i,L) = 1 ; for i=1,2,4 P(F1'=1|F1=1,X'=i) = 1 ; for i=2,3,4 P(F2'=10|F2=10,X'=i) = 1 ; for i=1,2,3 P(F2'=100|F2=100,X'=i) = 1 ; for i=1,2,3 Now, the sensing action: P(F2'=10|F2=U,S) = 0.5 P(F2'=100|F2=U,S) = 0.5 Stepping on flag F2 also discovers it: P(F2'=10|F2=U,X'=3) = 0.5 P(F2'=100|F2=U,X'=3) = 0.5 After stepping on a flag it disappears: P(F1'=0|X=1) = 1 P(F2'=0|X=3) = 1 Finally, the ice. If known it stays the same: P(I'|I) = 1 P(no I'|no I) = 1 P(I'=U|I=U, X=i) = 1 ; for i=1,2,4 Stepping in X=3, we discover the ice: P(I'| I=U, X=3) = 0.5 P(no I'| I=U, X=3) = 0.5 This completes the definition of the transition probability. b2) Find the optimal policy. You may assume that the robot starts at location I in order to save some computation. ANSWER: Now we need only solve the problem. We use value determination, starting from the end, that is T=3. Again we use a factored representation. V(3, [X=4]) = -100 ; reward for being in pit V(3, [X=1,F1=1]) = 1 ; reward for picking up F1 V(3, [X=1,F1=0]) = 0 ; no reward for picking up F1 if already picked V(3, [X=2]) = 0 ; no reward for location 2 V(3, [X=3,F2=100]) = 100 ; reward for F2 V(3, [X=3,F2=10]) = 10 ; reward for F2 V(3, [X=3,F2=0]) = 0 ; no reward for F2 if already picked All other cases are already covered or unreachable. So now we go to T=2: The easiest is: V(2, [X=4]) = -200 ; we are in the pit, and sure to stay there! V(2, [X=1,F1=1]) = 1 ; get reward, and no action gets anything non-zero V(2, [X=1,F1=0]) = 0 ; get no reward, and no action gets anything non-zero For X=2, optimal action is R, which lands one of several states: V(2, [X=2,F2=U,F1=1]) = 0 + 0.4*V(3, [X=3,F2=100]) + 0.4*V(3, [X=3,F2=10]) + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1]) = 44.1 V(2, [X=2,F2=100,F1=1]) = 0 + 0.8*V(3, [X=3,F2=100]) + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1]) = 80.1 V(2, [X=2,F2=10],F1=0) = 0 + 0.8*V(3, [X=3,F2=10]) + 0.1*V(3, [X=2]) + 0.1*V(3, [X=1,F1=1]) = 8.1 V(2, [X=2,F2=U,F1=0]) = 44 ; Like the case for F1=1, but without the reward for F1 V(2, [X=2,F2=100,F1=0]) = 80 ; Like the case for F1=1, but without the reward for F1 V(2, [X=2,F2=10,F1=0]) = 8 ; Like the case for F1=1, but without the reward for F1 Now, the risky location X=3: V(2, [X=3,D=R,I,F2=0]) = -100 ; get no reward, and sure to get to X=4 in T=3 V(2, [X=3,D=R,I,F2=10]) = -90 ; get reward 10, and sure to get to X=4 in T=3 V(2, [X=3,D=R,I,F2=0]) = 0 ; get reward 100, and sure to get to X=4 in T=3 V(2, [X=3,no I,F2=0]) = 0 ; get no reward, and do sensing to avoid risk of landing in pit V(2, [X=3,no I,F2=10]) = 10 ; get reward 10, and do sensing to avoid risk of landing in pit V(2, [X=3,no I,F2=100]) = 100 ; get reward 100, and do sensing to avoid risk of landing in pit This is sufficient as the value function is again defined for all reachable states. Now move on to compute V(1, .), and V(0, .) likewise. V(1, [X=4]) = -300 ; we are in the pit, and sure to stay there! V(1, [X=1,F1=1,F2=U]) = 1 + 0.8 * V(2, [X=2,F2=U,F1=0]) = 36.28 ; get reward, and doing R may get os to [X=2,F2=U,F1=0] V(1, [X=1,F1=1,F2=100]) ; unreachable (requires 2 action since T=0) V(1, [X=1,F1=1,F2=10]) ; unreachable (requires 2 action since T=0) etc. 7) A project leader in the IMG4 consortium would like to construct a decision procedure based on results from hiring employees in the KITE consortium. The decision procedure will be based on a decision tree, using information in the case database. Attributes for each employee are: degree level (A), Java programming ability (B), self-motivation (C), grade in AI (D). The target attribute is the "decision", describing the amount of money to pay the employee. a) Construct by hand (using the greedy search using the best information gain heuristic) a consistent decision tree for the following table: input variables decision A B C D -------------------------------------------- 1 F M 95 8 1 T L 70 4 1 F L 70 4 1 T M 95 6 1 T M 80 6 2 T H 95 8 2 T H 80 8 2 F H 95 8 ANSWER: try each attribute to see what happens. We will do "lazy computation" in that we will not bother with exact computations unless really needed. Using A we get: Case A=1: {8,4,4,6,6}, i.e. distribution {0.2, 0.4, 0.4}. Case A=2: {8,8,8} i.e. distribution {1,0,0}, i.e. 0 bits required. Using B we get: Case B=F: {8,8,4} i.e. distribution {2/3, 1/3, 0}. Case B=T: {4,6,6,8,8} i.e. distribution {0.2, 0.4, 0.4} as in A=1 (So B is clearly worse than A as number of bits required after getting answer to B is greater than for A). Using C we get: Case C=M: {6,6,8} i.e. distribution {2/3, 1/3, 0} as in case B=F. Case C=L: {4,4} i.e. 0 bits required. Case C=H: {8,8,8} i.e. 0 bits required. (So C is clearly better than B but we still need to figure out (maybe) if C is better than A) Using D we get: Case D=95: {8,8,8,6} i.e. distribution {0.75, 0.25,0} which is only slightly better than case C=M Case D=70: {4,4} i.e. 0 bits required. Case D=80: {8,6} i.e. distribution {0.5,0.5} with 2 examples (2 bits required). Therefore C is better than D. Now we must compare A and C more exactly. Case A=1: Distribution {0.2, 0.4, 0.4} is more chaotic than {0, 0.5, 0.5} and thus requires MORE than 1 bit per example, so total is more than 5 bits. (of course one can compute exact number of bits, i.e. -5*(0.2 log 0.2 + 2* 0.4 log 0.4) but we will avoid this, being lazy). Case C=M: Distribution {2/3, 1/3, 0} is LESS chaotic than {0.5, 0.5, 0} so requires LESS than one bit per example, so total is less than 3 bits. So C is clearly better than A, and thus C is selected as the root of the tree. Now, in the recursive calls the only interesting part is for case C=M. In this case, using B we get: Case B=F: {8} i.e. no additional bits required. Case B=T: {6,6} i.e. no additional bits required. (note that neither A nor D will help here). So the decision tree has 2 internal nodes only, and is: C? --- H: 8 ---L: 4 --M: B? --- F: 8 -- T: 6 b) Is a more compact (fewer internal nodes) decision tree possible for the above table? If no - prove it. If yes, which tree? ANSWER: In general, the above procedure may not generate the optimal tree. However, in this case it did happen to do so. Proof: clearly 0 internal nodes don't work here, as the target attiribute differs between the examples. Also, we tried all possible attributes for the top level, and none of them could fully classify the examples, so there is NO tree with ONE node that will work. The result above has 2 nodes, and since 2 is the smallest integer greater than 1, we are done.
Deadline: Tuesday, February 24, 2009, at 12 noon.
Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)
* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.