Justify all answers! 1) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none E F B A C A D D none F B D a) Is this network a poly-tree? No, because of the following cycle in the underlying undirected graph: A-B-F-D-C-A b) Is this network directed-path singly connected? Yes, because the only nodes with in-degree greater than 1 are C and F. C is reachable only from root nodes A and D, and likewise F is reachable only from A, B, and D, each through exactly one path. c) Determine the truth of the following independence statements (using d-separation): 1) I({A}, {D} | {}) True. There are 2 possible simple paths between A and D: In A-C-D node C is a blocking node, because it is a non-evidence (converging, obviously) sink node. In A-B-F-D node F is a blocking node because it is a converging non-evidence node, and none of its children (i.e. E) is an evidence node. 2) I({A}, {D} | {F}) False. This time path A-B-F-D is not blocked, since F is now a converging evidence node. 3) I({A}, {D} | {F, B}) True. The path A-B-F-D is now blocked by the pass-through evidence node B. Path A-C-D is still blocked by C. 4) I({A}, {D} | {F, B, C} False. Path A-C-D is now unblocked, since C is a converging evidence node. 5) I({A}, {E} | {F}) True. All paths from A to E go through E as a passthrough evidence node, and are thus blocked by E. 6) I({A}, {C, F} | {B}) False. C is an immediate successor of A, and the path A-C cannot be blocked. 7) I({A}, {F} | {B, C}) False. Path A-B-F is blocked by B, but the path A-C-D-F is unblocked due to converging evidence node C and non-evidence diverging node D. d) Determine at least 2 different ways to convert the network into a tree using clustering. Answers: 1) Create macro-nodes AB and BDF. We get a root node AB, node C with a single parent AB, node BDF with a single parent AB, and node E with a single parent BDF. 2) Create a macro-node ABCDEF (not very efficient, but valid). 3) Create a macro-node ABD. We get ABD as a root node, C with parent ABD, F with parent ABD, and E with parent F. e) Suppose that P(A) = 0.9, P(D) = 0.3, P(B|A) = 0.1, P(B|~A) = 0.8, P(C|A) = 1, P(C|~A) = 0.5. Find P(A| B, C, E) Answer: Actually, there is insufficient data above to compute this value. We can compute, say, P(A| B,C) because B is inependent of C given A (d-separation), but this will not help because B becomes dependent on C given E. As an exercise, we compute P(A|B, C) from first principles: P(A|B,C) = P(A, B, C) / P(B, C) = P(A, B, C) / (P(A, B, C) + P(~A, B, C) = = P(A) * P(B, C |A) / ( P(A) * P(B, C |A)+ P(~A) * P(B, C |~A)) Note that now we can use I(B, C | A) to get: P(A|B,C) = P(A) * P(B|A) * P(C |A) / ( P(A) * P(B|A) * P(C |~A) + P(~A) * P(B|~A) * P(C |~A)) = 0.9 * 0.1 * 1 / (0.9 * 0.1 * 1 + 0.1 * 0.8 * 0.5) = 0.09 / (0.09 + 0.04) = 9/13 2) You need to construct a 4-input neural network that has value 1 in the output just when exactly 3 of its inputs are 1, and 0 otherwise (assume inputs are in {0, 1}). a) Can this be done without hidden units? No. 2 hyperplanes are needed to separate out the points (0111, 1011, 1101, 1110) from all the rest. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) There are several such networks perhaps the simplest is one that uses a hidden unit that computes when 1111 is the input. This is done, say, with a step function with threshold 3.5 and all inputs with weight 1. The complete network has just 2 units: the above hidden unit, and an output unit with 5 inputs, and threshold 2.5. Its input weights are 1, except for the one connected to the hidden unit, which has negative weight (say -10). Clearly if 3 or more inputs have value 1, the output unit is triggered, unless the input unit is also on, as needed! 3) A project leader in the IMG4 consortium would like to construct a decision procedure based on results from hiring employees in the earlier KITE consortium. The decision procedure will be based on a decision tree, using information in the case database. Attributes for each employee are: degree level (A), industrial work experience (B), self-motivation (C), grade in AI (D). The target attribute is the "decision", describing how good the employee is for the project. a) Construct by hand (using the greedy search using the best information gain heuristic) a consistent decision tree for the following table: input variables decision A B C D -------------------------------------------- 2 T H 95 good 2 T H 95 good 2 F H 95 good 1 F H 95 good 1 T L 70 bad 1 F L 70 bad 1 T M 95 average 1 T H 80 average Need to check every attribute at each node of the tree. At the top, we consider: A: for value 1 we get 5 examples: 1 good, 2 bad, and 2 average, total info missing: 5*H(0.2,0.4,0.4) for value 2 we get 3 examples classified good (missing info: 0) B: for value T we get 5 examples: 2 good, 1 bad, 2 average (total missng info 5*H(0.4, 0.2, 0.4), same as for value 1 in A). for value F we get 3 examples: 2 good, 1 bad. This means that B is worse than A, without needing to compute an exact value! C: for value H we get 4 good, 1 average, missing info 5*H(0.8, 0, 0.2) for value L we get 2 bad (zero missing info) for value M we get 1 average (zero missing info). Clearly |H(0.8, 0, 0.2)| < |H(0.2,0.4,0.4)| so C is better than A. D: for value 95 we get 4 good, 1 average, mising info: 5*H(0.8, 0, 0.2) for value 70 we get 2 bad (zero missing info) for value 80 we get 1 average (zero missing info), so C and D are equally good! The greedy algorithm chooses arbitrarily, say we pick C. Now, in the recursive subtrees, for branch L we return the leaf value bad, for branch M we return the leaf value average, and for branch H we need further recursion on these 5 examples. Note that neither A nor B can fully distinguish between the average and the good in the examples, but using D, if we get the value 80 we can answer average, and if we get 95 we can answer good. If we get 70 there are no examples to use, so the default "good" should be at that branch. The overall tree is thus: C: L ---------------------- bad M ---------------------- average H --------- D: 70 ------ good 80 ------ average 95 ------ good b) Is a more compact (fewer internal nodes) decision tree possible for the above table? If no - prove it. If yes, which tree? No. Proof: a more compact tree would have only ONE internal node. Since above we tried all possible one-node trees and none of them were fully consistent with the examples, the above tree is optimal in this measure. 4) Consider the following investment scenario, occuring in a certain middle-eastern country. We will assume that you are risk-neutral, i.e. that your utility function is linear in the amount of money that your assets are worth. You have 1,000,000 NIS you wish to invest, with the following options: 1) Buy a house and rent it out for net gain of 40,000 NIS per year (after taxes, etc.) 2) Buy government bonds for a 5% yearly interest, out of which you need to pay 15% capital gains tax. However, there is talk that the capital gains tax will be raised to 30% very soon, and suppose that you credit this rumor with a 0.5 probability of being accurate (and that if false, the capital gains tax will remain unchanged). a) Which of the above investments should you make as a rational agent? Answer: Action 1 results in a net gain of 40000 NIS, so assuming utility is linear in money (risk neutrality) we will just measure utility in NIS. For action 2, there are two outcomes: 1) With probability 0.5, you gain 50000 NIS buy pay taxes 15% of that (that is, 7500 NIS), for a net gain of 42500 NIS. 2) With probability 0.5, you gain 50000 NIS buy pay taxes 30% of that (that is, 15000 NIS), for a net gain of 35000 NIS. Your expected utility is thus: 0.5 * 42500 + 0.5 * 35000 = 38750 which is STRICTLY LESS than buying the house, so you should buy the house... b) Suppose that a "friend" working at the treasury department can give you a "hint" on whether the capital gains tax will actually be raised. Given that your intended investment is for exactly one year, how much (maximum) should you as an intelligent agent be willing to pay this employee, given that 1) he is pessimistic, i.e. he will always say that taxes will be instated if they are, but if they are NOT he will say with probability 0.1 (erroneously) that there will be taxes, and 2) moral considerations are not part of your utility function. Answer: Suppose you get the hint for free, how much would you gain on the average? Since you believe that the probability of increase taxation is 0.5, and that your "friend" is always accurate IF taxes are to be increased, but will be wrong with probability 0.1 if they are not, then you believe: With probability 0.5*0.9 = 0.45 that he will say that taxes will not increase, with probability 0.55 that he will say that taxes will increase. After receving the hint, if the hint is "taxes will not increase" then you can be sure that they will not, so you should then go for the bonds, for a gain of 42500, a decision you would make only after receiving the hint, gaining you 2500 more than the action you would have made otherwise. If the hint is "taxes will increase", then with probability 0.5/0.55 they will indeed increase, and with probability 0.05/0.55 they will not. In this case, buying the house always gets you 40000, but purchasing the bonds gets you: E(bonds) = 0.5/0.55 * 35000 + 0.05/0.55 * 42000 < 40000 so we obviously prefer the real-estate in this case. The net gain due to information in this case is zero, since in this case the preferred acrion is the same as before receiving the information, Thus, expected value of the information is VI = 0.55 * 0 + 0.45 * 2500 = 1125 NIS, and you should be willing to pay anything below that amount for the hint, if we igonore other considerations... (Note that the above is not "value of perfect information" (VPI), since the information we receive is noisy - thus not perfect). 5) a) Consider a world with the following operators: Op(Action: pickup(x) Precond: Location(Robot, y) and Location(x, y) and not Holding(x) /* Exists y such that... */ Effect: Holding(x)) Op(Action: go(from, to) Precond: Location(Robot, from) Effect: Location(Robot, to)) The semantics of the Location predicate is such that an object can only be at one location. a1) The initial state is: Location(Robot, Room1) and not holding(Box1) and Location(Box1, Room2) The final state should have: Holding(Box1) Show the steps for solving this planning problem using partial-order planning (POP). Answer: The initial state contains dummy operators for start and end, as follows: StartOp: (a dummy operator) Action: null Precond: null Effect: Location(Robot, Room1) and not holding(Box1) and Location(Box1, Room2) EndOp: (a dummy operator) Acion: null Precond: Holding(Box1) Effect: null Ordering: Startop < EndOp There is only one unmet precondition in the plan, Holding(Box1). Since there is no operator in the plan with this effect, POP adds a step: OpPickup1: Action: pickup(Box1) Precond: Location(Robot, y) and Location(Box1, y) and not Holding(Box1) Effect: Holding(Box1) And the ordering constraint: OpPickup1 < EndOp, StartOp < OpPickup1, and marks OpPickup1 as the operator enabling the precondition of EndOp. (Note instantiation of x to Box1). Now the plan has 3 unmet preconditions. Suppose we start with Location(Box1, y). This can by met by StartOp if we instantiate y to Room2, and mark StartOp as the operator enabling this precondition. No additional ordering needed. Another unmet precondition is not Holding(Box1), and this can also be fixed by marking StartOp as the operator enabling this precondition. Finally, we need to handle: Location(Robot, Room2), and here there is no step that can enable this precondition, we need to add an operator: OpGo1: Action: go(from, Room2) Precond: Location(Robot, from) Effect: Location(Robot, Room2) And we make constraints: StartOp < OpGo1, and OpGo1 < OpPickup1, and mark OpGo1 as the step enabling the precondition Location(Robot, Room2) of OpPickup1. Note that no ordering and other constraints are violated until now. The only remaining unmet precondition is Location(Robot, from), and this can be met by the effect of StartOp with from instantiated to Room1. Since this causes no constraint violations, and all preconditions are met, the algorithm returns success, the final plan is: StartOp < OpGo1 < OpPickup1 < EndOp (this partially ordered plan happens to be a fully ordered, i.e. sequential plan) a2) Add an operator that supports the robot carrying the box, and show how from the initial state how a plan is developed for achieving: Location(Box1, Room1) using POP. One way to do this is to make the go operator have special cases in case the robot is carrying something. Less general, but more simple is to add a carry operator, as follows: Op(Action: carry(object, from, to) Precond: Location(Robot, from) and Holding(object) Effect: Location(Robot, to) and Location(object, to)) Now the new problem works as follows - initially we have the dummy operators: StartOp: (a dummy operator) Action: null Precond: null Effect: Location(Robot, Room1) and not holding(Box1) and Location(Box1, Room2) EndOp: (a dummy operator) Acion: null Precond: Location(Box1, Room1) Effect: null Ordering: Startop < EndOp To satisfy the precondition of EndOp we must add a step usung the carry operator: OpCarry1: Action: carry(Box1, from, Room1) Precond: Location(Robot, from) and Holding(Box1) Effect: Location(Robot, Room1) and Location(object, Room1) With ordering constraints StartOp < OpCarry1 and OpCarry1 < EndOp. Also mark OpCarry1 as enabling the precondition Location(Box1, Room1). Now, we need to meet the precondition Holding(Box1). This part is handled using exactly the same as before, which adds steps OpPickup1 and OpGo1, and the ordering: OpStart < OpGo1 < OpPickup1 < OpCarry1 < OpEnd. We remain woth the unmet precondition: Location(Robot, Room1). That one can be met as the result of the OpGo1 operator, so no further changes are needed. Note that we assumed in these answers that the POP algorithm makes the right choices when faced with several options - otherwise the process could be longer and require backtracking!