Justify all answers! 1) We are given the following independence statements about random variables A, B, C, D, E, F: I({A,B,C,D},{E,F}|{}) I({A},{C,D}|{B}) I({D},{A,B}|{C}) not I({A},{B}|{}) not I({B},{C}|{}) not I({C},{D}|{}) not I({E}, {F}|{}) a) Construct a Bayes network that is consistent with the above set of independence statements. ANSWER: The first independence statement means there are no edges between any of {A,B,C,D} and any of {E,F} so we have two separate components. In the {E, F} component, due to the last dependent statement, there is a path between E and F, and as there are only 2 nodes there this means an edge between E and F (say the directed edge E to F). In the {A,B,C,D} component there is a path from A to B, from B to C, and from C to D that does not contain colliders ("converging" nodes), due to the dependnece statements. However, due to the 2nd and 3rd independence statemenets, there is no edge from A to C or D, and resp. no edge from A or B to D. Since this part of the graph is connected, A must have an edge with B, and D must have an edge with C, and B must have an edeg with C. This means that this part of the graph is a chain A,B,C,D with no collodiers, because B blocks the path ABC. Likewise C is not a collider. So one possible way to make a directed graph here from A to B to C to D. b) Is the answer above unique? If not, what is the set of possible BNs that is consistent with all the statements? ANSWER: Not unique. Edge between E and F can be in either direction. Also, path ABCD can be oriented in any way that contains no colliders, so essentially any of the nodes A,B,C,D can be the "root", with edges directed away from it, and the rest of this component determined uniquely. 2) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none B none C A D A E C B G D B H E a) Is this network a poly-tree? ANSWER: No, due to undirected cycle ACEBGDA b) Is this network (directed-path) singly connected? ANSWER: Yes, the only simple undirected cycle contains four direction changed, and thus no alternate directed path between any par of nodes. c) Determine the truth of the following independence statements (using d-separation): 1) I({D}, {E} | {}) ANSWER: No, path DACE is is unblocked (A is non-evidence diverging, and C is non-evidence passthrough, so neithr blocks the path). 2) I({D}, {E} | {A}) ANSWER: Yes, path DACE is now blocked by A (diverging evidence node), and path DGBE is blocked by G (non-evidence, childless, converging "barren" node) 3) I({D}, {E} | {A, G}) ANSWER: No, now path DGBE has converging evidence node G, and non-evidence diverging node B, neither blocking. 4) I({B}, {E} | {D}) ANSWER: No, there is a direct edge from B to E. 5) I({B}, {C} | {}) ANSWER: Yes, path BGDAC is blocked by barren node G, and path BEC is blocked by barren node E. 6) I({B}, {C} | {G}) ANSWER: No, path BGDAC is no longer blocked by G, and D, A are non-evidence and passthrough, diverging, resp. d) Assume that the nodes are binary-valued. Suppose that P(A=true) = 0.2, P(B=true)= 0.1, P(E=true|C=false) = 0.5, P(E=true|C=true) = 0.5, P(H=true|E=true) = 0.2, P(H=true|E=false) = 0.4, P(C=true|A=true) = 0.9, P(C=true|A=false) = 0.2 P(D=true|A=true) = 0.8, P(D=true|A=false) = 0.1 Find P(C=true | D=true, G=true) Hint: you may want to preprocess to get rid of nodes before doing any computation. ANSWER: H is barren, dropped. Now E is barren, so dropped. Now the only path from C to G is CADG, so d-sep(A,G | D) in the stripped graph. So we have: P(C=true | D=true, G=true) = P(C=true | D=true) In this new query, G is also barren and dropped, and now B is barren and dropped. So we are left with just A, C, D, and can use simple enumeration. P(C| D=true) = alpha (P(C, A=false, D=true) + P(C, A=true, D=true)) = alpha (P(C|A=false)P(D=true|A=false)P(A=false) + P(C|A=true)P(D=true|A=true)P(A=true)) = alpha [ (0.2 * 0.1 * (1-0.2) + 0.9 * 0.8 * 0.2), /* for C=true */ ((1-0.2) * 0.1 * (1-0.2) + (1-0.9) * 0.8 * 0.2)] /* for C=false */ = alpha [ 0.16, 0.08 ] = [ 2/3, 1/3 ] That is, P(C=true | D=true, G=true) = P(C=true | D=true) = 2/3 3) You need to construct a 3-input neural network that has 2 outputs: the first has value 1 in the output just when the number of its inputs that are "on" is even, and the other has a value of 1 in the ouput just when the number of its inputs that are "on" is greater than or equal to 2 (assume inputs are in {0, 1}). a) Can this be done without hidden units? ANSWER: Majority can be done without hidden units (see below), but parity is not linearly separable, so overall the answer is: no. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) ANSWER: The function computing "greater than or equal to 2" is simple: use a unit U1 connected to all inputs, with input weights of 1 and a threshold of 1.5 Computing "even" requires hidden units, but we will "borrow" the above majority implementation as one of them (it is thus a "visible hidden unit", that is it is an internal layer but its output also appears as system output. The output unit U2 will be connected to all inputs with a weight of -1, and to the output of U1 with a weight of 2. Its threshold will be -0.5, so that it will provide an output of 1 when none of the inputs is active (zero is even). But if one of the inputs is on, the weighted sum will be -1 and the output will be 0. Now if 2 inputs are on, the weighted sum will be -1-1+2=0 and so the output will be 1. Finally, when 3 inputs are on the weighed sum will be -1 and the output wil again be 0 as required. 4) a) For the blocks world as defined in class, show how POP constructs the plan (that is, show all partial plans from the initial plan until a correct plan is reached) for initial state: D A C B ------------ and the goal state containing: On(A,B) and On(B,D) b) Suppose that is not known whether the above is the initial state, and the initial state can either the above initial state, or the following state, and these are the only possibilities. D B A C ------------ and that there is an operator "Check(clear(x)))" which results in knowing whether this is true or not. Show how a conditional plan is constructed for the above case. 5) Consider the following scenario, occuring in a certain middle-eastern country.* We will assume that you are a risk-neutral, i.e. that your utility function is linear in value unit discused below (morts). Assume also that moral considerations are not part of your utility function. A well-known band of criminals has stolen your mojo(**), is threatening to destroy it immediately (a damage of 1 mort), and will only return it to you if release their band member the great Humongous(***). Surrendering Humongous is not of immediate material damage, but may cause you a loss in the future, as Humongous is a notorious criminal. With probability 0.5 this will cause a loss of 1 mort next fortnight, and additionally with probability 0.4, independent of the above, it will cause a damage of 10 morts the one fortnight after the next. Your options are as follows: a) Do nothing, and then the criminals destroy your FOO. b) Release Humongous, and possibly encur the respective future losses. c) Request the police to try a raid to recover your mojo in location A. d) Request the police to try a raid to recover your mojo in location B. Since you do not know where your mojo is hidden, the results of a raid are: With probability 0.4 they will succeed at location A, and with probability 0.9 in location B, assuming your mojo is at that location. The probability of the mojo being at A is 0.6, and it is certain that your Mojo is either at A or at B. A failed raid causes immediate and permanent loss of you mojo. 5.1) What is your optimal course of action? Consider two reward accumulation functions: a) Simple sum of rewards. ANSWER: Policy a costs 1, Policy b costs 0.5*1+0.4*10 = 4.5 Policy c succeeds with probability 0.4*0.6 = 0.24 and in all other cases lose 1 mort, exected cost 1-0.24=0.76 Policy d succeeds with probability 0.9*0.4 = 0.36 so expected cost 1-0.36=0.64 So clearly policy d (raid at B) is the best. b) Discounted sum of rewards with a discounting factor of 0.5 (per fortnight). ANSWER: Same cost for all policies except for b which now has expected discounted cost: 0.5*(0.5+0.5*0.4*10) = 1.2 Not quite so bad, but still the worst one. (Not part of the answer:) Perhaps with a discount factor of 0.3, if we are really myopic then perhaps it will look better. Indeed: 0.3*(0.5+0.3*0.4*10) = 0.51 5.2) Would the answer be different, if in releasing Humongous, there is in addition a probability of 0.5 that the band of criminals will steal your mojo again after 2 fortnights (and then leave you with the same options?) ANSWER: No change, because in this policy b will be even worse! 5.3) In 5.1, suppose that an informer can provide information that will tell with certainty whether your mojo is at A or B, how much should you be willing to pay for such information (in both cases a and b)? ANSWER: Since raiding is optimal, this will help make raiding even better. If the answer is A (probability 0.6) we will change to raid A and succeed with probability 0.4, a gain of 0.4 in this case (because ordinarily we would raid at B and lose 1). If the answer is B (probability 0.4), we still raid at B and succeed with the same probability, so no gain in this case. Total expected value of information is thus 0.6*0.4 = 0.24 Note that the total cost of this policy is 0.64-0.24 = 0.4 if the information is free. (Also equal to 1 minus new overall probability of success, 1-(0.6*0.4+0.4*0.9) = 0.4 Again discounting makes no difference here. 6) Consider the following Syrian traveller problem instance. The (undirected) graph has 3 vertices, as follows: (I, V1, G) The edges are as follows: (I, V1), weight 10, no terrorists. (V1, G), weight 20, probability of terrorists 0.2 (I, G), weight 100, no terrorists Vertex I contains chems and an army, and G is a goal vertex. b1) Formalize the above problem as a (belief-state) MDP, i.e. what are the states, the transition probability function, and the reward function? ANSWER: We use the following state variables: location of agent A, location of chems C, location of army M, and belief state T of edge (V1,G). The domains of each of the first 3 variables are [I, V1, G], and the belief state of the edge is [true, false, U]. Actions are define by triples: [edge, chems, army] with edge being either (I, V1) or (V1, G) or (I,G), and chems, army are binary valued (true, false). We will define transitions only for actions that are legal (illegal ones can just return to the same state with probability 1, for example). The number of possible transitions is huge, so we will just provide an outline: The following transitions are with probability 1: [I, x, y, true] with action [(I,G), false, false] leads to [G, x, y, true] /* move to G */ [I, I, y, true] with action [(I,G), true, false] leads to [G, G, y, true] /* move to G with chems */ [I, y, I, true] with action [(I,G), false, true] leads to [G, x, G, true] /* move to G with army */ [I, I, I, true] with action [(I,G), true, true] leads to [G, G, G, true] /* move to G with army and chems */ (Note that variables like x or y mean "for any value of x, y, etc.") Same as above, with belief state of (V1,G) being false. Also, repeat all the above action of moving across edge (I,V1) and also moving across edge (V1,G), whenever the action is legal. The only transitions that are stochastic are for moves starting from I when the belief state of edge (V1,G) is U, for example: [I, x, y, U] with action [(I,G), false, false] leads to [G, x, y, true] with probability 0.2 leads to [G, x, y, false] with probability 0.8 Likewise for all moves to V1 or G when the state of the edge is unknown. The reward function R(S,A) will depend on the action, and be the negative of the weight of the edge mentioned in the action, multiplied by 2 for each argument of the action that has the value "true". This will be a terminal-state belief MDP, and all states of the form [x, G, y, z] are terminal (that is, any state with chems at G). b2) Find the optimal policy. You may assume that the robot starts at vertex I in order to save some computation. ANSWER: The assumption is that the intial state is [I, I, I, U] We will apply a mix between expectimax search and value iteration in order to solve the problem. Note that here all rewards are negative, so we want the solution to be as least negative as possible. Terminal states have a value of 0, and all other states can have an initial value of -infinity. Now, in the initial state can always traverse (I,G) with chems (cost 200) to get to a terminal state, so we can update U([I, I, I, U]) = -200 We can use this to prune the search: any policy which is known to have expected utility below -200 can be discarded, even if not fully defined or developed! We will continue processing top-down for now. From the initial state, moving with army along (I,G) already costs 200, so cannot be optimal as it does not reach a goal state. Moving along (I,G) without army or chems will nee to be checked, but later we will show that it is also suboptimal. (By which we mean NO POLICY which begins with this action is optimal.) There are 4 more possible actions: moving along (I,V1), with or without chems, with or without army. Checking the best of these require knowing the value of the states reached as a result, so we will try to compute these next. First, consider the state: [V1,V1,x,false]. From here can reach [G,G,x,false] for a cost of 40, by moving to G with chems, so we update: U([V1,V1,x,false]) = -40 Next, consider the state: [V1,V1,V1,true]. From there, we can reach the state [G,G,G,true] for a cost of 80, by moving to G with chems and army, so we update: U([V1,V1,V1,true]) = -80 So, from the initial state we can move to V1 with chems and army, cost 40, and land in [V1,V1,V1,true] with probability 0.2 and in [V1,V1,V1,false] with probability 0.8, so we can update U([I, I, I, U]) = -40 - 0.2*80 - 0.8 * 40 = -88 Now consider the value of the state [I, V1, I, true], i.e. chems moved, army and agent at I. We can get to [V1,V1,V1,true] with certainty for a cost of 20, so can update: U([I,V1,I,true]) = -20 - 80 = -100 So now at state [V1,V1,I,true] we can get to [I, V1, I, true] by agent moving to I for a cost of 10, and can update: U([V1,V1,I,true]) = -10 - 100 = -110 The action of moving with chems from initial state to V1 costs 20 and can land us either in [I, V1, I, true] (with probability 0.2) or in [I, V1, I, false] (with probability 0.8) so taking this action from th initial state has value: -20 - 0.2*110 - 0.8*40 = -74 So we can update U([I, I, I, U]) = -74 Finally, from the initial state we can move to V1 without chems or army, for a cost of 10, reaching [V1,I,I,true] with probability 0.2 and [V1,I,I,true] with probability 0.8 From [V1,I,I,true] returing to I and then need to travel with army and chems to [V1,V1,V1,true] costs 50, so update: U([V1,I,I,true]) = -50 -80 = -130 From [V1,I,I,false] returing to I and then need to travel with chems to [V1,V1,I,false] costs 30, so update: U([V1,I,I,true]) = -30 -40 = -70 So moving without chems or army from the initial state has value: -10 - 0.2*130 - 0.8*70 = -88 which is worse than moving with chems. All other policies can be shown to be worse, so the best policy from the initial state is to move to V1 with chems. If (V1,G) has no terrorists, continue to G with chems, total cost 60. Otherwise, return to I and bring the army to V1, and then proceed with army and chems to G - total cost 20+10+20+80 = 130. Expected cost of this policy, as indicated earlier by the value of the initial state, is 74. b3)In addition, from node I only, and before it starts moving, the agent can perform a SENSING action for a cost of 10, which results in the agent knowing the state of the edge (V1, G), i.e. whether it contains terrorists. Re-compute the optimal policy for this case. ANSWER: Doing the sensing action lands us in [I, I, I, true] with probability 0.2, and in [I, I, I, false] with probability 0.8 U([I, I, I, false]) = -60, and U([I, I, I, true]) = -120 So the policy starting with sensing costs: -10 - 0.2*120 -0.8*60 = -82 which is worse. Alternately, compute value of information. With probability 0.8 will know that there are no terrorists, so will start by traveling with chems as before (zero gain). With probability 0.2 will know that there are terrorists, thus also travel with army and pay a total of 120 (a gain of 10 compared to the 130 in this case). So total value of information here is 0+0.2*10 = 2 meaning it is not a good idea to pay 10 for it! (Note how this policy overall costs (sensing cost - VOI)= 10-2 = 8 more than not doing the sensing) 7) A principal researcher in the Israeli Smart Grid (ISG) consortium would like to construct a decision procedure based on results from hiring employees in the DARPA Robotics Challenge project. The decision procedure will be based on a decision tree, using information in the case database. Attributes for each employee are: degree level (A), C programming ability (B) Self motivation (C), number of papers published in AI workshops (D). The target attribute is the "decision", whether to hire or not. a) Construct by hand (using the greedy search using the best information gain heuristic) a consistent decision tree for the following table: input variables decision A B C D -------------------------------------------- 2 F L 0 N 2 F M 1 Y 2 T L 0 N 3 F H 0 Y 3 T M 2 Y 3 T M 0 N 3 T H 1 Y ANSWER: Try each of the attributes at the top level. With A: branch 2: distribution 1Y, 2N branch 3: distribution 3Y, 1N With B: branch F: distribution 2Y, 1N branch T: distribution 2Y, 2N With C: branch L: distribution 0Y, 2N branch M: distribution 2Y, 1N branch H: distribution 2Y, 0N Note that the amount of information needed to classify A:2 is the same as for B:F and the same as for C:M But other branches for A and B need more information, whereas other branches for C need none, so C is the the best of these. So now to consider D: branch 0: 1Y, 3N branch 1: 2Y, 0N branch 2: 1Y, 0N So branches D:0,2 need no further information, and need to compare information needed for C:M versus D:0 i.e. compare 3*H(1/3, 2/3) to 4*H(1/4, 3/4), so here we actually need to compute the information, we get for C: approx 2.755 bits for D: approx 3.3 bits So C is better, and we choose C. A recursive call occurs on branch C:M, and we now consider attributes A,B,D. Neither A nor B allow full classification, but with D we get: C:M D branch 0: N branch 1: Y branch 2: Y D is preferred, completing the tree, which is: C: branch L: N branch M: check D branch 0: N branch 1: Y branch 2: Y branch H: Y b) Is a more compact (fewer internal nodes) decision tree possible for the above table? If no - prove it. If yes, which tree? ANSWER: This tree has 2 internal nodes. Since above we tried all possible trees with 1 internal node and failed, this tree has the minimum number of internal nodes (among trees consistent with the table).
Deadline: Wednesday, January 15, 2014, at 12 noon.
Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)
* Disclaimer: any seeming identification with existing countries, institutions, political parties, or people, is unintentional.
** Mojo: copyright some film studio we do not recall.
*** The Great Humongous: copyright some other film studio we do not recall.