Justify all answers! 1) We have the following distribution over the binary variables A, B, C, given as a full table: A B C probability ------------------------------- F F F 0.81 F F T 0 F T F 0 F T T 0.09 T F F 0 T F T 0.09 T T F 0.01 T T T 0 a) Construct the Bayes network that most compactly describes the distribution (i.e. one that has the smallest number of arcs). b) Is the answer unique? 2) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none B A C B E D A E B D F D a) Is this network a poly-tree? b) Determine the truth of the following independence statements (using d-separation): 1) I({B}, {D} | {}) 2) I({B}, {D} | {A}) 3) I({B}, {D} | {A, C}) 4) I({B}, {F} | {A}) 5) I({B}, {F} | {A, C}) c) Assume that the nodes are binary-valued. Suppose that P(A=true) = 0.4, P(D=true|A=true) = 0.5, P(D=true|A=false) = 0.1, P(E=true|D=true) = 0.5, P(E=true|D=false) = 0.2 Find P(A=true | D=true, F=true) Hint: you may want to preprocess to get rid of nodes before doing any computation. 3) You need to construct a 5-input neural network that has value 1 in the output just when the number of its inputs that are on is either 0 or 3. (assume inputs are in {0, 1}). a) Can this be done without hidden units? b) Show a network using threshold elements that performs the required computation (specify the weights, too!) 4) a) Using the perceptron learning rule (the "delta" rule), show the steps of learning 2-input NAND function with a single perceptron and a step activation function with initial threshold 1 (note that the threshold can also be learned as a weight). Initially all other weights are minus 0.9, and the learning rate is 0.5. Note: in this case use the constant 1 instead of the derivative of the g() function. b) Show (by example) that the order of presenting the examples can change the number of training episodes. 5) Consider the following taxation scenario, occuring in a certain middle-eastern country.* We will assume that you are a risk-neutral, i.e. that your utility function is linear in the amount of money that your assets are worth. Assume also that moral considerations are not part of your utility function. Your accountant has notified you that your taxes for FY 2006-2007 amount to 1,000,000 NIS. You have the following options: 1) Declare the correct taxes verbatim, and pay the full amount. 2) Claim that you have used much of the money to fund scientific experiments on monkeys, in which case you will only need to pay 200,000 NIS. However, a senior tax collection evaluator will then assess your forms, and some of them have militant friends in the ASPCM* which will trash your house, costing you an additional 1,500,000. There are 3 possible evaluators, equally likely to be on your case: Evaluator A has no friends in the ASPCM. Evaluator B has a militant friend in the ASPCM, but likes the colour green, and if he sees a sufficient amount of it (say $30,000 = 100,000 NIS), with probability 0.5, will forget that monkeys have anything to do with animals (and otherwise will take the money and still tell his friend). Evaluators C will always report to his militant ASPCM friends. a) Which of the above taxation options should you prefer as a rational agent? b) Suppose that a "friend" of the accountant working at the treasury department can give you a "hint" about which evaluator will be on your case, if you grease his palm. How much is this information worth, given that your "friend" is perfectly reliable in this respect? Draw the decision diagram for each case, and solve it. 6) Consider the following belief-state MDP, where scoring is total value of rewards obtained in 3 steps. The world has 4 locations in linear sequence, as follows: F1 I F2 P Actions are: R attempts to move right, and L attempts to move left, and S senses location F2. Location F1 contains a flag, which has a value of 1. Location F2 contains a flag, with value either 10 or 100, with probability 0.5 each. A robot visiting a location with a flag gets a reward equal to the flag value, and then the flag disappears. P is an absorbing state, with a reward of -100. Transition probabilities are based on: motion succeeds with probability 0.8, but with probability 0.2 the robot either stays in the same location, or moves in the opposite direction (if possible), with probability 0.1 each (0.2 to stay in place if at end of sequence). Sensing, if performed, tells the robot the actual value of the flag at F2. b1) Formalize the above problem as a (belief-state) MDP, i.e. what are the states, the transition probability function, and the reward function? b2) Find the optimal policy. You may assume that the robot starts at location I in order to save some computation.
Deadline: Wednesday, April 9, 2008, at 12 noon.
Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)
* Disclaimer: any seeming identification with existing countries, institutions, or people, is unintentional.