Justify all answers!
1) We have the following distribution over the binary variables A, B, C, given as a full table:
A B C probability
-------------------------------
F F F 0.81
F F T 0
F T F 0
F T T 0.09
T F F 0
T F T 0.09
T T F 0.01
T T T 0
a) Construct the Bayes network that most compactly describes the distribution (i.e. one that has
the smallest number of arcs).
b) Is the answer unique?
2) Consider the following variables and their immediate causes (parents):
variable parents
-------------------------------
A none
B A
C B E
D A
E B D
F D
a) Is this network a poly-tree?
b) Determine the truth of the following independence
statements (using d-separation):
1) I({B}, {D} | {})
2) I({B}, {D} | {A})
3) I({B}, {D} | {A, C})
4) I({B}, {F} | {A})
5) I({B}, {F} | {A, C})
c) Assume that the nodes are binary-valued.
Suppose that P(A=true) = 0.4, P(D=true|A=true) = 0.5, P(D=true|A=false) = 0.1,
P(E=true|D=true) = 0.5, P(E=true|D=false) = 0.2
Find P(A=true | D=true, F=true)
Hint: you may want to preprocess to get rid of nodes before doing any computation.
3) You need to construct a 5-input neural network that has value 1 in the
output just when the number of its inputs that are on is either 0
or 3. (assume inputs are in {0, 1}).
a) Can this be done without hidden units?
b) Show a network using threshold elements that performs
the required computation (specify the weights, too!)
4) a) Using the perceptron learning rule (the "delta" rule), show the steps
of learning 2-input NAND function with a single perceptron
and a step activation function with initial threshold 1 (note
that the threshold can also be learned as a weight). Initially all
other weights are minus 0.9, and the learning rate is 0.5.
Note: in this case use the constant 1 instead of the derivative
of the g() function.
b) Show (by example) that the order of presenting the examples can
change the number of training episodes.
5) Consider the following taxation scenario, occuring in a certain middle-eastern country.*
We will assume that you are a risk-neutral, i.e. that
your utility function is linear in the amount of money that your assets are worth.
Assume also that moral considerations are not part of your utility function.
Your accountant has notified you that your taxes for FY 2006-2007 amount to 1,000,000 NIS.
You have the following options:
1) Declare the correct taxes verbatim, and pay the full amount.
2) Claim that you have used much of the money to fund
scientific experiments on monkeys, in which case you will only need to
pay 200,000 NIS. However, a senior tax collection evaluator will then assess
your forms, and some of them have militant friends in the ASPCM* which
will trash your house, costing you an additional 1,500,000.
There are 3 possible evaluators, equally likely to be on your case:
Evaluator A has no friends in the ASPCM.
Evaluator B has a militant friend in the ASPCM, but likes the
colour green, and if he sees a sufficient amount of it (say
$30,000 = 100,000 NIS), with probability 0.5, will
forget that monkeys have anything to do with animals (and
otherwise will take the money and still tell his friend).
Evaluators C will always report to his militant ASPCM friends.
a) Which of the above taxation options should you prefer as a rational agent?
b) Suppose that a "friend" of the accountant working at the treasury department can give you a "hint"
about which evaluator will be on your case, if you grease his palm.
How much is this information worth, given that your "friend"
is perfectly reliable in this respect?
Draw the decision diagram for each case, and solve it.
6) Consider the following belief-state MDP, where scoring is total value of rewards obtained in 3 steps.
The world has 4 locations in linear sequence, as follows: F1 I F2 P
Actions are: R attempts to move right, and L attempts to move left,
and S senses location F2. Location F1 contains a flag, which has a
value of 1. Location F2 contains a flag, with value either 10 or
100, with probability 0.5 each.
A robot visiting a location with a flag gets a reward equal to the
flag value, and then the flag disappears. P is an absorbing state,
with a reward of -100.
Transition probabilities are based on: motion succeeds with probability 0.8, but with probability
0.2 the robot either stays in the same location, or moves in the opposite direction (if possible),
with probability 0.1 each (0.2 to stay in place if at end of sequence).
Sensing, if performed, tells the robot the actual value of the flag
at F2.
b1) Formalize the above problem as a (belief-state) MDP, i.e. what are the states, the transition probability
function, and the reward function?
b2) Find the optimal policy. You may assume that the robot starts
at location I in order to save some computation.
Deadline: Wednesday, April 9, 2008, at 12 noon.
Note: submission (as well as WORK) on this exercise is SOLO (that is, NOT in pairs or other groups of size > 1)
* Disclaimer: any seeming identification with existing countries, institutions, or people, is unintentional.