1) Use situation calculus (i.e. the result function, and fluents) to write down the axioms for the money-and-banana example. The monkey needs to get the banana, which can be done by climbing a box after moving it into position below the bananas, which are too high to reach otherwise. In the initial state, the box is in the room, but not under-bananas, the monkey does not have the bananas. Actions are: a) grab - gets the bananas if the monkey is on the box and the box is under the bananas b) climb - puts the monkey on the box if the monkey is near the box c) move(item, position) - moves item into position if the item is a box and the monkey is not on it. Fluents (time-dependent predicates) are: box_at(location) (location can be under-bananas or elsewhere) on-box (monkey is on box) has-bananas (monkey has the bananas) Now, you need to prove that, for the initial situation S where the monkey is not on the box and does not have tha bananas, the action sequence (move, climb, grab) works, i.e. that: (note - form in exercise - using the "holds" predicate is an alternate not used in the book, so providing answer in terms used in book). The book adds one argument to each fluent denoting the situation, and we will do likewise. has-bananas(result(grab, result(climb, result(move(box, under-bananas), S)))) Can you do it with your original formulation (show the proof!). If not, add the requisite axioms and complete the proof. Answer: Grab action axiom: forall (S) (on-box(S) and box_at(under-bananas, S)) => has-bananas(result(grab, S)) Climb action axiom: forall (S) (not on-box(S)) => on-box(result(climb, S)) Note - we did not have a predicate for monkey location - unless on box! Let us assume that if the monkey is not on the box it is near the box - by no means an obvious (or correct!) assumption. Move action axiom: forall (S, I, P) ((not on-box(S)) and box(I)) => box_at(P, result(move(I, P), S)) Note that the representation commitment is not so good, but this is what we had... what the above means is if we move anything that is a box to P then it will be at P... allows for only one box, and also does not allow for moving anything but a box... Initial state: 1. not on-box(S0) 2. not has-bananas(S0) 3. not box-at(under-bananas, S0) box-at(in-room, S0) ; (Irrelevant fact) 0. box(box) To prove theorem with resolution we need to first convert it to normal form. However, before we do, note that we will have a problem. We have not axiomatized: a) That the box will stay under the bananas in a situation resulting from the "climb" action! (also for "grab", but we do not need this for the thorem! b) That the monkey will not be on the box as a result of applying move (note that if the monkey "magically" appears on the box as a result of move, the climb action axiom will not work...) We need to add these 2 FRAME AXIOMS: forall (S) box_at(under-bananas, S) => box_at(under-bananas, result(climb, S)) forall (S, I, P) (not on-box(S)) => not on-box(result(move(I, P), S)) Now, add the CNF versions of the above axioms (variables have ? in front): 4. (not on-box(?S)) or (not box_at(under-bananas,?S) or has-bananas(result(grab,?S)) 5. on_box(?S1) or on-box(result(climb,?S1)) 6. on-box(?S2) or (not box(?I)) or box_at(?P, result(move(?I, ?P), ?S2)) Now, convert the frame axioms: 7. (not box_at(under-bananas,?S3)) or box_at(under-bananas, result(climb,?S3)) 8. on-box(?S4) or (not on-box(result(move(?I1, ?P2), ?S4))) Finally (optionally) add negation of theorem to prove: 9. not has-bananas(result(grab, result(climb, result(move(box, under-bananas), S)))) Proof is as follows: Resolve 0 with 6, ?I=box, to get: 10: on-box(?S2) or box_at(?P, result(move(box, ?P), ?S2)) Resolve 1 with 10, ?S2=S0 to get: 11: box_at(?P, result(move(box, ?P), S0)) Resolve 8 with 1, ?I1=box, ?P=?P2, ?S4=S0, to get: 12: not on-box(result(move(box, ?P2), S0))) Resolve 5 with 12, ?S1=result(move(box, ?P2), S0) to get: 13: on-box(result(climb,result(move(box, ?P2), S0))) Resolve 4 with 13, ?S=result(climb,result(move(box, ?P2), S0)), get: 14: (not box_at(under-bananas,result(climb,result(move(box, ?P2), S0)))) or has-bananas(result(grab,result(climb,result(move(box, ?P2), S0)))) Resolve 7 with 11, ?S3=result(move(box, ?P), S0), ?P=under-bananas, to get: 15: box_at(under-bananas, result(climb,result(move(box, under-bananas), S0))) Resolve 14 with 15, ?P2=under-bananas, to get: 16: has-bananas(result(grab,result(climb,result(move(box, under-bananas),S0)))) This constitutes a proof. Now for a proof by contradiction, assume 9 was added, and now resolve 16 with 9 to get a contradiction. Note how the frame axioms 7 and 8 were needed in the proof. 2) Write down the axioms and facts for example below in FOPC: The taxonomic hierarchy for animals, contains dogs, which which in turn contains German shepherds and chihuahuas (which are disjoint). All dogs are carnivorous. German shepherds are large. All large dogs can be guard dogs, unless they are old. Fido is an old German shepherd. Answers: we will introduce "class" predicates with one argument, denoting the obvious. Taxonomic hierarchy axioms: forall (X) dog(X) => animal(X) forall (X) german-shepherd(X) => dog(X) forall (X) chihuahua(X) => dog(X) forall (X) (not german-shepherd(X)) or (not chihuahua(X)) ; disjointness Property axioms: forall (X) dog(X) => carnivorous(X) forall (X) german-shepherd(X) => large(X) forall (X) (large(X) and dog(X) and (not old(X))) => can-be-guard-dog(X) Specific facts about Fido: a. old(Fido) b. german-shepherd(Fido) a) Convert the knowledge base into normal form Conversion, axiom by axiom: 1. (not dog(?X)) or animal(?X) 2. (not german-shepherd(?X1)) or dog(?X1) 3. (not chihuahua(?X2)) or dog(?X2) 4. (not german-shepherd(?X3)) or (not chihuahua(?X3)) 5. (not dog(?X4)) or carnivorous(?X4) 6. (not german-shepherd(?X5)) or large(?X6) 7.(not large(?X6)) or (not dog(?X6)) or old(?X6) or can-be-guard-dog(?X6) b) Show how this knowledge-base should be indexed - using a table. (different from book - we used CNF, they used INF, but either is OK). We show POINTERS (e.g 2.1 means sentence 2, literal 1) Table is: Name | positives | negatives | argument --------------------------------------------------------------- Fido a.1, b.1 animal 1.2 large 6.2 7.1 dog 2.2, 3.2 1.1, 5.1, 7.2 chihuahua 3.1, 4.1 german-shepherd b.1 2.1, 4.1, 6.1 can-be-guard-dog 7.4 old a.1 7.3 VARIABLE 1.1, 1.2, ... more... c) Prove or disprove by using resolution (or argue that neither can be done): 1) Fido is an animal. Resolve b with 1, ?X1=Fido to get: 8: dog(Fido), then 8 with 1, ?X=Fido to get animal(Fido) 2) Fido is large. Resolve b with 6, ?X5=Fido to get 9: large(Fido) 3) Fido can be a guard dog. Cannot be done. because resolving things with 7 can get us at most: (not old(Fido)) or can-be-guard-dog(Fido) and noting more specific is available. Since we do not have (not old(Fido)) (in fact we have the opposite) we cannot prove or disprove the desired claim. 4) If Fido is a chihuahua, then Fido is young. Can be proved as follows: Add negation of claim: 100: chihuahua(Fido) 101: (not young(Fido)) Resolve 100 with 4, ?X3=Fido to get: 102: not german-shepherd(Fido) Resolve 102 with b to get the empty clause (contradiction) 5) Fido is carnivorous. Resolve 8 with 5, ?X4=Fido to get: 10: carnivorous(Fido) d) Would using INPUT RESOLUTION only change the answers to c? No, because at every step above we had at least 1 clause from the KB or the query, so this was input resolution. e) Can you prove all the above using only either forward or backward chaining? This depends on how rules are encoded. If the direction of the => is as in the original, we can get all the above results, except 4 for which we need to have german-shepherd(?X) => not chihuahua(?X) If the direction is not consistent, we may not be able to prove some of the claims above. Obviously, for claim 3 we still cannot prove anything! 3) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none E none B A C A E D A F B C a) Is this network a poly-tree? No. The underlying undirected graph has cycles, e.g. A-B-F-C-A b) Determine the truth of the following independence statements (using d-separation): 1) I({A}, {E} | {}) true - only path is through C - blocked as C is converging and null evidence. 2) I({A}, {E} | {B}) true - same reasons, and B is not a descendent of C 3) I({A}, {E} | {C}) false - path through C becomes unclocked by evidence 4) I({A}, {E} | {F}) false - same path is unblocked as F is an evidence node and child of C 5) I({A}, {C, F} | {B}) false - because C is a child of A (path with 1 arc is always unblocked) 6) I({A}, {F} | {B, C}) true - 2 paths are "passthrough" an evidence node. c) Determine at least 2 different ways to convert the network into a poly-tree using cutset conditioning. {A}, all supersets of {A}. Also {B} and {C} are cutsets. d) Suppose that P(A) = 0.9, P(E) = 0.3, P(B|A) = 0.1, P(B|~A) = 0.8. Find P(A|E) We have I({A}, {E} | {}), so P(A|E) = P(A) = 0.9 5) You need to construct a 3-input neural network that has value 1 in the output just when exactly 1 of its inputs is 1, and 0 otherwise (assume inputs are in {0, 1}). a) Can this be done without hidden units? Cannot be done. That is because the plane in 3-space incident on the points (0,0,1), (0,1,0), (1,0,0) which are the cases where the output has to be 1, has cases requiring an answer of 0 both above and below this plane, thus at least 2 separating planes are needed. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) In order to generate the additional plain, we need at least 1 hidden unit, as in the XOR problem. In summary, we need 2 units. Unit 1 takes all inputs, with weight 1 and threshold 1.5, and generates a "hidden" value that is 1 when MORE than 1 input is 1. The output takes all inputs (each with a weight of 1) and has a threshold of 0.5 - which will result in an output of 1 af AT LEAST one of the inputs is 1. The output unit also takes the hidden result as an input, with a large negative weight (e.g. 5), which results in outputing 1 when at least 1 of the circuit inputs is 1, but NOT MORE than 1, as desired. 6) a) Using the perceptron learning rule (the "delta" rule), show the steps of learning 3-input function AND with a single perceptron and a step activation function with threshold 1. Initially all weights are 1, and the learning rate is 0.4. b) Is it possible to combine back-propagation with genetic algorithms, and if so, how? Yes, use back-propagation as a "local improvement operator" in each iteration before selection. Of course, this can only work if we have a gradient measure, as well as a fitness measure for the networks. 7) a) Construct by hand a decision tree for the following table, a set of examples for sorting client loan risk. where the first decision is based on bad credit (A), then if necessary pending number of loan requests - B), granted loan by this bank in past (C), and monthly income (in tens of thousands of dollars a year) (D). input variables decision A B C D -------------------------------------------- F 1 F 5 small F 1 F 4 small F 1 F 3 small F 1 T 3 small F 2 F 3 medium F 2 F 4 medium T 1 F 8 small T 1 T 4 large T 1 T 5 large Note that we do not need to apply the algorithm, as the order of choices was already (not necessarily correctly) decided for us. Resulting tree is: A=T: 3 examples, cannot decide yet. Then, branching on B: B = 1: cannot decide (still 3 instances), so branch on C: C = F: decide "small" C = T: decide "large" B = 2: decide "large" (default - there are no examples!) A=F: 6 examples, cannot decide, B = 1: 4 examples, decide "small" B = 2: 2 examples, decide "medium" All in all, we have 5 leaf nodes, 4 non-leaf nodes. b) Which node will be pruned first, based on least information gain? As we can only prune nodes that are immediate parents on leaf nodes, the only candidates are nodes denoted by the paths: (A=F) and (A=T, B=1). In either of these nodes, the amount of information gain PER EXAMPLE is the same, i.e. - (1/3 log (1/3) + 2/3 log (2/3)), but the former is more frequent, so the average amount of information gain for a random pattern is higher. So, if forced to prune, we need to prune the branching at (A=T, B=1), and in this case just answer by majority and decide "small". c) Is a more compact decision tree possible (possibly using a different ordering of the variables)? Which tree? The optimal tree has only 3 internal nodes and 4 leaf nodes, branching on C initially: C = T: (3 examples) A = F: (1 example) decide "small" A = T: (2 examples) decide "large" C = F: (6 examples) B = 1: (4 examples) decide "small" B = 2: (2 examples) decide "medium"