1) Write down the axioms and facts for example below in FOPC: The taxonomic hierarchy for furniture, contains chairs, which which in turn contains stuffed chairs and plastic chairs (which are disjoint). All chairs have 4 legs. Stuffed chairs have padding. All chairs with padding are comfortable, unless they are broken. The item of furniture in my office, which I call "the contraption" is a stuffed chair. 1. (forall x) Chair(x) => Furniture(x) 2. (forall x) Stuffed_chair(x) => Chair(x) 3. (forall x) Plastic_chair(x) => Chair(x) 4. (forall x) ~Plastic_chair(x) v ~Stuffed_chair(x) 5. (forall x) Chair(x) => (exists g) Set(g) ^ Cardinality(g) = 4 ^ (forall l) In(l, g) => (Leg(l) ^ Has_part(x, l)) 6. Greater(4, 1) 6.1 (forall x, y, z) Greater(x, z) ^ (y=x) => Greater(y, z) ; Ad-hoc axiom for equality 7. (forall x) Stuffed_chair(x) => Has_padding(x) 8. (forall x) Chair(x) ^ Has_padding(x) ^ ~Broken(x) => Comfortable(x) 9. Stuffed_chair(the contraption) a) Convert the knowledge base into normal form Let us convert each item separately (so as not to have too many variable names). Items with ? are variables. 1. ~Chair(?x) v Furniture(?x) 2. ~Stuffed_chair(?x) v Chair(?x) 3. ~Plastic_chair(?x) v Chair(?x) 4. ~Plastic_chair(?x) v ~Stuffed_chair(?x) 5.1. ~Chair(?x) v Set(legs-of(?x)) 5.2. ~Chair(?x) v Cardinality(legs-of(?x))=4 5.3. ~Chair(?x) v ~In(?l, leg-of(?x)) v Leg(?l) 5.4. ~Chair(?x) v ~In(?l, leg-of(?x)) v Has_part(?x, ?l) where (leg_of(?var)) is a skolem function. 6. Greater(4, 1) 6.1 ~Greater(?x, ?z) v ~(?y=?x) v Greater(?y, ?z) 7. ~Stuffed_chair(?x) v Has_padding(?x) 8. ~Chair(?x) v ~Has_padding(?x) v Broken(?x) v Comfortable(?x) 9. Stuffed_chair(the contraption) b) Show how this knowledge-base should be indexed - using a table. Not very interesting, so this is skipped... c) Prove or disprove by using resolution (or argue that neither can be done): 1) The contraption is a chair. Can be proved. Add: 10. ~Chair(the contraption) and now: resolve 10 with 2, ?x=the contraption to get: 11. ~Stuffed_chair(the contraption) Now resolve 11 with 9 to get an empty clause (contradiction) 2) The contraption is comfortable. Cannot be proved or disproved, since the only clause which mentions "Comfortable" also has "Broken" and there is no other clasue which mentions "~Broken". 3) The contraption has padding. Can be proved. Add: 10. ~Has_padding(the contraption) Resolve with 7. ?x=the contraption, to get: 11. ~Stuffed_chair(the contraption) Resolve 11 with 9 to get the empty clause. 4) If the contraption is a plastic chair, then it is a dog. Can be proved. We need to prove: Plastic_chair(the contraption) => Dog(the contraption) and its negation is: Plastic_chair(the contraption) ^ ~Dog(the contraption) which are added to the KB: 10. Plastic_chair(the contraption) 11. ~Dog(the contraption) (actually, 11 is irrelevant and will never be used) Now, resolve 10 with 4, ?x=the contraption to get: 12. ~Stuffed_chair(the contraption) Resolve 12 again with 9 to get the empty clause. 5) The contraption has more than 1 leg. Can be proved. We need to prove: (exists g) Greater(Cardinality(g),1) ^ Set(g) ^ (forall l) In(l, g) => Leg(l) ^ Has_part(the contraption, l) Its negation is: (forall g) ~Greater(Cardinality(g),1) v ~Set(g) v (exists l) In(l, g) ^ (~Leg(l) v ~Has_part(the contraption, l) Converting to CNF, and adding to the KB, we get: 10. ~Greater(Cardinality(?g),1) v ~Set(?g) v ~Leg(foo-of(?g) v ~Has_part(the contraption, foo-of(?g)) 11. ~Greater(Cardinality(?g),1) v ~Set(?g) v In(foo-of(?g), ?g) Resolve 9 with 2, ?x=the contraption, to get: 12. Chair(the contraption) Resolve 12 with 5.1, 5.2., 5.3, and 5.4 each with ?x=the contraption to get: 13. Set(legs-of(the contraption)) 14. Cardinality(legs-of(the contraption)) = 4 15. ~In(?l, leg-of(the contraption)) v Leg(?l) 16. ~In(?l,leg-of(the contraption))v Has_part(the contraption,?l) Resolve 14 with 6.1, where ?y=Cardinality(legs-of(the contraption)) and ?x=4 to get: 17. Greater(Cardinality(legs-of(the contraption)), ?z) v ~Greater(4, ?z) Resolve 17 with 6, ?z=1, to get: 18. Greater(Cardinality(legs-of(the contraption)), 1) Resolve 18 with 10, and 11 resp. with ?g=legs-of(the contraption) to get: 19. ~Set(legs-of(the contraption)) v ~Leg(foo-of(legs-of(the contraption)) v ~Has_part(the contraption, foo-of(legs-of(the contraption)) 20. ~Set(legs-of(the contraption)) v v In(foo-of(legs-of(the contraption)), legs-of(the contraption)) Resolve 13 with 19 and 20 resp. to get: 21. ~Leg(foo-of(legs-of(the contraption)) v ~Has_part(the contraption, foo-of(legs-of(the contraption)) 22. In(foo-of(legs-of(the contraption)),legs-of(the contraption)) Resolve 15 with 22, ?l=foo-of(legs-of(the contraption), to get: 23. Leg(foo-of(legs-of(the contraption))) Resolve 16 with 22, ?l=foo-of(legs-of(the contraption), to get: 24. Has_part(the contraption, foo-of(legs-of(the contraption)) Resolve 23 with 21, to get: 25. ~Has_part(the contraption, foo-of(legs-of(the contraption)) Finally, resolve 24 with 25 to get the empty clause. SHOW ALL STEPS, including the relevant unifiers, etc. d) Would using INPUT RESOLUTION only change the answers to c? No changes, except it seems that the last proof would not work with just input resolution. e) Can you prove all the above using only either forward or backward chaining? First, we can represent only the Horn clauses. Everything here is in Horn form, except for axiom 8, so all but axiom 8 can be used. Since none of the proofs used axiom 8, and all the proofs had only Horn clauses, it appears that these proofs could have been done with EITHER forward OR backward chaining. 2) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none E F B A C A D none F B C D a) Is this network a poly-tree? No, its underlying underected graph has a cycle: A-B-F-C-A b) Determine the truth of the following independence statements (using d-separation): 1) I({A}, {D} | {}) Yes, all paths go through F and are blocked at F because F is a converging node with no evidence at F or below 2) I({A}, {D} | {F}) No, because A-C-F-D is unblocked (at F due to evidence at F, and at C because it is a non-converging node with no evidence 3) I({A}, {D} | {F, B, C}) Yes, becase all paths must pass at either B or C in a non-converging manner, and are blocked at B or C as they are evidence 4) I({A}, {E} | {F}) Yes, F blocks all paths from A to E since F is a non-converging evidence node on all these paths. 5) I({A}, {C, F} | {B}) No, path A-C can never be blocked. 6) I({A}, {F} | {B, C}) Yes, either B or C block every path. c) Determine at least 2 different ways to convert the network into a tree using clustering. 1) Make BC into a single node, or 2( Make BCF into a single node, or 3) Make ABC into a single node or ... d) Suppose that P(A) = 0.9, P(D) = 0.3, P(B|A) = 0.1, P(B|~A) = 0.8, P(C|A) = 1, P(C|~A) = 0.5. Find P(A| B, C, E) A is independent of E given BC, so we can just compute P(A |B, C). We have P(A|B,C) = P(ABC)/P(BC) = P(A)P(C|A)P(B|AC)/(P(ABC)+P(~ABC)) = P(A)P(C|A)P(B|A)/( P(A)P(C|A)P(B|A) + P(~A)P(C|~A)P(B|~A)) = 0.9*1*0.1/(0.9*1*0.1 + 0.1*0.5*0.8) = 0.09/(0.09 + 0.04) ~ 0.7 3) You need to construct a 4-input neural network that has value 1 in the output just when exactly 2 or 3 of its inputs are 1, and 0 otherwise (assume inputs are in {0, 1}). a) Can this be done without hidden units? No, this decision function is not linearly separable. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) With one hidden unit, this should work. We also have 1 output unit. The hidden unit receives all inputs, with a weight of 1 and a threshold of 3.5. The output units also receives all inputs, again with a weight of 1. It also recieves the output of the hidden unit, with a weight of -10. Its threshold is 1.5. Clearly is is on when 2 or more 2 inputs are 1, as long as the hidden unit is off - and the hidden unit disables the output when more than 3 inputs are 1. 4) a) Using the perceptron learning rule (the "delta" rule), show the steps of learning 3-input function OR with a single perceptron and a step activation function with threshold 1. Initially all weights are 1, and the learning rate is 0.4. In this case, if a value EXACTLY at the threshold is considered "more" than the threshold, the algorithm will make one pass over all 8 examples and make no changes (no error), and stop. If we need a value strictly greater than the threshold, the examples: 001, 010, and 111 will generate an error, and in each case the respective weight will be increased to 1.4. After a second pass over the examples, no errors remain and the algorithm stops. b) Is it possible to combine back-propagation with genetic algorithms, and if so, how? Yes. Use GA to learn the weights in a neural network, but for each individual weight vector use back-propagation as a local improvement operator (to find a local minimum) before computing the fitness. 5) a) Construct by hand (using the greedy search using the best information gain heuristic) a decision tree for the following table, a set of examples for considering desirability of a potential employee in a high-tech company. Data is degree level (A), existence of work experience (B), self-motivation (C), and last grade average (D). input variables decision A B C D -------------------------------------------- 1 T F 85 good 1 T F 80 good 1 T F 70 good 1 T T 70 good 1 F F 90 good 1 F T 80 bad 1 F T 85 bad 2 T F 70 average 2 T F 80 average In the initial situation, remaining information requited is: A) If we choose A - case 2 needs 0 information, case 1 has probability of 2/7 and 5/7 over 7 cases. B) For B we have in one branch 3 cases with probs 1/3 and 2/3, in the other 2/6 and 4/6 C) For C we have the same remaining information as for B D) For D we have for the 70 branch: 1/3 and 2/3, for the 80 branch the same, for the 85 branch 1/2 and 1/2, and for the 90 branch we have no remaining info. Doing the exact numbers, we get that A leaves least remaining information and is (locally) optimal. Now we remain with the branch 1 of A. Applying B the remaining required information is 1/3 and 2/3 for the false branch, and 0 for the T branch. For C we have 0 remainder for the F branch, and 1/2, 2/3 for the T branch. For D we have 0 remainder for the 70 and 90 branch, but 1/2, 1/2 for the 85 branch and the same for the 80 branch, a total of 2 bits which is a bit less than for B or C, so we choose attribute D. Now, the 85 branch either B or C have the same information gain, and for the 80 branch this is the same, so at either branch we can choose indifferently. So, the tree is: A ----- 2 ---- average \------1 ---- D ----- 70 --- good \----- 90 --- good \---- 80 --- B ---- F ---- bad \---- T ---- good \-- 85 --- B ---- F ---- bad \---- T ---- good This tree has 7 leaves and 4 internal nodes. b) After constructing the tree, one can try to prune nodes from the bottom up, based on least information gain. Which node would be pruned first in this case? The only candidates are nodes whose only children are leaves, so only 2 candidates. Each candidate has an information gain of 2 bits, so is equally good for pruning. c) Is a more compact decision tree possible? Which tree? In general, the above method is not optimal. In this case, the following tree is a correct classifier: B-- F -- C -- F ---- good \-- T ---- bad T -- C -- T ---- good \-- F --- A --- 1 --- good \--- 2 --- average This tree has 4 internal nodes, but only 5 leaves, and is thus better. (I believe it is optimal, but not sure).