Justify all answers! 1) Write down the axioms and facts for example below in FOPC: Predicates: OS(x) ; x is an operating system Windows(x) ; x is a windows type object Linux(x) ; x is a Linux type object XP(x) ; x is windows XP NT(x) ; x is windows NT Security(x, y) ; object x has security y OnmyPC(x) ; x is on my computer Hates(x, y) ; x hates y Likes(x) ; x likes y UserFriendly(x) ; x is user friendly Constants: Good, Lousy, I The taxonomic hierarchy for operating systems contains WINDOWS and LINUX type systems, which are disjoint and exhaustive. 1. forall x Linux(x) -> OS(x) 2. forall x Windows(x) -> OS(x) 3. forall x OS(x) -> Windows(x) v Linux(x) 4. forall x not Linux(x) v not Windows(x) WINDOWS type systems contain Windows XP and Windows NT systems. 6. forall x NT(x) -> Windows(x) 7. forall x XP(x) -> Windows(x) All windows operating systems are both user-friendly and have lousy security. 8. forall x Windows(x) -> (Security(x, Lousy) ^ Userfriendly(x)) The system on my computer is LINUX. ("all operating systems on my computer are linux") 9. forall x (OnmyPC(x) ^ OS(x)) -> Linux(x) I hate all systems with lousy security. 10. forall x (OS(x) ^ Security(x, lousy)) -> hates(I, x) LINUX has good security. 11. forall x Linux(x) -> Security(x, Good) I always either hate an operating system or like it, but never both. 12. forall x OS(x) => ((Hates(I, x) ^ not Likes(I, x)) v (not Hates(I, x) ^ Likes(I, x))) Security of an OS cannot be both good and lousy. 13. forall x OS(x) => not (Security(x, Good) ^ Security(x, Lousy)) a) Convert the knowledge base into CNF. In answers, lower-case letters are universally quantified variables. Assume "automatic standartization apart" between clauses. 1. not Linux(x) v OS(x) 2. not Windows(x) v OS(x) 3. not OS(x) v Windows(x) v Linux(x) 4. not Linux(x) v not Windows(x) 6. not NT(x) v Windows(x) 7. not XP(x) v Windows(x) 8. not Windows(x) v Security(x, Lousy) 8a. not Windows(x) v Userfriendly(x)) 9. not OnmyPC(x) v not OS(x)) v Linux(x) 10. not OS(x) v not Security(x, lousy) v Hates(I, x) 11. not Linux(x) v Security(x, Good) 12. not OS(x) v Hates(I, x) v Likes(I, x) 12a. not OS(x) v not Hates(I, x) v not Likes(I, x) 13. not OS(x) v not Security(x, Good) v not Security(x, Lousy) b) Show how this knowledge-base should be indexed - using a table. We will denote the index using clause number dot position in clause. Name Positive Negative Arg 1 Arg 2 -------------------------------------------------------------- OS 1.2 3.1 2.2 9.2 10.1 12.1 12a.1 13.1 -------------------------------------------------------------- Linux 3.3 1.1 9.3 4.1 11.1 -------------------------------------------------------------- Windows 3.2 2.1 6.2 4.2 7.2 8.1 8a.1 -------------------------------------------------------------- NT 6.1 -------------------------------------------------------------- XP 7.1 -------------------------------------------------------------- Security 8.2 10.2 11.1 13.2 13.3 -------------------------------------------------------------- Userfriendly 8a.1 -------------------------------------------------------------- OnmyPC 9.1 -------------------------------------------------------------- Hates 10.3 12a.2 12.2 -------------------------------------------------------------- Likes 12.3 12a.3 -------------------------------------------------------------- I 12.2 12a.2 12.3 12a.3 -------------------------------------------------------------- Good 11.2 13.2 -------------------------------------------------------------- Lousy 8.2 10.2 13.3 -------------------------------------------------------------- c) Prove or disprove by using resolution (or argue that neither can be done): 1) I hate all WINDOWS systems. Need to show: Q1: forall x Windows(x) -> Hates(I, x) (in CNF, this is: not Windows(x) v Hates(I, x)) Its negation is: Exists x Windows(x) ^ not Hates(I, x) For CNF, we need a skolem constant, say W123, and now: Q1': Windows(W123) Q1'a: not Hates(I, W123) From 8 and 10, we get: A: not Windows(x) v not OS(x) v Hates(I, x) From A and 2 we get: B: not Windows(x) v Hates(I, x) which is what we need to prove. Using a refutation proof, we need to resolve A with Q1' (unifier {x/A123}) and then resolve the result with Q1a' to get an empty clause. 2) If the system on my machine is WINDOWS NT, then I like it. Need to prove: Q2: forall x (OnmyPC(x) ^ NT(x)) -> Likes(I, x) Negation gives Q2': exists x OnmyPC(x) ^ NT(x) ^ not Likes(I, x) In CNF we get, after skolemizing (skolem constant N8): Q2a': OnmyPC(N8) Q2b': NT(N8) Q3c': not Likes(I, x) (actually, this last one is redundant) From 6 and Q2b', unifier {x/N8} we get: B: windows(N8) From B and 2, unifier {x/N8} we get: C: OS(N8) From 9, using C and then Q2a' on the result, we get: D: Linux(N8) Then from 4, using D and then B on the result we get the empty clause. (Intuitively, this is odd - the only reason I like the all NT on my machine is because there is NO NT on my machine - the false antecedent makes the senetence true!) 3) I do not hate the operating system on my computer. This is a bit ambiguous. Does it mean: I do not hate ANY of the operating systems on my computer, or: there is ONE OS on my computer, and I don't hate it? We will assume the former: Q3: forall x (OS(x) ^ OnmyPC(x)) -> not Hates(I, x) Negating, we get: Q3': exists x OS(x) ^ OnmyPC(x) ^ Hates(I, x) In CNF, with skolem constant O2, we get: Q3': OS(O2) Q3a': OnmyPC(O2) Q3b': Hates(I, O2) Using 9 with Q3' (unifier {X/O2}), and then the result with Q3a' we get: G: Linux(O2) We can continue to get, from G and 11, that: H: Security(O2, Good) but then from H and 10 and Q3' we can only get: Hates(I, O2) and then we are stuck! So we cannot either prove or disprove. 4) If I like all systems with good security, then I like the operating system on my computer. (Again, a bit ambiguous, so we will be making the same assumption as above). Q4: (forall y (OS(x) ^ Security(x, Good)) -> Likes(I, x)) -> (forall x (OS(x) ^ OnmyPC(x)) -> Likes(I, x)) Negated, this becomes: (forall y (OS(x) ^ Security(x, Good)) -> Likes(I, x)) ^ (exists x OS(x) ^ OnmyPC(x) ^ Hates(I, x)) The last part is just like in the previous case, and can be treated similar to last time, (in CNF, with skolem constant O3, we get): (note that y is not really within the scope of the universal quantifier) Q4': OS(O3) Q4a': OnmyPC(O3) Q4b': not Likes(I, O3) to which we add a clause for the first part: Q4c': not OS(x) v not Security(x, Good)) v Likes(I, x)) We can show: K: Security(O3, Good) In the same we we did above for O2. This time, however, we are NOT stuck. We Use Q4c' with K, and then the result with O4', and then the result with Q4b' to get the empty clause. 5) If a system is user-friendly then it has lousy security. This cannot be either proved or disproved. SHOW ALL STEPS, including the relevant unifiers, etc. d) Would using INPUT RESOLUTION only change the answers to c? Item 1 was proved where all resolutions had at least one clause from the KB or negated query, so there is NO CHANGE for INPUT RESOLUTION. Item 2 contained a step "using B on the result", that was NOT input resolution. It seems that this step is unavoidable, since both "Linux(N8)" and "Windows(N8)" are NOT in the input, and there seems to be no alternate proof. Items 3-4 also used only input resolution, so no change. In 5, resolution is complete - but we could not prove or disprove, so obviously we can do no better with just input resolution. e) Convert the KB into implication normal form. 1. Linux(x) -> OS(x) 2. Windows(x) -> OS(x) 3. OS(x) -> Windows(x) v Linux(x) 4. Linux(x) ^ Windows(x) -> FALSE 6. NT(x) -> Windows(x) 7. XP(x) -> Windows(x) 8. Windows(x) -> Security(x, Lousy) 8a. Windows(x) -> Userfriendly(x)) 9. (OnmyPC(x) ^ OS(x)) -> Linux(x) 10. (OS(x) ^ Security(x, lousy)) -> Hates(I, x) 11. Linux(x) -> Security(x, Good) 12. OS(x) -> Hates(I, x) v Likes(I, x) 12a. OS(x) ^ Hates(I, x) ^ Likes(I, x) -> FALSE 13. OS(x) ^ Security(x, Good) ^ Security(x, Lousy) -> FALSE f) Can you prove all the above using only either forward or backward chaining? Cannot do any of those, because axioms 3, 12 are not Horn, and also the queries contain disjunctions. 2) Consider the following variables and their immediate causes (parents): variable parents ------------------------------- A none E none B A C E D B C F B C a) Is this network a poly-tree? No, because there is a cycle in the underlying undirected graph: B-D-C-F-B b) Is this network directed-path singly connected? Yes, only 1 directed path, at most, between any pair of vertices. c) Determine the truth of the following independence statements (using d-separation): 1) I({A}, {E} | {}) Yes. Al paths go through either D (or F), s.t. D (resp. F) is a converging node on the path, with no evidence at D (resp. F), thus the path is blocked there. 2) I({A}, {E} | {D}) No. Now the path A-B-D-C-E is unblocked, because D is a converging node that is part of the evidence. 3) I({A}, {E} | {D, B}) Yes, because the path through D is now blocked by B, a pass-through node on this path that is an the evidence set. 4) I({A}, {E} | {D, B, F}) Yes, because B also blocks the path A-B-F-C-E, for the same reason as above. 5) I({A}, {D, F} | {B}) Yes, B is a blocking node on the path from A to either D or F, same as above. 6) I({A}, {F} | {B, C}) Yes, again same reason as above. d) Determine at least 2 different ways to convert the network into a poly-tree using cutset conditioning. Can break the only cycle by conditioning on EITHER B or C. e) Suppose that P(A) = 0.8, P(E) = 0.2, P(B|A) = 0.1, P(B|~A) = 0.8. Find P(A|E). Very easy, since A is independent of E given no evidence, so P(A|E) = P(A) = 0.8 (the rest data are just red herrings). 3) You need to construct a 4-input neural network that has value 1 in the output just when an even number of its inputs are 1, and 0 otherwise (assume inputs are in {0, 1}). a) Can this be done without hidden units? No, this function is not linearly separable. b) Show a network using threshold elements that performs the required computation (specify the weights, too!) We can do this by having internal units that essentially COUNT the number of inputs which are 1. The simplest (but NOT most compact) method is to have 4 hidden units, call them A-D, all of which are connected to all the 4 inputs. The output unit is connented to all the hidden units, and also has a "constant" input, i.e. an input which is always 1. The units A-D will represent: "at least 1 ones", "at least 2 ones", "at least 3 ones", and "4 ones" respectively. This is easily done: Assume that the threshold of all units is 1. The input weights for the hidden units are as follows: Unit A: all weights are 2 Unit B: all weights are 0.7 Unit C: all weights are 0.4 Unit D: all weights are 0.3 The weights for the output unit O are as follows: constant input -> O: 2 A->O: -2 B->O: 2 C->O: -2 D->O: 2 It is easy to see that the above network computes the required function. 4) a) Using the perceptron learning rule (the "delta" rule), show the steps of learning 3-input function AND with a single perceptron and a step activation function with threshold 1. Initially all weights are 0.7, and the learning rate is 0.5. Suppose that the order of presentation is the same as counting in binary. Then, in first pass: 000: no error, no change 001: no error, no change 010: no error, no change 011: should be 0, gives 1. Fix 2nd and 3rd weights to 0.2 100: no error, no change 101: no error, no change 110: no error, no change 111: no error, no change A second pass through all examples gives no errors, and in fact just one pass resulted in a correct output. b) Show (by example) that the order of presenting the examples can change the number of training episodes. In the above example it does not matter. However, suppose that the original weights were: 1.1, 0.2, 0.3. Using the example 100 first, we get: 100: should be 0, we get 1, fix first weight to 0.6 And only one pass is needed, because the result is correct for all examples. However, if the order were: 000: no error 001: no error 010: no error 011: no error 111: no error 110: should be 0, we get 1, fix first weight to 0.6 and 2nd to -0.3 101: no error 100: no error Now the result is incorrect, because 111 will now give us 0 instead of 1, and at least 1 more pass necessary to fix it! c) Is it possible to combine back-propagation with genetic algorithms, and if so, how? Yes. For example, perform backpropagation on any member of the population in a GA. This is called "local improvement". As a result, every member of the population will be at its own local optimum, and convergence of te GA may be faster. 5) a) Construct by hand (using the greedy search using the best information gain heuristic) a decision tree for the following table, a set of examples for considering desirability of a potential employee in a high-tech company. Data is degree level (A), existence of work experience (B), self-motivation (C), and last grade average (D). input variables decision A B C D -------------------------------------------- 1 T F 95 good 1 T F 80 good 1 T F 70 good 1 T T 70 good 1 F F 90 good 2 F T 80 bad 2 F T 95 bad 2 T F 70 average 2 T F 80 average Amount of information needed to classify riginal data is accouding to the likelihood ratio (5, 2, 2), i.e. probabilities (5/9 2/9, 2/9). Total information required is: I = 9(- 5/9 log (5/9) - 2/9 log(2/9) - 2/9 log(2/9)) =approx 13 bits Now, we try question A. If answer is 1, remaining needed information is 0. If answer is 2, (w. 4 examples), they are divided 2-2, so 1 bit per example is needed. Total remainder is 4 bits. (Thus answer to A gives approx. 9 bits of information for all the test set. Trying question B. If the answer is T, we have 6 examples, divided 4-2. Needed remaining info in this branch is: 6(-2/3 log 2/3 - 1/3 log 1/3) = approx 5.5 which is already greater than 4, so B cannot be better than A. For Question C: If the answer if D again we get 6 examples, divided 4-2, (remaining information needed in this branche approx. 5.5) so C is also worse than A. For Question D: Answer 70: 3 examples divided 1-2, need approx 2.75 bits, Answer 80: 3 examples divided 1-1-1, need approx 4.5 bits, So already more than 4 bits altogether. So, clearly A is best from greedy perspective. Now we need to look at 2 subtrees separately: A=1 and A=2. For the case A=2, clearly using either question B or question C will allow exact classification between "bad" and "average", and question D will not, so we can pick either arbitrarily. For the case A=1, we already have perfect classification, so no more need be done. We thus are done, we have a tree with only 2 internal nodes. b) Is a more compact decision tree possible? If yes, which tree? No, because there is no 1-internal-node tree that results in complete classification. But in general, the greedy algorithm is NOT guaranteed to find the optimal tree. 6) A) Use situation calculus (i.e. the result function, and fluents) to write down the axioms for the following simplified domain. The robot can grab an item, and as a result it will hold it. The robot can put item A inside item B, but only if it does not hold an item. In the initial situation, S0, the robot holds only item A. Actions are: a) Insert: will cause item A to be inside item B, but only if the robot holds no items. b) Grab(item1): if the robot is not holding any other item, the result will be that the robot will hold item1, and any other item inside item1. c) Release: as a result of this action, the robot will hold nothing Fluents (situation-dependent predicates) are: Inside(item1, item2, s) (Item1 is inside item2) Holds(item, s) (The robot holds item) Success(s) (The robot holds both A and B) Axioms for actions: a) forall x, y, s (not exists z Holds(z, s)) -> Inside(x, y, Result(Insert, s)) (Note "bug" that if x=y, this may result in an object being inside itself. But then we didn't say it should not be so in the specifications...) b) forall x, s (not exists y Holds(y, s)) -> Holds(x, Result(Grab, s)) b1) forall x, y, s (Holds(x, s) ^ Inside(y, x, s)) -> Holds(y, s) (Note ambiguity in spec - will it hold things that were inside in the pre-action situation, or things that are inside in the post-action situation? The above is the 2nd option. A fine distinction, but potentially significant!) c) forall x, s not Holds(x, Result(Release, s)) d) forall s (holds(A, s) ^ Holds(B, s)) -> Success(s) This looks like it should do, but it is NOT, as we can see when trying to prove what we need. B) Now, you need to prove that, for the initial situation S0, that the action sequence: Release, Insert, Grab(B) results in success, that is, prove that: Success(Result(Grab(B), Result(Insert, Result(Release, S0)))) First, from c and b we get: R1: Inside(A, B, Result(Insert, Result(Release, S0))) Now, we want to use a to get that: Q: Holds(B, Result(Grab(B), Result(Insert, Result(Release, S0)))) but we CANNOT, because we cannot prove that: Pre: not Holds(x, Result(Insert, Result(Release, S0))) Although the above is intuitive, it does not follow from our axioms. We need frame axioms for things that do NOT change. E.g. Insert should not cause things to be held! The following is incomplete, but will do for our proof: F: forall x, s (not Holds(x, s)) -> (not Holds(x, Result(Insert, s))) Now we can use F and c to prove Pre, and can then use Pre, R1 and a to get Q. But what about object A? We also need to prove that: Q1: Holds(A, Result(Grab(B), Result(Insert, Result(Release, S0)))) which looks OK, except that we cannot prove that: Pre1: Inside(A, B, Result(Grab(B), Result(Insert, Result(Release, S0)))) We need another frame axiom for "Inside", e.g. (also intuitive, but needs to be stated!): forall x, y, z, s Inside(x, y, s) -> Inside(x, y, Result(Grab(z), s)) and THEN we can prove Pre1, and then Q1! Now from Q and Q1 and axiom d for "success" we finally get what we need.