Question 1: a) 1. (forall x) Pirate(x) => (exists y) Has_part(x, y) ^ Leg(y) ^ Material(y) = Wood 2. (forall x) (Material(x) = Wood) => Floats(x) 3. ~Floats(Joe) => Pirate(Joe) 4. (forall x) ((exists y) Has_part(x, y) ^ Floats(y) => Floats(x) b) 1.1 ~Pirate(?x) v Has_part(?x, Leg-of(?x)) 1.2 ~Pirate(?x) v Leg(Leg-of(?x)) 1.3 ~Pirate(?x) v Material(Leg-of(?x)) = Wood where Leg-of(.) is a skolem function with 1 argument. 2. ~(Material(?x) = Wood) v Floats(?x) 3. Floats(Joe) v Pirate(Joe) 4. ~Has_part(?x, ?y) v ~Floats(?y) v Floats(?x) c) We need to prove Floats(Joe), so we add its negation ~Floats(Joe) to the KB. It is already a (unit) CNF clause. d) Resolve the above unit clause with 3, to get: 5. Pirate(Joe) Now, from 5 with 1.1, and 1.3 unifier ?x=Joe, to get, respectively: 6.1 Has_part(Joe, Leg-of(Joe)) 6.3 Material(Leg-of(Joe)) = Wood From 6.3 and 2, unifier ?x=Leg-of(Joe), we get: 7. Floats(Leg-of(Joe)) From 7 and 4, unifier ?y=Leg-of(Joe), we get: 8. ~Has_part(?x, Leg-of(Joe)) v Floats(?x) the above means: "Anything that has Joe's leg floats". From 6.1 and 8, unifier ?x=Joe we get: 9. Floats(Joe) Resolving 8 with the goal unit clause results in the empty clause. Question 2: a) Network is NOT a polytree, as A-B-D-C-A is a cycle in the underlying undirected graph. The smallest cluster resulting in a polytree is by making B and C into a single macro-node BC, resulting in (with arrows directed down and to the right): F --- G | | A --- BC --- D --- E The rest pertains to the original network. b) B is independent of G, as it is d-separated from G with no evidence. There are only 2 simple unirected paths, both blocked, as follows: 1) B - A - C - F - G 2) B - D - C - F - G In the 1st path, C is a blocking node, as it is a converging node with no evidence at C or descendents (becasue there is no evidence at all). In the 2nd path, C is NOT a blocking node, but D is a blocking node, since it is a converging node and there is no evidence. c) With evidence at D, both paths above become unblocked, and thus B becomes dependent of G given D. d) To compute P(E | B, C, G) first observe that E is independent of G given {B, C}, since all undirected paths from G to E contain either B or C as a non-converging node that has evidence - thus blocking the path. So it is the same if we compute: P(E | B, C) = P(E, D | B,C) + P(E, ~D | B,C) (apply chain rule) = P(E | D,B,C) P(D | B,C) + P(E | ~D,B,C) P(~D | B,C) (E is d-separated from {B, C} given D, and plug in given probabilities) = P(E | D) * 0.9 + P(E | ~D) * 0.1 = 0.9 * 0.9 + 0.5 * 0.1 = 0.86 e) A is d-separated from F with no evidence, as all paths must have either C or D as converging nodes in the path, and there is no evidence. Thus: P(A, ~F) = P(A) P(~F) = 0.8 * (1-0.1) = 0.72 Question 3: a) False - finding optimal exact decision tree is NP-hard, independent of which algorithm we use! b) False - backpropagation is a form of gradient descent, which may get stuck in local minima, thus fail to achieve 0 errors. c) True - all is needed is a fitness measure, or even a live/die result or any other feedback from the world. No teacher necessary. d) True - the forward chaining algorithm, adds consequences of new facts, as well as the facts asserted by the "user".