Introduction to Artificial Inteligence
Assignment 6- solutions
Written assignment: inference and learning
1) Write down the axioms and facts for example below in FOPC:
The taxonomic hierarchy for furniture, contains chairs, which
which in turn contains stuffed chairs and plastic chairs (which are
disjoint). All chairs have 4 legs. Stuffed chairs have padding.
All chairs with padding are comfortable, unless they are broken.
The item of furniture in my office, which I call
"the contraption" is a stuffed chair.
1. (forall x) Chair(x) => Furniture(x)
2. (forall x) Stuffed_chair(x) => Chair(x)
3. (forall x) Plastic_chair(x) => Chair(x)
4. (forall x) ~Plastic_chair(x) v ~Stuffed_chair(x)
5. (forall x) Chair(x) =>
(exists g) Set(g) ^ Cardinality(g) = 4 ^
(forall l) In(l, g) => (Leg(l) ^ Has_part(x, l))
6. Greater(4, 1)
6.1 (forall x, y, z) Greater(x, z) ^ (y=x) => Greater(y, z)
; Ad-hoc axiom for equality
7. (forall x) Stuffed_chair(x) => Has_padding(x)
8. (forall x) Chair(x) ^ Has_padding(x) ^ ~Broken(x) => Comfortable(x)
9. Stuffed_chair(the contraption)
a) Convert the knowledge base into normal form
Let us convert each item separately (so as not to have
too many variable names). Items with ? are variables.
1. ~Chair(?x) v Furniture(?x)
2. ~Stuffed_chair(?x) v Chair(?x)
3. ~Plastic_chair(?x) v Chair(?x)
4. ~Plastic_chair(?x) v ~Stuffed_chair(?x)
5.1. ~Chair(?x) v Set(legs-of(?x))
5.2. ~Chair(?x) v Cardinality(legs-of(?x))=4
5.3. ~Chair(?x) v ~In(?l, leg-of(?x)) v Leg(?l)
5.4. ~Chair(?x) v ~In(?l, leg-of(?x)) v Has_part(?x, ?l)
where (leg_of(?var)) is a skolem function.
6. Greater(4, 1)
6.1 ~Greater(?x, ?z) v ~(?y=?x) v Greater(?y, ?z)
7. ~Stuffed_chair(?x) v Has_padding(?x)
8. ~Chair(?x) v ~Has_padding(?x) v Broken(?x) v Comfortable(?x)
9. Stuffed_chair(the contraption)
b) Show how this knowledge-base should be indexed - using
a table.
Not very interesting, so this is skipped...
c) Prove or disprove by using resolution (or argue that neither
can be done):
1) The contraption is a chair.
Can be proved. Add:
10. ~Chair(the contraption)
and now: resolve 10 with 2, ?x=the contraption to get:
11. ~Stuffed_chair(the contraption)
Now resolve 11 with 9 to get an empty clause (contradiction)
2) The contraption is comfortable.
Cannot be proved or disproved, since the only clause which
mentions "Comfortable" also has "Broken" and there is
no other clasue which mentions "~Broken".
3) The contraption has padding.
Can be proved. Add:
10. ~Has_padding(the contraption)
Resolve with 7. ?x=the contraption, to get:
11. ~Stuffed_chair(the contraption)
Resolve 11 with 9 to get the empty clause.
4) If the contraption is a plastic chair, then it is a dog.
Can be proved. We need to prove:
Plastic_chair(the contraption) => Dog(the contraption)
and its negation is:
Plastic_chair(the contraption) ^ ~Dog(the contraption)
which are added to the KB:
10. Plastic_chair(the contraption)
11. ~Dog(the contraption)
(actually, 11 is irrelevant and will never be used)
Now, resolve 10 with 4, ?x=the contraption to get:
12. ~Stuffed_chair(the contraption)
Resolve 12 again with 9 to get the empty clause.
5) The contraption has more than 1 leg.
Can be proved. We need to prove:
(exists g) Greater(Cardinality(g),1) ^ Set(g)
^ (forall l) In(l, g) =>
Leg(l) ^ Has_part(the contraption, l)
Its negation is:
(forall g) ~Greater(Cardinality(g),1) v ~Set(g)
v (exists l) In(l, g)
^ (~Leg(l) v ~Has_part(the contraption, l)
Converting to CNF, and adding to the KB, we get:
10. ~Greater(Cardinality(?g),1) v ~Set(?g) v ~Leg(foo-of(?g)
v ~Has_part(the contraption, foo-of(?g))
11. ~Greater(Cardinality(?g),1) v ~Set(?g)
v In(foo-of(?g), ?g)
Resolve 9 with 2, ?x=the contraption, to get:
12. Chair(the contraption)
Resolve 12 with 5.1, 5.2., 5.3, and 5.4 each
with ?x=the contraption to get:
13. Set(legs-of(the contraption))
14. Cardinality(legs-of(the contraption)) = 4
15. ~In(?l, leg-of(the contraption)) v Leg(?l)
16. ~In(?l,leg-of(the contraption))v Has_part(the contraption,?l)
Resolve 14 with 6.1, where
?y=Cardinality(legs-of(the contraption)) and ?x=4 to get:
17. Greater(Cardinality(legs-of(the contraption)), ?z) v
~Greater(4, ?z)
Resolve 17 with 6, ?z=1, to get:
18. Greater(Cardinality(legs-of(the contraption)), 1)
Resolve 18 with 10, and 11 resp. with
?g=legs-of(the contraption) to get:
19. ~Set(legs-of(the contraption))
v ~Leg(foo-of(legs-of(the contraption))
v ~Has_part(the contraption,
foo-of(legs-of(the contraption))
20. ~Set(legs-of(the contraption)) v
v In(foo-of(legs-of(the contraption)),
legs-of(the contraption))
Resolve 13 with 19 and 20 resp. to get:
21. ~Leg(foo-of(legs-of(the contraption)) v
~Has_part(the contraption, foo-of(legs-of(the contraption))
22. In(foo-of(legs-of(the contraption)),legs-of(the contraption))
Resolve 15 with 22, ?l=foo-of(legs-of(the contraption), to get:
23. Leg(foo-of(legs-of(the contraption)))
Resolve 16 with 22, ?l=foo-of(legs-of(the contraption), to get:
24. Has_part(the contraption, foo-of(legs-of(the contraption))
Resolve 23 with 21, to get:
25. ~Has_part(the contraption, foo-of(legs-of(the contraption))
Finally, resolve 24 with 25 to get the empty clause.
SHOW ALL STEPS, including the relevant unifiers, etc.
d) Would using INPUT RESOLUTION only change the answers to c?
No changes, except it seems that the last proof would not work
with just input resolution.
e) Can you prove all the above using only either forward or
backward chaining?
First, we can represent only the Horn clauses. Everything here is in
Horn form, except for axiom 8, so all but axiom 8 can be used. Since
none of the proofs used axiom 8, and all the proofs had only Horn
clauses, it appears that these proofs could have been done with
EITHER forward OR backward chaining.
2) Consider the following variables and their immediate causes (parents):
variable parents
-------------------------------
A none
E F
B A
C A
D none
F B C D
a) Is this network a poly-tree?
No, its underlying underected graph has a cycle: A-B-F-C-A
b) Determine the truth of the following independence
statements (using d-separation):
1) I({A}, {D} | {})
Yes, all paths go through F and are blocked at F because
F is a converging node with no evidence at F or below
2) I({A}, {D} | {F})
No, because A-C-F-D is unblocked (at F due to evidence at F,
and at C because it is a non-converging node with no evidence
3) I({A}, {D} | {F, B, C})
Yes, becase all paths must pass at either B or C in a
non-converging manner, and are blocked at B or C as they are
evidence
4) I({A}, {E} | {F})
Yes, F blocks all paths from A to E since F is a non-converging
evidence node on all these paths.
5) I({A}, {C, F} | {B})
No, path A-C can never be blocked.
6) I({A}, {F} | {B, C})
Yes, either B or C block every path.
c) Determine at least 2 different ways to convert the network into
a tree using clustering.
1) Make BC into a single node, or
2( Make BCF into a single node, or
3) Make ABC into a single node or ...
d) Suppose that P(A) = 0.9, P(D) = 0.3, P(B|A) = 0.1, P(B|~A) = 0.8,
P(C|A) = 1, P(C|~A) = 0.5.
Find P(A| B, C, E)
A is independent of E given BC, so we can just compute P(A |B, C).
We have P(A|B,C) = P(ABC)/P(BC) = P(A)P(C|A)P(B|AC)/(P(ABC)+P(~ABC))
= P(A)P(C|A)P(B|A)/( P(A)P(C|A)P(B|A) + P(~A)P(C|~A)P(B|~A)) =
0.9*1*0.1/(0.9*1*0.1 + 0.1*0.5*0.8) = 0.09/(0.09 + 0.04) ~ 0.7
3) You need to construct a 4-input neural network that has value 1 in the
output just when exactly 2 or 3 of its inputs are 1, and 0 otherwise
(assume inputs are in {0, 1}).
a) Can this be done without hidden units?
No, this decision function is not linearly separable.
b) Show a network using threshold elements that performs
the required computation (specify the weights, too!)
With one hidden unit, this should work. We also have 1 output unit.
The hidden unit receives all inputs, with a weight of 1 and a
threshold of 3.5. The output units also receives all inputs, again
with a weight of 1. It also recieves the output of the hidden unit,
with a weight of -10. Its threshold is 1.5. Clearly is is on when 2 or
more 2 inputs are 1, as long as the hidden unit is off - and the hidden
unit disables the output when more than 3 inputs are 1.
4) a) Using the perceptron learning rule (the "delta" rule), show the steps
of learning 3-input function OR with a single perceptron
and a step activation function with threshold 1. Initially all
weights are 1, and the learning rate is 0.4.
In this case, if a value EXACTLY at the threshold is considered
"more" than the threshold, the algorithm will make one pass over all
8 examples and make no changes (no error), and stop.
If we need a value strictly greater than the threshold, the examples:
001, 010, and 111 will generate an error, and in each case the respective
weight will be increased to 1.4. After a second pass over the examples,
no errors remain and the algorithm stops.
b) Is it possible to combine back-propagation with genetic algorithms, and
if so, how?
Yes. Use GA to learn the weights in a neural network, but for each
individual weight vector use back-propagation as a local improvement
operator (to find a local minimum) before computing the fitness.
5) a) Construct by hand (using the greedy search using the best information
gain heuristic) a decision tree for the following table, a set of
examples for considering desirability of a potential employee in
a high-tech company. Data is degree level (A),
existence of work experience (B),
self-motivation (C), and last grade average (D).
input variables decision
A B C D
--------------------------------------------
1 T F 85 good
1 T F 80 good
1 T F 70 good
1 T T 70 good
1 F F 90 good
1 F T 80 bad
1 F T 85 bad
2 T F 70 average
2 T F 80 average
In the initial situation, remaining information requited is:
A) If we choose A - case 2 needs 0 information, case 1 has probability
of 2/7 and 5/7 over 7 cases.
B) For B we have in one branch 3 cases with probs 1/3 and 2/3,
in the other 2/6 and 4/6
C) For C we have the same remaining information as for B
D) For D we have for the 70 branch: 1/3 and 2/3, for the
80 branch the same, for the 85 branch 1/2 and 1/2, and for the
90 branch we have no remaining info.
Doing the exact numbers, we get that A leaves least remaining
information and is (locally) optimal.
Now we remain with the branch 1 of A. Applying B the remaining
required information is 1/3 and 2/3 for the false branch, and 0 for
the T branch. For C we have 0 remainder for the F branch, and
1/2, 2/3 for the T branch. For D we have 0 remainder for the 70
and 90 branch, but 1/2, 1/2 for the 85 branch and the same for
the 80 branch, a total of 2 bits which is a bit less than for
B or C, so we choose attribute D.
Now, the 85 branch either B or C have the same information gain,
and for the 80 branch this is the same, so at either branch we can choose
indifferently. So, the tree is:
A ----- 2 ---- average
\------1 ---- D ----- 70 --- good
\----- 90 --- good
\---- 80 --- B ---- F ---- bad
\---- T ---- good
\-- 85 --- B ---- F ---- bad
\---- T ---- good
This tree has 7 leaves and 4 internal nodes.
b) After constructing the tree, one can try to prune nodes from the
bottom up, based on least information gain. Which node would be pruned
first in this case?
The only candidates are nodes whose only children are leaves, so
only 2 candidates. Each candidate has an information gain of 2 bits,
so is equally good for pruning.
c) Is a more compact decision tree possible? Which tree?
In general, the above method is not optimal. In this case,
the following tree is a correct classifier:
B-- F -- C -- F ---- good
\-- T ---- bad
T -- C -- T ---- good
\-- F --- A --- 1 --- good
\--- 2 --- average
This tree has 4 internal nodes, but only 5 leaves, and is thus better.
(I believe it is optimal, but not sure).