Introduction to Artificial Inteligence
Assignment 6
Written assignment: inference and learning - SOLUTIONS
1) Find the most general unifier for the following pairs, if it exists
(variables are identifiers with a ? prefix).
Note that we assume the need to standardize apart, first (we will
"prime" the variable name in the second term, if necessary).
a) P(A, ?z, C), P(?x, ?x, ?z)
{?z|?x, ?x|A, ?z'|C} (alternately ?z|A, which is equivalent)
b) Foo(?x, Foo(?x)), Foo(Foo(?y), ?y)
Fail (occurs check)
c) Q(?y, G(A, B)), Q(G(?x, ?x), ?y)
{?y|G(?x, ?x), ?y'|G(A, B)}
2) Write down the axioms and facts for example below in FOPC:
The taxonomic hierarchy for animals, contains birds, which
which in turn contains penguins and sparrows (which are
disjoint).
All birds can fly, unless they are penguins. No penguins can fly.
Anything that can fly has wings. Poopoo is a penguin.
We can use either set notation or predicate notation for sets
in the hierarchy. In predicate notations we will have one-argument
predicates: Animal, Bird, Sparrow, Penguin. Also the Can-Fly(x)
predicate and Wing(x) mean x can fly and x is a wing, respectively.
We need the predicate Has-part(x, y) meaning x has part y.
The hierarchy:
(1) (forall x) Bird(x) => Animal(x)
(2) (forall x) Penguin(x) => Bird(x)
(3) (forall x) Sparrow(x) => Bird(x)
(4) (forall x) ~(Sparrow(x) ^ Penguin(x))
Other facts:
(5) (forall x) Bird(x) ^ ~Penguin(x) => Can-fly(x)
(6) (forall x) Penguin(x) => ~Can-fly(x)
(7) (forall x) Can-fly(x) => (exists y) Has-part(x, y) ^ Wing(y)
(8) Penguin(Poopoo)
a) Convert the knowledge base into normal form
Using ? as a prefix to denote variables, we get (in CNF):
(1) ~Bird(?x) v Animal(?x)
(2) ~Penguin(?x) v Bird(?x)
(3) ~Sparrow(?x) v Bird(?x)
(4) ~Sparrow(?x) v ~Penguin(?x)
Other facts:
(5) ~Bird(?x) v Penguin(?x) v Can-fly(?x)
(6) ~Penguin(?x) v ~Can-fly(?x)
(7) ~Can-fly(?x) v Has-part(?x, wing-of(?x)) ; Wing-of: skolem func.
(7') ~Can-fly(?x) v Wing(wing-of(?x))
(8) Penguin(Poopoo)
b) Show how this knowledge-base should be indexed - using a table.
Using a simple predicate-based table, with clause.literal numbers
(note that since this is CNF rather than INF, we do not have
"conclusion" and "premise" columns):
Predicate Positive Negative
Animal 1.2
Bird 2.2, 3.2 1.1, 5.1
Penguin 8.1, 5.2 2.1, 4.2, 6.1
Sparrow 3.1, 4.1
Can-fly 5.3 6.2, 7.1, 7'.1
Has-part 7.2
Wing 7'.2
c) Prove or disprove by using resolution (or argue that neither
can be done):
1) Poopoo has wings,
Cannot be proved or disproved - only 7' has the predicate
"Wing", and since we can prove ~Can-fly(Poopoo), there is
no way to prove or disprove that Poopoo has wings.
2) Poopoo can fly.
From (6) and (8) with unifier {?x|Poopoo} we get
(9) ~Can-fly(Poopoo)
3) Poopoo is an animal.
From (1) and (2) with unifier {?x|?x'}, resolving on Bird(?x) we get:
(10) ~Penguin(?x') v Animal(?x')
Then from (8) and (10) with unifier {?x'|Poopoo} we get:
Animal(Poopoo)
4) If poopoo is a sparrow, then Poopoo cannot fly.
We need to prove: ~Sparrow(Poopoo) v ~Can-fly(Poopoo)
We will add the negation of this as:
(11) Sparrow(Poopoo)
(11') Can-fly(Poopoo)
Now resolving (9) and (11') results in the empty clause
(contradiction - proving the sentence).
d) Would using INPUT RESOLUTION only change the answers to c?
No, we only used input resolution in the answers.
3) It is known that there is a correlation between smoking and cancer.
Medical associations claim that smoking causes cancer. Tobacco companies
claim that there is a hidden element that causes both cancer and smoking.
Draw a Bayes network for each of the above claims.
Medical associations claim network is: smoking ---=> cancer
Tobbaco companies claim network is: Hidden element ----=> smoking
\
--------=> cancer
4) Consider the following variables and their immediate causes (parents):
variable parents
-------------------------------
A none
E none
B A
C A E
D A
F B C
Draw the network (assume arrows pointing downwards)
A E
/|\ /
D B C
\|
F
a) Is this network a poly-tree?
No. A - B - F and A - C - F are paths in the network.
b) Determine the truth of the following independence
statements (using d-separation):
1) I({A}, {E} | {}) True (paths blocked by C, F, resp.)
2) I({A}, {E} | {B}) True (paths blocked by C, F resp.)
3) I({A}, {E} | {C}) False (path A C E unblocked)
4) I({A}, {E} | {F}) False (path A C E unblocked)
5) I({A}, {C, F} | {B}) False (path A C always unblocked)
6) I({A}, {F} | {B, C}) True (paths blocked by B, C, resp.)
c) Determine at least 2 different ways to convert the network into
a poly-tree using clustering.
1) Make B-C a cluster.
2) Make A-B-C-F a cluster.
d) Suppose that P(A) = 0.9, P(E) = 0.3, P(B|A) = 0.1, P(B|~A) = 0.8.
Find P(A|E)
Since I({A}, {E} | {}) then P(A|E) = P(A) = 0.9
5) You need to construct a 3-input neural network that has value 1 in the
output just when exactly 2 of its inputs are 1, and 0 otherwise
(assume inputs are in {0, 1}).
a) Can this be done without hidden units?
No - the function is not linearly separable in 3 dimensions.
b) Show a network using threshold elements that performs
the required computation (specify the weights, too!)
Use a hidden unit that is on if ALL of its inputs are 1
(for example, input weights 1 and threshold 2.5).
Use one output unit with input weights 1 from each of the input
layer neurons is 1, weight -10 on an arc from the hidden unit,
and threshold 1.5.
Output will be 1 if AT LEAST 2 of its inputs are 1, but NOT when
all 3 inputs are 1, which is exactly what we need.
6) Using the perceptron learning rule (the "delta" rule), show the steps
of learning the function OR with a single perceptron
and a step activation function with threshold 1. Initially all
weights are 1, and the learning rate is 0.4.
If the threshold is such that a weighted sum EQUAL to the threshold
causes an output of 1, a single epoch finds no errors and the
perceptron is trained. If the output is one only on STRICTLY greater
than 1, in the first epoch the examples 01 and 10 will increase the
respective weights to 1.4, after which no further changes will occur.
7) Is it possible to combine back-propagation with genetic algorithms, and
if so, how?
Yes, use back-propagarion as a local improvement operator on elements
of the population, for example just before computing fitness in each
iteration.
8) a) Construct by hand a decision tree for the following table,
where the first decision is based on A, then if necessary B, etc.
input variables decision
A B C D
--------------------------------------------
T 1 F 5 fly
T 1 F 4 fly
T 1 F 3 fly
T 1 T 3 fly
T 2 F 3 walk
T 2 F 4 walk
F 1 F 2 fly
F 1 T 4 crawl
F 1 T 5 crawl
A ---(true)--- B ---(1)--- fly
\ \
\ --(2)--- walk
\
--(false)-- C -(false)-- fly
\
-(true)-- crawl
b) Which node will be pruned first, based on least information gain?
Nodes B, C have the same information gain PER ENCOUNTERED EXAMPLE
AT THE NODE.
However, C is needed on fewer examples and should be pruned first
c) Is a more compact decision tree possible (possibly using a different
ordering of the variables)? Which tree?
Since there is no subset of 2 attributes which uniquely determine
the decision variable, every decision tree must have 3 or more
non-leaf nodes, thus the above tree is optimal.