Introduction to Artificial Inteligence

Assignment 6 - Solutions


1) Use situation calculus (i.e. the result function, and fluents) to
   write down the axioms for the money-and-banana example. The monkey
   needs to get the banana, which can be done by climbing a box after
   moving it into position below the bananas, which are too high to
   reach otherwise. In the initial state, the box is in the room, but
   not under-bananas, the monkey does not have the bananas. Actions
   are:
    a) grab - gets the bananas if the monkey is on the box and the box is
              under the bananas
    b) climb - puts the monkey on the box if the monkey is near the box
    c) move(item, position) - moves item into position if the item is
               a box and the monkey is not on it.

    Fluents (time-dependent predicates) are: 
        box_at(location)  (location can be under-bananas or elsewhere)
        on-box            (monkey is on box)
        has-bananas       (monkey has the bananas)

    Now, you need to prove that, for the initial situation S where the monkey
    is not on the box and does not have tha bananas, the action sequence
    (move, climb, grab) works, i.e. that: (note - form in exercise - using
    the "holds" predicate is an alternate not used in the book, so providing
    answer in terms used in book). The book adds one argument to each fluent
    denoting the situation, and we will do likewise.

        has-bananas(result(grab, result(climb, 
                                 result(move(box, under-bananas), S))))

    Can you do it with your original formulation (show the proof!).
    If not, add the requisite axioms and complete the proof.

Answer:

Grab action axiom:
 forall (S)  (on-box(S) and box_at(under-bananas, S)) =>
       has-bananas(result(grab, S))

Climb action axiom:
 forall (S) (not on-box(S)) => on-box(result(climb, S))

Note - we did not have a predicate for monkey location - unless on box!
Let us assume that if the monkey is not on the box it is near the box - by
no means an obvious (or correct!) assumption.

Move action axiom:
 forall (S, I, P) ((not on-box(S)) and box(I)) =>
    box_at(P, result(move(I, P), S))

Note that the representation commitment is not so good, but this is
what we had... what the above means is if we move anything that is a box to P
then it will be at P... allows for only one box, and also does not allow
for moving anything but a box...

Initial state:
1. not on-box(S0) 
2. not has-bananas(S0)
3. not box-at(under-bananas, S0)
 box-at(in-room, S0)     ; (Irrelevant fact)

0. box(box)


To prove theorem with resolution we need to first convert it to normal form.
However, before we do, note that we will have a problem. We have not
axiomatized:
a) That the box will stay under the bananas in a situation resulting
   from the "climb" action! (also for "grab", but we do not need this
   for the thorem!
b) That the monkey will not be on the box as a result of applying move
   (note that if the monkey "magically" appears on the box as a result of
   move, the climb action axiom will not work...)
We need to add these 2 FRAME AXIOMS:

 forall (S) box_at(under-bananas, S) =>
            box_at(under-bananas, result(climb, S))

 forall (S, I, P) (not on-box(S)) => not on-box(result(move(I, P), S))

Now, add the CNF versions of the above axioms (variables have ? in front):

4. (not on-box(?S)) or (not box_at(under-bananas,?S) or 
      has-bananas(result(grab,?S))

5. on_box(?S1) or on-box(result(climb,?S1))

6. on-box(?S2) or (not box(?I)) or box_at(?P, result(move(?I, ?P), ?S2))

Now, convert the frame axioms:

7. (not box_at(under-bananas,?S3)) or box_at(under-bananas, result(climb,?S3))

8. on-box(?S4) or (not on-box(result(move(?I1, ?P2), ?S4)))

Finally (optionally) add negation of theorem to prove:

9.  not has-bananas(result(grab, result(climb, 
                                 result(move(box, under-bananas), S))))

Proof is as follows:

Resolve 0 with 6, ?I=box, to get:
10: on-box(?S2) or box_at(?P, result(move(box, ?P), ?S2))

Resolve 1 with 10, ?S2=S0 to get:
11:  box_at(?P, result(move(box, ?P), S0))

Resolve 8 with 1, ?I1=box, ?P=?P2, ?S4=S0, to get:
12: not on-box(result(move(box, ?P2), S0)))

Resolve 5 with 12, ?S1=result(move(box, ?P2), S0) to get:
13: on-box(result(climb,result(move(box, ?P2), S0)))

Resolve 4 with 13, ?S=result(climb,result(move(box, ?P2), S0)), get:
14: (not box_at(under-bananas,result(climb,result(move(box, ?P2), S0)))) or 
      has-bananas(result(grab,result(climb,result(move(box, ?P2), S0))))

Resolve 7 with 11, ?S3=result(move(box, ?P), S0), ?P=under-bananas, to get:
15: box_at(under-bananas, result(climb,result(move(box, under-bananas), S0)))

Resolve 14 with 15, ?P2=under-bananas, to get:
16: has-bananas(result(grab,result(climb,result(move(box, under-bananas),S0))))

This constitutes a proof. Now for a proof by contradiction, assume 9 was
added, and now resolve 16 with 9 to get a contradiction.
Note how the frame axioms 7 and 8 were needed in the proof.

2) Write down the axioms and facts for example below in FOPC:
   The taxonomic hierarchy for animals, contains dogs, which
   which in turn contains German shepherds and chihuahuas (which are
   disjoint). All dogs are carnivorous.
   German shepherds are large. All large dogs can be guard dogs,
   unless they are old. Fido is an old German shepherd.


Answers: we will introduce "class" predicates with one argument, denoting
the obvious. Taxonomic hierarchy axioms:
  forall (X) dog(X) => animal(X)
  forall (X) german-shepherd(X) => dog(X)
  forall (X) chihuahua(X) => dog(X)
  forall (X) (not german-shepherd(X)) or (not chihuahua(X))  ; disjointness

Property axioms:
  forall (X) dog(X) => carnivorous(X)
  forall (X) german-shepherd(X) => large(X)
  forall (X) (large(X) and dog(X) and (not old(X))) => can-be-guard-dog(X)

Specific facts about Fido:
a.  old(Fido)
b.  german-shepherd(Fido)
  
   a) Convert the knowledge base into normal form

Conversion, axiom by axiom:
1. (not dog(?X)) or  animal(?X)
2. (not german-shepherd(?X1)) or dog(?X1)
3. (not chihuahua(?X2)) or dog(?X2)
4. (not german-shepherd(?X3)) or (not chihuahua(?X3))
5. (not dog(?X4)) or carnivorous(?X4)
6. (not german-shepherd(?X5)) or large(?X6)
7.(not large(?X6)) or (not dog(?X6)) or old(?X6) or can-be-guard-dog(?X6)


   b) Show how this knowledge-base should be indexed - using
      a table. (different from book - we used CNF, they used INF, but either
      is OK). We show POINTERS (e.g 2.1 means sentence 2, literal 1)

Table is:

Name             |   positives     |  negatives  |  argument
---------------------------------------------------------------
Fido                                                 a.1, b.1

animal                 1.2

large                  6.2               7.1

dog                    2.2, 3.2       1.1, 5.1, 7.2

chihuahua                             3.1, 4.1

german-shepherd        b.1            2.1, 4.1, 6.1

can-be-guard-dog       7.4

old                    a.1              7.3

VARIABLE                                               1.1, 1.2, ... more...



   c) Prove or disprove by using resolution  (or argue that neither 
      can be done):
      1) Fido is an animal.
Resolve b with 1, ?X1=Fido to get: 
8: dog(Fido), 
then 8 with 1, ?X=Fido to get animal(Fido)

      2) Fido is large.
Resolve b with 6, ?X5=Fido to get
9: large(Fido)

      3) Fido can be a guard dog.
Cannot be done. because resolving things with 7 can get us at most:
 (not old(Fido)) or can-be-guard-dog(Fido)
and noting more specific is available. Since we do not have (not old(Fido))
(in fact we have the opposite) we cannot prove or disprove the desired
claim.

      4) If Fido is a chihuahua, then Fido is young.
Can be proved as follows: Add negation of claim:
100: chihuahua(Fido) 
101: (not young(Fido))
Resolve 100 with 4, ?X3=Fido to get:
102: not german-shepherd(Fido)
Resolve 102 with b to get the empty clause (contradiction)

      5) Fido is carnivorous.
Resolve 8 with 5, ?X4=Fido to get:
10: carnivorous(Fido)

   d) Would using INPUT RESOLUTION only change the answers to c?
No, because at every step above we had at least 1 clause from the KB or
the query, so this was input resolution.

   e) Can you prove all the above using only either forward or
      backward chaining?
This depends on how rules are encoded. If the direction of the => is
as in the original, we can get all the above results, except 4 for which we
need to have german-shepherd(?X) => not chihuahua(?X)
If the direction is not consistent, we may not be able to prove some
of the claims above. Obviously, for claim 3 we still cannot prove anything!

3) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        E               none
        B               A
        C               A E
        D               A
        F               B C

    a) Is this network a poly-tree?

        No. The underlying undirected graph has cycles, e.g.  A-B-F-C-A

    b) Determine the truth of the following independence
       statements (using d-separation):

       1)  I({A}, {E} | {})         true - only path is through C - blocked
                                    as C is converging and null evidence.

       2)  I({A}, {E} | {B})        true - same reasons, and B is not a
                                    descendent of C

       3)  I({A}, {E} | {C})        false - path through C becomes unclocked
                                    by evidence

       4)  I({A}, {E} | {F})        false - same path is unblocked as F is
                                    an evidence node and child of C

       5)  I({A}, {C, F} | {B})     false - because C is a child of A
                                    (path with 1 arc is always unblocked)

       6)  I({A}, {F} | {B, C})     true - 2 paths are "passthrough" an
                                    evidence node.

    c) Determine at least 2 different ways to convert the network into
       a poly-tree using cutset conditioning.

       {A}, all supersets of {A}. Also {B} and {C} are cutsets.


    d) Suppose that P(A) = 0.9, P(E) = 0.3, P(B|A) = 0.1, P(B|~A)  = 0.8.
       Find P(A|E)

       We have I({A}, {E} | {}), so P(A|E) = P(A) = 0.9


5) You need to construct a 3-input neural network that has value 1 in the 
   output just when exactly 1 of its inputs is 1, and 0 otherwise
   (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

      Cannot be done. That is because the plane in 3-space incident on
      the points (0,0,1), (0,1,0), (1,0,0) which are the cases where the
      output has to be 1, has cases requiring an answer of 0 both above
      and below this plane, thus at least 2 separating planes are needed.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

      In order to generate the additional plain, we need at least 1 hidden
      unit, as in the XOR problem. In summary, we need 2 units. Unit 1
      takes all inputs, with weight 1 and threshold 1.5, and generates
      a "hidden" value that is 1 when MORE than 1 input is 1. The output
      takes all inputs (each with a weight of 1) and has a threshold of
      0.5 - which will result in an output of 1 af AT LEAST one of the
      inputs is 1. The output unit also takes the hidden result as
      an input, with a large negative weight (e.g. 5), which results
      in outputing 1 when at least 1 of the circuit inputs is 1, but
      NOT MORE than 1, as desired.

6) a) Using the perceptron learning rule (the "delta" rule), show the steps
      of learning 3-input function AND with a single perceptron
      and a step activation function with threshold 1. Initially all
      weights are 1, and the learning rate is 0.4.

   b) Is it possible to combine back-propagation with genetic algorithms, and
      if so, how?

      Yes, use back-propagation as a "local improvement operator" in each
      iteration before selection. Of course, this can only work if we have
      a gradient measure, as well as a fitness measure for the networks.

7) a) Construct by hand a decision tree for the following table, a set of
      examples for sorting client loan risk.
      where the first decision is based on bad credit (A),
      then if necessary pending number of loan requests - B), 
      granted loan by this bank in past (C), and monthly income (in tens of 
      thousands of dollars a year) (D).

     input variables               decision
   A      B      C      D
--------------------------------------------
   F      1      F      5             small
   F      1      F      4             small
   F      1      F      3             small
   F      1      T      3             small
   F      2      F      3             medium
   F      2      F      4             medium
   T      1      F      8             small
   T      1      T      4             large
   T      1      T      5             large

Note that we do not need to apply the algorithm, as the order of choices
was already (not necessarily correctly) decided for us. Resulting tree is:

A=T: 3 examples, cannot decide yet. Then, branching on B:
  B = 1: cannot decide (still 3 instances), so branch on C:
     C = F: decide "small"
     C = T: decide "large"
  B = 2: decide "large" (default - there are no examples!)
A=F: 6 examples, cannot decide,
  B = 1: 4 examples, decide "small"
  B = 2: 2 examples, decide "medium"

All in all, we have 5 leaf nodes, 4 non-leaf nodes.

   b) Which node will be pruned first, based on least information gain?

As we can only prune nodes that are immediate parents on leaf nodes, the
only candidates are nodes denoted by the paths: (A=F) and (A=T, B=1).
In either of these nodes, the amount of information gain PER EXAMPLE is
the same, i.e. - (1/3 log (1/3) + 2/3 log (2/3)), but the former is more
frequent, so the average amount of information gain for a random pattern is
higher. So, if forced to prune,
we need to prune the branching at (A=T, B=1), and in this case
just answer by majority and decide "small".

   c) Is a more compact decision tree possible (possibly using a different
      ordering of the variables)? Which tree?

The optimal tree has only 3 internal nodes and 4 leaf nodes, branching
on C initially:

C = T: (3 examples)
   A = F: (1 example) decide "small"
   A = T: (2 examples) decide "large"
C = F: (6 examples)
   B = 1: (4 examples) decide "small"
   B = 2: (2 examples) decide "medium"