Introduction to Artificial Inteligence

Assignment 6 - Solutions

Written assignment: inference and learning


1) A) Use situation calculus (i.e. the result function, and fluents) to
   write down the axioms for the following modified wumpus world example.
   The robot needs to get the gold and climb back out.
   Stairs are at (1, 1) always.
   Actions are: 
   a) Move(dx, dy): the robot moves by dx in the x direction, and by dy
      in the y direction (this robot can move omnidirectionally).

       A. (forall x, y, s) RobotAt(x, y, s) ^ GoldAt(x, y, s) =>
                           HasGold(result(Grab, s))

   b) Grab: if the gold is in the same room as the robot, the predicate
      HasGold becomes true as a result.

       B. (forall x, y, s, dx, dy) RobotAt(x, y, s) => 
                           RobotAt(x+dx, y+dy, result(Move(dx, dy), s))

        Note that we also need to define "+" appropriately!

   c) Climb: if the robot is at the room with the stairs, the result is
      that the robot and all it has gets out (success).

       C. (forall s) RobotAt(1, 1, s) ^ HasGold(s) => Success(Climb, s)

    Fluents (situation-dependent predicates) are: 
        GoldAt(x, y, s)  
        HasGold(s)           (Robot has the gold)
        RobotAt(x, y, s)     (Robot at x, y in situation s)
        Success(s)           (In situation s, robot and gold outside maze)



    B) Now, you need to prove that, for the initial situation S where the
       robot is at (1,1), and the gold at (1,2), the action sequence:
            Move(0, 1), Grab, Move(0, -1), Climb
       results in success, that is, prove that:
         Success(result(Climb, 
                        result(Move(0, -1),
                               result(Grab,
                                      result(Move(0, 1), S))))

       From B and from RobotAT(1, 1, S) we get with generalized MP
       (unifier {x/1, y/1, s/S} ) (with axiom for addition)
          1. RobotAT(1, 2, result(Move(0, 1), S))

       The rest of above cannot be proved unless we add frame axioms for
       things that do NOT change as a result of Move and Grab:

       F1. (forall s, dx, dy) HasGold(s) => 
                           HasGold(result(Move(dx, dy), s))

       F2. (forall x, y, s) RobotAt(x, y, s) => 
                           RobotAt(x, y, result(Grab, s))

       Also, gold should not move as a result of move (assuming there is
       a LOT of gold).

       F3. (forall x, y, s, dx, dy) GoldAt(x, y, s) => 
                           GoldAt(x, y, result(Move(dx, dy), s))

       (If we assume that this holds only when the robot does NOT have the
        gold, we will have, instead:
       F3'. (forall x, y, s, dx, dy) GoldAt(x, y, s) ^ ~HasGold(s) => 
                           GoldAt(x, y, result(Move(dx, dy), s))

       but we will ALSO need:
       F4. (forall s, dx, dy) ~HasGold(s) =>
                            ~HasGold(result(Move(dx, dy, s)))

       As well as: ~HasGold(S)

       as well as a few more steps in the proof - but we will use the
       shorter version...)

       Now from F1 and 1 we get: 
          2. RobotAt(1, 2, result(Grab, result(Move(0, 1), S)))

       From F3 and from GoldAt(1,1, S) we get:
          2a. GoldAt(1, 2, result(Move(0, 1), S))

       Then from 1 and 2a and B, unifier: {x/1, y/2, s/result(Move(0, 1), S)}
          3. HasGold(result(Grab, result(Move(0, 1), S)))
        
       From 2 and A and axiom for addition we get:
             4. RobotAt(1, 1, result(Move(0, -1),
                       result(Grab, result(Move(0, 1), S)))
       From 3 and F1 we get:
             5. HasGold(result(Move(0, -1),
                       result(Grab, result(Move(0, 1), S)))

       Finally, from C and 4 and 5, unifier:
        {s/result(Move(0, -1), result(Grab, result(Move(0, 1), S)))}
       we get the desired result.

2) Write down the axioms and facts for example below in FOPC:

   The taxonomic hierarchy for animals, contains dogs, and cats,
   which are disjoint but not exhaustive. 

     A. (forall x) Dog(x) => Animal(x)
     B. (forall x) Cat(x) => Animal(x)
     C. ~(exists x) Cat(x) ^ Dog(x)
     D. (exists x) Animal(x) ^ ~Cat(x) ^ ~Dog(x)

   Dogs in turn contains German shepherds and chihuahuas (which are
   disjoint).

     E. (forall x) GermanShepherd(x) => Dog(x) 
     F. (forall x) Chihuahua(x) => Dog(x) 
     G. ~(exists x) GermanShepherd(x) ^ Chihuahua(x)

   All dogs are carnivorous.

     H. (forall x) Dog(x) => Carnovorous(x)

   German shepherds are large. 

     I. (forall x) GermanShepherd(x) => Large(x) 

   All large dogs can be guard dogs, unless they are deaf. 


     J. (forall x) Large(x) ^ Dog(x) => GuardDog(x) v Deaf(x)

   Fido is a deaf German shepherd.

     K. GermanShepherd(Fido)
     L. Deaf(Fido)

   a) Convert the knowledge base into IMPLICATION normal form

      (note - we will use Pred() instead of T => Pred()
                      and ~Pred  instead of Pred()=> F, for conciseness.
       All identifiers that start with x denote variables)

     A. Dog(x1) => Animal(x1)
     B. Cat(x2) => Animal(x2)
     C. Dog(x3) ^ Cat(X3) => F

   For D, we skolemize and separate out the clause:
     D1.  Animal(Animal1)
     D2.  ~Cat(Animal1)
     D3.  ~Dog(Animal1)

     E. GermanShepherd(x4) => Dog(x4) 
     F. Chihuahua(x5) => Dog(x5) 
     G. GermanShepherd(x6) ^ Chihuahua(x6)  => F

     H. Dog(x7) => Carnivorous(x7)

     I. GermanShepherd(x8) => Large(x8) 


     J. Large(x9) ^ Dog(x9) => GuardDog(x9) v Deaf(x9)

     K. and L. need no changes.

   c) Prove or disprove by using resolution  (or argue that neither 
      can be done):
      1) Fido is an animal.
          From K and E, unifier {x4/Fido} we get:
            1. Dog(Fido)
          From 1 and A, unifier {x1/Fido} we get:
            2. Animal(Fido)

      2) Fido is large.
          From K and I, unifier {x8/Fido}, we get:
            3. Large(Fido)

      3) Fido can be a guard dog.
          From J and 3, unifier {x9/Fido} we get:
            4. Dog(Fido) => GuardDog(Fido) v Deaf(Fido)
          From 4. and 1., empty unifier, we get:
            5. T => GuardDog(Fido) v Deaf(Fido)
          No further progress is possible, either to prove or disprove
            GuardDog(Fido)

      4) If Fido is a chihuahua, then Fido is young.
          This can be proved by contradiction. The above can be written:
            Chihuahua(Fido) => Young(Fido)
          Its negation is ~Young(Fido) ^ Chihuahua(Fido), so we add:
            6.   Chihuahua(Fido)
            6a.  ~Young(Fido)         (Actually, this one is immaterial!)
          From K and G, unifier {x6/Fido} we get:
            7.   Chihuahua(Fido) => F
          And resolving 6 with 7 gives T=>F, contradiction.

      5) Fido is carnivorous.
          From H and 1, unifier {x7/Fido}, we get:
            8. Carnivorous(Fido)
            
      6) If Poopoo is a cat, then it is not a German Shepherd
          Again, we need to show:  Cat(Fido) => ~GermanShepherd(Fido),
          which after negation contains:
            9.  Cat(Fido)
          From 1. and C, unifier {x3/Fido}, we get:
            10. Cat(Fido) => F
          which after unifying 10 with 9 gives T=>F

   d) Would using INPUT RESOLUTION only change the answers to c?
        As seen above, all resolution steps (except for item 3, which
        we could not prove or disprove anyway) used at least one
        clause from either the KB or a clause to be disproved, so this
        WAS just input resolution - so no change.

   e) Can you prove all the above using only either forward or
      backward chaining?
         We could do it with forward chaining, but some of the cases only
         by contradiction (cases 4 and 6).

3) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        E               none
        B               A
        C               A E
        D               A
        F               B C A

    a) Is this network a poly-tree?

     Not a polytree, e.g. the following is a cycle in the underlying
     undirected graph:  A-B-F-A

    b) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({A}, {E} | {})
             True, all paths from A to E must pass F, except for
             the path A-C-E. In all paths passing through F, F is
             a converging node that is blocking because there is no
             evidence. In the path A-C-E, as there is no evidence C
             (converging node) is blocking.

       2)  I({A}, {E} | {B})
             True, as above, because B is not a descendent of either C or F
       3)  I({A}, {E} | {C})
             False, the path A-C-E is now unblocked, because C is
             a converging evidence node on this path.
       4)  I({A}, {E} | {F})
             False, path A-C-E is now unblocked because C does not block
             this path (evidence in F, descendent of C)
       5)  I({A}, {C, F} | {B})
             False, path A-F can never be blocked (no intermediate nodes!)
       6)  I({A}, {F} | {B, C})
             False, path A-F can never be blocked (no intermediate nodes!)
    c) Determine at least 2 different ways to convert the network into
       a poly-tree using cutset conditioning.
         Conditioning on A will make the network into a polytree.
         So will conditioning on A and E.
    d) Suppose that P(A) = 0.9, P(E) = 0.3, P(B|A) = 0.1, P(B|~A)  = 0.8.
       Find P(A|E)
         Since A is d-separated from E with no evidence, they are
         independent and P(A|E) = P(A) = 0.9

4) You need to construct a 4-input neural network that has value 1 in the 
   output just when exactly 2 of its inputs are 1, and 0 otherwise
   (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

     This function is not linearly separable, thus CANNOT be done
     without hidden units.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

      Assume that all thresholds are 1. We will need just 1 hidden unit,
      with input from ALL input signals.
      All its input weights will be 0.4, and thus it will output 1 just
      when 3 or more of the inputs are 1.
      There is one output unit, with 5 inputs: 4 connected to the
      network inputs, each with a weight of 0.6, and one connected to
      the output of the hidden unit, with weight -10.
      Clearly, the output will be 1, if 2 or more inputs are 1, UNLESS
      the hidden unit output is 1, i.e. unless the NN input has 3 or more
      inputs which are 1. This is just what we need.


5) a) Using the perceptron learning rule (the "delta" rule), show the steps
      of learning 3-input function OR with a single perceptron
      and a step activation function with threshold 1. Initially all
      weights are 0.7, and the learning rate is 0.4.

      If we are lucky, the first 3 example fed will be:

      input         O   T
---------------------------
      1  0  0       0   1
      0  1  0       0   1
      0  0  1       0   1

     And after each example the respective weight will be increased to 1.1,
    after which all results will be correct. If unlucky, this could take 
    longer...

   b) Is it possible to combine back-propagation with simulated annealing, and
      if so, how?

         There are several ways to do this, but easiest is to make
        backpropagation as a procedure WITHIN each simulated annealing
        cycle. This is not necessarily very efficient, but will help
        prevent getting stuck in local minima.

6) a) Construct by hand a decision tree for the following table, a set of
      examples for classifying oranges, where the attributes are:
      color (A), shape (B), surface (C), and smell (D).
      Use the greedy algorithm studied in class, with information gain
      as the heuristic for deciding the attribute ate each sub-tree.

     input variables               quality
   A      B      C      D
--------------------------------------------
light  sphere  rough   sharp          1
light  sphere  rough   mild           1
light  sphere  hairy   sharp          0
light  oval    rough   sharp          1
light  oval    rough   mild           0
light  oval    smooth  sharp          1
dark   sphere  hairy   sharp          0
dark   sphere  rough   mild           1
dark   sphere  smooth  mild           1
dark   oval    hairy   sharp          0
dark   oval    hairy   mild           0
dark   oval    smooth  mild           0

At the root, the likelihood over the target attribute is 6:6, and the same
at any branch of D, so D is ruled out. 
Trying A, in the "light" branch  likelihoods of 2:4 and for "dark" we get 4:2. 
For B, "sphere" we have 2:4 and "oval" 4:2, i.e just as good as A.
For C we get: for "rough" 1:4, for "hairy" 0:4, for "smooth" 1:2, and
 a quick check shows that after getting the answer to C we need fewer
 bits of information for a full classification

  (for B this is 2* 6*(1/3 log 1/3 + 2/3 log 2/3), while for
   C this is only 5*(1/4 log 1/4 + 4/5 log 4/5)+3*(1/3 log 1/3 + 2/3 log 2/3) )

So the root of the (greedy) tree is C. 
Recursive calls needed for "rough",
where the likelihoods are 1:4. We now try A, B, D.
For A - "light" we get 1:3, for "dark" 0:1   (need 4*(1/4 log 1/4 + 3/4 log 3/4))
For B - "sphere" we get 0:3, for "oval" 1:1  (need 2 bits after this)
For D - "sharp" we get 0:2, for "mild" 1:2   (need 3*(1/3 log 1/3 + 2/3 log 2/3))
The smallest remaining information required is after applying B, so
B it is for this case. We require a recursive call for B="oval",
and here only D discriminates, so we need D.

At the other branch, C="smooth", either of A, B, or C will result
in likelihoods of 0:1 on one branch and 1:1 on the other. Say we pick
(arbitrarily) A. Now, on the "light" branch we have 0:1, and on
the "dark" branch only B will work. To sum up, the tree we get is:

C ==== hairy =========================> 0
  ==== rough ===== B === sphere ======> 1
                     === oval ==== D === sharp ===> 1
                                     === mild ====> 0
  ==== smooth ==== A ==== light ==================> 1
                     ==== dark === B === oval ====> 0
                                     === sphere ==> 1

This tree has 5 internal nodes!

   b) Is a decision tree with fewer internal nodes possible? Which?

     Yes, actually D is the best root node, because the following tree has
     only 3 internal nodes (but the greedy algorithm cannot find it!)

D ==== sharp ==== C ===== hairy =====> 0
                    ===== rough =====> 1
                    ===== smooth ====> 1
  ==== mild ===== B ===== oval ======> 0
                    ===== sphere ====> 1


   c) Which node will be pruned first, based on least information gain?

From the tree in a), only 2 candidates for removal (those who only have
leaves as children), and each has an information gaint of 2 bit, so either
D or the 2nd level B will be pruned

From the tree in b, either B or C has an information gain of 6, so again
either will be pruned first...