Introduction to Artificial Inteligence

Assignment 6


Written assignment: inference and learning - SOLUTIONS


1) Find the most general unifier for the following pairs, if it exists
   (variables are identifiers with a ? prefix).

   Note that we assume the need to standardize apart, first (we will
   "prime" the variable name in the second term, if necessary).

   a) P(A, ?z, C), P(?x, ?x, ?z)
         {?z|?x, ?x|A, ?z'|C}   (alternately ?z|A, which is equivalent)

   b) Foo(?x, Foo(?x)), Foo(Foo(?y), ?y)
         Fail (occurs check)
   c) Q(?y, G(A, B)), Q(G(?x, ?x), ?y)
         {?y|G(?x, ?x), ?y'|G(A, B)}

2) Write down the axioms and facts for example below in FOPC:
   The taxonomic hierarchy for animals, contains birds, which
   which in turn contains penguins and sparrows (which are
   disjoint).
   All birds can fly, unless they are penguins. No penguins can fly.
   Anything that can fly has wings. Poopoo is a penguin.

       We can use either set notation or predicate notation for sets
       in the hierarchy. In predicate notations we will have one-argument
       predicates: Animal, Bird, Sparrow, Penguin. Also the Can-Fly(x)
       predicate and Wing(x) mean x can fly and x is a wing, respectively.
       We need the predicate Has-part(x, y) meaning x has part y.

       The hierarchy:
       (1)  (forall x) Bird(x) => Animal(x)
       (2)  (forall x) Penguin(x) => Bird(x)
       (3)  (forall x) Sparrow(x) => Bird(x)
       (4)  (forall x) ~(Sparrow(x) ^ Penguin(x))

       Other facts:
        (5)  (forall x) Bird(x) ^ ~Penguin(x) => Can-fly(x)
        (6)  (forall x) Penguin(x) => ~Can-fly(x)
        (7)  (forall x) Can-fly(x) => (exists y) Has-part(x, y) ^ Wing(y)
        (8)   Penguin(Poopoo)

   a) Convert the knowledge base into normal form

       Using ? as a prefix to denote variables, we get (in CNF):
       (1)  ~Bird(?x)    v Animal(?x)
       (2)  ~Penguin(?x) v Bird(?x)
       (3)  ~Sparrow(?x) v Bird(?x)
       (4)  ~Sparrow(?x) v ~Penguin(?x)

       Other facts:
        (5)  ~Bird(?x) v Penguin(?x) v Can-fly(?x)
        (6)  ~Penguin(?x) v ~Can-fly(?x)
        (7)  ~Can-fly(?x) v Has-part(?x, wing-of(?x))  ; Wing-of: skolem func.
        (7') ~Can-fly(?x) v Wing(wing-of(?x))
        (8)  Penguin(Poopoo)

   b) Show how this knowledge-base should be indexed - using a table.

      Using a simple predicate-based table, with clause.literal numbers
      (note that since this is CNF rather than INF, we do not have 
      "conclusion" and "premise" columns):

      Predicate            Positive            Negative

       Animal               1.2
       Bird                 2.2, 3.2           1.1, 5.1
       Penguin              8.1, 5.2           2.1, 4.2, 6.1
       Sparrow                                 3.1, 4.1
       Can-fly              5.3                6.2, 7.1, 7'.1
       Has-part             7.2
       Wing                 7'.2
 
  c) Prove or disprove by using resolution  (or argue that neither 
      can be done):
      1) Poopoo has wings,
          Cannot be proved or disproved - only 7' has the predicate
          "Wing", and since we can prove ~Can-fly(Poopoo), there is
          no way to prove or disprove that Poopoo has wings.

      2) Poopoo can fly.
         From (6) and (8) with unifier {?x|Poopoo} we get
         (9) ~Can-fly(Poopoo)
      3) Poopoo is an animal.
         From (1) and (2) with unifier {?x|?x'}, resolving on Bird(?x) we get:
         (10) ~Penguin(?x') v Animal(?x')
         Then from (8) and (10) with unifier {?x'|Poopoo} we get:
           Animal(Poopoo)
      4) If poopoo is a sparrow, then Poopoo cannot fly.
         We need to prove: ~Sparrow(Poopoo) v ~Can-fly(Poopoo)
         We will add the negation of this as:
         (11)  Sparrow(Poopoo)
         (11') Can-fly(Poopoo)
         Now resolving (9) and (11') results in the empty clause
         (contradiction - proving the sentence).
         
   d) Would using INPUT RESOLUTION only change the answers to c?
         No, we only used input resolution in the answers.

3) It is known that there is a correlation between smoking and cancer.
   Medical associations claim that smoking causes cancer. Tobacco companies
   claim that there is a hidden element that causes both cancer and smoking.
   Draw a Bayes network for each of the above claims.

   Medical associations claim network is:    smoking ---=> cancer

   Tobbaco companies claim network is:    Hidden element ----=> smoking
                                                  \
                                                   --------=> cancer


4) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        E               none
        B               A
        C               A E
        D               A
        F               B C

        Draw the network (assume arrows pointing downwards)

                     A   E
                    /|\ /    
                   D B C
                      \|
                       F

    a) Is this network a poly-tree?
           No. A - B - F and A - C - F are paths in the network.

    b) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({A}, {E} | {})     True (paths blocked by C, F, resp.)
       2)  I({A}, {E} | {B})    True (paths blocked by C, F resp.)
       3)  I({A}, {E} | {C})    False (path A C E unblocked)
       4)  I({A}, {E} | {F})    False (path A C E unblocked)
       5)  I({A}, {C, F} | {B}) False (path A C always unblocked)
       6)  I({A}, {F} | {B, C}) True  (paths blocked by B, C, resp.)

    c) Determine at least 2 different ways to convert the network into
       a poly-tree using clustering.

        1) Make B-C a cluster.
        2) Make A-B-C-F a cluster.

    d) Suppose that P(A) = 0.9, P(E) = 0.3, P(B|A) = 0.1, P(B|~A)  = 0.8.
       Find P(A|E)

         Since I({A}, {E} | {}) then P(A|E) = P(A) = 0.9

5) You need to construct a 3-input neural network that has value 1 in the 
   output just when exactly 2 of its inputs are 1, and 0 otherwise
   (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

      No - the function is not linearly separable in 3 dimensions.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

      Use a hidden unit that is on if ALL of its inputs are 1
      (for example, input weights 1 and threshold 2.5).
      Use one output unit with input weights 1 from each of the input
      layer neurons is 1, weight -10 on an arc from the hidden unit,
      and threshold 1.5.

      Output will be 1 if AT LEAST 2 of its inputs are 1, but NOT when
      all 3 inputs are 1, which is exactly what we need.

6) Using the perceptron learning rule (the "delta" rule), show the steps
   of learning the function OR with a single perceptron
   and a step activation function with threshold 1. Initially all
   weights are 1, and the learning rate is 0.4.

      If the threshold is such that a weighted sum EQUAL to the threshold
      causes an output of 1, a single epoch finds no errors and the
      perceptron is trained. If the output is one only on STRICTLY greater
      than 1, in the first epoch the examples 01 and 10 will increase the
      respective weights to 1.4, after which no further changes will occur.

7) Is it possible to combine back-propagation with genetic algorithms, and
   if so, how?

     Yes, use back-propagarion as a local improvement operator on elements
     of the population, for example just before computing fitness in each
     iteration.

8) a) Construct by hand a decision tree for the following table,
      where the first decision is based on A, then if necessary B, etc.
     input variables               decision
   A      B      C      D
--------------------------------------------
   T      1      F      5             fly
   T      1      F      4             fly
   T      1      F      3             fly
   T      1      T      3             fly
   T      2      F      3             walk
   T      2      F      4             walk
   F      1      F      2             fly
   F      1      T      4             crawl
   F      1      T      5             crawl

   A  ---(true)--- B ---(1)---  fly
    \                \
     \                 --(2)---  walk
      \
       --(false)-- C -(false)--  fly
                     \
                      -(true)--  crawl

   b) Which node will be pruned first, based on least information gain?

         Nodes B, C have the same information gain PER ENCOUNTERED EXAMPLE
         AT THE NODE.
         However, C is needed on fewer examples and should be pruned first

   c) Is a more compact decision tree possible (possibly using a different
      ordering of the variables)? Which tree?

      Since there is no subset of 2 attributes which uniquely determine
      the decision variable, every decision tree must have 3 or more
      non-leaf nodes, thus the above tree is optimal.