Introduction to Artificial Inteligence

Assignment 6- solutions

Written assignment: inference and learning


1) Write down the axioms and facts for example below in FOPC:
   The taxonomic hierarchy for furniture, contains chairs, which
   which in turn contains stuffed chairs and plastic chairs (which are
   disjoint). All chairs have 4 legs. Stuffed chairs have padding.
   All chairs with padding are comfortable, unless they are broken.
   The item of furniture in my office, which I call
   "the contraption" is a stuffed chair.

      1. (forall x) Chair(x) => Furniture(x)

      2. (forall x) Stuffed_chair(x) => Chair(x)
      3. (forall x) Plastic_chair(x) => Chair(x)
      4. (forall x) ~Plastic_chair(x) v ~Stuffed_chair(x)

      5. (forall x) Chair(x) => 
            (exists g) Set(g) ^ Cardinality(g) = 4 ^ 
	        (forall l) In(l, g) => (Leg(l) ^ Has_part(x, l))

      6.   Greater(4, 1)
      6.1  (forall x, y, z) Greater(x, z) ^ (y=x) => Greater(y, z)
            ; Ad-hoc axiom for equality

      7. (forall x) Stuffed_chair(x) => Has_padding(x)

      8. (forall x) Chair(x) ^ Has_padding(x) ^ ~Broken(x) => Comfortable(x)

      9. Stuffed_chair(the contraption)

   a) Convert the knowledge base into normal form
         Let us convert each item separately (so as not to have
         too many variable names). Items with ? are variables.

         1.   ~Chair(?x) v Furniture(?x)
         2.   ~Stuffed_chair(?x) v Chair(?x)
         3.   ~Plastic_chair(?x) v Chair(?x)
         4.   ~Plastic_chair(?x) v ~Stuffed_chair(?x)

         5.1.  ~Chair(?x) v Set(legs-of(?x))
         5.2.  ~Chair(?x) v Cardinality(legs-of(?x))=4
         5.3.  ~Chair(?x) v ~In(?l, leg-of(?x)) v Leg(?l) 
         5.4.  ~Chair(?x) v ~In(?l, leg-of(?x)) v Has_part(?x, ?l)

           where (leg_of(?var)) is a skolem function.

      6.    Greater(4, 1)
      6.1   ~Greater(?x, ?z) v ~(?y=?x) v Greater(?y, ?z)

      7. ~Stuffed_chair(?x) v Has_padding(?x)

      8. ~Chair(?x) v ~Has_padding(?x) v Broken(?x) v Comfortable(?x)

      9. Stuffed_chair(the contraption)


   b) Show how this knowledge-base should be indexed - using
      a table.

      Not very interesting, so this is skipped...

   c) Prove or disprove by using resolution  (or argue that neither 
      can be done):
      1) The contraption is a chair.

          Can be proved. Add:
            10. ~Chair(the contraption)
          and now: resolve 10 with 2, ?x=the contraption to get:
            11. ~Stuffed_chair(the contraption)
          Now resolve 11 with 9 to get an empty clause (contradiction)

      2) The contraption is comfortable.
              Cannot be proved or disproved, since the only clause which
              mentions "Comfortable" also has "Broken" and there is
              no other clasue which mentions "~Broken".

      3) The contraption has padding.

           Can be proved. Add:
             10. ~Has_padding(the contraption)
           Resolve with 7. ?x=the contraption, to get:
             11. ~Stuffed_chair(the contraption)
           Resolve 11 with 9 to get the empty clause.

      4) If the contraption is a plastic chair, then it is a dog.

           Can be proved. We need to prove: 
             Plastic_chair(the contraption) => Dog(the contraption)
           and its negation is:
             Plastic_chair(the contraption) ^ ~Dog(the contraption)
           which are added to the KB:
             10. Plastic_chair(the contraption)
             11. ~Dog(the contraption)
           (actually, 11 is irrelevant and will never be used)
           Now, resolve 10 with 4, ?x=the contraption to get:
             12. ~Stuffed_chair(the contraption)
           Resolve 12 again with 9 to get the empty clause.

      5) The contraption has more than 1 leg.
            Can be proved. We need to prove:
              (exists g) Greater(Cardinality(g),1) ^ Set(g)
                         ^ (forall l) In(l, g) => 
                               Leg(l) ^ Has_part(the contraption, l)
            Its negation is:
              (forall g) ~Greater(Cardinality(g),1) v ~Set(g)
                  v (exists l) In(l, g) 
                        ^ (~Leg(l) v ~Has_part(the contraption, l)
            Converting to CNF, and adding to the KB, we get:
              10. ~Greater(Cardinality(?g),1) v ~Set(?g) v ~Leg(foo-of(?g)
                      v ~Has_part(the contraption, foo-of(?g))
              11. ~Greater(Cardinality(?g),1) v ~Set(?g) 
                      v In(foo-of(?g), ?g)
            Resolve 9 with 2, ?x=the contraption, to get:
              12. Chair(the contraption)
            Resolve 12 with 5.1, 5.2., 5.3, and 5.4 each 
	    with ?x=the contraption to get:
              13. Set(legs-of(the contraption))
              14. Cardinality(legs-of(the contraption)) = 4
	      15. ~In(?l, leg-of(the contraption)) v Leg(?l)
	      16. ~In(?l,leg-of(the contraption))v Has_part(the contraption,?l)
	    Resolve 14 with 6.1, where
            ?y=Cardinality(legs-of(the contraption)) and ?x=4 to get:
              17. Greater(Cardinality(legs-of(the contraption)), ?z) v
                     ~Greater(4, ?z)
            Resolve 17 with 6, ?z=1, to get:
              18. Greater(Cardinality(legs-of(the contraption)), 1)
            Resolve 18 with 10, and 11 resp. with 
	    ?g=legs-of(the contraption) to get:
	      19. ~Set(legs-of(the contraption))
                      v ~Leg(foo-of(legs-of(the contraption))
                      v ~Has_part(the contraption, 
                             foo-of(legs-of(the contraption))
              20. ~Set(legs-of(the contraption)) v
                      v In(foo-of(legs-of(the contraption)),
                              legs-of(the contraption))
            Resolve  13 with 19 and 20 resp. to get:
	      21. ~Leg(foo-of(legs-of(the contraption)) v
                  ~Has_part(the contraption, foo-of(legs-of(the contraption))
              22. In(foo-of(legs-of(the contraption)),legs-of(the contraption))
            Resolve 15 with 22, ?l=foo-of(legs-of(the contraption), to get:
              23. Leg(foo-of(legs-of(the contraption)))
            Resolve 16 with 22, ?l=foo-of(legs-of(the contraption), to get:
              24. Has_part(the contraption, foo-of(legs-of(the contraption))
            Resolve 23 with 21, to get:
              25. ~Has_part(the contraption, foo-of(legs-of(the contraption))
            Finally, resolve 24 with 25 to get the empty clause.

      SHOW ALL STEPS, including the relevant unifiers, etc.
   d) Would using INPUT RESOLUTION only change the answers to c?
        No changes, except it seems that the last proof would not work
        with just input resolution.
   e) Can you prove all the above using only either forward or
      backward chaining?
        First, we can represent only the Horn clauses. Everything here is in
        Horn form, except for axiom 8, so all but axiom 8 can be used. Since
        none of the proofs used axiom 8, and all the proofs had only Horn
        clauses, it appears that these proofs could have been done with
        EITHER forward OR backward chaining.

2) Consider the following variables and their immediate causes (parents):

     variable         parents
-------------------------------
        A               none
        E               F
        B               A
        C               A
        D               none
        F               B C D

    a) Is this network a poly-tree?

       No, its underlying underected graph has a cycle: A-B-F-C-A

    b) Determine the truth of the following independence
       statements (using d-separation):
       1)  I({A}, {D} | {})
            Yes, all paths go through F and are blocked at F because
            F is a converging node with no evidence at F or below

       2)  I({A}, {D} | {F})
            No, because  A-C-F-D is unblocked (at F due to evidence at F,
            and at C because it is a non-converging node with no evidence

       3)  I({A}, {D} | {F, B, C})
            Yes, becase all paths must pass at either B or C in a
            non-converging manner, and are blocked at B or C as they are
            evidence

       4)  I({A}, {E} | {F})
            Yes, F blocks all paths from A to E since F is a non-converging
            evidence node on all these paths.

       5)  I({A}, {C, F} | {B})
            No, path A-C can never be blocked.

       6)  I({A}, {F} | {B, C})
            Yes, either B or C block every path.

    c) Determine at least 2 different ways to convert the network into
       a tree using clustering.
            1) Make BC into a single node, or
            2( Make BCF into a single node, or
            3) Make ABC into a single node or ...
         
    d) Suppose that P(A) = 0.9, P(D) = 0.3, P(B|A) = 0.1, P(B|~A)  = 0.8,
       P(C|A) = 1, P(C|~A) = 0.5.
       Find P(A| B, C, E)

           A is independent of E given BC, so we can just compute P(A |B, C).
           We have P(A|B,C) = P(ABC)/P(BC) = P(A)P(C|A)P(B|AC)/(P(ABC)+P(~ABC))
           = P(A)P(C|A)P(B|A)/( P(A)P(C|A)P(B|A) + P(~A)P(C|~A)P(B|~A)) =
           0.9*1*0.1/(0.9*1*0.1 + 0.1*0.5*0.8) = 0.09/(0.09 + 0.04) ~ 0.7

3) You need to construct a 4-input neural network that has value 1 in the 
   output just when exactly 2 or 3 of its inputs are 1, and 0 otherwise
   (assume inputs are in {0, 1}).
   a) Can this be done without hidden units?

      No, this decision function is not linearly separable.

   b) Show a network using threshold elements that performs
      the required computation (specify the weights, too!)

      With one hidden unit, this should work. We also have 1 output unit.
      The hidden unit receives all inputs, with a weight of 1 and a
      threshold of 3.5. The output units also receives all inputs, again
      with a weight of 1. It also recieves the output of the hidden unit,
      with a weight of -10. Its threshold is 1.5. Clearly is is on when 2 or
      more 2 inputs are 1, as long as the hidden unit is off - and the hidden
      unit disables the output when more than 3 inputs are 1.

4) a) Using the perceptron learning rule (the "delta" rule), show the steps
      of learning 3-input function OR with a single perceptron
      and a step activation function with threshold 1. Initially all
      weights are 1, and the learning rate is 0.4.

      In this case, if a value EXACTLY at the threshold is considered
      "more" than the threshold, the algorithm will make one pass over all
      8 examples and make no changes (no error), and stop.

      If we need a value strictly greater than the threshold, the examples:

      001, 010, and 111 will generate an error, and in each case the respective
      weight will be increased to 1.4. After a second pass over the examples,
      no errors remain and the algorithm stops.

   b) Is it possible to combine back-propagation with genetic algorithms, and
      if so, how?

      Yes. Use GA to learn the weights in a neural network, but for each
      individual weight vector use back-propagation as a local improvement
      operator (to find a local minimum) before computing the fitness.


5) a) Construct by hand (using the greedy search using the best information
      gain heuristic) a decision tree for the following table, a set of
      examples for considering desirability of a potential employee in
      a high-tech company. Data is degree level (A),
      existence of work experience (B),
      self-motivation (C), and  last grade average (D).

     input variables               decision
   A      B      C      D
--------------------------------------------
   1      T      F      85             good
   1      T      F      80             good
   1      T      F      70             good
   1      T      T      70             good
   1      F      F      90             good
   1      F      T      80             bad
   1      F      T      85             bad
   2      T      F      70             average
   2      T      F      80             average

      In the initial situation, remaining information requited is:
      A) If we choose A - case 2 needs 0 information, case 1 has probability
         of 2/7 and 5/7 over 7 cases.
      B) For B we have in one branch 3 cases with probs 1/3 and 2/3,
         in the other 2/6 and 4/6
      C) For C we have the same remaining information as for B
      D) For D we have for the 70 branch: 1/3 and 2/3, for the
         80 branch the same, for the 85 branch 1/2 and 1/2, and for the
         90 branch we have no remaining info.

      Doing the exact numbers, we get that A leaves least remaining
      information and is (locally) optimal.

      Now we remain with the branch 1 of A. Applying B the remaining
      required information is 1/3 and 2/3 for the false branch, and 0 for
      the T branch. For C we have 0 remainder for the F branch, and
      1/2, 2/3 for the T branch. For D we have 0 remainder for the 70
      and 90 branch, but 1/2, 1/2 for the 85 branch and the same for
      the 80 branch, a total of 2 bits which is a bit less than for
      B or C, so we choose attribute D.

      Now, the 85 branch either B or C have the same information gain,
      and for the 80 branch this is the same, so at either branch we can choose
      indifferently. So, the tree is:

      A ----- 2 ---- average
       \------1 ---- D ----- 70 --- good
                       \----- 90 --- good
                        \---- 80 --- B ----  F ---- bad
                                       \----  T ---- good
                          \-- 85 --- B ----  F ---- bad
                                       \----  T ---- good

      This tree has 7 leaves and 4 internal nodes.

   b) After constructing the tree, one can try to prune nodes from the
      bottom up, based on least information gain. Which node would be pruned
      first in this case?

       The only candidates are nodes whose only children are leaves, so
       only 2 candidates. Each candidate has an information gain of 2 bits,
       so is equally good for pruning.

   c) Is a more compact decision tree possible? Which tree?

       In general, the above method is not optimal. In this case,
       the following tree is a correct classifier:

       B-- F -- C --   F ---- good
                  \--   T ---- bad
           T -- C --   T ---- good
                  \--   F --- A --- 1 --- good
                                \--- 2 --- average

       This tree has 4 internal nodes, but only 5 leaves, and is thus better.
       (I believe it is optimal, but not sure).