Introduction to Artificial Inteligence

Assignment 3

Homework assignment - simulators, agents, search, games, logic, inference

1) (search and logic)
   Finding an explanation for observed evidence is known as abductive
   reasoning.  The input is a KB of definite Horn clauses (i.e. "rules",
   see below). Also part of the input is  a set of literals E  (the observed
   evidence) to be explained.    An explanation is a subset S of KB that 
   proves E. Formally, for each literal L in E and or in any rule in S, 
   there is a rule R in S where L is the consequent of R. 

   In this exercise, we wish to solve the weighted abduction version:
   each rule R has a weight w(R), and we need to find an explanation S where
   the sum of the costs of the rules in S is minimal. For example, consider
   the following KB (with costs in parenthesis):

   R1:        a -> b    (1)
   R2:          -> a   (10)
   R3:  b and c -> d    (1)
   R4:        a -> c    (1)
   R5:        x -> d    (2)
   R6:          -> b    (7)
   R7:          -> c    (6)
   R8:          -> d   (20)

   For the case where we wish to find an explanation for E = {d},
   one possible explanation is S = {R8} (with a weight of 20)
   but it is not an optimal explanation, because S = {R3, R6, R7} is
   also an explanation, with a weight of w(R3)+w(R6)+w(R7) = 15.

   a) Formalize the weighted abduction problem as a search problem,
      where the state is a partial solution S', and where an operator adds
      any rule not in the partial solution.

   Answer:
      Initial state: an empty set of rules, and the set E.

      Goal_test(S): 
          Let A = E union { s | s is in some rule R in S}
          Let C = {s | s is a consequent of some rule in S}
          If A is a (non-strict) subset of C, return true.
          Otherwise return false.
           
      Operators:
          Add any rule R from KB that is not in S.

      Path cost:
          Sum of operator costs on the path, where the
          cost of an operator is the weight of the rule
          added by the operator.

     Note: Assuming henceforth that rule weights are positive.

   b) Same as in (a), but allowed rules are only such that have a 
      consequent that already appear in E or in S' (but not in the
      consequent of a rule in S' - no need to explain a literal more
      than once).

  Answer: same as in A, except limit the operators
      such that if a rule R is Ant -> C, then if C
      is a consequent of some rule in S then the operator
      adding R is disallowed.
      
   c) Same as (b), but this time a lexicographical ordering of literals
      is given (say a, then b, etc.), and only rules that have
      a consequent that is first in lexicographic ordering (among
      unexplained literals) are allowed.

   Answer: 
    Same as (b), with the following addition. Let > be a relation
    over literals, denoting the lexicographic ordering. Then
    if C is a consequent of some rule in S, then rules R
    that have a consequent C' with C > C' are prohibited.

   d) Assume that we use uniform-cost search, are we assured of finding
      the minimal explanation using version c? If not, are we assured of such
      using option b? You need to sketch the proof for your answer.

   Answer: yes, version c is sufficient, using proof by contradiction.
      First we show that no duplicates are needed.

      Let S be an optimal solution, and suppose that some
      literals in S are in more than 1 rule consequent. For each
      such literal L, arbitrarily select one rule that has L as
      a consequent, and delete all the rest from S, to form S'.
      Since rule weights are positive, clearly S' has less total weight
      than S, and thus S is not optimal - a contradiction.

      Thus, we can assume that each literal in S is a consequent
      of exactly one rule in S. Sort the rules lexicographically according
      to their consequent literal, to get a sequence Seq. Applying operators
      adding rules in order according to Seq results in S.
      Thus S is reachable using operators of option c alone.
      The properties of uniform-cost search assure us that no
      goal state S' can be found before S, unless it has equal path-cost
      (in which case S' is also optimal).


   e) We wish to use A* to perform weighted abduction. Find an
      admissible heuristic for versions a, b, c, (they may all be the
      same heuristic, and it does NOT have to be a GOOD heuristic).
      Now, show an execution of A* using your heuristic to solve the
      weighted abduction problem, given as input the above KB and
      the evidence to be explained E = {d}.
      Use only one of the search problem versions a, b, c - your choice
      (hint: which would result in the shortest trace?)

  Answer: an obvious admissible heuristic is h(S) = 0.

      Finding a better admissible heuristic is not very easy (see f).
      We will trace option c, which has the smallest branching factor
      and thus will be less work.

      Initial state (S0): 
          literals to prove: {d}
          set of rules: {}    (empty).

      States in agenda (with f-value):  (S0, 0)

      Retrieve: S0

      Expanding: 
        use rule R3, to get S1: to prove: {b, c}, rules {R3}, f(S3)=1
        use rule R5, to get S2: to prove: {x}, rules {R5}, f(S2)=2
        use rule R8, to get S3: to prove: {}, rules {R8}, f(S3)=20

      (note that in "to prove" we drop literals that are in a 
       consequent of S, for conciseness).

      Agenda is now: (S1, 1), (S2, 2), (S3, 20)

      Retrieve: S1 (to prove: {b, c}, rules {R3}, f(S3)=1)
      Expanding: 
        use rule R1, to get S4: to prove: {a, c}, rules {R1, R3}, f(S4)=2
        use rule R6, to get S5: to prove: {c}, rules {R3, R6}, f(S5)=8

      (Note that using this option, we only add rules that prove the
       lexicographically smallest literal, thus do NOT try to add
       R4 ad R7 at this point!)

      Agenda is now:  (S2, 2), (S4, 2) (S5, 8) (S3, 20)

      Retrieve: S2 (to prove: {x}, rules {R5}, f(S2)=2)
      Result: fail (not a goal state, and cannot expand)

      Agenda is now:  (S4, 2) (S5, 8) (S3, 20)

      Retrieve: S4 (to prove: {a, c}, rules {R1, R3}, f(S4)=2)
      Expanding: 
        use rule R2, to get S6: to prove: {c}, rules {R1, R2, R3}, f(S6)=12

      Agenda is now:  (S5, 8) (S6, 12), (S3, 20)

      Retrieve: S5 (to prove: {c}, rules {R3, R6}, f(S5)=8)
      Expanding: 
        use rule R4, to get S7: to prove: {a}, rules {R3, R4, R6}, f(S7)=9
        use rule R7, to get S8: to prove: {}, rules {R3, R4,R7}, f(S8)=14

      Agenda is now:  (S7, 9), (S6, 12), (S8, 14), (S3, 20)

      Retrieve: S7 (to prove: {a}, rules {R3, R4, R6}, f(S6)=9)
      Expanding: 
        use rule R2, to get S9: to prove: {}, rules {R2, R3, R4, R6}, f(S9)=19

      Agenda is now:  (S6, 12), (S8, 14), (S9, 19) (S3, 20)

      Retrieve: S6 (to prove: {c}, rules {R1, R2, R3}, f(S6)=12)
      Expanding: 
        rule R4, to get S10: to prove: {}, rules {R1, R2, R3, R4}, f(S10)=13
        rule R7, to get S11: to prove: {}, rules {R1, R2, R3, R6}, f(S11)=19

      Agenda is now:  (S10, 13),  (S8, 14), (S9, 19) (S11, 19), (S3, 20)

      Retrieve: S10 (to prove: {}, rules {R1, R2, R3, R4}, f(S10)=13)

      This is a GOAL STATE and the optimal solution {R1, R2, R3, R4}
      is returned. That is, assume "a" (weight 10), then use "a"
      to prove b and c (weight 1 each), and then use b and c to prove d
      (weight 1).


   f) A student in the AI course suggested the following heuristic,
      called "total weight", computed as follows.

      For each literal L, initial its "total weight" T(L) to be infinite.
      If there is any rule R that has L as a consequent, and such
      that w(R) plus the sum of total weights off all its antecedents is
      less than T(L), update T(L). Keep doing so until no further
      updates are possible.

      Now, for a partial solution S', let h(S') be the sume of
      total weights for all unexplained literals in S' and E.

      For example, starting with infinite T(.) for all literals,
      we can set T(a) = 10 due to rule R2, and 
      T(b) = T(a) + 1 due to rule R1. Now, we can set T(b) = 7 due to
      rule R6, etc. Now, suppose we have E = {b} and S' is empty.
      Since there is one  unexplained literals (namely b) in E and
      S', we have h(S') = T(b) = 7.

      Is this heuristic admissible? If yes, provide a proof outline.
      If not, provide a counterexample.  

  Answer: not admissible. In fact, the above KB was set up to provide
      such a counterexample. The literal weights are:
           T(a) = 10
           T(b) =  7
           T(c) =  6
           T(d) = 14 (= T(b)+T(c) + 1)
           T(x) = infinity

      In fact, the initial state with {d} to be proved already
      has h(S) = 14, which is an overestimate, as we have seen a
      solution with weight 13. 

2) (FOL and situation calculus)
   We need to represent and reason within the following scenario, called
   the BGU shooting problem (based on the infamous Yale Shooting problem):
   A student in the AI course is so upset with his instructor, Fred, that
   he decides to shoot him. He therefore intends to wait outside the
   classroom with a loaded gun, and shoot Fred when he arrives.

   a) We need to represent the following facts using FOL and situation
      calculus, using successor-state axioms:

      A) The action of loading a gun makes it loaded. (Action name: Load)
      B) The action of waiting makes Fred arrive at the intended
         assasination site. (Action name: Wait)
      C) If Fred is at the intended assasination site and the gun
         is loaded, the action of shooting kills Fred.
         Shoot also makes the gun unloaded. (Action name: Shoot)
      D) In situation S0, Fred is alive, the gun is not loaded, and
         Fred is not at the assasination site.

      Use the following predicates to represent the above information:
         Loaded(s)    :  the gun is loaded in situation s
         Dead(s)      :  Fred is dead in situation s
         Loc(s)       :  Fred is at the assasination site in situation s
      Obviously, you also need to use the Result(a, s) function...

      Answer:
        A + last line of C: (one successor state action for "Loaded")
            forall a, s Loaded(Result(a, s)) iff  
                         ((Loaded(s) and a not = Shoot) or
                           (a = Load))

        B: (very simple successor state axiom for "Loc")
            forall s Loc(Result(Wait, s))

        C: (successor state axiom for "Dead")
            forall s Loc(s) and Loaded(s) -> Dead(Result(Shoot, s))

        D: (facts about S0)
            not Dead(S0)
            not Loaded(S0)
            not Loc(S0)

        Note that we will also need to say that the actions are different,
        and in particular: 
          DIFF1: not =(Wait, Shoot)
        And also about equality of actions, in particular:
          EQ1:  =(Load, Load)

   b) Now, we need to prove that the following sequence of actions (starting
      from situation S0): Load, Wait, Shoot - results in Fred's death. 
      Proceed as follows:
      A) Represent the fact to be proved, and its negation.
         Answer: need to prove
             Q: Dead(Result(Shoot, Result(Wait, Result(Load, S0))))

          So we add the negation of this fact to KB:
             Q': not Dead(Result(Shoot, Result(Wait, Result(Load, S0))))

      B) Convert all the knowledge you represented in A into CNF
          Start with A, need to get rid of "iff" first:
            forall a, s ( not Loaded(Result(a, s)) or  
                            ((Loaded(s) and a not = Shoot) or
                              (a = Load))) and
                       ( Loaded(Result(a, s)) or  
                            not ((Loaded(s) and a not = Shoot) or
                                  (a = Load)))

          Treating "=" as a predicate, and moving negation inward:
            forall a, s ( not Loaded(Result(a, s)) or  
                            ((Loaded(s) and  not =(a, Shoot)) or
                              (=(a, Load)))) and
                       ( Loaded(Result(a, s)) or  
                            (not (Loaded(s) or =(a, Shoot)) and
                               not =(a, Load)))

          We only have UNIVERSAL quantifiers, so can skip some steps. Now,
          dropping "forall s", distibuting or over and + separating clauses, we get:

           A1:   not Loaded(Result(a, s)) or Loaded(s) or =(a, Load)
           A2:   not Loaded(Result(a, s)) or not =(a, Shoot) or =(a, Load)
           A3:   Loaded(Result(a, s)) or not Loaded(s) or =(a, Shoot)
           A4:   Loaded(Result(a, s)) or not =(a, Load)

           Now, sentence B becomes (only need to remove quantifier):
            B: Loc(Result(Wait, s))

           Now for sentence C we get rid of implication:
             forall s not (Loc(s) and Loaded(s)) or Dead(Result(Shoot, s))

           Moving negation inwards:
             forall s not Loc(s) or not Loaded(s) or Dead(Result(Shoot, s))

           Getting rid of the only universal quantifier, we get:
            C: not Loc(s) or not Loaded(s) or Dead(Result(Shoot, s))

          Finally, the facts about S0 are already in CNF:
            D1: not Dead(S0)
            D2: not Loaded(S0)
            D3: not Loc(S0)

      C) Use a refutation proof with resolution to show Fred's death.

 Answer:
   Resolve Q' with C, unifier {s | Result(Wait, Result(Load, S0))} to get:
    1: not Loc(Result(Wait, Result(Load, S0))) or
       not Loaded(Result(Wait, Result(Load, S0))      

   Now, resolve 1 with B, unifier {s | Result(Load, S0)} to get:
    2: not Loaded(Result(Wait, Result(Load, S0))

   Resolve 2 with A3, unifier {s | Result(Load, S0), a | Wait} to get:
    3:  not Loaded(Result(Load, S0)) or =(Wait, Shoot)

  Resolve 3 with DIFF1, empty unifier, to get:
    4: not Loaded(Result(Load, S0)

  Resolve 4 with A4, unifier {s | S0, a | Load} to get:
    5: not =(Load, Load)

  Now, resolve 5 wirh EQ1, empty unifier, to get the empty clause, QED.

  (Note that in the proof the data about S0 is actually redundant, as
   are axioms A1 and A2).


   c) What would be a potential problem in the proof if we did not
      have "frame axioms" in our representation (i.e. stating what
      did not change)?

  Answer:
   In this case, we will be missing A3, which states that the gun remains
   loaded (if it was loaded before the action), unless the action is "Shoot".
   We will thus not be able to prove that the gun is loaded just before shooting,
   and thus will not be able to prove Fred's death.

   d) Would we be able to do the proof using only forward chaining?

  Answer:
   No, because A3 is not in Horn form (it has 2 positive literals), and
   is essential for the proof.

3) (Propositional logic)
   For each of the following sentences, determine whether they are
   satisfiable, unsatisfiable, and/or valid. For each sentence that
   is satisfiable, determine the number of models (over the propositional
   variables in the sentence).
   a) ((A -> B) and (not B or C)) -> (not A or C)
 
       Valid (and thus also satisfiable).
       Since there are 3 variables, and all assignments are models,
       it has 2^3 = 8 models.

   b) A -> not A

      Satisfiable, with ONE model (A=false).

   c) X or Y or Z

      Satisfiable, with 7 models (only one assignment - X=Y=Z=False is not a model)

   d) X and Y and Z

      Satisfiable, with 1 models (only one assignment - X=Y=Z=True is a model)

   e) (A -> B) and (C -> D) and (E -> F) 

      Satisfiable, with 3*3*3=27 models.

4) (Game trees and search)
   Consider the stochastic game to be implemented in assignment 2:
   grid based environment, agents can move up, down, left, or right, or shoot,
   but where walking deliberately into a wall or waiting
   is not possible.
   You may assume a time limit as desired for the length of the game in
   each scenario.

   a) Show a scenario (with no bullets) where for the first agent to move,
      under zero-sum scoring, moving in a certain direction (say, left) loses,
      but under total flags collected scoring, the optimal action is to
      move left.

     Consider the folowing scenario, with 2 moves (4 ply) until the end of the
     game, where # is a wall, A is the first agent, B is the second agent,
     F is a flag worth 10, and G is a flag worth 100.

     ######
     #  G##
     #F AB#
     ######

     Under zero-sum scoring, clearly A should go UP to get G for a score of 100
     (since B does not get to F before the end of the game). A move of LEFT by
     A lets it get to flag A, but then B can get to G, and A will get 10-100 = -90
     and lose. Proof: see tree.

        Moves              Score (for A)
----------------------------------------------------------
     A: U B: L A: L B: U    (100)
                       L    (100)
        L B: L A: U B: U   (-100)
                       L      (0)
                  L B: U    (-90)
                       L     (10)

     But under total-flags scoring, each agent gets as a score the value of all
     flags collected. So here the optimal action by A is ledt to get flag F,
     and B will get flag G, for a score of 110 for both agents. 
     Proof: see tree (same as before, except different score).

        Moves              Score (for both A and B)
----------------------------------------------------------
     A: U B: L A: L B: U    (100)
                       L    (100)
        L B: L A: U B: U    (100)
                       L      (0)
                  L B: U    (110)
                       L     (10)


   b) In the game with bullets, show a scenario where the the first agent to move loses.

      #####
      ## F#
      #A  #
      ##B##
      #####

      This game is completely symmetrical, so we will just show that if agent
      A starts the game, it loses. Assuming each agent only has 1 bullet. Note
      that initially, each agent can only move out in front of the other agent
      or shoot (in which case it can no longer shoot). We will assume that
      an agent can wait only if no other action is possible. Assume probability
      of hitting the enemy is 1. A partial game tree is:

      A: S B: U A: W B: S -----  (A dies, eventually B gets the flag)
           B: S          ------  (not a good option for B, as then A can win,
                                  so B will not do it).
         R B: S           -----  (A dies, eventually B gets the flag)
           

   In both cases, you must PROVE the scores by generating the respective game tree, either
   to a terminal position or prove that you can prune some branches.

   c) Suggest heuristic evaluation functions for the total-flags scoring version of the game without bullets:
     1) One that is a lower bound, but non-trivial.

        Number of flags collected until the current state, plus the
        following quantity Q:
        Compute the graph of distances along the grid, taking obstacles
        into account, but not the other agent. Find a minimal spanning tree
        (arc weights being the distances). Find a subtree with total distance 
        (twice weight of tree edges) less
        than the number of turns left, with maximum value of flags, and
        let Q be that value. (This is a lower bound, because one agent can just keep
        out of the way, and the other agent can always traverse the low-distance
        part of the spanning tree in the available number of moves).

     2) One that is an optimistic upper bound, but non-trivial

        Number of flags collected until the current state, plus the following
        quantity Q':
        Find the minimal distance D between flags (considering the agent positions
        as flag locations). Now,
        compute a boung on maximum number of flags in the available number of
        ply, P, as Fcount = P / D. Let Q' be the total value of the Fcount
        best flags. (It is an upper bound because there is no way the agents
        can collect more than Fcount flags together).
    

5)  (Game-tree search - alpha-beta pruning)
   a) Give an example of a 4-ply game tree (branching factor 2)
     where alpha-beta pruning saves NOTHING, and another where the savings
     is maximal. How many nodes are pruned in each case?

For exposition, we will use L, R for MAX's moves, and l, r for MINs moves.

 MAX val | MIN val | MAX val | MIN  val
------------------------------------------
   L  5     l  10    L   0     l    0
                               r   10
                     R   10    l   20
                               r   10
            r   5    L  -10    l  -10
                               r    0
                     R    5    l   10
                               r    5
   R 50     l  50    L   10    l   10
                               r   20
                     R   50    l   60
                               r   50
            r 100    L    0    l   10
                               r    0
                     R  100    l  110
                               r  100

In this case, alpha-beta will not be able to prune anything.
MAX will pick move R to get a value of 50.


The example with maximal pruning:

 MAX val | MIN val | MAX val | MIN  val
------------------------------------------
   L  100   l  100   L  100    l  100
                               r  110
                     R <=10    l   10
                               r   ---  pruned  since R will not be chosen
            r >=200  L  200    l  200
                               r  210
                     R         l  ----  Pruned from R onwards, since r will
                               r  ----  not be chosen.
   R <=10   l <=10   L  <=10   l   10
                               r  ---- Pruned, since L wil not be chosen
                     R   <=0   l   0
                               r  ---- Pruned, since top-level R will not be chosen
            r        L         l  ---
                               r  ---  Prune whole subtree from r onwards,
                     R         l  ---  because the top-level R will not
                               r  ---  be chosen.

Value of top node (not shown) is 100, where MAX picks move L
Note that the size of the pruned sub-trees keep increasing, in the best case.


   b) Provide an example of a 2-ply + 1 chance node level game tree where
      alpha-beta saves search, and a similar example where changing the
      leaf values monotonically results in no savings for pruning. 

We will use the same notation as above, but with a chance node consisting of a
coin toss: H, T. Assume that terminal values are known to be bounded,
with a maximum absolute value of 10.

 MAX val | CHANCE  val |  MIN val 
------------------------------------------
 L    5     H      8       l     10
                           r      8
            T      2       l      2
                           r     10
 R   <=0    H    -10       l    -10
                           r    (-8)   ---- pruned
            T              l    (10)   ---- pruned at T (because MAX will not pick R)
                           r    (0)    ---- pruned

Observe that values in parenthesis will not be seen by an algorithm, as
these nodes or theire parents can be pruned.

Note that a different set of values, even monotonically changed, may void the
possibility to prune, e.g.:

 MAX val | CHANCE  val |  MIN val 
------------------------------------------
 L    1.5   H      2       l      5
                           r      2
            T      1       l      1
                           r      5
 R   -0.5   H     -1       l     -1
                           r   -0.5
            T      0       l      5
                           r      0

Now nothing can be pruned.

6)  (agents and environment simulators)
   For each of the following domains, which type of agent is appropriate
   (table lookup, reflex, goal based, or utility based). Also, indicate the
   type of environment:
   a) An agent that solves the 8 puzzle.

The environment here is the easiest: deterministic, accessible, episodic,
discrete, and static. 

A goal based agent suffices. However, since the problem is rather small,
one can pre-solve for all problem instances, and use a simple reflex agent.
In fact, even a table lookup based agent can easily be implemented here
(only 9!/2 possible states).


   b) An agent that plays checkers.

The environment here is easiest: deterministic, accessible, (possibly) episodic,
discrete, but not static (due to the opposing agent).

Game is considered too big to implement as reflex-based, so goal-driven
or utility-based agent would be needed.

   c) An agent that can pass the Turing test.

The environment here is one of the hardest possible: stochastic,
non-accessible,  episodic (if only a single test instance), continuous, and dynamic.

A utility-based agent would be needed.

   d) An autonomous robot for exploring Mars.

The environment here is one of the hardest possible: stochastic,
non-accessible, nonepisodic, continuous, and dynamic.

Requires, to do a good job, a utility-based agent. Current implementation
is actually reflex (with internal state), with a goal-based component elsewhere.

   e) An internet shopping agent (flight tickets domain).

The environment here is also rather hard: stochastic (to some extent)
non-accessible, non-episodic, continuous (but possibly only in a few parameters,
such as prices), and dynamic.

Usually requires a utility-based agent.

Justify all answers shortly!
One comment about the non-observable tic-tac-toe question from the old exam, mentioned in class Dec 6: you cannot solve it using standard decision trees, even with simple chance nodes.
A simplified version, where the opponent is stochastic, is still problematic, as you need a "game tree" over the belief state (something you have not yet studied). The full version where it is not observable and the opponent is not stochastic is truly a hard problem, and even the notion of optimality there is not easy to define, never mind finding an optimal policy!
Deadline: Noon (12:00), Wednesday, December 6, 2006 ( strict deadline).
Submissions are to be solo.