# Introduction to Artificial Inteligence

## Homework assignment - simulators, agents, search, games, logic, inference

```1) (search and logic)
Finding an explanation for observed evidence is known as abductive
reasoning.  The input is a KB of definite Horn clauses (i.e. "rules",
see below), and a function that assignes each literal a
non-negative cost.
Also part of the input is a set of literals E (the observed
evidence) to be explained.
An explanation is a set of assumed literals A, that together with KB
proves E.

In this exercise, we wish to solve the weighted abduction
version: given E, find an explanation  A where
the sum of the costs of the literals in A is minimal. For example, consider
the following KB:

R1:        a -> b
R2:  b and c -> d
R3:        a -> c
R4:        x -> d

The cost of assumption for literals is as follows:
cost(b) = 7
cost(c) = 6
cost(d) = 21
cost(a) = 9
cost(x) = 100

For the case where we wish to find an explanation for E = {d},
one possible explanation is A = {d} for a cost of 21.
But it is not an optimal explanation, because A = {a}
also an explanation, with a weight of 9.

a) Formalize the weighted abduction problem as a search problem,
where the state is a partial solution A', and where an operator adds
any literal not in the partial solution.

ANS:
Initial state: A = {}.
Goal test: E follows from A union KB
Operators (one of several possibilities):
Path cost: for each operator, the cost of the added literal,

b) Same as (a), but this time a lexicographical ordering of literals
is given (say a, then b, etc.), and one can add a literal x only
if no literal y greater than x is in A'.

ANS:
Initial state: A = {}.
Goal test: E follows from A union KB
Operators (one of several possibilities):
Add any x literal to A that is not already in A, provided that
for all y in A, y is "lexicigraphically before" x. Note that you need to
this predicate for implementation, and that numerous
definitions are possible.
Path cost: for each operator, the cost of the added literal.

c) Assume that we use uniform-cost search, are we assured of finding
the minimal explanation using version a? If not, are we assured of such
using option b? You need to sketch the proof for your answer.

ANS:
Yes in both cases, ASSUMING COSTS ARE POSITIVE.
This follows from the monotonicity of path costs and the
monotonicity of the logic (i.e. that adding literals can never
cause something that was provable to stop being provable).

For case b you also need the fact that every A is reachable from
the empty set despite the lexicographic restriction.

(Observe that you need noth monotonicities, or this would not work!)

d) We wish to use A* to perform weighted abduction. Find an
admissible heuristic for versions a, b, (they may all be the
same heuristic, and it does NOT have to be a GOOD heuristic).
Now, show an execution of A* using your heuristic to solve the
weighted abduction problem, given as input the above KB and
the evidence to be explained E = {d}.
Use only one of the search problem versions a, b - your choice
(hint: which would result in the shortest trace?)

ANS:
A trivial addmissible heuristic would be simply 0. However, let
us do something a bit better:
h(A) = 0 if A is a goal,
h(A) = otherwise, cost of cheapest eligible literal not in A.

We will use option b, so that we get a shorter trace... Assume
the standard lexicographic ordering (a is smallest).

Initialize:
{} h = 6, f = 6
Expand {} to get:
{a} h =   0, f =   9
{b} h =   6, f =  13
{d} h =   0, f =  21
{c} h =  21, f =  27
{x} h =   0, f = 100
Put them in the queue in this order.

Now, get {a}, which is a goal state, and we are done...

Note effectiveness of heuristic, because had we used 0 we would
have gotten:
{c} h =   0, f =   6
{b} h =   0, f =   7
{a} h =   0, f =   9
{d} h =   0, f =  21
{x} h =   0, f = 100

And now we would try to expand {c}, and then try to expand {b}
and only then get to {a}.

2) (FOL and situation calculus)
We need to represent and reason within the scenario defined by assignment 1.

a) Write down the axioms for the behavior of the world using FOL and
situation calculus. Assume only one agent, no sentry, and no shooting.
Use the propositions:
Wall(x,y)   ; Location x, y is a wall
Ice(x,y)    ; Location x, y is ice
Free(x,y)   ; Location x, y is neither a wall nor ice
Also use the fluents (situation-dependent propositions)
Loc(x,y,s)  ; Agent is at x,y in situation s
Dir(d, s)   ; Agent last motion was d at s (one of: left, right, up, down, none,
where the latter can happen only after hittin a wall).
Flag(x,y,s) ; There is a flag that has not been collected at x, y, in situation s.
CollectReward(s)  ; Agent receives reward in situation s (only when stepping
into a location that contained a flag in a previous situation).
Functions:
+, -  (in order to compute adjacent locations).
And the actions: left, right, up, down.

ANS:

There are several ways to do this. The smallest number of axioms would be
by using successor-state axioms, as done below, but note that some
of the axioms will be huge. Note that this is orders of magnitude
larger than what you can expect to get in the midterm exam.

1.  Forall (s, a)  CollectReward(Result(a,s))  iff
exists(x,y) (Flag(x,y,s) and Loc(x,y,Result(a,s)))

Now the axiom for the flag (note that the above description is a bit ambiguous,
so we will assume that the flag is collected on in the SAME situation where
the agent arrives at the flag's location).

2.  Forall (x, y, s, a)  Flag(x,y,Result(a,s)) iff
(Flag(x,y,s) and not Loc(x, y, Result(a,s)))

Now, the motion direction, this is somewhat tougher.

3.  Forall (s, a)  Dir(left,Result(a,s))  iff
(Exists (x,y) Loc(x,y,s) and not Wall(x-1,y) and
((Dir(left,s) and Ice(x,y)) or
(a=left and (not Ice(x,y) or Dir(none,s))))

4.  Forall (s, a)  Dir(right,Result(a,s))
(Exists (x,y) Loc(x,y,s) and not Wall(x+1,y) and
((Dir(right,s) and Ice(x,y)) or
(a=right and (not Ice(x,y) or Dir(none,s)))))

5.  Forall (s, a)  Dir(up,Result(a,s))  iff
(Exists (x,y) Loc(x,y,s) and not Wall(x,y+1) and
((Dir(down,s) and Ice(x,y)) or
(a=down and (not Ice(x,y) or Dir(none,s)))))

6.  Forall (s, a)  Dir(down,Result(a,s))  iff
(Exists (x,y) Loc(x,y,s) and not Wall(x,y+1) and
((Dir(up,s) and Ice(x,y)) or
(a=up and (not Ice(x,y) or Dir(none,s)))))

7.  Forall (s, a)  Dir(none,Result(a,s))  iff (Exists (x,y)
(Loc(x,y,s) and
((Ice(x,y) and
((Dir(right,s) and Wall(x+1,y)) or
(Dir(left,s)  and Wall(x-1,y)) or
(Dir(up,s) and  Wall(x,y+1)) or
(Dir(down,s) and and Wall(x,y-1))))
(Free(x,y) and
((a=right and Wall(x+1,y)) or
(a=left  and Wall(x-1,y)) or
(a=up and  Wall(x,y+1)) or
(a=down and Wall(x,y-1)))))))

Axioms 3-7 can be reduced by using axiliary predicates and functions,
but if we are not allowed to do so, all the above is needed.
Note that in the proof below we will not be using axioms 3, 6, 7.
Finally, the axioms on the Loc predicate. We will rely heavily on the
motion direction specified by the Dir predicate.

8. Forall (s, a, x1, y1)  Loc(x1,y1,Result(a,s)) iff
(Exists (x,y) Loc(x,y,s) and
((Dir(none, Result(a,s)) and x1=x and y1=y) or
(Dir(right, Result(a,s)) and x1=x+1 and y1=y) or
(Dir(left, Result(a,s)) and x1=x-1 and y1=y) or
(Dir(up, Result(a,s)) and x1=x and y1=y+1) or
(Dir(down, Result(a,s)) and x1=x and y1=y-1)))

We also need axioms stating that locations are all different. That is,
we need:

9.   Forall (x, y, s, x1, y1) Loc(x,y,s) and Loc(x1,y1,s) implies x=x1 and y=y1

Also, equality (and inequality) of numbers and action, such as:
10. "1=1", "not 1=2", "right=right" etc.
(We will not number these individually).
and also semantics of + and -, such as "2=1+1", etc.

b) Write down the additional facts representing the following scenario
(with x starting at 0 on the left, and y starting at 0 on the bottom,
F is a flag, i is ice, A is the agent, initially on a free square.):

####
##F#
#Ai#
####

ANS:
The fixed facts are (we will not give them numbers, as there are too many of them):
Free(1,1)
Free(2,2)
not Ice(2,2)
Ice(2,1)
not Ice(1,1)
not Wall(2,1)
not Wall(2,2)
not Wall(1,1)
Wall(0,0)
Wall(0,1)
Wall(0,2)
Wall(0,3)
Wall(1,0)
Wall(1,2)
Wall(1,3)
Wall(2,0)
Wall(1,3)
Wall(3,0)
Wall(3,1)
Wall(3,2)
Wall(3,3)

The variable facts - let us call the initial situation S. We get:

Loc(1,1,S)
Dir(none,S)
Flag(2,2,S)
not CollectReward(S)

c)  Now, we need to find a sequence of actions that will result in collecting
a reward, and prove a theorem that it will do so. Do the following steps:

A) Convert the knowledge base to conjunctive normal form.

ANS: (dropping essentially identical axioms). Assuming all lower-case single letters
(possibly with numbers) are variables. Also not doing "standardize apart" between
axioms, even though in principle it SHOULD be done.

1.  Becomes several clauses:
1A. not CollectReward(Result(a,s)) or Flag(Flag-x(a,s),Flag-y(a,s),s)
1B. not CollectReward(Result(a,s)) or Loc(Flag-x(a,s),Flag-y(a,s),Result(a,s)))
(Note that Flag-x and Flag-y are Skolem functions.)
1C. not Flag(x,y,s) or not Loc(x,y,Result(a,s)) or CollectReward(Result(a,s))

2. Also becomes several clauses:
2A. not Flag(x,y,Result(a,s)) or Flag(x,y,s)
2B. not Flag(x,y,Result(a,s)) or not Loc(x, y, Result(a,s))
2C. not Flag(x,y,s) or Loc(x, y, Result(a,s) or Flag(x,y,Result(a,s))

3. Becomes quite a number of clauses:
3A. not Dir(left,Result(a,s)) or Loc(LL-x(a,s),LL-y(a,s),s)
3B. not Dir(left,Result(a,s)) or not Wall(LL-x(a,s)-1,LL-y(a,y))
3C. not Dir(left,Result(a,s)) or Dir(left, s) or a=left
3D. not Dir(left,Result(a,s)) or Dir(left, s) or not Ice(LL-x(a,s),LL-y(a,s))
3E. not Dir(left,Result(a,s)) or Ice(LL-x(a,s),LL-y(a,s)) or a=left
3F. not Dir(left,Result(a,s)) or Ice(LL-x(a,s),LL-y(a,s)) or not  Ice(LL-x(a,s),LL-y(a,s))
(Note that LL-x and LL-y are Skolem functions. Also, clause 3F is a tautology
and is thus redundant. There are actually several more, but we use none of them, these
are all "diagnostic" axioms and will never be used for prediction)
3G. Dir(left,Result(a,s)) or not Loc(x,y,s) or Wall(x-1,y) or not Dir(left,s) or not Ice(x,y)
3H. Dir(left,Result(a,s)) or not Loc(x,y,s) or Wall(x-1,y) or not a=left or Ice(x,y)
3I. Dir(left,Result(a,s)) or not Loc(x,y,s) or Wall(x-1,y) or not a=left or not Dir(none,s)

Axioms 4,5,6 are similar. We will only do the parts we need:

4H. Dir(right,Result(a,s)) or not Loc(x,y,s) or Wall(x+1,y) or not a=right or Ice(x,y)

5I. Dir(up,Result(a,s)) or not Loc(x,y,s) or Wall(x,y+1) or not a=up or not Dir(none,s)

7. Another one with many clauses.

We will do only one of them:

7X. Dir(none, Result(a,s)) and not Ice(x,y) and not Dir(right,s) and not Wall(x+1,y)

8. Yet another one consisting of many clauses, we will write only the necessary ones:

8X. Loc(x1,y1,Result(a,s)) or not Loc(x,y,s) or not Dir(none,Result(a,s)) or not x1=x or not y1=y
8Y. Loc(x1,y1,Result(a,s)) or not Loc(x,y,s) or not Dir(right,Result(a,s)) or not x1=x+1 or not y1=y
8Z. Loc(x1,y1,Result(a,s)) or not Loc(x,y,s) or not Dir(up,Result(a,s)) or not x1=x or not y1=y+1

B) Formally list what we need to prove, and state its negation.

One possible plan for collecting the flags is the sequence of actions: right, right, up
(where actually the middle action can be any action as we are skidding and
hitting the wall there in any case). This means we need to prove:

Q:  CollectReward(Result(up, Result(up, Result(right, S))))

C) Use resolution to reach a contradiction, thus proving the theorem.

We add the negation of the query, Q', and then start doing resolutions, where:

Q': not CollectReward(Result(up, Result(up, Result(right, S))))

Proof:
Resolve Q' with 1C, substitution {a|up, s|Result(up, Result(right, S))} to get:
(note that if we want a proof by forward chaining we would prove 1C and then
use it to get Q).

P1: not Flag(x,y, Result(up, Result(right, S))) or
not Loc(x,y, Result(up, Result(up, Result(right, S))))

(That is, we will need to prove that we get there, and that the flag is still there).

Resolve Loc(1,1,S) with 4H, substitution {x|1, y|1, s|S} to get:

P2: Dir(right,Result(a,S)) or  Wall(1+1,1) or not a=right or Ice(1,1)

Also need to use the axioms on substitution of equals and the "+" function to get 2 in Wall(2, 1).
Resolving P2 and "not Wall(2,1)", empty substitution, we get:

P3: Dir(right,Result(a,S)) or not a=right or Ice(1,1)

Resolving P3 with "not Ice(1,1)", empty substitution, we get:

P4: Dir(right,Result(a,S)) or not a=right

Resolve P4 with 8Y, substitution {a|a, s|S} we get:

P5:  Loc(x1,y1,Result(a,S)) or not Loc(x,y,S) or not x1=x+1 or not y1=y or not a=right

Resolve P5 with Loc(1,1,S), substitution {x|1, y|1} to get:

P6: Loc(x1,y1,Result(a,S)) or not x1=1+1 or not y1=1 or not a=right

Resolving P6 with "2=1+1" {x1|2}, and then with "1=1" {y1|1} (2 separate actions!) we get:

P7: Loc(2,1,Result(a,S)) or not a=right

Resolving P7 with "right=right", substitution {a|right} we get:

P8: Loc(2,1,Result(right,S))

(*** Posting this for now, the rest is similar, will continue to the end as
time passes. Outline: prove the location for the next 2 actions, and that the
flag does not "disappear" in the first 2 actions, and then use P1 ***)

d) What would be a potential problem in the proof if we did not
have "frame axioms" in our representation (i.e. stating what
did not change)?

We would not be able, e.g., to conclude that the flag is still there
in the situation where the agent reaches (2,2).

e) Would we be able to do the proof using only forward chaining?

If we used only Horn clauses in the proof, then we can do this proof using only
forward chaining. It looks like in this case all axioms used in the proof were Horn,
clauses, with the definite clause being the desired one in any case. So in this case the

3) (Propositional logic)
For each of the following sentences, determine whether they are
satisfiable, unsatisfiable, and/or valid. For each sentence that
is satisfiable, determine the number of models (over the propositional
variables in the sentence).
a) (A and (A -> B) and (B-> C)) -> C
b) A and (not (A or B))
c) A or (not (B or C))
d) (A and B) or (C and D)
e) A -> not A

ANS:

a) Valid. (And thus, of course, also satisfiable, and has 2^3=8 models).
b) Unsatisfiable, since it is a conjunction containing: A and not A
c) Satisfiable, containing 5 models: 4 with A=T, and also one with A=F, B=F, C=F.
d) Satisfiable, containing 7 models: 4 with A=T, B=T, also 4 with C=T, D=T, minus 1
with A=T, B=T, C=T, D=T which was counted twire above.
e) Satisfiable, 1 model: A=F

4) (Game trees and search. Note, this is the same question as last year, but the domain
is a bit different!)
Consider the stochastic game to be implemented in assignment 2:
grid based environment, agents can move up, down, left, or right, or shoot,
but where walking deliberately into a wall or waiting
is not possible.
You may assume a time limit as desired for the length of the game in
each scenario.

a) Show a scenario (with no bullets) where for the first agent to
move (Clarification: not "whoever is the first to move", i.e. this need be
shown only for one of the agents) under zero-sum scoring,
moving in a certain direction (say, left) loses,
but under total flags collected scoring (i.e. each player
scores its own flags, no penalty for flags collected by an opponent), the optimal action is to
move left.

ANS:

ANS:

Same as last year (almost).
Assume a short game (only 3 ply for each player),
and the scenario, with A to move (numbers are flag values). Not visible is
ICE under the A, and the flags 2 (the top one) and 6. (Assuming probability 1
for slipping on the ice).

3A62
.B.
.42

Note that if A moves left (collecting 3) B moves up to block A from
collecting 6 (as otherwise A would collect 9 and win, since B can only
collect 8). A would win if it collected 6 immediately, and then
would still collect 2 to win.

But if each player collected its own score, since B can never get more
than 8, it would go for 4 first, so A moving left would be optimal
for a score of 9.

b) In the game with bullets and zero-sum scoring, show a scenario
where whoever is the first agent to move, loses.

ANS:
Assume exactly 1 bullet each,
and the following initial position:

A.......9
.B#######

Either A or B can move (and get shot and lose)
or shoot immediately, after which the other player can safely move to
the flag and win.

In both cases, you must PROVE the scores by generating the respective game tree, either
to a terminal position or prove that you can prune some branches.

c) Suggest heuristic evaluation functions for the total-flags scoring version of the game without bullets:
1) One that is a lower bound, but non-trivial.

ANS:
Total flags collected by the player until now.

Anything beyond that is not easy, e.g. can add to the above:
value of highest-valued flag that is closer to this agent than to
the other agent.

2) One that is an optimistic upper bound, but non-trivial

ANS:

Total flags collected by the player until now plus total flags not
yet collected by anybody,

Anything beyond that is not easy, e.g. can use: total flags already
collected plus value of all flags within reach withing the number
of turns remaining in the game (either true distance, or
Manhattan distance).

5)  (Game-tree search - alpha-beta pruning)
a) Consider the tic-tac-toe position (with O to move):

X |   | O
-----------------
O |   |
-----------------
X |   |

Give an example in ordering the complete search tree starting from this position
where alpha-beta pruning saves as little as possible, and another where the savings
is maximal. How many nodes are pruned in each case?
You may denote moves by Xxy or Oxy in order to save space.
(Clarification: in the case with no pruning, no need to specify
the tree explicitly, only estimate the number of nodes).

ANS: assuming y axis 1 is at the bottom...

Tree with best pruning is (note also initial bounds plus-minus 1 on
alpha and beta!), value of each node in parenthesis, with "-"
indicating -1 and "+" indicating +1.

MAX     MIN     MAX     MIN     MAX

(+) O32 (+) X22 (+) O31  score: 1 (alpha=1)
(O21, O23 pruned)
(+) X31 (+) O22  score: 1 (alpha=1)
(O21, O23 pruned)
(+) X21 (+) O22  score: 1 (alpha=1)
(O21, O23 pruned)
(+) X31 (+) O22  score: 1 (alpha=1)
(O21, O23 pruned)
(O31, O21, O22, O23 pruned)

Final result: O can win by playing O32 initially, otherwise
can be forced to draw or even lose.
With bad ordering, no pruning, tree has a bit less than 5! leaves.

b) Consider the same scenario as above, except that after every move by
X, a random mark (equally probable to be X or O, uniformly distributed
over the free locations) is generated. Draw and evaluate the
search tree, and state where pruning is possible.

(Since the above is too large, you may start from the following
position for b, with O to move, instead).

X | X | O
-----------------
O |   |
-----------------
X | O |

ANS:
In this case "uniformly disributed over the locations" is trivial,
as at that stage there is only one location left in this scenario.
Still, this is 50-50 whether we get X or O.
Some pruning is still possible, as the score in the games is bounded
in absolute value by 1.

MAX      MIN     CHANCE

(0) O32 (0) X22 (+) O31  Score: 1
(-) X31  Score:-1
(0) X31 (+) O22  Score: 1
(-) X22  Score:-1
(0) O22 (0) X32 (0) O31  Score: 0
(0) X31  Score: 0
(>=0) X31 (+) O32  Score: 1
(can be pruned, X31 will not be chosen by X)
(0) O31 (0) X32 (0) O22  Score: 0
(0) X22  Score: 0
(>=0) X22 (+) O32  Score: 1
(can be pruned, X22 will not be chosen by X)

So it turns out that no matter what O does, X can force an
expected score of 0 )sometimes forcing a draw, and sometimes
forcing an equaly probability of a win by X and a win by O)

6)  (agents and environment simulators)
For each of the following domains, which type of agent is appropriate
(table lookup, reflex, goal based, or utility based). Also, indicate the
type of environment:
a) An agent that plays billiards ion a real table.
b) An agent that plays go.
c) An agent that can play peg-and-hole solitaire.
d) An autonomous robot that can win the DARPA challenge (driving in
a city).
e) An internet shopping agent (car rental domain).
f) An agent that plays tic-tac-toe (3X3).

ANS:

a) Utility based if we need to get a player that not just plans
for getting specific balls into holes, but also plans what remains after,
and risks when missing, etc.

b) Utility based, or possibly goal based (but then we don't know exeactly which
goals at present)

c) Goal-based should work, but reflex would also work if we pre-computed
everything, and even table-driven is close to being possible today
(using a pretty huge memory).

d) Utility based agent, at least (also probabily with learning capabilities).

e) Utility-based agent.

d) Goal-driven would work, but in this case simple table-driven reflex would also
work, provided pre-computation of the "reflex" table (size 3^9) is done.