Introduction to Artificial Inteligence

Homework assignment - simulators, agents, search, games, logic, inference

1) (FOL and situation calculus)
We need to represent and reason within the scenario defined by assignment 1.
For simplicity, we will ignore the issue of action costs, assume only one agent, and allow
only the drive and pickup actions.

a) Write down the axioms for the behavior of the world using FOL and
situation calculus.

Use the predicates:
Edge(e, v1, v2) ; e is an edge in the graph between v1 and v2.
Goal(ag, v)     ; Agent ag's goal is vertex v.
Also use the fluents (situation-dependent predicates)
Loc(v,s)        ; Agent is at vertex v in situation s
Fire(e,s)       ; Edge e is on fire in situation s
Carrying(s)     ; Agent is carrying water in situation s
Water(v,s)      ; Vertex v has water in situation s

Constants: as needed, for the edges and vertices, and of course the pickup action
pickup          ; Denotes attempting to pick up water

Functions: not needed here, except to denote actions:
drive(e)        ; Denotes a move action across edge e (may be ineffective).
and of course the "Result" function:
Result(a,s)    ; Denotes the situation resulting from doing action a in
; situation s.
You may wish to add auxiliary predicates or fluents in order to simplify the axioms. you
are allowed to do so provided you explain why they are needed.

Add the facts representing the following scenario in situation S0:

Edge(E1, V0, V1)
Edge(E2, V1, V2)
Edge(E3, V3, V1)
Edge(E4, V0, V3)
Edge(E5, V3, V2)

; Situation S0 fluents
Loc(V0, S0)
Water(V0, S0)
Fire(E1)
Fire(E5)

ANSWER: Need to encode the successor-state axioms for the actions in the domain.

Defining an auxiliary predicate "AtEdge(e,s)" for conciseness:
0.    forall s,e  AtEdge(e,s) <=> exists v,u Loc(v,s) and (Edge(e,u,v) or Edge(e,v,u))

In order to avoid using equlities (will be hard to handle later on), we will split them by
action type (since there are only 2 action types). Starting with "Carrying", first,
water is carried after a drive iff it was carried before and no succesul drive across an
edge with fire:
1.    forall s,e  Carrying(Result(drive(e),s)) <=> Carrying(s) and (AtEdge(e,s) => not Fire(e,s))

Carrying after pickup iff carrying before or at vertex with water:
2.    forall s,v  Carrying(Result(pickup),s)  <=> Carrying(s) or (Loc(v,s) and Water(v,s))

Now the "Water", drive does not change this predicate:
3.    forall s,e,v  Water(v,Result(drive(e),s) <=> Water(v,s)

Water at vertex after pickup iff it was there before and agent is not at this vertex:
4.    forall s,v  Water(v,Result(pickup),s) <=> Water(v,s) and not Loc(v,s)

Now fire is always the same as befoe after pickup:
5.    forall s,e  Fire(e,Result(pickup,s) <=> Fire(e,s)

And if a successful drive occurs on an edge, it is always off, otherwise unchanged:
6.    forall s,e  not Fire(e,Result(drive(e),s)) <=> Successful(drive(e),s) or not Fire(e,s)

Driving across some other edge never changes the Fire predicate (must use equality here, unfortunately):
7.    forall s,e,e1   not e=e1 =>  (Fire(e1,s) <=> Fire(e1,Result(drive(e),s)))

Definition of the "Successful" predicate (also needed for Loc predicate soon):
8.    forall s,e  Successful(drive(e),s)  <=> (AtEdge(e,s) and (Fire(e,s) => Carrying(s)))

Loc predicate: location does not change on pickup:
9.   forall s,v  Loc(v, Result(pickup,s)) <=>  Loc(v,s)

Also no change if drive not successful. Note that this is not an iff as can reach v from
elsewhere on a successful drive.
10.   forall s,v,e  Loc(v,s) and not Successful(drive(e),s) => Loc(v,Result(drive(e),s))

What happens on a successful drive:
11.   forall s,v,u,e  Loc(u,s) and Successful(drive(e),s) and (Edge(e,u,v) or Edge(e,v,u)) => Loc(v,Result(drive(e),s))

Finally, at what locations we are NOT. Simplest implementation by saying "cannot
be in 2 locations in same situation":
12.   forall s,u,v   Loc(u,s) and Loc(v,s)  =>  u=v

Also need to axiomatize equality, since we are using "simple" resolution.
Equality is reflexive:
13.   forall x  x=x

State all things NOT equal, such as:
not V1=V2
not E1=E3
etc. We will not number them, and just call the "inequalty axioms".

Also need to encode S0, and the graph, which is listed above PLUS negation
of all things not specified, such as "not Fire(E2)" and "not Edge(E1, V3,V4)",
but we will not use these nagative facts in our proof.

b)  Now, we need to find a sequence of actions that will result in reaching the goal,
of agent A1 being in V1,
and prove a theorem that it will do so. Do the following steps:

A) Convert the knowledge base to conjunctive normal form.

ANSWER: Some of the axioms will separate into a LOT of parts! We first separate the <=>
as in some cases this will cause different quantifiers. Starting with axiom 0:
0a.   forall s,e  AtEdge(e,s) => exists v,u Loc(v,s) and (Edge(e,u,v) or Edge(e,v,u))
0b.   forall s,e  (exists v,u Loc(v,s) and (Edge(e,u,v) or Edge(e,v,u))) => AtEdge(e,s)

Now 0a has (after removing => and moving the quantifiers) two existentials inside
the scope of 2 universals, so we need 2 Skolem functions ("u-of" and "v-of"), resulting in:
0a.   forall s,e  not AtEdge(e,s) or (Loc(v-of(s,e),s) and (Edge(e,u-of(s,e),v-of(s,e)) or Edge(e,v-of(s,e),u-of(s,e))))

Removing universal quantifiers and applying distribution, we get:
0a1.   not AtEdge(e,s) or Loc(v-of(s,e),s)
0a2.   not AtEdge(e,s) or Edge(e,u-of(s,e),v-of(s,e)) or Edge(e,v-of(s,e),u-of(s,e))

Now 0b has (after removing => and moving the quantifiers) only universal quantifiers:
0b.   forall s,e,v,u  (not Loc(v,s) or (not Edge(e,u,v) and not Edge(e,v,u))) or AtEdge(e,s)

Removing universals and pplying distribution, we get:
0b1.     not Loc(v,s) or not Edge(e,u,v) or AtEdge(e,s)
0b2.     not Loc(v,s) or not Edge(e,v,u) or AtEdge(e,s)

Many of the above will not be used in the proof, for example we will probably not
need the "not AtEdge()" axioms 0a1 and 0a2. Now that we understand the principle,
these uneeded axioms will not appear in the solution.

Axiom 1 also seprates into:
1a.    forall s,e  Carrying(Result(drive(e),s)) => Carrying(s) and (AtEdge(e,s) => not Fire(e,s))
1b.    forall s,e  (Carrying(s) and (AtEdge(e,s) => not Fire(e,s))) => Carrying(Result(drive(e),s))
Neither will be used in the proof, so omitted from this example solution.

Axiom 2 separates into:
2a.    forall s,v  Carrying(Result(pickup),s)  => Carrying(s) or (Loc(v,s) and Water(v,s))
2b.    forall s,v  Carrying(s) or (Loc(v,s) and Water(v,s)) => Carrying(Result(pickup),s)
As we will not need 2a, only doing 2b below, which becomes:
forall s,v  (not Carrying(s) and (not Loc(v,s) or not Water(v,s))) or Carrying(Result(pickup),s)
Removing quantifiers and distributing, we get:
2b1.   not Carrying(s) or Carrying(Result(pickup),s)
2b2.   not Loc(v,s) or not Water(v,s) or Carrying(Result(pickup),s)

Likewise, Axiom 3 separates into (simpler, only 2 CNF clauses):
3a.    not Water(v,Result(drive(e),s) or Water(v,s)
3a.    Water(v,Result(drive(e),s) or not Water(v,s)

Axiom 4 separates into:
4a.    forall s,v  Water(v,Result(pickup),s) => Water(v,s) and not Loc(v,s)
4b.    forall s,v  (Water(v,s) and not Loc(v,s)) => Water(v,Result(pickup),s)

In which 4a after processing distributes into:
4a1.   not Water(v,Result(pickup),s) or Water(v,s)
4a2.   not Water(v,Result(pickup),s) or not Loc(v,s)

And 4b becomes:
4b.    not Water(v,s) or Loc(v,s) or Water(v,Result(pickup),s)

Axiom 5 is also simple:
5a.    not Fire(e,Result(pickup,s) or Fire(e,s)
5b.    Fire(e,Result(pickup,s) or not Fire(e,s)

Axiom 6 becomes:
6a.    Fire(e,Result(drive(e),s)) or Successful(drive(e),s) or not Fire(e,s)
6b1.   not Fire(e,Result(drive(e),s)) or not Successful(drive(e),s)
6b2.   not Fire(e,Result(drive(e),s)) or Fire(e,s)

Axiom 7 separates into:
7a.   e=e1 or  not Fire(e1,s) or Fire(e1,Result(drive(e),s))
7b.   e=e1 or  Fire(e1,s) or not Fire(e1,Result(drive(e),s))

Axiom 8 also initially becomes:
8a.   forall s,e  Successful(drive(e),s) => (AtEdge(e,s) and (Fire(e,s) => Carrying(s)))
8b.   (AtEdge(e,s) and (Fire(e,s) => Carrying(s))) => Successful(drive(e),s)

With 8a separating into:
8a1.  not Successful(drive(e),s) or AtEdge(e,s)
8a2.  not Successful(drive(e),s) or not Fire(e,s) or Carrying(s)

And 8b becomes:
8b1.  not AtEdge(e,s) or Fire(e,s) or Successful(drive(e),s)
8b2.  not AtEdge(e,s) or not Carrying(s) or Successful(drive(e),s)

Axiom 9 is also simple:
9a.   not Loc(v, Result(pickup,s)) or  Loc(v,s)
9b.   Loc(v, Result(pickup,s)) or not  Loc(v,s)

As is axiom 10:
10.   not Loc(v,s) or Successful(drive(e),s) or Loc(v,Result(drive(e),s))

Axiom 11 is simple, but becomes 2 axioms due to distribution:
11a.  not Loc(u,s) or not Successful(drive(e),s) or not Edge(e,u,v) or Loc(v,Result(drive(e),s))
11b.  not Loc(u,s) or not Successful(drive(e),s) or not Edge(e,v,u) or Loc(v,Result(drive(e),s))

Axiom 12, simply eliminate =>, move negation, and remove quantifiers, to get:
12.   not Loc(u,s) or not Loc(v,s) or u=v

Axiom 13, simply remove quantifier:
13.    x=x

All the information describing S0, as well as the inequality axioms are already in
CNF, so no change. Note that the variables appearing in the different axioms should
all be treated as different variables, and we will "standardize them apart" whenever needed
during the proof.

B) Formally list what we need to prove, and state its negation.

G: Goal(A1, V1).

One way to reach V1 is to drive(E1). But since it is on fire,
and there is water at V0, need pickp first. Another way is drive(E4) followed by drive(E3).
We show the solution for the first plan. So we need to prove:

Q. exists v  Goal(A1, v) and Loc(v, Result(drive(E1), Result(pickup, S0)))

Negating, it, we have:
not exists v  Goal(A1, v) and Loc(v, Result(drive(E1), Result(pickup, S0)))
and converting the result into CNF we get:
Q'.   not Goal(A1, v) or not Loc(v, Result(drive(E1), Result(pickup, S0)))

C) Use resolution to reach a contradiction, thus proving the theorem.

ANSWER: We begin by simplifying Q', getting rid of the first literal.

Resolving Q' with G, unifier {v|V1} we get:

Q'':   not Loc(V1, Result(drive(E1), Result(pickup, S0)))

Resolving Q'' with 11a.4, unifier {v|V1, e|E1, s|Result(pickup, S0)} we get:
20.    not Loc(u,Result(pickup, S0)) or not Successful(drive(E1),Result(pickup, S0)) or not Edge(E1,u,V1)

Resolving 20.3 with "Edge(E1, V0, V1)" (graph description fact), unifier {u|V0} we get:
21.    not Loc(V0,Result(pickup, S0)) or not Successful(drive(E1),Result(pickup, S0))

Resolving 21.1 with 9b.1, unifier {v|V0, s|S0} we get:
22.    not Loc(V0,S0) or not Successful(drive(E1),Result(pickup, S0))

Resolving 22.1 with situation S0 fact "Loc(V0,S0)" we get:
23.    not Successful(drive(E1),Result(pickup, S0))

Resolving 23 with 8b2.3, unifier {e|E1, s|,Result(pickup, S0)} we get:
24.    not AtEdge(E1,Result(pickup, S0)) or not Carrying(Result(pickup, S0))

Resolving 24.2 with 2b2.2, unifier {s|S0} we get:
25.    not AtEdge(E1,Result(pickup, S0)) or not Loc(v,S0) or not Water(v,S0)

Resolving 25.2 with S0 situation description fact "Loc(V0,S0)", unifier {v|V0} we get:
26.    not AtEdge(E1,Result(pickup, S0)) or not Water(V0,S0)

Resolving 26.2 with S0 situation desciption fact "Water(V0,S0)", empty unifier, we get:
27.    not AtEdge(E1,Result(pickup, S0))

Resolving 27 with 0b2.3, unifier {e|E1,s|Result(pickup, S0)} we get:
28.    not Loc(v,Result(pickup, S0)) or not Edge(E1,v,u)

Resolving 28.2 with graph description fact "Edge(E1,V0,V1)", unifier {v|V0, u|V1} we get:
29.    not Loc(V0,Result(pickup, S0))

Resolving 29 with 9b.1, unifier {v|V0, s|S0} we get:
30.    not Loc(V0, S0)

Finally, resolving 30 with S0 situation description fact "Loc(V0,S0)"
we get the empty clause, a contradiction, thus completing the proof.

c) What would be a potential problem in the proof if we did not
have "frame axioms" in our representation (i.e. stating what
did not change)?

ANSWER: We need to state explicitly that the location does not change as the result of pickup (axiom 9).
Otherwise, there would be no way to prove that the drive(E1) would work in the resulting situation.

d) Would we be able to do the proof using only forward chaining?

It can work if our proof (or some other proof) used only Horn clauses. So this need to be checked.
In essence, the proof would work in a reverse order from what we have above. Let us check all the steps:

We can use 9b (in Horn form) with "Loc(S0,V0)" to get
F1.     Loc(V0, Result(pickup,S0))

Then use F1 and "Edge(E1,V0,V1)" with 0b2 (in Horn form) to get:
F2.      AtEdge(E1,Result(pickup, S0))

Also from  "Loc(S0,V0)" and "Water(V0,S0)" and 2b2 (in Horn form), we get:
F3.      Carrying(Result(pickup),S0)

Now from F2, F3 with 8b2 (in Horn form) we get:
F4.      Successful(drive(E1),Result(pickup, S0))

And now from F1, F4, and "Edge(E1,V0,V1)" with 11a (in Horn form) we get:
F5.      Loc(V1,Result(drive(e),Result(pickup, S0)))

2)  (agents and environment simulators)
For each of the following domains, which type of agent is appropriate
(table lookup, reflex, goal based, or utility based). Also, indicate the
type of environment:
a) An agent that plays Poker.
b) An agent that can play Spider solitaire.
c) An autonomous humanoid robot that can win the DARPA robotics challenge (driving a car,
walking across debris, connecting two pipes). Scoring depends on success in the tasks,
on execution speed, and on using as little communication with the operator as possible.
d) An internet coordination agent (sets meetings between humans with conflicting goals).
e) An agent that can solve a Soduku puzzle.
f) An agent that plays Go.

a) Poker: environment is stochastic (dealing cards), inaccessible, discrete (but money is "almost" continuous),
static, non-episodic (can learn between deals), and multi-agent. Needs a utility based agent (with
learning). The "stochastic" part can be cast into "inacessibility" (cards have been dealt, but unknown)
making it "deterministic", but this does not help.

b) Spider solitare: environment is deterministic (cards already dealt), inaccessible, discrete, static,
episodic, and single-agent. Goal-based agent may suffice (maximize probability of reaching goal).

c) DRC: environment is stochastic, inaccessible, continuous, dynamic, non-episodic, and multi agent (operator and
robot). Needs a utility based agent (with learning).

d) Internet coordination: environment is stochastic, inaccessible, discrete (almost), dynamic, non-episodic, and multi agent.
Needs a utility-based agent due to possible conflicts between agents.

e) Sudoku: environment is detreministic, accessible, static, episodic, and single agent. A goal-driven agent suffices.
For "newspaper Sudoku puzzles", supposedly no backtracking is needed, in which case rule-based suffices.

f) Go: environment is detreministic, accessible, static, episodic, and multi agent. (Possibly non-episodic for
repeated games if we allow learning of opponent strategy). Needs a goal-based or utility-based agent,
although it is only recently that search based agents do significantly better than rule-based.

3) (Propositional logic)
For each of the following sentences, determine whether they are
satisfiable, unsatisfiable, and/or valid. For each sentence that
is satisfiable, determine the number of models (over the propositional
variables in the sentence). In case b, also trace the run of the DPLL algorithm for satisfiability with this
formula as input (i.e. explain any recursive calles and cases encountered).
a) (A and (A -> B) and (B -> C)) -> C
b) (A -> not A) and (not B -> B)
c) (A or B or C or D or E or F or G)
d) (A and B) or (C or not D or E)
e) (A and (A -> B) and (B -> C)) -> not C
f) not ((A and (A -> B) and (B -> C)) -> C)

a) Valid, so 8 models, since there are 3 variables.

b) Satisfiable, one model (A=false, B=true)

c) Satisfiable 127 models (seven variables, and only one assignment is not satisfying, so 2^7-1)

d) Satisfiable. The first part has 8 models (in 5 variables), and the second has 4*7, but these have an
overlap of 7 so we have total 8+4*7-7 = 29 models.

e) Satisfiable. Holds if not C (4 models), and also if the antecedent is false, which it is unless A,B,C=true,
resulting in 7 models. But all the models of not C overlap them. So total 7 models.

f) Unsatisfiable, since it is a negation of a valid formula.

4) (Search):
Show the run of the following algorithms on the setting of question 1
above, where the agent starting at V0 needs to reach V2 wth minimal cost.
The costs are: Cpickup=2, and edge weights are: E5 costs 1,
E3 costs 5, E1 and E2 and E4 cost 2.
Assume that h(n) is the cost to reach V2 assuming no fires.

a) Greedy search (f(n) = h(n)), breaking ties in favor of states where the
agent is not carrying water, and then in favor of lower-numbered edges.

We will represent a state by the following tuple of predicate values: (Loc, Carrying, Fire(E1), Fire(E2), Water(V0)),
since these are the only things that can change.
We will assume that the algoritm checks and discards duplicate states as they are generated.
Let us also assume we prefer lower numbered VERTICES.

Initialize with initial state S0: (V0, F, T, T, T)

Pass 1, get S0, possible actions:  drive(E4) (cost 2), pickup (cost 2), so agenda is now:
S1 = (V3, F, T, T, T), g=2, h=1
S2 = (V0, T, T, T, F), g=2, h=3

Pass 2, pick S1, possible actions: drive(E4) (cost 2), drive(E3) (cost 5), but the first one
results in an already visited state and is discarded so agenda is now
(because we have a tie, and are in favor of NOT carrying water):
S3 = (V1, F, T, T, T), g=7, h=2
S2 = (V0, T, T, T, F), g=2, h=3

Pass 3, pick S3,  now possible actions drive(E2) (cost 2)
and drive(E3) (duplicate, so dropped) so we get:
S4 = (V2, F, T, T, T), g=9, h=0
S2 = (V0, T, T, T, F), g=2, h=3

Pass 4, pick S4, which is a goal, so we are done, having discovered a SUBOPTIMAL solution.
Note how this is FAST, but suboptimal.

b) A* search, (f(n) = g(n)+h(n)), breaking ties as above.

ANSWER: Same representation as above. Initialize with initial state S0: (V0, F, T, T, T)

Pass 1, get S0, possible actions:  drive(E4) (cost 2), pickup (cost 2), so agenda is now:
S1 = (V3, F, T, T, T), g=2, h=1
S2 = (V0, T, T, T, F), g=2, h=3

Pass 2, pick S1, possible actions: drive(E4) (cost 2), drive(E3) (cost 5), but the first one
results in an already visited state and is discarded so agenda is now
(because we have a tie, and are in favor of NOT carrying water):
S2 = (V0, T, T, T, F), g=2, h=3
S3 = (V1, F, T, T, T), g=7, h=2

Pass 3, pick S2,  now possible actions are drive(E1) and drive(E4), both cost 4, so we get:
S4 = (V3, T, T, T, F), g=6, h=1
S5 = (V1, F, F, T, F), g=6, h=2   (fire in E1 is extinguished)
S3 = (V1, F, T, T, T), g=7, h=2

Pass 4, pick S4, possible actions are drive(E4) dropped (duplicate state results), drive(E3) costs 10,
drive(E5) costs 2, and we get:
S5 = (V1, F, F, T, F), g=6, h=2   (tie breaking infavor of lower-numbered vertex)
S6 = (V2, F, T, F, F), g=8, h=0   (fire in E5 is extingiushed)
S3 = (V1, F, T, T, T), g=7, h=2
S7 = (V1, T, T, T, F), g=16, h=2

Pass 5, pick S5, possible actions drive on E1 (cost 2), E2 (cost 2), E3 (cost 5), so we get:
S6 = (V2, F, T, F, F), g=8, h=0
S8 = (V2, F, F, T, F), g=8, h=0
S3 = (V1, F, T, T, T), g=7, h=2
S9 = (V0, F, F, T, F), g=8, h=3   (not a duplicate state, since here no fire at E1)
S10= (V3, F, F, T, F), g=13, h=1
S7 = (V1, T, T, T, F), g=16, h=2

Pass 6, pick S6 or S8, each of which is a goal state, so we are done.
(There are 2 optimal solutions here).

c) Simplified RTA* with a limit of 2 expansions per real action, breaking ties as above.

ANSWER: Same representation as above. Initialize with initial state S0: (V0, F, T, T, T)

Pass 1, get S0, possible actions:  drive(E4) (cost 2), pickup (cost 2), so agenda is now:
S1 = (V3, F, T, T, T), g=2, h=1
S2 = (V0, T, T, T, F), g=2, h=3

Pass 2, pick S1, possible actions: drive(E4) (cost 2), drive(E3) (cost 5), but the first one
results in an already visited state and is discarded so agenda is now
(because we have a tie, and are in favor of NOT carrying water):
S2 = (V0, T, T, T, F), g=2, h=3
S3 = (V1, F, T, T, T), g=7, h=2

Pass 3, pick S2. Having done 2 expansions we now do NOT do any more in RTA*,
but take the first action leading to S2, which is "pickup".

So now we re-initialize with state S0' = (V0, T, T, T, F), resulting from "pickup", and
restart the search from there.

Picking S0', possible actions are drive(E1) and drive(E4), both cost 4, so we get:
S1' = (V3, T, T, T, F), g=4, h=1
S2' = (V1, F, F, T, F), g=4, h=2   (fire in E1 is extinguished)

Pass 1', pick S1', possible actions are drive(E4) (duplicate state, dropped), drive(E3) costs 10,
drive(E5) costs 2, and we get:
S2' = (V1, F, F, T, F), g=4, h=2
S3' = (V2, F, T, F, F), g=6, h=0    (fire in E5 is extinguished)
S4' = (V1, T, T, T, F), g=14, h=2

Now due to (unfortunate) tie breaking we pick S2' whcih is not a goal, and again must
return after 2 expansions, the first action leading to S2', which is "drive(E1)".

We re-initialize and re-start the search from S0'' = (V1, F, F, T, F).
We have possible drive actions on E1 and E2 (cost 2), and E3 (cost 5), to get:
S1'' = (V2, F, F, T, F), g=2, h=0
S2'' = (V0, F, F, T, F), g=2, h=3
S3'' = (V3, F, F, T, F), g=5, h=1

Pass 1, we pick S1'' which is a goal, so we are done (action drive(E2)). The resulting path
happens to be one of the optimal solutions.

d) Repeat a-c but using h'(n) = 2*h(n) as the heuristic. Is h'(n) admissible?

a) Multiplying heuristic by 2 makes no difference for greedy search - same ordering of nodes (though h values are doubled).

b) For A*, we again initialize with initial state S0: (V0, F, T, T, T)

Pass 1, get S0, possible actions:  drive(E4) (cost 2), pickup (cost 2), so agenda is now:
S1 = (V3, F, T, T, T), g=2, h'=2
S2 = (V0, T, T, T, F), g=2, h'=6

Pass 2, pick S1, possible actions: drive(E4) (cost 2), drive(E3) (cost 5), but the first one
results in an already visited state and is discarded so agenda is now
(because we have a tie, and are in favor of NOT carrying water):
S2 = (V0, T, T, T, F), g=2, h'=6
S3 = (V1, F, T, T, T), g=7, h'=4

Pass 3, pick S2,  now possible actions are drive(E1) and drive(E4), both cost 4, so we get:
S4 = (V3, T, T, T, F), g=6, h'=2
S5 = (V1, F, F, T, F), g=6, h'=4   (fire in E1 is extinguished)
S3 = (V1, F, T, T, T), g=7, h'=4

Pass 4, pick S4, possible actions are drive(E4) dropped (duplicate state results), drive(E3) costs 10,
drive(E5) costs 2, and we get:
S6 = (V2, F, T, F, F), g=8, h'=0   (fire in E5 is extingiushed)
S5 = (V1, F, F, T, F), g=6, h'=4
S3 = (V1, F, T, T, T), g=7, h'=4
S7 = (V1, T, T, T, F), g=16, h'=4

Pass 5, pick S6, which is a goal state, and we are done. This generates one of the optimal solutions,
even though h' is NOT admissible and therefore finding an optimal solution using h' is not guaranteed.

c) For RTA*: Initialize with initial state S0: (V0, F, T, T, T)

Pass 1, get S0, possible actions:  drive(E4) (cost 2), pickup (cost 2), so agenda is now:
S1 = (V3, F, T, T, T), g=2, h'=2
S2 = (V0, T, T, T, F), g=2, h'=6

Pass 2, pick S1, possible actions: drive(E4) (cost 2), drive(E3) (cost 5), but the first one
results in an already visited state and is discarded so agenda is now
(because we have a tie, and are in favor of NOT carrying water):
S2 = (V0, T, T, T, F), g=2, h'=6
S3 = (V1, F, T, T, T), g=7, h'=4

Pass 3, pick S2. Having done 2 expansions we now do NOT do any more in RTA*,
but take the first action leading to S2, which is "pickup".

So now we re-initialize with state S0' = (V0, T, T, T, F), resulting from "pickup", and
restart the search from there.

Picking S0', possible actions are drive(E1) and drive(E4), both cost 4, so we get:
S1' = (V3, T, T, T, F), g=4, h'=2
S2' = (V1, F, F, T, F), g=4, h'=4   (fire in E1 is extinguished)

Pass 1', pick S1', possible actions are drive(E4) (duplicate state, dropped), drive(E3) costs 10,
drive(E5) costs 2, and we get:
S3' = (V2, F, T, F, F), g=6, h'=0    (fire in E5 is extinguished)
S2' = (V1, F, F, T, F), g=4, h'=4
S4' = (V1, T, T, T, F), g=14, h'=4

Now we pick S3', which is a goal, so we are done resulting in one of the optimal solutions (not guaranteed in general).

In all these cases, the search was FASTER due to using h' instead of h, which
is frequently the case when using an inadmissible heuristic. This usually scrifices optimality
of the result, here we were lucky to get optimal solutions despite that.

5) (Game trees and Search):
Consider a 3-person game (A, B, C) with complete information, where A and C are fully
cooperating: the score of for A and C together is the
NEGATIVE of the scores for B.
For example, if A scores 1, C also scores 1, and B scores -1.
However, C is a player that plays completely randomy with
a uniform distribution. Assume that players take turns in the order A, B, C,
repeatedly, until the game ends.

a) Write an algorithm for optimal play by player A.

Since C plays randomly, can simply run EXPECTI-MINI-MAX, replacing the moves of C
in the tree by the appropriate expectation, so no need to write anything.

b) If the game now is partial information: A cannot communicate with C,
and C cannot see the moves made by A, but
still optimal? In what cases, if any, can you run into problems?

C plays randomly, so any information you give it is redundant. A has all the information it needs, so no
problem. Now if B can also observe C's moves, the same algorithm as before is optimal. But if not,
it becomes possible for A to exploit this fact. How to do this is non-trivial.

c) Same as in case b, but now C can see A's moves, but A cannot see the moves made by C.
Same algorithm will not be optimal. Not easy to change to make this optimal.

d) Show an example 3-ply game tree with branching factor 2, and show
the values computed in this tree in case a.

3 ply means just 1 turn for all the 3 players. Example tree (A picks move 1,
for an expected score of 5).

We will call moves by A: A1 and A2, etc.

B (min)   C (avg)

A1 5 --- B1 5 ---- C1  0
\         \
\         --- C2 10
\
- B2 7 ---- C1 -1
\
--- C2 15

A2 3 --- B1 40---- C1 -10
\        \
\        --- C2 90
\
B2 3 ---- C1 10
\
--- C2 -4

6)  (Game-tree search - alpha-beta pruning)
a) Give an example of a 3-ply game tree (branching factor 2)
where alpha-beta pruning saves NOTHING, and another where the savings
is maximal. How many nodes are pruned in each case?

ANSWER: using the same scheme as before to designate moves:
MIN       MAX

A1 10--- B1 20---- A1  10
\         ---- A2  20
\ - B2 10---- A1   5
---- A2  10
A2 27--- B1 35---- A1  30
\         ---- A2  35
\ - B2 27---- A1  25
---- A2  27

In this case, performing the search in the order indicated by the ordering of lines,
nothing is saved. The value is determined only on the last branch in each
case, and nothing is pruned. At the end, A will pick A2 for a value of 27.
Reversing the ordering at each level, we get:

MIN       MAX

A1 27 -- B1 27 --- A1  27
\        ---- A2  25
- B2 >=30-- A1  30
---- A2  35 (but pruned, no need to check, as B will never pick B2)

A2 <=10  B1 10---- A1  10
\       ---- A2   5
- B2 X ---- A1  20   (pruned, as well as everyhing beyond this point since B is assured a move that results in
---- A2  10    no more than 2, so A will never pick A2 anyway).

Here A can pick A1 and be assured a value of 27, pruning (i.e. not looking at)
A1B2A2, and also A2B2 and its children. Total pruning 3 nodes at the bottom level, but also one node at the next higher level.

b) Suppose that we know ahead of time that the terminal values can only
be integers between -2 and 2. Is there a case where alpha-beta can save even
more search than the best case above (show it, or prove that it cannot help).
Having a prior bound can save a lot. Example (X is "don't care"):
MIN       MAX

A1 2  -- B1 2 ---- A1  2
\        ---- A2  0   (pruned, A will pick A1 anyway at this level)
- B2 2 ---- A1  2
---- A2  -1 (but pruned, no need to check, as A will never pick A2 at this level)

A2 <=2 - B1 X ---- A1  X  (Entire rest of tree pruned, A will play A1 at top level anyway)
\       ---- A2  X
- B2 X ---- A1  X
---- A2  X

c) Provide an example of a 2-ply + 2 chance node level game tree where
one can apply an adapted alpha-beta to prune nodes,
and a similar example where changing the distribution on the chance node
edges results in no savings for pruning.

ANSWER: assume that scores are limited by 10 in absolute value. (If there are no bounds, cannot prune.)
We will let the chance be biased.

CHANCE      MIN     CHANCE

A1 7.1-- .9  9---- B1   9 -- .5  11
\          \           -- .5   9
\          --- B2  10 -- .5   0
\                    -- .5  10
- .1 -10---- B1 -10 -- .5 -10
-- .5 -10

A2 <=1-- .9 <=0--- B1   0
\          \
\           -- B2   5                   (pruned, as well as anything below. A will not pick A2 anyway)
\
-- .1  X---- B1   5
\
---- B2   0

So A can pick A1 to get 7.1 expected value, no need to continue checking below A2.
Changing the branch distributions at the top-level chance nodes to go the other way, we get:

CHANCE      MIN       CHANCE

A1 -8.1- .1  9---- B1   9 -- .5  11
\          \           -- .5   9
\          --- B2  10 -- .5   0
\                    -- .5  10
- .9 -10---- B1 -10 -- .5 -10
-- .5 -10

A2 0---- .1  0---- B1   0 -- .5   5
\          \           -- .5  -5
\           -- B2   5 -- .5   0
\                    -- .5  10
-- .9  0---- B1   5 -- .5   0
\           -- .5  10
--- B2   0 -- .5 -10
-- .5  10

In this case, A will pick A2 but cannot stop the search before
checking everything below A2.