Evolutionary Computation and Artificial Life
(202-1-5171), Semester A, 2005/6
Exercise 3 (Painted Desert)
David Gabay – 038483384
Ziv Ben-Eliahu – 033267766
You can view the classes specifically concerning this problem representation in:
Evolving a program for a collective operation of several instances (ants) whose task is to sort colored squares (grains) in a grid (desert). We implement the solution in Java.
·
Fitness = two types of fitness were used: one as in Koza's paper, and the other gives
the same fitness for the desert and adds to it the number of nodes of the tree
multiplied by a size penalty factor. We tried factors between 0.1 and 0.25.
·
Select = As in Koza's paper – tournament
selection with tournament size 7. We also tried to add the possibility of
selecting a random participant in the tournament with some probability (0.01 –
0.03).
·
Crossover = As in Koza's paper, crossover is
done by exchanging between two random sub-trees.
o
Crossover-probability = we tried the values 0.6 – 0.8.
·
Mutation = with some probability on each node, replace the sub-tree
on the node with a random tree of depth 0 – 3.
o
Mutation-probability = we tried the values 0 (no mutation) and 0.01 (per
node).
·
Population size = 500 – 600.
·
#Generations = 51
·
Initial population structure random trees, whose depth is in the
range 2 - 5. (For the first runs we tried 3-6). A full
tree is grown until the minimal depth is reached, and then (and until the
maximal depth) each node is set according to a growth probability (we
used 0.3-0.4). With that probability, the node is an internal root of a
sub-tree, and with prob. 1 - growth probability it is a terminal.
·
Time steps, maximal number of side effects, and annealing: Koza's paper set a
limit of 300 time steps and 1000 side effects operations per ant. To save
computation time, we didn't allow the first generations to perform 300 steps.
For each generation n, each program was allowed to perform n*m (we tried m=10
and m=15) steps, or at most 300 steps. (The maximum being achieved at the 20'th or 30'th generation).
·
Values of colors: For the values returned by functions on different
send grains colors, we tried the values 1-3 and 0-2 for the colored grains, -1
for colorless grains. (For all functions)
·
Initial desert: we usually tried to evaluate each generation on a different
Results with no annealing (i.e. up to 300 operations per program,
for all generations) population size 500 and no penalties on trees' sizes
The four following graphs show typical results, in experiments where no annealing was used, the trees initial depth was 3-6, and the variables we were changing were the values that functions return for different colors (1-3 or 0-2) and the existence of mutation.
Parameters: XO prob. 0.6, no mutation, Black-color value = 1




With an attempt to reduce computability time, we tried start with smaller trees and allow less operations on the first generations. (see Parameters section above).
Parameters for the following runs were XO prob. 0.6, mutation prob. 0.01, initial trees depth 2-5, and m (number of operation for generation 1) 10 and 15
m =



Introducing penalties on trees sizes:
In all the above runs, the best-of-runs trees included 100-200 nodes.
We next tried to produce smaller trees by adding to the fitness a penalty for tree size. Since we expect this to be harder task, and since we were using an annealing procedure that reduces computation time, we increased the population size to 600.
A few runs with penalty 0.2 (i.e. a tree of size n will have fitness penalty n*0.2) produced bad results – the best fitness (without the penalty) was about 110, and tress sizes were around 20 - 30. Setting the penalty to 0.1 gave us great results on the first attempt. The three best programs of the run are given below. All performed perfectly on sorting the grains and its representing trees included only 11, 13 and 16 nodes.
Program 1, size 11 (only survived for one generation):
(IFLTE
(X)
(IF-DROP
(MOVE-RANDOM)
(PICK-UP)
)
(GO-E)
(IFLTE
(CARRYING)
(X)
(GO-W)
(GO-N)
)
)
Program 2, size 13 (survived for a few generations):
(IFLTE
(IF-DROP
(GO-W)
(GO-N)
)
(IF-DROP
(MOVE-RANDOM)
(PICK-UP)
)
(GO-E)
(IFLTE
(CARRYING)
(X)
(GO-W)
(GO-N)
)
)
Program 3, size 16:
(IFLTE
(IF-DROP
(GO-W)
(GO-N)
)
(IF-DROP
(GO-N)
(PICK-UP)
)
(IFLTZ
(GO-E)
(GO-W)
(PICK-UP)
)
(IFLTE
(CARRYING)
(X)
(GO-W)
(GO-N)
)
)
Here is the graph of this run. (Best fitness is 101.1, composed of 100 for the desert value and a 1.1 penalty for the size)

Here is graph of a bad run, with the same parameters: (The fitness without the penalty was 106 and the penalty 4.4)

Five programs that were produced on a certain run are given below, along with the result of running each of them on the same random desert, after the evolution, allowing a fair comparison. The assigned fitness of each program is composed of their size penalty (0.1*size) and the value of the desert on which they were tested during evolution, 15*generation number time steps. Note that for one of the intermediate programs, generation 5, there is a considerable difference between the value of the desert on which the program was running during evolution, for 75 time steps, and the value of the desert on which the program was run after evolution, for 300 time steps.
Generation 0:
(IFLTE
(IFLTE
(X)
(MOVE-RANDOM)
(GO-N)
(PICK-UP)
)
(X)
(IFLTE
(GO-W)
(CARRYING)
(COLOR)
(MOVE-RANDOM)
)
(IFLTZ
(IF-DROP
(COLOR)
(PICK-UP)
)
(PICK-UP)
(MOVE-RANDOM)
)
)
size: 18 fitness: 175.8
Generation 0 Desert:
Desert Value 176
|_B||__||_S||__||__||__||__||__||__||__|
|__||_S||_S||__||__||4B||__||__||__||__|
|_B||__||_S||__||__||__||__||__||__||__|
|7G||_S||_S||_B||__||__||__||__||__||__|
|_B||_G||8B||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|6S||_G||0G||__||__||__||__||__||__||__|
|__||_G||__||__||__||__||2B||__||__||__|
|1B||_G||__||__||__||__||__||__||5B||__|
|_G||_G||_S||__||9S||3G||__||__||__||__|
Generation 5:
(IFLTE
(IFLTE
(X)
(COLOR)
(GO-N)
(PICK-UP)
)
(X)
(IFLTE
(GO-W)
(CARRYING)
(COLOR)
(MOVE-RANDOM)
)
(IFLTZ
(IF-DROP
(COLOR)
(PICK-UP)
)
(PICK-UP)
(MOVE-RANDOM)
)
)
size: 18 fitness: 109.8
Generation 5 Desert:
Desert Value 141
|_B||_G||_S||__||__||__||__||__||__||__|
|_G||_G||__||__||__||__||__||__||__||__|
|__||__||_S||__||__||__||__||__||__||__|
|_B||_S||_S||__||__||__||__||1B||__||__|
|_B||8B||_S||__||__||__||__||__||__||__|
|_B||4S||_S||7B||6S||__||__||__||__||__|
|2G||_G||5S||__||__||3G||__||__||__||__|
|_B||_G||__||__||__||__||__||__||__||__|
|_B||_G||__||9_||__||__||__||__||__||__|
|_B||_G||0S||_G||__||__||__||__||__||__|
Generation 11:
(IFLTE
(IFLTE
(X)
(PICK-UP)
(GO-N)
(PICK-UP)
)
(X)
(IFLTE
(GO-W)
(CARRYING)
(GO-N)
(MOVE-RANDOM)
)
(IFLTZ
(IF-DROP
(X)
(COLOR)
)
(PICK-UP)
(MOVE-RANDOM)
)
)
size: 18 fitness: 101.8
Generation 11 desert:
Desert Value 104
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||9G||7S||__||__||__||__||__||__||__|
|_B||6G||_S||__||__||__||__||__||__||__|
|_B||_G||3S||__||__||__||_S||__||__||__|
|_B||1G||_S||__||__||__||__||__||__||__|
|_B||_G||8S||__||__||__||__||__||__||__|
|5B||2G||_S||__||__||__||__||__||__||0_|
|_B||_G||4_||__||__||__||__||__||__||__|
Generation 17:
(IFLTE
(IFLTE
(X)
(PICK-UP)
(COLOR)
(PICK-UP)
)
(X)
(GO-W)
(IFLTZ
(IF-DROP
(COLOR)
(GO-N)
)
(GO-N)
(MOVE-RANDOM)
)
)
size: 14 fitness: 101.4
Generation 17 desert:
Desert Value 100
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||2G||_S||__||__||__||__||__||__||__|
|_B||_G||_S||9_||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||_G||3S||__||__||__||__||__||__||__|
|_B||_G||0S||__||__||1_||__||__||__||__|
|6B||_G||_S||__||__||__||__||__||4_||__|
|_B||_G||7S||__||__||__||__||__||__||__|
|_B||_G||8S||__||__||__||__||__||__||__|
|_B||_G||5S||__||__||__||__||__||__||__|
Best (smallest) of run, Generation 37:
(IFLTE
(IFLTE
(X)
(PICK-UP)
(GO-E)
(PICK-UP)
)
(X)
(GO-W)
(IF-DROP
(MOVE-RANDOM)
(GO-N)
)
)
size: 11 fitness: 101.1
Best of run desert:
Desert Value 100
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||__|
|4B||_G||_S||__||__||__||__||__||__||__|
|_B||9G||_S||__||__||__||__||__||__||__|
|5B||_G||_S||__||3_||__||__||__||__||__|
|_B||_G||1S||__||__||__||__||__||__||__|
|_B||_G||8S||__||__||__||__||__||__||__|
|_B||_G||_S||__||__||__||__||__||__||7_|
|_B||2G||0S||__||__||6_||__||__||__||__|
Introducing multiple programs:
This is done simple by giving GP-Genotype several roots. Each root a program. Accessing by crossover and mutate will pick
one of the roots at
For 3 program types:

GP #0:
(IFLTE
(Y)
(COLOR)
(X)
(IFLTZ
(IFLTE
(IFLTZ
(IFLTZ
(MOVE-RANDOM)
(IFLTZ
(Y)
(GO-S)
(PICK-UP)
)
(MOVE-RANDOM)
)
(IFLTZ
(COLOR)
(IFLTE
(Y)
(COLOR)
(GO-N)
(GO-N)
)
(COLOR)
)
(IF-DROP
(COLOR)
(Y)
)
)
(Y)
(PICK-UP)
(IFLTZ
(X)
(PICK-UP)
(Y)
)
)
(Y)
(IFLTZ
(Y)
(GO-E)
(GO-W)
)
)
)
GP #1:
(IF-DROP
(X)
(IFLTE
(IF-DROP
(CARRYING)
(IFLTZ
(Y)
(PICK-UP)
(GO-N)
)
)
(Y)
(GO-E)
(GO-S)
)
)
GP #2:
(IFLTE
(IFLTZ
(IFLTE
(IFLTZ
(CARRYING)
(IFLTZ
(IF-DROP
(PICK-UP)
(GO-S)
)
(GO-E)
(COLOR)
)
(IF-DROP
(COLOR)
(Y)
)
)
(Y)
(PICK-UP)
(IFLTZ
(X)
(X)
(Y)
)
)
(Y)
(IFLTZ
(X)
(GO-E)
(GO-W)
)
)
(IF-DROP
(IFLTE
(IFLTZ
(CARRYING)
(GO-E)
(GO-E)
)
(COLOR)
(X)
(GO-N)
)
(IFLTE
(GO-N)
(IFLTE
(CARRYING)
(MOVE-RANDOM)
(X)
(Y)
)
(IFLTE
(MOVE-RANDOM)
(GO-N)
(IFLTZ
(MOVE-RANDOM)
(X)
(PICK-UP)
)
(IFLTE
(X)
(IFLTZ
(X)
(GO-S)
(GO-W)
)
(GO-W)
(MOVE-RANDOM)
)
)
(IFLTZ
(GO-N)
(COLOR)
(X)
)
)
)
(CARRYING)
(IFLTZ
(GO-W)
(PICK-UP)
(IF-DROP
(IFLTE
(MOVE-RANDOM)
(GO-S)
(GO-S)
(MOVE-RANDOM)
)
(GO-N)
)
)
)
5 program types showed no improvements,
and even worse fitness.
See another example in the last example.
Introducing ADFs:
We added ADFs via mutate. When mutate create a
small tree (to be inserted into the GP) we saved it as an ADF.
Small tree: between 2 and 8 nodes.
An example for a created ADF, ADF #2:
(IFLTE
(GO-W)
(ADF-1)
(GO-N)
(CARRYING)
)
At first, we allowed up to 100 ADFs. This gave
bad results, meaning best desert value at about 120.
So, we tried limiting to a few ADFs. Here are the
results [the graphs look the same as before (quick decline and then leveling),
just the average-best fitness achieved wasn’t as good]:
Same desert, 2 ADFs allowed:
Generation 24 - fitness 100
Generation 31 - fitness 106
Generation 33 - fitness 110
Different desert, 2 ADFs allowed:
Generation 32 - fitness 106
Generation 22 - fitness 106
Generation 27 - fitness 118
Different desert, 0 ADFs allowed:
Generation 20 - fitness 110
Generation 27 - fitness 102
Generation 31 - fitness 115
Current conclusion is that ADFs based on mutation
are not effective. They did not give better or faster results. We should try
creating ADFs based on crossover, or just selecting
part of a GP with a high fitness (which is what happens in crossover).
Full combination test
A full combination gave the worst
results:
1. Adding the penalty to large programs.
2. 10 ADFs
3. 5 GP types.
Resulted in:
(please note
that the fitness of the desert alone is: 141)

Best Phenotype:
GP #0:
(GO-E)
GP #1:
(MOVE-RANDOM)
GP #2:
(IFLTE
(MOVE-RANDOM)
(IF-DROP
(ADF-9)
(GO-E)
)
(X)
(X)
)
GP #3:
(IFLTZ
(IF-DROP
(CARRYING)
(IFLTE
(PICK-UP)
(GO-N)
(ADF-8)
(IFLTZ
(ADF-6)
(GO-W)
(IF-DROP
(COLOR)
(Y)
)
)
)
)
(IFLTE
(MOVE-RANDOM)
(CARRYING)
(ADF-6)
(GO-E)
)
(IFLTE
(MOVE-RANDOM)
(IF-DROP
(GO-E)
(GO-E)
)
(X)
(X)
)
)
GP #4:
(GO-E)
1.
Better performance was achieved when the
values that the functions move-X, color, pick-up and carrying were 0-2, and not
1-3. It is not exactly clear why those values are better for a general program,
but for the compact best-of-run program given in Koza's
paper (page 9) it is easy to see that the code section (IFLTE color color pick-up) will work better when the value of black
grains is 0: if the value for black is
2. Annealing worked well in most cases even when the initial steps-limit was as low as 10. We assume that the reason for this is that for the first generations, most programs can be evaluated using less time-steps than required to complete the ordering of the grains. For example, a program that does not change the fitness (one without a "pick-up" function, or one that only moves north and south) needs not be run for many steps in order to be compared with a program that does some progress. Another example is Koza's best-of-run example reached fitness 135 after only 30 time steps.
3. Size: it is possible, although it doesn't work on every run, to produce small trees by adding a simple size penalty to the fitness. However, this only works on some runs and completely fails on others, even when the penalty is much smaller than the desert value. It seems that there is a surprising correlation between size and performance – on runs in which trees sizes were kept small, the programs were able to sort the grains better. This means that on that given problem, bigger (and deeper) is not better. This is possibly due to the bound on side-effecting operations (which makes parts of big trees redundant). Another reason may be that it is easier for small programs to avoid destructive side effects after the desired sorting of the grains is achieved: note that the use of annealing may cause the evolution of programs that do better with fewer time-steps. A natural possibility that arise is to force a limit on trees' size, rather than just a penalty to the fitness of large trees.
4. The combination of annealing and size penalty gives on-the-edge runs: either there is very slow convergence and the best of 51 generations was over nodes big and did not sort the desert perfectly, or a fast convergence that produces a small winning program after 15 to 25 generations.
5. Y function: in our compact winning programs, as well as in Koza's example for a winning program, there was no use of the Y function. This makes sense since the fitness only depends on the X coordinate of grains. It might be a good idea not to use this function at all.