Evolutionary Computation and Artificial Life – Assignment 1

Igal Shilman ID: 304369283 email: shilman@cs.bgu.ac.il

 

Question 1

Here is the list of all the schemata that 11010 belongs to:

11010

1101*

110*0

110**

11*10

11*1*

11**0

11***

1*010

1*01*

1*0*0

1*0**

1**10

1**1*

1***0

1****

*1010

*101*

*10*0

*10**

*1*10

*1*1*

*1**0

*1***

**010

**01*

**0*0

**0**

***10

***1*

****0

*****

 

Question 2

The average fitness is:  where m(H) is the number of instances matches H in the population.

a.        

 

 

b.       

c.

 

 

d.

 

 

 

e.

 

f.

 

g.

If the assumption that all strings are present in the population were dropped then

We will get less accurate approximation to the average fitness of the schemata

For instance:

But if only the strings 00100, 00110 and 00010 were present then:

We are sampling fewer values therefore we get less accurate average.

 

 

 

 

 

 

 

 

Question 3

1.        

a.  o(*****111) = 3 ,l(*****111) = 2       o(1*****1*) = 2 , l(1*****1*)=6

 

b.      the probability for the survival of the schema *****111 after mutation is:

And the prob’ for 1*****1* to survive is:

 

c.       The probability that crossover will destroy a schema is not greater then:   therefore the probability to survive is at least:

So for *****111 the probability to survive  is : 0.75

And for 1*****1* is : 0.27

 

 

 

Question 4 (the code is at /ass1/q4.scm)

The goal is to find the minimum of the real valued function g[x,y] in the domain [-2,2] 

  

 

Text Box: The graph of g[x, y] in [-2, 2]

 


As stated in the question description we’ll use the following operators:

 

 

Parameters

Kind

Operator

 

Fitness proportionate selection

Selection

With  a crossover rate of 0.7

Single point crossover

Recombination

Mutation rate of 0.001

Bitwise mutation

Mutation

 

 

Fitness function: 

Population size: 100 genomes.

Generations: 1000

Representation:

The genome is a binary string of length 2n representing two integers each of  bits.

The first  bits represent x and the rest represent y

 

  

 x and y are in the range  And they should be mapped to real numbers in the [-2,2] interval, therefore they must be scaled. It can be done via simple transformation:

   Which maps integers from  to reals in [-2,2].

The recombination and mutation will be used without any special adaptations.

 

Experiments: 

In the first experiment let’s define n = 10, thus it is possible to sample 1024 different points (in jumps of ) and run execute the GA few times.

Run 1 - 3:

In all the runs the GA converged to the point (-2/1023,  238/1023) which gave a fitness value of 0.895956644650997 and g evaluated at this point to 0.116125435276537 which is pretty close to the minimum value that “Mathematica” found.  , in all the runs the best individuals became dominant quite early – in generations 34, 11 and 7 and quickly took over the entire population in few generations using selection and recombination.

 

 

 

In run 3 (for example) we can see how the individual takes over the population and how quickly the average fitness “catches up” with the best fitness. The best individual appeared in the population in the 7th generation. So we can see at the graph below, the best /average fitness up to the 15th generation rising rapidly.

 

After the super individual took over the population the average fitness is bounded between 0.84 and 0.9 and converges quickly to the best fitness.

 

 

 Best and average values after super individual took over

 

 

 

Run 4

This time we’ll set n = 16 thus we are able to sample 65536 different points (in jumps of  ) so we’d except to see an higher precision and to get a value that is closer to the minimum value in the domain which is approximately ~ 0.116016 .

And indeed this run gave us the point (-2/65535 15238/65535) in generation 345 which holds:

 

 

 

Summery

 

Despite the fact that this function has lots of local minimums in that range, the GA succeed to converge to a good  value, yet the convergence usally happened in first (< 100 )generations of evolution.  so maybe a diffrent operators or representation is needed here, for instance maybe rank selection that make it harder for a supper individual to take over the population or maybe dynamic mutation paramter that will favor exploration at the beginning and at later phase it will favor explotation of good regions.

 

Question 5

 

If after a certain number of generations fitness no longer increases then it is possible that the algorithm had reached the optimum, but it is more likely that the algorithm converged prematurely to a local optimum.

In that case either a different selection method should be considered like rank selection

That makes it harder for a super individual to take over the population and to destroy the variance in the population, or maybe higher mutation rate should be considered to allow the algorithm to escape from a local optimum.

 

 

Question 6 (the code is at /ass1/q6.scm)

The goal is to find a subset  (list of primes less then 100) which sums up to 437.

Representation:

The genome is a binary string of length 25.

And the genotype to phenotype will be evaluated as follows:

Where iff genome[i] = 1 then the prime[i] will be included in subset S.

 

Fitness function: 

Where

( is the ith prime number.)

 is in the range [0,1060]  therefore  is in the range [0,623]

Plot of f(x) in [0,1060]

 

The maximum of  in that range, is at, ()

So trivially if we’ll find the maximum of  then the binary string will indicate the subset of the primes that sums up to 437.

 

 

Parameters

Kind

Operator

 

Fitness proportionate selection

Selection

With  a crossover rate of 0.7

Single point crossover

Recombination

Mutation rate of 0.001

Bitwise mutation

Mutation

 

Population size: 50 genomes.

Generations: 100

Experiments: 

Runs 1-3: All the first 3 runs succeeded finding a subset that’s sums up to 437, and managed to do so in generations 4 , 5 and 15. This means that all of them received the max fitness value: 623 Here is one genome for example: (1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0) which defines the subset: {2 5 7 11 23 67 71 79 83 89} that sums to 437.

 

Average of the 3 first runs

 

Another constraint added: find the subset that sums to 437 and has minimal length.

To handle this constraint we’ll change the fitness function to:

 

we’ll test different values for  = 16,32 , 40

Now we’d except to maximize the number of zeros and to minimize the difference between the subset sum and 437.

 

 

Results: (the code is at /ass1/q6b.scm)

 

Table shows best solutions found per α

Best fitness

generation

sum

length

16

911

25

437

7

32

895

100

437

8

40

1341

6

435

7

 

 

We can see here that for big values for  ( > 40) the “left” side of the fitness functions takes over and the right side which is responsible for the sum of the subset has smaller effect on the entire fitness value.

We’d except to see symmetric effect with small values of , yet for this problem the algorithm found genomes with 7 bits set to 1 even for small values of  such as