A) Write programs to align two sequences using
dynamic programming, weighted edit distance and
affine gap penalties for:
1) global alignment
2) end-space free variant
3) local alignment
The weights are given by the 20 by 20 symmetric
matrix of Gonnet.
Run your programs first on short examples of strings
with 10-20 letters and then run them using each of the
7 sequences below (49 runs in total for each alignment type).
The gap initiation weight is 2.7 and the
gap extension weight is 0.15.
The programs should output:
- the names of the sequences followed by their alignment score
- the alignment. of the two sequences
- the number of identical letters matched.
For example, for 4hhba and 4hhbb the output should look like:
4hhba 4hhbb 73.58
Identities: 62 (45%)
(in parenthesis is the % of identities divided by the length of the larger sequence).
A nicer output would show where the identities are:
| | | | | | |||| | | ||| | | | | |
| | || ||||| | || | || || || |
|| || || | || | |||| | | | | | | ||
To present the results of your 4 x 7 x 7 runs,
report 4 7x7 tables, each for each type of alignment.
The entry i,j in each table will contain the
alignment score of sequence i with sequence j.
Include in your output the alignment of:
4hhba and 4hhbb with global
4hhba and 1dxtb900 with global
4hhba and 1dxtb900 with local
4hhba and 1dxtb900 with global-local
1dxtb900 and 4hhba with local
1dxtb900 and 4hhbb with global-local
In addition, report the 5 alignments with largest
scores using global-local (excluding the self-alignment).
For parts B and D, each pair of students will choose the
type of alignment algorithm on which they will
work. Please let me know the name of the pairs,
and I will assign to each pair the type of alignment
on which they will work with.
B) To measure the significance of the alignment score,
one can generate a number of alignments using random
sequences, compute the mean M and standard deviation S of
the distribution of scores, and report the number N:
N= (SCORE-M ) / S , where SCORE is the alignment score of
the original pair.
To generate the distribution:
- The first sequence is kept as is.
- Generate 500 random sequences of same length and composition
as the second sequence.
- Compute the alignment score of each random sequence with the
To generate the random sequences you can generate random permutations
of the second sequence.
1.- Explain how you generate the random permutations.
2.- Write a program to report the number N for a pair of aligned
sequences (using your the type of alignment chosen).
3.- Run it on the 7X7 pairs and report the results in a 7 by 7
table. Each entry will contain the value of N for the corresponding
4.- Besides the diagonal (self-comparisons) which are the 5 pairs with
highest value of N?
5.- Find the 5 pairs with highest alignment score from part A. Are these
the same as those identified in part B4?
C) For each type of alignment, explain in 2-3 lines its advantages/disadvantages.
Use various values of gap initiation weights. For example:
Gap initiation weight Gap extension weight
Explain in 5-10 lines for each of the alignment types, what is the effect
of the different gap weights.
Besides the printed output required, you have to provide a directory
which will contain both the source code and the executable.
I should be able to run your programs easily to reproduce your results.
Use easy parameter line commands for each option and for the input
Here are the 7 sequences. The first line is the name
of the sequence (it is not part of the sequence).
The following lines correspond to the sequence
until the first blank line.