CAFASP-1 has
encouraged developers of automated methods to have their programs
available to the community in the form of easy-to-use computer servers.
Assessing
the performance of automated programs can help a human predictor
to establish a better strategy on how to use the automated results
for his or her particular prediction.
Finally, CAFASP-1 has been useful in
identifying the requirements for a
future blind trial of automated served-based fold prediction.
Fold recognition addresses a subset of the more general problem
of prediction of the three-dimensional structure of a protein from its amino acid sequence.
Fold-recognition methods search a library of known folds to find
the most compatible protein for a given target sequence of unknown structure.
The predictive power of such methods was clearly demonstrated in blind tests, such as CASP3,
where prediction targets were not known at the time the predictions were made.
However, to be accepted as a research tool, such methods must be used by a wider
community in actual research. The recent growth of the
Internet provided a perfect opportunity
for groups developing structure prediction algorithms to make them available to
their potential users: scientists interested in seeking structural insights for their
particular research problems. Led by the astounding popularity of existing web servers,
several groups have made their fold-recognition methods available to the community.
In this manuscript, we demonstrate the performance of fully automated fold-recognition
methods on a set of sequences, selected from
the prediction targets of the CASP3 experiment. Most of
us took part in this meeting and submitted predictions, which in
some cases are discussed in other manuscripts in this special issue of Proteins.
Therefore, it is important to stress the differences between results presented in this
manuscript and those available on the official CASP3 web site and discussed in
other manuscripts in this issue.
Results presented here were generated automatically by web servers from submitted sequence
data, with no human-expert intervention involved.
In this respect, predictions presented here represent raw data that could be available
to any user of a fold prediction server. An expert could easily improve such
predictions, but his or her work often includes steps that are difficult to describe
in a quantitative way and that are not easily reproducible.
Such expertise was used extensively for predictions presented at the CASP3 meeting.
In many cases extensive work by groups of several people was necessary to prepare a
successful prediction. Someone lacking specialized expertise in fold prediction
may not be able to achieve the same prediction accuracy as some of
the predictions
submitted at the CASP3 meeting. On the other hand, results from this manuscript are
obtained with a few simple mouse clicks. Another difference between the results
presented here and those from CASP3 is that for the latter, all the predictions were
blind, whereas in CAFASP1, the experimental structures of all but 4 of the targets
were already known. Thus, CAFASP1 is not a blind experiment. In addition,
because the goals of CASP3 and CAFASP1 are different,
the evaluation procedures used were also different. Consequently,
the results shown here are not comparable with those reported in CASP3.
Our insistence on full automation of the fold prediction process is not meant to
belittle or to cast any doubt on the importance of the specialized expertise in fold
predictions. But for certain purposes, such as the evaluation and
comparison of various prediction algorithms and strategies, it is useful to have
fully automated and easily reproducible methods. This evaluation assesses the
performance of the methods only and not the performance of humans using machines.
This is what a non-expert user is most interested in: "Which program(s)
should I choose to use in order to predict the structure of this new sequence?".
And last, but not least, fully
automated methods are necessary to apply fold prediction to large groups of protein
sequences, such as those available from genome projects. CAFASP1 attempts to provide
an assessment of the capabilities of current fold-recognition servers.
For CAFASP1 we have used as benchmark for our methods the CASP3 targets.
We have classified the targets into 7 categories:
Introduction
Methods
1. The automated methods (in alphabetical order).
Seven groups actively participated in CAFASP1.
Table I
lists for each group, the name of the server evaluated, its url and
the corresponding reference;
Table II summarizes the main characteristics of the servers.
2. The targets.
1. Targets with folds at SCOP's [1] family level.
In this category we chose the five targets with lowest sequence
similarity to their corresponding folds from those classified in
CASP3 as having homologous at the family level; the other
family-level CASP3 targets were excluded from CAFASP1 because
they do not pose any challenge to fold-recognition methods.
The targets of this category included in CAFASP1 are
T0055 , T0057 , T0068 , T0070 , T0062 (for a full protein description see the assessors'
papers in this Issue).
2. Targets with folds at SCOP's superfamily level:
T0074 , T0081 , T0083 , T0063 , T0053 , T0044 , T0054 , T0085 , T0080 .
In CAFASP1, we used an evaluation scheme that tested only fold recognition, and not alignment. Each server produced a list of top-scoring folds, and if the first correct fold appeared at rank i , then 1/i points were awarded. (This scoring scheme is similar to that used for CASP-1.)
The rationale of this scoring system is as follows: Suppose a program always has the correct answer within the top i ranks; if only a single answer is desired, then, on average, the correct fold will be predicted with probability 1/i.
For structure prediction, evaluating the quality of the sequence-structure alignments is critical, since fold-recognition methods can in some cases produce poor sequence-structure alignments. Unfortunately, for CAFASP1 evaluating alignments was not possible given the time constraints. Thus our evaluation procedure may award points to predictions that in CASP3 were considered incorrect. As progress in evaluation has been observed from CASP1 to CASP2 and CASP3, we hope that for CAFASP similar progress will be observed and alignment quality will be properly assessed in CAFASP2.
Another difficulty with this evaluation method is to identify what is the list of "correct" hits for each target. For the targets in category 1 (family-level) and 2 (superfamily-level) there was almost no difficulty. Any PDB chain with the same fold type in SCOP as the target was considered a correct hit; anything outside the fold type of the target was considered to be wrong. The only exception was T0085, which belongs to a "fold" in SCOP which, according to SCOP, is not a real fold, but only a collection of different folds. Thus, for T0085 we only accepted as correct hits those entries in T0085's superfamily.
Since we used SCOP to determine what folds were to be treated as correct, any reported hits that did not have a SCOP classification were excluded from the ranking before scoring.
In addition, we had to decide how to evaluate multi-domain targets (T0063, T0081, T0083 and T0079), each of which has two domains. For the full-sequence tests, hits belonging to the fold type of either domain were considered correct. The separate domains of these targets were evaluated in the domain-level category (see below).
Unfortunately, we had to exclude T0079 from the full-sequence tests, and include it only in the domain tests. As a whole, it was not easy to select a single fold type as correct for T0079; several fold types can be considered as good hits, but only for the individual domains.
For the targets in category 3 (fold-level) we applied the same criteria (accept as correct hits those entries classified in SCOP as the same fold type). However, there were 3 targets (T0077, T0061 and T0075) for which we could not determine what single fold type should be considered as a correct hit. In addition, the similarities of these targets to known folds is weaker and do not cover the full sequences. Thus, we decided not to evaluate these targets and place them together with T0079 into category of non-evaluated targets (the results of the automated methods on these targets are included in the CAFASP1 web page at http://www.cs.bgu.ac.il/~dfischer/cafasp1/cafasp1.html). The targets in category 7 can be considered as not suitable for our simple evaluation method.
In category 4 we placed the domain predictions. Although determining the exact boundaries of a domain requires knowledge of the structure, we wanted to evaluate our methods also on single domains. In many cases, rough domain boundaries are also known from the sequence alone, as has been demonstrated in several cases in the CASP3 predictions. The evaluation criteria used here was the same as for the above categories.
No evaluation was possible for category 5, as the structures of these targets are still unknown. Thus our predictions for these targets are truly blind predictions. When the structures are released we will be able to evaluate them.
Finally, the two targets considered to be novel folds were placed in category 6. An ideal fold-recognition method should be able to identify also new folds, or at least give a very low score for the top-ranking fold. We filed predictions for targets in this category to observe how high our top-ranking fold scored for novel folds. This can be helpful in setting confidence thresholds for the methods when applied at larger scales.
The lists of folds considered to be correct hits for each target are included in the CAFASP1 web page.
To identify some of the remarkable predictions, that is, good predictions for targets that few predictors did well on, we applied the following normalization. Let Sij be the score received by program i on target j (computed as one divided by the rank of the first correct hit). Let Tj be the sum of scores for target j. We define the normalized score as Sij * Sij/Tj. The larger the normalized score, the more remarkable the prediction of program i for target j is.
The normalized scores do not change the overall rankings of the programs by much, and so are not shown, though the information is available on the main web page and the summary results web page .
To allow for a detailed comparison of performance
we computed for each category and program a partial total of the
inverse-rank scores.
For each category we observed which programs obtained the highest
total and subsequently added all the scores into an overall grade.
We verified that the results reported here
accurately correspond to those that are obtained by the automated programs.
Most results were checked by at least one
other person (besides the developer of the program).
Thus, a reader can submit the sequence of any of the targets and expect
to obtain essentially the same results (excluding the differences that
will appear due to possibly updated databases). All the programs
included in CAFASP1 are available freely through the internet.
Almost all methods did well on the close homology targets, although three
methods had some trouble with target T0068.
To really distinguish between the performance of the methods in this
category,
evaluation of the alignments and scores is needed (see Discussion).
This Category includes 9 targets and provides
most of the variation in the final ranking of the methods.
The two best-scoring programs (GenThreader and SDPMA2) received
5.00 and 4.43 points, respectively (column "SUPERFAMILY" in the
table).
GenThreader's performance is remarkable in that its 5.00 points were
obtained from correct predictions at rank one in five targets (and
zeroes in the other four);
fsrvr_SDPMA2 had correct hits in the top 10 ranks for 8 of the
9 targets, but only scored the correct hit first for three of the targets.
GenThreader [2]
was originally designed for
the purposes of structurally annotating genomic sequences. The method uses
a combination of traditional profile-based sequence alignment, a set of
threading potentials [3], and a neural network to evaluate the
quality of the implied structural model. In
the original version of GenThreader the profiles which make up the fold
library were generated using a standard multiple sequence alignment
method, but in the current version (used here) the profiles are generated
in one step by using PSIBLAST [4]. For each sequence-structure
alignment, the threading potentials are summed for the implied model, and
the energy sums and sequence alignment score are presented to a neural
network. The neural network is used to detect favorable combinations of
energy sums and alignment scores, and has been trained on known structural
similarities found in the CATH structure classification database
[5].
In the CASP3 predictions made by the Jones group, GenThreader was only
used as a pre-filter to detect superfamily matches. Apart from targets
T0074, T0083 and T0085, the GenThreader results were not considered
significant, and so most of the results were arrived at using a full
threading method (THREADER2).
frsvr_SDPMA2 is a variation of the SDP method previously described
[6],
which takes as input a multiple alignment of sequences homologous to the target
and the predicted secondary structure given by PHD [15].
The multiple alignment or profile can be built using any method, but the current
implementation in frsvr uses a very simple approach: it compiles
significant hits from SWISSPROT [8]
using a single BLAST [9]
search
and the multiple alignment is built using PROFILEMAKE [10].
The sequence-to-structure compatibility function combines
(i) the sequence similarity
between the multiple alignment and the
sequence of a protein of known structure with
(ii) the extent of agreement between the predicted and the
observed secondary structures. SDP uses the global-local alignment algorithm
[11] for ranking the folds in the library.
The library includes full PDB chains
and single domains from multi-domain
proteins and currently contains over 2000 entries.
The top ranks are sorted by their z-scores,
which are obtained from the distribution
of scores of the folds in the library. The
folds of newer (or C-alpha only) PDB entries corresponding to the best matches
for some of the CASP3 targets were not in the library; for CAFASP1 these have
been included. For filing predictions for CASP3, the Fischer group used human intervention and
in some cases, the fold obtained at rank one by frsvr was chosen. In other
cases, because the z-score of the rank one result was below a confidence threshold,
a different fold was chosen.
The highest normalized scores in this category were also obtained by
GenThreader and SDPMA2 for their correct prediction at rank one of
target T0085.
The second highest normalized score was obtained by 3
programs (BASIC, Karplus2 and Karplus3) in their correct
identification at rank one for target T0044.
The results for category 3 are shown in column "FOLD"
of Table III.
The best performing methods were Karplus1 and Karplus3 with 2.06 and
1.62 points, respectively, out of a possible maximum of 5.
Clearly, the performance of our methods in the "fold-level" targets is
not as good as that in the "superfamily" targets.
The most outstanding result when observing the normalized scores, was
obtained by Karplus1 on target T0043; it was the only program
identifying the correct fold at rank one.
The Karplus1 and Karplus3 methods are both SAM methods [12]. In
SAM-T98 a hidden Markov model (HMM) is constructed from a single sequence
and homologs that are found in a non-redundant protein database.
The method alternates between searching the database for homologs
using an HMM and realigning the homologs using Baum-Welch [12]
training on the HMM.
Only sequence information is used, not structure information.
All scoring with HMMs was done with local scoring summing over all alignments.
For Karplus1, an HMM was built for each fold in the fold library, and
the target sequence scored against all the HMMS.
For Karplus2, an HMM was built for the target sequence, and the entire
PDB database was scored.
For Karplus3, the HMM scores for the template and target methods were added.
For CASP3, hand-selection among the top few hits and hand-realignment was
done, but subsequent analysis indicates that the fully automatic
method does about as well overall as after modification by hand.
It is interesting to notice that the best performers in this
category were methods based on sequence-information alone.
We have no explanation for this phenomenon, and previous tests of the
SAM-T98 method indicated that it found the correct fold only when it
found the correct superfamily. One possible partial explanation is
that the SAM-T98 method relies on local alignment, so one does not
need to match the entire fold to find a match. The
sum-over-all-alignments scoring makes the method more tolerant of
somewhat incorrect alignments than the structure-based methods, which
usually require better alignments before they provide good
scores. However, it is not possible to arrive at far-reaching
conclusions from a sample of only 5 test cases.
When computing the sub-total from categories 1 through 3 the best
performers are GenThreader, SDPMA2, and Karplus3 with 11.11, 10.22, and
10.14 points, respectively (column SUBTOTAL in Table III).
These are the same top three as for the
superfamily category alone, which contributes most of the variation
between inverse-rank scores.
This category does not strictly belong to a fully automated
context because determination of the domain boundaries required
previous knowledge.
Nevertheless, because in an actual prediction experiment it is often
suspected what the boundaries are, we also tested our programs using
the exact domain definitions.
In this category the best performers were
SDPMA and SDPMA2 with 4.78 and 4.60 points, respectively,
out of a maximum of 7.00 (data
shown on our
summary results web page).
The SDPMA methods identified the correct
fold in rank one for four targets and in rank two for one target.
The next best performer in this category was (1D+3D)-PSSM with 3.81
points.
Fold recognition by 3D-PSSMs uses the SCOP database to identify remote
homologues that are superposed in 3-dimensions to obtain sequence
alignments that could not be obtained from sequence alone. The fold
library consists of representative proteins <40% identity from the SCOP
(called SCOP40). A master protein in the SCOP40 library is selected (say
A0) and using PSI-BLAST a 1D-profile is constructed incorporating sequences
with >25% identity. This 1D-profile is directly coverted into a 1D-PSSM
using the PSI-BLAST approach. Then from the structural alignment of say
protein domains B0 and C0 to A0, multiple sequence alignments are piled up
to generate a 3D-profile combining the 1D-profiles of A0, B0 and C0 which
is then converted into a PSSM. The search algorithm is a global dynamic
programming algorithm with the sequence of the probe matched against the
template PSSM together with scoring the similarity between predicted
secondary structure for the probe with experimental secondary structure of
the template. In 1D-3D-PSSMs, searches are made with the 1D- and the
3D-PSSMs and the results are pooled and sorted by expectation value. The
3D and (3D-1D)-PSSM methods have been developed substantially since they
were applied at CASP3.
The most remarkable normalized score achieved
in this category was obtained by BASIC
on target T0071.2. BASIC identified the correct fold at rank two,
whereas only two other methods had the correct fold at ranks eight or higher.
The next most remarkable normalized result was obtained
by SDPMA and SDPMA2 for identifying the correct fold
of target T0063.1 in rank 2,
whereas only one other method had the correct fold at rank nine.
Clearly, knowing the exact domain boundaries of a target sequence
is contributing significantly to the performance of most methods.
For example, for T0083, four methods did better when given the correct domain,
for T0063, 6 methods did better and one worse on the domains, and for
T0071, 9 methods did better on the domains.
When summing up all scores from all four categories, the top
programs are SDPMA2, SDPMA, GenThreader and Karplus3 with 14.82,
14.26, 13.90 and 13.44 points, respectively.
4. Reproducibility and validity of the automated results.
Results
Table III is a summary of
the results by program and by category. The individual inverse-rank scores
by target and program are available in the corresponding tables
from our main web page .
Category 1. Family-level targets
Category 2. Superfamily-level targets
Category 3. Fold-level targets
Category 4. Domain targets
Categories 5-7.
There is no evaluation for targets in these categories, but the
automated predictions submitted can be seen on our main
web page.
We will evaluate the predictions for targets of still unknown
structure when the structures are determined, and update the
appropriate web pages.
Not every aspect of CASP3 is ideal, however. Submissions in CASP3 often included manual input based on structural or functional interpretation of the results of algorithms. For example, given that a particular target was known to bind DNA, groups were quite free to ignore any highly ranked fold which was not also known to bind DNA. In CASP2, the best fold recognition results were entered by a group that did not use a fold recognition algorithm, but instead relied entirely on evolutionary inferences based on the known function of the target proteins [13]. In CAFASP1, however, the fold assignments have been made exclusively by automatic servers, without any human interpretation of the results.
Accepting the fact that CAFASP1 was not a blind test, entrants in CAFASP1 were permitted to augment their template libraries with an entry from a recently deposited set of protein coordinate entry if that was required to ensure there was a correct hit in their library (although not all the participants took the opportunity to augment their libraries). The evaluation process excluded newer entries that are not included in the latest release of the SCOP database, which roughly corresponds to the structures available at the time CASP3 predictions were filed. The object of CAFASP1 was to evaluate the algorithms and not how recently each group had updated their fold libraries. Of course, in real applications, the ease of updating the library is an important aspect of the utility of each method. Clearly a server which is frequently updated is going to have a significant advantage over a server using an out-of-date template library. Furthermore, entrants have been able to further develop their methods over the six months since the last CASP3 prediction deadline.
One of the conclusions from this study is that no single approach is markedly superior to the others evaluated when considered across the entire range of targets. Some methods performed better at the superfamily level, others at the domain level. All methods performed poorly at the fold level category; the differences between the methods in this category may not be statistically significant. The distinction between superfamily recognition and fold recognition is an important one. It might be expected that the methods which strongly weight sequence similarity would be the most effective methods at recognizing distant evolutionary relationships, whereas methods which emphasize structural information (e.g. predicted secondary structure and statistical potentials) would be most effective at recognizing similar folds in the absence of common ancestry. To some extent this was true in CAFASP1, although the relatively small number of targets does not allow to draw a general conclusion.
One limitation with the variety of methods tested in CAFASP1 is that no results have been included from pair potential-based threading methods (e.g. [14] [15]). Most of such threading methods are not available as servers, but are distributed as standalone software packages which must be installed on the user's own machine. In the assessment of fold recognition results in CASP3, three of the six groups selected to present their results made use of this type of fold recognition (see their corresponding reports in this Issue). Unlike the classic potential-based threading methods, all of the methods in CAFASP1 explicitly make use of the sequence information in one form or another. Several of the methods in CAFASP1 also incorporate some structural information from the available coordinates. The observed secondary structure of an entry in the template library is matched with that predicted for the probe in Topits, frsvr and PSSM. In addition PSSM uses structural similarity to obtain sequence profiles. Although GenTHREADER does not employ 3-D information in the alignment step, and hence is not a true potential-based threading method, it does make use of pair potentials in the evaluation of sequence-structure compatibility. However it is not clear from the CAFASP1 results whether these algorithms have been able to extract so much more information from the available three-dimensional structures that they provide a markedly superior performance to methods such as SAM-T98 and BASIC that exploit the signals provided from multiple sequences alone. However, the top performing methods in CAFASP1 do exploit structural information in addition to sequence information. To what extent their superior performance stems from their use of structural information or from other factors (such as better alignment algorithms or better statistical scoring measures) remains to be determined.
One area in which the CAFASP1 results are of particular interest is that of structural genomics. Automated approaches for fold recognition are essential if the wealth of data in genomes is to be exploited (e.g. [16], [17]). One important aspect of genomic fold assignment, however, is that folds must be assigned with a high degree of confidence. Even if a method frequently ranks correct folds in top place, if the scores for these matches are not significant then the results will be of little use for genome annotation. To assess this aspect of fold assignment it is necessary to evaluate how well a method discriminates correct match scores from incorrect match scores. Table IV shows that automatic fold-recognition methods are just beginning to discriminate correct from incorrect matches. Although a number of true positives were identified above the first false positive threshold (Th1), their scores do not necessarily lie above the methods' confidence thresholds (see last column of Table II). Improvements in this aspect will result in a wider applicability of automated fold-recognition methods at a genomic scale.
Beyond genome analysis, automated fold recognition servers enable the
wider community ready access to the software. It is therefore essential
that the accuracy of automatic methods of fold recognition are evaluated
to allow non-expert users to decide which methods are most reliable. The
CASP experiment has already highlighted the value of blind trials; of
course this must be extended to CAFASP. Although the results discussed
here are not from a blind trial, we consider that one important aspect of this
study is to explore what kind of strategy is required for comparative
blind trials of automated fold recognition. We intend for CAFASP2 to be
just such a blind trial, and will thus provide an invaluable insight into
the abilities and limitations of automated protein fold recognition.
[1] Murzin, A.G. and Brenner, S.E. and Hubbard, T . and Chothia, C.,
SCOP: a structural classification of proteins data base for the investigation of sequences and structures.
J. Molecular Biology , 247, 536-540, 1995.
[2] Jones, D.T. GenThreader
J. Molecular Biology , In press, 1999.
[3] Jones D.T., Taylor W.R., Thornton J.M. A new
approach to protein fold recognition. Nature , 358:86-89, 1992.
[4] Altschul, S.F. et al.,
Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acid Res. , 25,3389-3402, 1997.
[5] Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B.
& Thornton. J.M. (1997) CATH - a hierarchic classification of protein
domain structures. Structure. 5, 1093-1108.
[6] Fischer, D. and Eisenberg, D. Fold Recognition
Using Sequence-Derived Predictions. Protein Science,
5:947-955, 1996.
[7] Rost, B. and Sander, C.,
Prediction of protein secondary structure at better
than 70\% accuracy,
J. Molecular Biology 232:584-599, 1993.
[8] Bairoch, A. and Boeckmann, B.,
The Swiss-PROT protein sequence data bank.
Nucl. Acids Res. 20,2019-2022, 1992.
[9] Altschul, S.F. and Gish, W. and Miller, W. and Mye
rs, E.W. and Lipman, D.J. Basic local alignment tool
J. Molecular Biology 215, 403-410, 1990.
[10] Genetic Computer Group, 1991.
[11] Fischer, D. and Elofsson, A. and Rice, D.W. and Eisenberg, D.
Assessing the performance of inverted protein folding
methods by means of an extensive benchmark.
Proc. 1st. Pacific Symposium on Biocomputing 300-318, Jan. 1996.
[12] K. Karplus, C. Barrett, and R. Hughey.
Hidden Markov Models for detecting Remote Protein Homologies
Bioinformatics to appear 1999.
[13] Murzin, A. G. and Bateman A. Distant Homology Recognition Using Structural
Classification of Proteins.
Proteins , 105,112, 1997.
[14] Bryant, S.H. and Lawrence, C.E., An empirical energy function
for threading protein sequence through folding motif.
Proteins. , 16, 92-112, 1993.
[15] Hendlich, M. and Lackner, P. and Weitckus, S. and Floeckner, H. and Froschauer, R. and Gottsbacher, K. and Casari, G. and Sippl, M.J.
Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force.
J. Molecular Biology 216, 167-180, 1990.
[16] Fischer, D. and Eisenberg, D.
Assigning folds to the proteins encoded by the genome of Mycoplasma
genitalium.
Proc. Nat. Acad. Sci. 94:11929-11934, 1997.
[17] Rychlewski, L., Zhang, B., Godzik, A.,
Functional insights from structural predictions: analysis of the Escherichia
coli genome
Protein Science , in the press, 1999.
[18] Arne Elofsson, Daniel Fischer, Danny W. Rice, Scott M. LeGrand & David Eisenberg,
A study of combined structure-sequence profiles. Folding & Design
1, 451-461, 1996
[19] Huynen et al, Homology-based fold predictions for Mycoplasma genitalium Proteins.
J. Molecular Biology 280(3):323-6, 1998.
[20] Rost, B., TOPITS: Threading one-dimensional predictions into
three-dimensional structures.
Proc. Conf. Intelligent Systems in Molecular Biology, ISMB-95 ,
314-321, 1995.
[21] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler,
T. Hubbard, and C. Chothia.
Sequence Comparisons Using Multiple Sequences Detect
Twice as Many Remote Homologues As Pairwise Methods.
J. Molecular Biology
284(4):1201-1210, 1998.
Acknowledgments
We thank
J. Moult for his interest and support and for
inviting us to join this Special Issue of Proteins;
the many users of our servers and the many CASP3 participants
who encouraged us to bring about this event in very strict time
constraints;
the experimentalists that permitted
their sequences to be used as benchmarks in CASP3 (and CAFASP1);
and last, we thank the Internet technology advances that
allowed CAFASP1 to take place.
L A Kelley is supported by GlaxoWellcome.
LAK, RMM and MJES thank Dr Saqi (GlaxoWellcome) for helpful discussions.
A.G. and K.P. are supported by NIH grant no. GM60049.
KJK supported by NSF Grants BIR-9408579 and DBI-9808077,
and DOE grant DE-FG03-95ER62112.
References