CAFASP2 ALIGNMENT ACCURACY EVALUATION
This document contains four sections:
- 1. Evaluation Description.
- 2. Main Results.
- 3. Additional Automatic Evaluations.
- 4. Conclusions.
DISCLAIMER
The results of the automatic Fold Recognition evaluation are presented here
and are independent of any other human
assessment that may be carried out.
1. Evaluation Description
In this section we shortly summarize the
evaluation procedure applied:
- The Scoring Functions used and
- Multiple models considered for each target.
- Multi-domain partitioning.
- Target classification into Homology Modeling Targets, Easy, and Hard Fold Recognition Targets.
- The "N-1" ranking procedure.
- The categories of predictions: on-time and late.
- Links to the targets, models and all the data used in the evaluation.
2. Main Results
WARNING
In the results below, the servers are identified using their casp-id number.
The name and casp-id number of each server, is
given here .
These results correspond to the MaxSub evaluation
applied on the on-time, first models only.
The results are shown in a number of partitions.
The main result of the evaluation is based on the
MaxSub evaluation on the 26 Fold Recognition Targets,
but we also present the evaluation on the 15 Homology
Modeling Targets.
| Set |
# Targets |
Maximum Correct |
Best Servers |
Second Best Servers |
Third Best Servers |
Summary |
| Homology Modeling Targets (All) |
15 |
15 |
093 106 107 132 260 |
108 111 259 |
103 395B |
HM Table |
Fold Recognition Targets (All) Main Result |
26 |
5 |
093 106 108 395B |
103 259 |
132 260 |
FR Table |
Servers Appearing at the top ranks:
- 093 106 107 108 132 260 395B.
We subdivide the FR Targets into 5 easy targets
(T0100, T0101, T0104, T0109 and T0127)
and 21 hard targets, and present the subtotals for the
easy and hard targets separately:
| Set |
# Targets |
Maximum Correct |
Best Servers |
Second Best Servers |
Third Best Servers |
Summary |
| Easy Fold Recognition Targets (Easy). |
5 |
4 |
093 106 108 395B |
259 260 |
395 |
Easy FR |
| Hard Fold Recognition Targets (Hard). |
21 |
2 (results not significant) |
103 127 132 |
216 |
108 280 |
Hard FR |
As this subdivision shows, it is hard to derive significant conclusions from
the evaluation on the hard FR targets alone, because our evaluation awards
credit to very few models. To gain a better insight on the servers' performance
on the hard FR targets, other evaluation methods are needed, including
those with more lenient thresholds, sequence independent ones, or others
(see below).
A further subdivision of the HM targets
into 4 harder HM targets
(T0089, T0090 T0092 and T0103)
and 11 easy HM targets, and a partition of the 4 harder HM
targets + the 5 easy FR targets are presented here.
In addition, for each target we present the
detailed results of the
MaxSub evaluation, which includes individual statistics on all models
(including models 2 to 5 and late submissions).
Specificity Analysis.
Fold recognition servers usually report the list of top hits for each prediction, along
with an assigned score. We used this information to compute
the specificity or selectivity of the servers in the lines of CAFASP-1.
We classified the hits into correct and incorrect ones using the
MaxSub scores (FR Table above).
Then, we
computed the number of correct hits that a server produced above the
server's score of its largest incorrect hit. We did the same
for the scores of the following incorrect hits.
The specificity or selectivity results show that the servers
with the best specificity (106, 107, 259 and 260)
were able to identify most of the easy FR targets with better scores
than their first false positive.
The most selective server had 4 correct
predictions before its first false positive.
Using the classification of correct and incorrect predictons according
to the MaxSub scores at the larger threshold of 5.0 A (see the Additional Automatic Evaluations section below),
a similar result is obtained, but at this
threshold, servers 108 and 395 ranked higher.
Allowing one false positive, servers 132, 260 and 395 are at the top with
6 correct answers. (See also the specificity analysis using a CAFASP1-like
scoring system below).
3. Additional Automatic Evaluations
In addition to the above "official" evaluation, we have performed
various
different evaluations , including:
- MaxSub evaluation using
- the best of the top 5 models and
- normalization by Structural Alignment
- first models but at a MaxSub threshold of 5.0 A
- lgscore evaluation
- CAFASP1-like evaluation
- touch evaluation (experimental)
The results from these additional evaluations were very similar to
the official result above, showing only minor differences in the final ranking
of the servers.
4. Conclusions
What did the servers succeed to predict?
It is clear that with the strict MaxSub evaluation criteria,
which considered only on-time models no.1,
good models were predicted for all 15 HM targets and for the 5
easy FR targets. However, there was much lower success among
the 21 hard FR targets. When it is clearly determined which of
these 21 targets correspond to new folds, then we will be
able to know how many of these targets could not possibly be
predicted by fold recognition. Nevertheless, even for a target
with a known fold,
the fact that MaxSub scored a prediction with a zero
does not necessarily imply that the prediction is totally wrong. In some
cases, a prediction may have identified the correct parent but
due to alignment errors, only small regions were modeled accurately.
Because of the relatively low sensitivity of our automated evaluation,
a "human-expert" evaluation is required
to learn more about the prediction capabilities of the servers among the
21 hard FR targets. The casp human assessor may provide some
additional insights when he/she assesses the servers' models of these
hard FR targets.
In addition,
we searched for any predictions that may have
captured something about the true fold, and we have found that:
- No potentially valuable prediction could be found for targets
T0086, T0094, T0096.2, T0105, T0106, T0118, T0120, T0126.
- Valuable predictions that may have captured (at least in part) the
correct fold or a correct motif were found for targets
T0087.1 , T0087.2, T0091, T0097, T0098, T0102, T0107, T0108,
T0110, T0114, T0115, T0116, T0121.2.
- Some of the especially interesting predictions not scored in the above tables
were:
Here is a list of main conclusions that we can draw from CAFASP2:
- Hard to assess beyond the 15 HM targets and the 5 easy FR targets.
- Hard to use automatic evaluation on hard cases and especially when only a few
hard targets exist. To discriminate borderline predictions, more accurate automatic evaluation methods
are needed, although it is not clear how useful such predictions might be.
- At this point, the conclusions listed here are mainly based on the
evaluation within the easier FR targets.
- 4 servers better than the rest: ffas (395B), threaders, 3dpssm and inbgus,
but fugue is not much behind.
These servers are significantly better than pdbblast, even within the HM targets alone. sam-t99 also appears to follow closely after the top 4 servers, also showing
excellent performance in the HM targets, although with lgscore it ranked second.
- HM servers not better than FR servers on HM targets.
- The additional automatic evaluations generally confirmed the above findings, with very minor exceptions.
- Selectivities as bad as in cafasp1, but the difficulty of targets
has increased significantly. Selectivity on the 5 easy FR targets is
good.
- From the new servers that did not participate in CAFASP1, fugue is approaching the performance of the
top 4 servers.
- The ab initio server isites appears to give interesting , promising models for the targets where FR fails.
- For future CAFASP experiments, the raw output will be required to be
in PDB format containing at least C-alpha atoms.
- Taken together, the servers as a group identified roughly double the
number of correct targets than the best of the servers.
To determine how useful the servers as a group might have been for
a human predictor, it would be interesting to evaluate human participants
at casp who used the servers' results, as well as the cafasp-consensus
group predictions filed at casp.
CAFASP2 url: http://www.cs.bgu.ac.il/~dfischer/CAFASP2
|