Here is a list of main conclusions that we can draw from CAFASP2:
- Hard to assess beyond the 15 HM targets and the 5 easy FR targets.
- Hard to use automatic evaluation on hard cases and especially when only a few
hard targets exist. To discriminate borderline predictions, more accurate automatic evaluation methods
are needed, although it is not clear how useful such predictions might be.
- At this point, the conclusions listed here are mainly based on the
evaluation within the easier FR targets.
- 4 servers better than the rest: ffas (395B), threaders, 3dpssm and inbgus,
but fugue is not much behind.
These servers are significantly better than pdbblast, even within the HM targets alone. sam-t99 also appears to follow closely after the top 4 servers, also showing
excellent performance in the HM targets, although with lgscore it ranked second.
- HM servers not better than FR servers on HM targets.
- The additional automatic evaluations generally confirmed the above findings, with very minor exceptions.
- Selectivities as bad as in cafasp1, but the difficulty of targets
has increased significantly. Selectivity on the 5 easy FR targets is
good.
- From the new servers that did not participate in CAFASP1, fugue is approaching the performance of the
top 4 servers.
- The ab initio server isites appears to give interesting , promising models for the targets where FR fails.
- For future CAFASP experiments, the raw output will be required to be
in PDB format containing at least C-alpha atoms.
- Taken together, the servers as a group identified roughly double the
number of correct targets than the best of the servers.
To determine how useful the servers as a group might have been for
a human predictor, it would be interesting to evaluate human participants
at casp who used the servers' results, as well as the cafasp-consensus
group predictions filed at casp.