Dr. Daniel Fischer
Mini-Project in Bioinformatics
Build a web server to assess the quality of protein models.
- Milestone for Nov.1 (postponed to Nov. 4 before midnight):
- Learn HTML, perl and cgi basics and build an html form to upload two files.
- Milestone for Nov.15 (next meeting: Nov. 15):
- Have the html interface ready to input a predicted model and a real protein.
- Obtain the list of C-alphas that appear in both models and superimpose them using besl.
- Report the RMS and the transformation.
- Show each model in a different window using a graphics program (RasMol, Molscript, PDB3D, all freely availabe thru the internet; you can choose the one you like;
(but will have to learn how to use Molscript anyway).
- Show the superposition of the predicted model onto the real protein in a third window. Use coloring
for each atom of the predicted model according to its distance to the real protein.
- Milestone for Nov.29 (next meeting: Nov. 29):
- Allow to enter a file in AL format, and automatically convert it to PDB using AL2TS
(at http://predictioncenter.llnl.gov/local/al2ts/al2ts.html). You can manually get AL files
at http://PredictionCenter.llnl.gov/casp3/results/casp3-browsers.cgi. There you can
click "3D coordinate predictions", select target number(s) and group name(s) or all groups,
click start, click "look at the results", and then if the second entry name contains AL,
click on the first column (the name of the group), and a list of the groups' predictions
in AL format appears.
- Allow to enter a pdb code, and automatically download the pdb file from the pdb site (at
ftp://ftp.rcsb.org/data/structures/all/pdb/). To know which pdb codes correspond to the
T00xx models, you can see the table at http://PredictionCenter.llnl.gov/casp3/targets/templates/targets.html (for example, T0043 has a PDB code 1HKA).
- Milestone for Dec. 6:
- Add an automatic option to bring AL files from http://PredictionCenter.llnl.gov/casp3/results/casp3-browsers.cgi.
- Run MaxSub for the model and the pdb, and report numerical and graphic results. The object code of
MaxSub is available at ~dfischer/.html/miniproject/maxsub. To run it, you have to call it with 3 arguments:
pred_file pdb_file threshold, where pred_file is the prediction file, pdb_file is the pdb file and threshold
is a float value of the maximum threshold allowed (for now use 4.0). This is an example
output file . What you need is the list of CA's given by "REMARK YOUR GOOD MODEL COORDINATES".
You also have to compute a "GL" score using the following formula: For each CA, compute its distance
to its corresponding atom: d_i. Compute the score as: sum of all i's of: 1.0 / (1.0 + (d_i/d0)^2 ), where d0
is the threshold used in maxsub. Then compute ngaps as the number of gaps in the subset. Finally,
output the GL score as: score - ngaps/2.0.
- After submission DO NOT automatically open the graphical window. Put only clickable
items that after clicking it will open the window. Allow also for a numerical output,
listing the distance after superposition of each atom (this needs be done for the
full superposition and also for the subset superposition of MaxSub).
- Generate files NOT in your writable directory, and with dynamic names.
- Milestone for Dec. 13 (next meeting Dec. 13):
- Let the user select the subset he wants to consider, instead of that obtained by maxsub. This should be
done with two options: The first is to show a list
of the CA's and let the user click those that he wants to select. The second option is to show the model
graphically in an applet, and by clicking on an applet, select the desired atoms. In this option,
colour those selected atoms in some special color, and in another window, show the list of selected atoms.
Double click, unselects. After selection, put a "done" button. The applet should show the model and pdb
superimposed using all atoms. The deadline for this second option is at the end of the project.
- Milestone for the end of project:
- Run maxsub (not thru the web) and compute the GL score for each of the predictions of each of the pdbs
listed below, and
create a summary table of scores, with totals by group. The list will appear here soon. The table should
be a web document with clickable items to see details. Produce 5 different tables, for 5 different thresholds:
2.0, 3.0, 4.0, 6.0 and 8.0. Think of how to compare the tables and say something clever about them.
You can find here the list of targets .
Material for the mini project
- Routines to compute the optimal superposition: besl.c and eigen.f .
- How to compute T(p) (the transformation T of a point p).
- Directory with predicted protein models can be found at : ~bioinbgu/erez/CASP3/
- Directory with real protein models can be found at : ~bioinbgu/erez/pdbs/