Computational Geometry and 3D
Structure in Molecular Biology
|
Honors class
Spring 2002
|
Klara
Kedem - klara@cs.bgu.ac.il , room 308,
Bldg 58, 647-7845.
Contents
Overview
My goal is to look at some algorithmic problems
related to three-dimensional structures in chemistry and molecular biology,
primarily from the perspective of computational geometry. There will be a brief
introduction on computational geometry.
Schedule
- Week 1:
Introduction and some neat examples in computational geometry
- Week 2:
Analyzing protein shape with the unit-vector RMS
- Week 3: Finding
the consensus shape for a protein family
Abstracts of talks
- Week 1: TBA
- Week
2 (and paper): We consider the problem of identifying common
three-dimensional substructures between proteins. Our method is based
on comparing the shape of the alpha-carbon backbone structures of the
proteins in order to find 3D rigid motions that bring portions of the
geometric structures into correspondence. We propose a geometric
representation of protein backbone chains that is compact yet allows for
similarity measures that are robust against noise and outliers. We
represent the structure of the backbone as a sequence of unit vectors,
defined by each adjacent pair of $\alpha$-carbons; we then define a measure
of the similarity of two protein structures based on the RMS (root mean
squared) distance between corresponding orientation vectors of the two
proteins.
Our measure has several advantages over measures that are commonly used for
comparing protein shapes, such as the minimum RMS distance between the 3D
positions of corresponding atoms in two proteins. This new measure
behaves well for identifying common substructures, in contrast with
position-based measures where the nonmatching portions of the structure
dominate the measure. At the same time, it avoids the quadratic space
and computational difficulties associated with methods based on distance
matrices and contact maps. We show applications of our approach to
detecting common contiguous substructures in pairs of proteins, as well as
the more difficult problem of identifying common protein domains (i.e.,
larger substructures that are not necessarily contiguous along the protein
chain).
- Week
3 (and paper): We define and prove properties of the
"consensus shape for a family of proteins", a protein-like
structure that provides a compact summary of the significant structural
information for a protein family. If all members of a protein family
exhibit a geometric relationship between corresponding alpha carbons then
that relationship is preserved in the consensus shape. In particular,
distances and angles that are consistent across family members are
preserved. For the consensus shape, the spacing between successive
alpha carbons is variable, with small distances in regions where the members
of the protein family exhibit significant variation and large distances (up
to the standard spacing of about 4 angstrom) in regions where the family
members agree. Despite this non-protein-like characteristic, the
consensus shape preserves and highlights important structural
information. We describe an iterative algorithm for computing the
consensus shape and prove that the algorithm converges. We also present the
results of experiments in which we build consensus shapes for several known
protein families.
List of Papers
Below, we list some papers on relevant topics that might
be interesting:
We use some abbreviations for various journals
and conference proceedings below:
- SCG = Proceedings of the ACM Symposium on Computational
Geometry
- JMB = Journal of Molecular Biology
- PSFG = Proteins: Structure, Function, and Genetics
- JCICS = J. Chem. Inf. Comput. Sci.
Structural Domains in Proteins
- G.M. Crippen, ``The tree structural organization
of proteins.'' JMB 126(1979), 315.
- G.D. Rose. ``Hierarchic organization of domains
in globular proteins.'' JMB 134(1979), 447.
- L. Holm, C. Sander. ``Parser for protein folding
units.'' PSFG 19(1994), 256.
- R. Srinivasan, G.D. Rose. ``LINUS: A hierarchic
procedure to predict the fold of a protein.'' PSFG 22(1995), 81.
- L. Holm, C. Sander. ``Searching protein structure
databases has come of age.'' PSFG 19(1994), 165.
Distance Geometry and Molecular Conformation
- G.M. Crippen, T.F. Havel, "Distance Geometry
and Molecular Conformation." Research Studies Press LTD, 1988, pp.1--23,
139--191.
- B. Hendrickson, "Conditions For Unique Graph
Realizations." Siam Journal of Computing, Vol. 21, No. 1, February
1992, pp. 65--84.
- B. Hendrickson, "The Molecule Problem: Exploiting
Structure in Global Optimization." Siam Journal of Computing, Vol.
5, No. 4, November 1995, pp. 835--857.
Overview Articles on Molecular Similarity
and Protein Resemblance
- L. Holm, C. Sander. ``Mapping the protein universe.'' Science 273(1996),
595.
- J-P. Doucet, J. Weber. ``Molecular similarity.''
in Computer-Aided Molecule Design: Theory and Applications.". Springer,
1996. pp. 328--362.
Geometric Similarity in Drug Design
- G. Milne, S. Wang, M. Nicklaus. ``Molecular modeling
in the discovery of drug leads.'' JCICS 36(1996), 726.
- P. Indyk, R. Motwani, S. Venkatasubramanian.
"Geometric Matching under Noise: Combinatorial Bounds and Algorithms."
- P.W. Finn, L.E. Kavraki, J.C. Latombe, R. Motwani,
S. Venkatasubramanian. ``Search techniques for rational drug design.''
- P.W. Finn, L.E. Kavraki, J.C. Latombe, R. Motwani,
C. Shelton, S. Venkatasubramanian, A. Yao. ``RAPID: Randomized pharmacophore
identification for drug design.'' SCG 1997, 324.
- T. Akutsu, H. Tamaki, T. Tokuyama. ``Distribution
of distances and triangles in a point set and algorithms for computing
the largest common point sets." SCG 1997, 314.
- T. Akutsu, M.M. Halldorsson. ``On the approxiamtion
of largest common subtrees and largest common point sets.'' Lecture Notes
in Computer Science 834, Springer, 1994, 405.
- Lydia E. Kavraki. "Geometry and the Discovery
of New Ligands." Stanford University, Stanford, CA.
Geometric Similarity for Protein Structure
Comparison
- Liisa Holm and Chris Sander. "An
Evolutionary Treasure: Unification of a Broad Set of Amidohydrolases Related
to Urease." PROTEINS: Structure, Function, and Genetics 28:72-82,
Wiley-Liss, Inc., 1997.
- Gerrit Vriend and Chris Sander,
"Detection of Common Three-Dimensional Substructures in Proteins."
PROTEINS: Structure, Function, and Genetics 11:52-58 (1991).
- Christine A. Orengo, Nigel P. Brown,
and William R. Taylor, "Fast Structure Alignment for Protein Databank
Searching." PROTEINS: Structure, Function, and Genetics 14:139-167
(1992).
- Helen M. Grindley, Peter J. Artymiuk,
David W. Rice, and Peter Willett, "Identification of Tertiary Structure
Resemblance in Proteins Using a Maximal Common Subgraph Isomorphism Algorithm."
J. Mol. Biol. (1993) 229, 707-721.
- Liisa Holm and Chris Sander, "Protein
Structure Comparison by Alignment of Distance Matrices." J. Mol. Biol.
(1993) 233, 123-138.
- Unit-vector
RMS (URMS) as a tool to analyze molecular dynamics trajectories, K.
Kedem, L.P. Chew and R. Elber, Proteins: Structure, Function and
Genetics, 37(1999), pp. 554-564.
- Fast
detection of geometric substructure in proteins, L.P. Chew, K. Kedem,
D.P. Huttenlocher and J. Kleinberg, Journal of Computational Biology,
6:(3-4)(1999), pp. 313-325.
- Finding
the consensus shape of a protein family, L.P.Chew and K. Kedem, ACM
18th Symp. on Computational Geometry,
Barcelona, Spain, June 2002.
Geometric Hashing
- Daniel Fischer, Raquel Norel, Ruth Nussinov, Haim
J. Wolfson. "3-D Docking of Protein Molecules."
- Daniel Fischer, Ruth Nussinov, Haim J. Wolfson.
"3-D Substructure Matching in Protein Molecules."
- Haim J. Wolfson. "Model-Based Object Recognition
by Geometric Hashing."
- Yehezkel Lamdan, Haim J. Wolfson. "Geometric
Hashing: A General and Efficient Model-Based Recognition Scheme."
Proceedings of the Second International Conference on Computer Vision,
Tampa, FL, December 5-8, 1988.
- Bilha Sandak, Ruth Nussinov and Haim J. Wolfson.
"Docking of Conformationally Flexible Proteins." Lecture Notes
in Computer Science, Combinatorial Pattern Matching, 7th Annual Symposium,
CPM 96, Laguna Beach, CA, June 1996.
- Daniel Fischer, Shuo Liang Lin, Haim J. Wolfson
and Ruth Nussinov. "A Geometry-based Suite of Molecular Docking Processes."
JMB (1995) 248, 459-477.
- Raquel Norel, Shuo L. Lin, Haim
J. Wolfson, and Ruth Nussinov. "Shape Complementarity at Protein-Protein
Interfaces." Biopolymers, Vol. 34, 933-940 (1994).
Geometric similarity in drug design
- G. Milne, S. Wang, M. Nicklaus. ``Molecular modeling
in the discovery of drug leads.'' JCICS 36(1996), 726.
Simplified Geometric Models of
Protein Folding
- S. Sun. P.D. Thomas, K. Dill. ``A simple protein
folding algorithm using a binary code and secondary structure constraints.''
Protein Engineering 8(1995), 769.
- K. Yue, K. Dill. ``Folding proteins with a simple
energy function and extensive conformational searching.'' Protein Science
5(1996), 254.
Some Relevant
WWW Links
- TRADES
background
on proteins from the Samuel Lunenfeld Research Institute
- The Protein Data Bank (PDB)
- Structural classification of
proteins (SCOP)
- A
large resource page on computational biology at George Mason University.
- A
large resource page on bioinformatics at the Institut Pasteur.
- CARB Biocomputing
Resources.
- A
list of protein folding groups on the web.
- The
WWW Virtual Library page on biomolecules.
Return to top of page