From BINF - Bioinformatics Centre
Contents |
Thomas Hamelryck
PLEASE SEE MY GROUP'S WEBSITE FOR UP-TO-DATE INFORMATION
I am group leader (associate professor) of the structural bioinformatics group in the bioinformatics center at the university of Copenhagen, led by Prof. Anders Krogh. I have a background in biotechnology and macromolecular crystallography, but am currently active in the field of structural bioinformatics. My main aim is the development of protein and RNA structure prediction, simulation and design methods, making use of probabilistic models.
Address
Thomas Hamelryck
Bioinformatics Center
University of Copenhagen
Room 1.2.22
Ole Maaloes Vej 5
2200 Copenhagen
Denmark
E-mail: thamelry -at- binf.ku.dk
Tel: +45 35321278
Research in structural bioinformatics
Bioinformatics is the study of large scale problems in molecular biology using computational tools. Structural bioinformatics studies problems that are associated with macromolecular structure. My research focus lies on the prediction of protein and RNA 3D structure, and related problems such as protein design, simulation of protein dynamics and inferential protein structure determination.
Probabilistic models of protein structure
FB5-HMM and TorusDBN are probabilistic models of protein structure, based on Dynamic Bayesian Networks and directional statistics. The models can be used to generate protein-like conformations that are compatible with a given amino acid sequence, on a local length scale. We expect this approach to protein structure sampling will replace fragment library approaches in the very near approaches, since it is conceptually elegant, computationally efficient and fully probabilistic. The latter is of great importance in Markov Chain Monte Carlo simulations of protein structure. An article describing FB5-HMM, a model of protein C-alpha geometry, made the cover of the September 2006 issue of PLoS computational Biology. In 2008, we published a related model of the full protein backbone (called TorusDBN) in PNAS.
Loop closure
In protein structure prediction, it is often important to construct a protein segment that bridges two fixed segments. This non-trivial problem is often tackled using algorithms from the field of robotics, ie. inverse kinematics methods. We recently designed a novel algorithm called Full Cyclic Coordinate Descent (FCCD) that is fast, easy to implement and extremely flexible. It is especially efficient for rebuilding the protein backbone making use of C-alpha positions only. An article describing the method is published in BMC Bioinformatics.
Measuring Solvent Exposure
Half Sphere Exposure (HSE) is a new method to measure amino acid solvent exposure in a protein structure. It is in many ways superior to the conventional measures and only requires the coordinates of the C-alpha atoms. We are using this measure in the context of structure prediction. An article on HSE is published with Proteins.
Mocapy
Mocapy is a toolkit for inference and learning in Dynamic Bayesian Networks (DBN). A Dynamic Bayesian Network is a machine learning method that can be used to develop probabilistic models of sequences. A DBN can be considered as a generalization of the better known Hidden Markov Model (HMM), but they have much more modelling power. DBNs can for example be used to model protein sequences, or for speech recognition. Inference and maximum likelihood (ML) and maximum a-posteriori (MAP) parameter learning is done using Gibbs sampling/Stochastic Expectation Maximization. Currently discrete (that is, Multinomial), Gaussian, Kent, Von Mises-Fisher and Dirichlet nodes are implemented. In practice this means that you can model sequences of symbols (ie. discrete observations), floats, vectors (of any dimension) and even unit vectors (using the Kent and Von Mises-Fisher nodes). The latter makes it for example possible to model bond angles in molecules. Mocapy can handle large datasets and can be run on a cluster computer or a desktop computer with a single CPU. Mocapy was originally implemented in Python, making use of the numpy, SciPy and pyMPI libraries. Mocapy is freely available from sourceforge under the LGPL license, and comes with a 50+ page manual. Mocapy++, a recent, fast re-implementation of Mocapy in C++, is available as well.
Bio.PDB
This is a Python library that allows you to access the data in PDB and mmCIF files. The data in the PDB file is represented by a Structure/Model/Chain/Residue/Atom data structure. The parser also does some integrity checks (ie. do all atoms and residues have a unique name?). This python library is part of the Biopython project, a set of freely available Python modules that deal with various aspects of boinformatics. Be sure to try out the CVS version, which contains some additional goodies and bug-fixes. People who want to contribute are welcome, BTW. An article describing this toolkit is published in Bioinformatics. Bio.PDB comes with extensive documentation.
Bio.PDB's features include:
- Support for mmCIF and PDB files
- Multiple models (i.e. in NMR structures) supported
- Insertion codes are taken into account
- Deals with anisotropic B-values
- Disorder is adequately handled (of atoms or complete residues, i.e. due to point mutations)
- It does a lot of sanity checking
- It's quite fast (10 s for the large ribosomal subunit - 64000 atoms)
- Fast atom neighbor lookup using a KD tree
- Identification of polypeptides
- Superposition of structures
- Various analysis tools (DSSP, residue depth, etc.)
- Coordinates are available as full-fledged Vector objects
- Keeping a local copy of the PDB up-to-date
- Writing PDB files
- Calculation of Half Sphere Exposure (a new solvent exposure measure)
- New features are added regularly!
Function from structure
I developed a new algorithm that makes it possible to identify recurring 3D patterns of side chains in a large set of structures (the method was applied to about 800 superfamily domains from the SCOP classification). It can also be used to identify potentially interesting sites in a single structure. The method incorporates a number of novel features that are not found in other similar methods:
- It deals with conservative amino acid subsitutions
- It deals with shifted C-alpha positions (ie. the side chain atom position coincide, but the backbone position is shifted)
- It can find mirror imaged side chain patterns
- It takes atom label ambiguities into account
- It is very speed and memory efficient by making use of an SR-tree data structure
The method has been used to identify various interesting novel active site similarities, and also identified a putative active site in bacterial luciferase.
The project won the Ishango prize 2001. The Ishango prize is part of the Operation Ishango campaign launched by the Brussels-Capital region to increase awareness of science and encourage young people to take up scientific careers. The competition awards two prizes of 2,500 euro to young researchers or science students working in the region, one french and one dutch speaking.
An article describing the method is published in Proteins.
Presentations
- Biopython's Bio.PDB module (PDF): Computational Representation of Macromolecules workshop, UCSD, San Diego, USA, 9-10/09/03.
- A 2D measure provides a new view of solvent exposure (PDF): Lund University, Lund, Sweden, 25/10/2004.
- ISCB, Greece, July 2007
Teaching
I'm teaching the obligatory Structural Bioinformatics course at the Bioinformatics center. Topics include introduction to protein structure, prediction of function from structure and prediction of local structure, solvent exposure and tertiary structure. I'm also teaching the Structural Bioinformatics section of the Introduction to Bioinformatics course, and an introduction to Dynamic Bayesian Networks and Mocapy as part of the advanced bioinformatics PhD course.
Former research interests
Protein-carbohydrate interactions
I recieved my PhD from the Free University Brussels (VUB), Ultrastructure Department, in 1999 on the subject of crystal studies of protein-carbohydrate interactions. I used the legume lectins as a model system to study the general features of carbohydrate binding sites in proteins. This led for example to the discovery of a general give-and-take mechanism that these proteins use to distinguish very similar carbohydrates.
My PhD thesis (gzipped PS|PDF) contains a broad introduction to legume lectin structure.
The proteins I studied include (click on the structure identifiers to go to the PDB):
- Lentil lectin (1LES): Used in a multi-disciplinary (NMR, molecular modeling and crystallography) study of protein-carbohydrate interactions.
- Phytohaemagglutinin-L (1FAT): the thing that makes raw beans toxic by binding to your gut. Every year a substantial amount of people get sick from eating raw or unsufficiently cooked beans, in which the PHA fraction is not or not fully denatured. The cause of the illness is almost always wrongly attributed to bacterial food poisoning, which has similar symptoms. PHA-L exhibited a novel quaternary structure, which was shown to be important for binding plant hormones (cytokinines).
- Arcelin-5 (1IOA): an insecticidal protein from wild bean strains. This protein is a "truncated" legume lectin, with some surprising features. The biggest surprise was the presence of a specific cis-peptide bond (a conserved feature of the legume lectin family) without the stabilisation of a neighboring metal ion binding site. This site was thought to be necessary for the stabilisation of the cis-peptide bond.
- DBL in complex with adenine (1BJQ), with the Forssman disaccharide (1LU1) and with the blood group A trisaccharide (1LU2) and DB58 (1LUL): two lectins from Dolichos biflorus. The DBL structure (see picture) led to a better understanding of the specificity of proteins that bind N-acetylated sugars, and to the discovery of a general give-and-take mechanism that lectins use to to distinguish closely similar carbohydrates. DB58 has a very peculiar quaternary structure. DBL and DB58 both bind plant hormones (cytokinines) as well, in an unusual binding site that depends on the quaternary structure.
- FRIL (1QMO): Flt3 Interacting Lectin, a lectin that keeps haematopoietic progenitors alive in vitro . FRIL forms a very complicated crosslinked lattice in the crystal, which is probably important for its unique biological activity. The FRIL structure showed for the first time how weak protein-protein interactions can become important when so-called cross-linked lectin-sugar lattices are formed. These lattices are believed to be responsible for the creation of a higher-level specificity, and are thought to be of high importance for the biological effects of lectins. For some years, Phylogix, Inc. (located in Boston) developed FRIL-based therapeutics to protect and repair tisues damaged by chemotherapy.
My PhD work was awarded the shared second place by the jury of the DSM prize for chemistry and technology 1999.
Protein architecture
And in addition...
- CV in PDF format
- Biopython logo: Made by 3D graphics designer Henrik Vestergaard.
- Static logo (JPG)
- Animated logo (AVI format) (MPEG format)
- Machine learning in bioinformatics conference, 17 Oktober 2003, Brussels. I organized this conference when I was working in the COMO lab at the Free University Brussels (VUB), Belgium. The speakers' slides are available.
- My Python for Bioinformatics/Scientific computing page.
- My favorite structure prediction-related textbooks on Amazon Listmania (UK)(US).
Publications
BibTeX format
PubMed Query
1995
- Casset, F., Hamelryck, T., Loris, R., Brisson, J., Tellier, C., Dao-Thi, M., Wyns, L., Poortmans, F., Pérez, S. & Imberty, A. (1995) NMR, molecular modeling and crystallographic studies of lentil lectin-sucrose interaction. J. Biol. Chem., 270, 25619-25628 (PDF)
1996
- Dao-Thi, M.-H., Hamelryck, T. W., Poortmans, F., Voelker, T. A., Chrispeels, M. J. & Wyns, L. (1996) Crystallization of Glycosylated and Nonglycosylated Phytohemagglutinin-L. Proteins Struct. Func. Genet., 24, 134-137 (PDF)
- Hamelryck, T.W., Dao-Thi, M., Poortmans, F., Chrispeels, M.J., Wyns, L. & Loris, R. (1996) The Crystallographic Structure of Phytohemagglutinin-L. J. Biol. Chem. , 271, 20479-20485. (PDF)
- Hamelryck, T. W., Poortmans, F., Goossens, A., Angenon, G., Van Montagu, M., Wyns, L., & Loris, R. (1996) Crystal Structure of Arcelin-5, a Lectin-like Defense Protein from Phaseolus vulgaris. J. Biol. Chem. 271, 32796-32802 (PDF)
1998
- Loris, R., Hamelryck, T., Bouckaert, J. & Wyns, L. (1998) Legume Lectin Structure. Biochem. Biophys. Acta, 1383, 9-36. (PDF)(Highly cited - see Google Scholar)
- Dao-Thi, M., Hamelryck, T.W., Bouckaert, J., Körber, F., Burkow, V., Poortmans, F., Etzler, M., Strecker, G., Wyns, L. & Loris, R. (1998) Crystallization of two related lectins from the legume plant Dolichos biflorus. Acta Cryst., D54, 1446-1449.
- Hamelryck, T.W., Loris, R., Bouckaert, J. & Wyns, L. (1998) Properties and Structure of the Legume Lectin Family. Trends Glycosci. Glycobiol. , 10, 349-404 (PDF) (You'll need Japanese fonts for this one :-)
1999
- Hamelryck, T.W., Loris, R., Bouckaert, J., Dao Thi M.-H., Strecker, G., Imberty, A., Fernandez, E., Wyns, L. & Etzler, M.E. (1999) Carbohydrate Binding, Quaternary Structure and a Novel Hydrophobic Binding Site in Two Legume Lectin Oligomers from Dolichos biflorus. J. Mol. Biol. , 286, 1161-1177 (PDF)
- Bouckaert, J., Hamelryck, T., Wyns, L., Loris, R. (1999) Novel structures of plant lectins and their complexes with carbohydrates. Curr. Opin. Struct. Biol. , 9, 572-577. (PDF)
- Bouckaert, J., Hamelryck, TW., Wyns, L., Loris, R. (1999) The crystal structures of Man(alpha1-3)Man(alpha1-O)Me and Man(alpha1-6)Man(alpha1-O)Me in complex with concanavalin A. J. Biol. Chem. 274, 29188-2995. (PDF)
2000
- Hamelryck, T.W., Moore, JG., Chrispeels, MJ., Loris, R., Wyns, L. (2000) The Role of Weak Protein-Protein Interactions in Multivalent Lectin-Carbohydrate Binding: Crystal Structure of Cross-linked FRIL. J. Mol. Biol. 299, 875-883. (PDF)
2001
- Buts, L., Dao-Thi, M., Loris, R., Wyns, L., Etzler, M., Hamelryck, T. (2001) Weak protein-protein interactions in lectins: the crystal structure of a vegetative lectin from the legume Dolichos biflorus. J. Mol. Biol. 309, 193-201. (PDF)
- Hamelryck, T.W., Kjeldgaard, M. (2001) An Open Source Multi-purpose Programming Environment for Macromolecular Crystallography. CCP4 newsletter , 39 (PS) (HTML@CCP4)
2003
- Hamelryck, T. (2003), Efficient identification of side-chain patterns using a multidimensional index tree. Proteins Struct. Func. Gen., 51, 96-108. (PDF)(Faculty of 1000 evaluation: "must read")
- Hamelryck, T., Manderick, B. (2003) PDB parser and structure class implemented in Python. Bioinformatics, 19, 2308-2310. (PDF@Bioinformatics)
2005
- Hamelryck T. (2005) An amino acid has two sides: A new 2D measure provides a different view of solvent exposure. Proteins Struct. Func. Bioinf., 59, 38-48. (PDF)
- Boomsma, W., Hamelryck, T. (2005) Full Cyclic Coordinate Descent: Solving the protein loop closure problem in Calpha space, BMC Bioinformatics, 6:159 (Abstract&PDF@BioMed)
- Won, KJ., Hamelryck, T., Prugel-Bennett, A., Krogh, A. (2005) Evolving Hidden Markov Models for Protein Secondary Structure Prediction, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, pp. 33-40, Edinburgh. (PDF)
- Kent, J.T., Hamelryck, T. (2005). Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets, pp. 57-60. Leeds, Leeds University Press. (PDF@LASR)
2006
Note that all publications in 2006 were open access!
- Boomsma, W., Kent, J.T., Mardia, K.V., Taylor, C.C. & Hamelryck, T. (2006) Graphical models and directional statistics capture protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), Interdisciplinary Statistics and Bioinformatics, pp. 91-94. Leeds, Leeds University Press. (PDF@LASR)
- Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131 (PDF@PLoS)(cover picture, large file)
- Baranov, PV., Vestergaard, B., Hamelryck, T., Gesteland, RF., Nyborg, J., Atkins , JF. (2006) Diverse bacterial genomes encode an operon of two genes, one of which is an unusual class-I release factor that potentially recognizes atypical mRNA signals other than normal stop codons. Biology Direct, 1:28 (PDF@Biology Direct)
- Paluszewski, M., Hamelryck, T. and Winter, P. Reconstructing protein structure from solvent exposure using Tabu Search. (2006) Algorithms Mol. Biol. 1:20. (PDF@AlgMolBiol).
2007
- Won, KJ., Hamelryck, T., Prugel-Bennett, A. and Krogh, A. (2007) An evolving method for learning HMM Structure: prediction of protein secondary structure. BMC Bioinformatics, 8, 357 (PDF@BMC Bioinformatics)
2008
- Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937. PDF@PNAS
- Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Krogh, A., Mardia, KV. and Hamelryck, T. (2008) PHAISTOS: protein structure prediction using a probabilistic model of local structure. Proceedings of CASP8, Cagliari, Sardinia, Italy, December 3-7 2008. pp 82-83
2009
- Hamelryck, T. (2009) Probabilistic models and machine learning in structural bioinformatics. Statistical Methods in Medical Research, Review. 18, 505-526.
- Cock, P., Antao, T., Chang, J., Chapman, B., Cox, C., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11),1422-1423.
- Frellsen, J., Moltke, I., Thiim, M., Mardia, KV., Ferkinghoff-Borg, J., Hamelryck, T. (2009) A probabilistic model of RNA conformational space. PLoS Computational Biology, 5(6), e1000406.
- Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK.
2010
- Paluszewski, M., Hamelryck, T. (2010) Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics, 11:126.
- Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306.
- Paulsen, J., Paluszewski, M., Mardia, KV., Hamelryck, T. (2010) A probabilistic model of hydrogen bond geometry in proteins. LASR 2010 - High-throughput sequencing, proteins and statistics, pp. 61-64. Leeds university press, Leeds, UK.
- Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429.
- Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated and generalized.PLoS ONE, 5(11): e13714.

