Dr. Lawrence Kelley

Dr. Lawrence A. Kelley
Structural Bioinformatics Group
Centre for Bioinformatics
Division of Molecular Biosciences
Department of Life Sciences
Faculty of Natural Sciences
Imperial College London
London SW7 2AZ, United Kingdom
Tel +44 (0)20 7594 5776
E-mail: l.a.kelley@imperial.ac.uk

Research Interests

My primary research goal is the understanding of the relationship between the amino acid sequence of a protein and its consequent three-dimensional structure and function. To this end I have been developing computational methods for the prediction of protein structure and function from sequence.

I am the chief developer of the Phyre2 server for protein structure prediction. Phyre2 receives approximately 1,000 submissions per day and has been cited over 1,800 times. Phyre2 is the successor of my previous system known as 3D-PSSM. 3D-PSSM was one of the first widely-used web-based protein structure prediction servers and has received over 1,600 citations.

My most recent work has been on improving and developing the Phyre system by adding new functionality to analyse model quality, predict function, effects of mutations and building complexes. Phyre2.

Member of the Editorial board of Biology


A quick introduction to the basics of protein structure prediction by homology as performed by the Phyre2 server

Presentation about the ab initio folding technique (known as Poing) used in Phyre2

Poing was written by Dr. Benjamin Jefferys and is open source and available here. We applied Poing to the study of simulations of a virtual cell described in (Jefferys, Kelley & Sternberg, 2010) [pdf]

The work made the cover of the Journal of Molecular Biology, was highlighted by the Faculty of 1000 and selected for presentation in the highlights track at ISMB 2010.


Philosophy of Science

I have a continuing collaboration with Dr. Michael Scott at the University of Manchester on the philosophy of science. We recently published "The evolution of biology. A shift towards the engineering of prediction-generating tools and away from traditional research practice. Kelley LA and Scott MA. EMBO reports 9, 12, 1163-1167 (2008). [pdf]


ePlant and the 3D display initiative

Dr. Geoff Fucile and Prof. Nicholas Provart, University of Toronto on the ePlant initiative.[article](2011).

ePlant is a suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. This includes sequence homology relationships and single nucleotide polymorphism data, protein structure models (produced by Phyre), molecular interaction networks, subcellular localization and gene expression data.



For the last 3 years I have been in close collaboration with the computer artist Prof. William Latham at Goldsmiths University of London and the rest of the Mutators team. Our first piece of work involved combining William's techniques for data visualisation with the evolutionary history of a protein that diverged into two forms - one present in the lens of the eye and one present in the liver. This became the "History of the Species" film which you can watch on the right.

This work has been exhibited at several events including Siggraph '07, the Medical Research Council (MRC) in Mill Hill and as an insert center-fold poster in the Jan. 24 issue of the New Scientist.

More recently we have been developing, with the expertise of Stephen Todd and pilot work by Ben Jefferys, a system to interactively explore and visualise protein folding and protein-protein docking in real-time. This system is known as Foldsynth and I will be putting demos up soon. A litle snapshot can be seen below.


  • 2001 - present - Post-doctoral Research Fellow at Imperial College London

  • Poing

    With Dr. Ben Jefferys we developed the Poing model of protein folding and studied the effect of macromolecular crowding on simulated folding from a virtual ribosome [pdf]. See videos above for more information.


    With Dr. Mark Wass we developed the 3DLigandSite web server for the prediction of potential ligand binding sites given a protein structure - experimental or modelled. 3DLigandSite was ranked as one of the top performing methods for binding site prediction for the past 4 years at CASP. It is also tighly coupled to the new Phyre2 web server.

    Toward a map of the global proteome

    With Dr. Daniel Chubb we investigated the effect of the exponential growth in the protein sequence database on our ability to detect remote homologies over time [pdf]. A surprising result was that despite such enormous growth in sequence data, our ability to detect remote homology has reached a plateau as early as 2004 using the most widely cited method PSI-Blast. This work has received several rewards such as an invited talk and poster prize at MASAMB 2009 and a special presentation at CASP9.


    I together with Dr. Riccardo Bennet Lovesey, developed the Phyre protein structure prediction server, which is now one of the most widely used systems of its kind. Phyre is based on the application of profile-profile matching to detect remote evolutionary relationships between proteins [pdf].


    April II - Applications in probabilistic inductive logic. Worked in the large international European-funded AprilII project to apply the machine learning technique of Inductive Logic Programming combined with probability measures to protein fold classification. This work involved a collaboration with Prof. Stephen Muggleton.

    SVILP predicting binding specificity

    Applied the technique of Support Vector Inductive Logic Programming to the problem of predicting protein-ligand binding site specificity. SVILP involves learning rules using ILP. These rules then form the attributes of a feature vector representing a training or testing data example. A support vector machine is then used to learn the relationship between input examples and their classification based on these feature vectors. [pdf].


    Argumentation is an established technique for reasoning about situations where absolute truth or precise probability is impossible to determine. Together with Dr. Ben Jefferys we developed an argumentation system for 3D-PSSM to automate the application of expert knowledge in interpreting protein structure prediction results [article].

  • 1997 - 2001 - Post-doctoral Research Fellow at the Imperial Cancer Research Fund, London, UK
  • Worked in the Biomolecular Modelling Laboratory funded by Glaxo-Wellcome and the Imperial Cancer Research Fund (ICRF)(now Cancer Research UK) on protein structure prediction and the use of text.

    While at the ICRF I developed with Dr. Bob Maccallum the 3D-PSSM web server for protein structure prediction. The Critical Assessment of Structure Prediction (CASP) is a meeting held every 2 years as an international blind trial of protein structure prediction techniques. At CASP4 3D-PSSM was found to be the best-performing fully automatic method for structure prediction. In addition, our manual, human-crafted predictions were ranked 3rd out of the 100+ groups attending. 3D-PSSM has received over 1400 citations in the literature.

    Using textual annotation information - SAWTED

    There is a wealth of largely untapped information available on proteins in the form of human annotation and journal abstracts. Part of the reason for the superior performance of experts using structure prediction programs such as 3D-PSSM over the purely automatic use of such algorithms is the human expert's ability to read textual annotation and otherwise human-readable information. Myself and Dr. Bob Maccallum developed the SAWTED algorithm (Structural Annotation With TExt Description) (MacCallum et al., 1999). The purpose of SAWTED is to use the extent of shared human-assigned keywords between potentially remote homologues to lend confidence to tenuous homology assignments. The SwissProt homologues of a user's query sequence may contain keywords such as "cytochrome" or "p450". Similarly, the SwissProt homologues of a known structure may contain keywords such as "cytochrome" and "mitochondria". These terms are represented as abstract "term vectors" which can be compared using the vector cosine model of text retrieval. Despite a very weak sequence match between these two proteins using an algorithm such as PSI-Blast, the shared keyword or SAWTED score can automatically provide an independent source of evidence for genuine homology.

    Predicting sub-cellular localisation

    Dr. Ben Stapley and myself used tools from machine learning (specifically Support Vector Machines (SVMs)) in conjunction with collated Medline abstracts to represent proteins as very high-dimensional (40,000+) vectors of English language terms. SVMs are binary classifiers that permit such large spaces to be quickly and automatically partitioned given training data. Once trained such systems can be used to classify new proteins based on text alone, or combinations of text and sequence features. (Stapley et al., 2001; Stapley et al., 2002).

  • 1994 - 1997 - Ph.D. Biomolecular Computing, Leicester University, UK
  • My PhD involved the development of various methods of analysing ensembles of protein structures determined by nuclear magnetic resonance spectroscopy. I was supervised by Dr. Mike Sutcliffe.

    Designed and programmed NMRCLUST (Kelley et al., 1996)[pdf], a tool to cluster ensembles of NMR-derived protein structures into conformationally-related sub-families. This required the development of an automated method of clustering without rigid cut-offs or user intervention.

    NMRCLUST required the development of a new clustering technique, now incorporated in the R statistics package known as the Kelley-Gardner-Sutcliffe (KGS) measure. This has been applied to a wide range of fields beyond proteins, such as HIV virtual screening of drugs, measuring biodiversity, and even coffee bean morphology and taste and has over 170 citations.

    Worked for two months in the Bioinformatics group at Oxford Molecular Ltd. incorporating NMRCLUST into Architect, the IDITIS protein database generation tool.

    Designed and programmed NMRCORE (Kelley et al., 1997)[pdf], a tool to define automatically the core atoms and domains in an ensemble of protein structures.

    Developed OLDERADO:On Line Database of Ensemble Representatives And DOmains (Kelley et al., 1997)[pdf]. This is a searchable database of the results of NMRCORE and NMRCLUST on the current set of PDB-deposited NMR-derived ensembles.

  • 1990 - 1994 - B.A. Biochemistry, Christ Church, Oxford University, UK
  • Distinction, scholar, extended essay on protein structure prediction.

    Publications 2008-present

    Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling. MacDonald JT, Kelley LA, Freemont PS. PLoS ONE (2013) 8(6): e65770. doi:10.1371/journal.pone.0065770

    High-quality protein backbone reconstruction from alpha carbons using gaussian mixture models. B.L. More, L.A. Kelley, J. Barber, J.W. Murray, J.T. MacDonald J. Comput. Chem. 2013, 34, 1881–1889. DOI: 10.1002/jcc.23330

    Functional assignment of Mycobacterium tuberculosis proteome by genome-scale fold-recognition. Tuberculosis Issue 1, Volume 93, January 2013

    A new structural model of the acid-labile subunit: pathogenetic mechanisms of short stature-causing mutations. David A, Kelley LA, Sternberg MJ. J Mol Endocrinol. (2012) 49(3):213-20. doi: 10.1530/JME-12-0086

    Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucl. Acids Res.(2013) 41 (D1): D499-D507. doi: 10.1093/nar/gks1266

    Functional significance of mutations in the Snf2 domain of ATRX. Mitson M, Kelley LA, Sternberg MJE, Higgs DR and Gibbons RJ. Human Molecular Genetics (2011) doi: 10.1093/hmg/ddr163

    ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web. Fucile G, Di Biase D, Nahal H, La G, Khodabandeh S, Chen Y, Easley K, Christendat D, Kelley LA, Provart NJ PLoS ONE (2010) 6(1): e15237. doi:10.1371/journal.pone.0015237

    Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Bioinformatics (2010) 26(21):2664-71.

    3DLigandSite: Predicting ligand binding sites using similar structures. Wass M, Kelley LA and Sternberg MJE Nucleic Acids Research (2010) 38 Suppl:W469-73

    Protein Folding Requires Crowd Control in a Simulated Cell. Jefferys BR, Kelley LA and Sternberg MJE Journal of Molecular Biology (2010) Volume 397, Issue 5, 16 April 2010, Pages 1329-1338

    Review: Computational Structural Biology - Methods and Applications. Kelley LA and Sternberg MJE in Crystallography Reviews Volume 16 Issue 4, 303, 2010

    Discovering rules for protein-ligand specificity using support vector inductive logic programming. Kelley LA Shrimpton PJ Muggleton SH and Sternberg MJE Protein Engineering Design and Selection (2009); doi: 10.1093/protein/gzp035

    Protein structure prediction on the web: a case study using the Phyre server. Kelley LA and Sternberg MJE. Nature Protocols 4, 363 - 371 (2009)

    The evolution of biology. A shift towards the engineering of prediction-generating tools and away from traditional research practice. Kelley LA and Scott MA. EMBO reports 9, 12, 1163-1167 (2008).

    Using DNA to Generate 3D Organic Art Forms Latham W, Shaw M, Todd S, Fol Leymarie F, Jefferys B, Kelley LA. Lecture Notes in Computer Science, Vol 4974. Springer Berlin (2008)

    Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Bennett-Lovsey RM, Hebert AD, Sternberg MJE, Kelley LA. Proteins. vol 70, 3, 611-625. (2008)

    Using DNA to generate 3D organic art forms, W. Latham, M. Shaw, S. Todd, F.F. Leymarie, B. Jefferys, L. Kelley, EvoMUSART, Sixth European Workshop on Evolutionary and Biologically Inspired Music, Sound, Art and Design, Napoli, 2008.


    "From DNA to 3D Organic Art Forms," W. Latham, M. Shaw, S. Todd, F.F. Leymarie, L. Kelley & B. Jefferys, Sketch at SIGGRAPH, San Diego, USA, Aug. 2007.

    Multi-class prediction using stochastic logic programs. Chen J, Kelley LA, Muggleton S, Sternberg M. ILP '06 conference proceedings. S. Muggleton and R. Otero, eds., Springer LNAI, Santiago de Compostela, Spain 24-27 August, 2006 (to be published in Feb. 2007 by Springer)

    Capturing expert knowledge with argumentation: a case study in bioinformatics. Jefferys BR, Kelley LA, Sergot MJ, Fox J, Sternberg MJ. Bioinformatics 22(8):924-33. (2006)

    The proteome: structure, function and evolution. Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. Philos Trans R Soc Lond B Biol Sci. 361(1467):441-51. (2006)

    The extent and importance of intragenic recombination. de Silva E, Kelley LA, Stumpf MP. Hum Genomics. Nov;1(6):410-20. (2004)

    Predicting sub-cellular localization from text using support vector machines. Stapley, B.J., Kelley, L.A. & Sternberg, M.J.E. Pacific Symposia in Biocomputing. (2002)

    Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. (2001) Proteins. 45 Suppl 5:39-46.

    Protein Functional Classification by Text Data-Mining. Stapley, B.J., Kelley, L.A. & Sternberg, M.J.E. Proceedings of the 19th Twente Workshop on Language Theory. (2001)

    On John Allen's critique of induction. Kelley LA & Scott M (2001) Bioessays. 23(9):860-1.

    Enhanced Genome Annotation using Structural Profiles in the Program 3D-PSSM. Kelley LA, MacCallum RM & Sternberg MJE (2000). J. Mol. Biol. 299(2):499-520.

    SAWTED: Structure assignment with text description - enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. MacCallum, R. M., Kelley, L. A. & Sternberg, M. J. E. (1999). Bioinformatics. 16(2):125-9.

    CAFASP-1: Critical Assessment of Fully Automated Structure Prediction. Fischer, D., Bryson, K., Elofsson, A., Godzik, A., Jones, D., Karplus, K., Kelley, L. A., MacCallum, R. M., Pawlowski, K., Rost, B., Rychlewski, L. & Sternberg, M. J. E. (1999). Proteins Suppl. 3, 209-217.

    Recognition of remote protein homologies using three-dimensional information to generate a point specific scoring matrix in the program 3D-PSSM. Kelley, L. A., MacCallum, R. M. & Sternberg, M. J. E. (1999). In "RECOM99 - Proceedings of the third annual conference on computational biology", (ed. Istrail, S., Pevzner, P. and Waterman, M.). pp. 218-225. Association for Computing Machinery, New York

    Progress in protein structure prediction: assessment of CASP3. Sternberg, M. J. E., Bates, P. A., Kelley, L. A. & MacCallum, R. M. (1999). Curr. Opin. in Struct. Biol. 9, 368-373.

    OLDERADO: On-Line Database of Ensemble Representatives And Domains. Kelley, L.A., and Sutcliffe, M.J. (1997) Prot.Sci., 6, 2628-2630.

    An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures. Kelley, L.A., Gardner, S.P. and Sutcliffe, M.J. (1997) Protein Eng. 10, 737-741.

    An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally-related subfamilies. Kelley, L.A., Gardner, S.P. and Sutcliffe, M.J. (1996) Protein Eng. 9, 1063-1065.


    Fold Recognition. Kelley LA, in From Protein Structure to Function with Bioinformatics. Springer, Dordrecht. 2008

    Protein fold discovery using stochastic logic programs. Chen J, Kelley LA, Muggleton SH and Sternberg MJE. in Probablistic Inductive Logic Programming. Springer 2008.

    Protein Structure Prediction. Kelley LA. in Computational Genomics - Theory and Application. Horizon Bioscience 2004.