Lawrence Kelley

Dr. Lawrence A. Kelley
Structural Bioinformatics Group
Centre for Bioinformatics
Division of Molecular Biosciences
Department of Life Sciences
Faculty of Natural Sciences
Imperial College London
London SW7 2AZ, United Kingdom
Tel +44 (0)20 7594 5776
E-mail: l.a.kelley@imperial.ac.uk

  • 2001-present - Post-doctoral Research Fellow at Imperial College, London, UK
    Research Interests

    Current Research

    My primary research goal is the understanding of the relationship between the amino acid sequence of a protein and its consequent three-dimensional structure and function. To this end I have been developing computational methods for the prediction of protein structure and function from sequence.

    Currently I am developing a system for the prediction of protein-ligand specificity using a new machine learning technique called Support Vector Inductive Logic Programming (SVILP).

    Fold recognition: Phyre

    Myself together with PhD students Riccardo Bennett-Lovesy, Alex Herbert and Dr. Kieran Fleming, have developed a new protein fold recognition system (Phyre) to replace and surpass the popular 3D-PSSM system that I and Dr. Bob MacCallum developed 3 years ago.

    Ab initio structure prediction using fragments.

    Following the research by the lab of David Baker, we have developed a system for ab initio protein structure prediction.

    Machine learning

    Given the overwhelming amount of empirical information from protein sequence and structure databases, and the lack of substantial progress in understanding the protein folding problem from physical first principles, it is imperative to use modern techniques from the field of artificial intelligence and data mining to develop practical solutions to this problem.

    In particular, I am interested in using Support Vector Machines, Random Forests, Graph Theory, and statistical schemes such as Monte Carlo simulation, Gibbs Sampling, Genetic algorithms and empirically derived potentials based on the Boltzmann hypothesis.

    Pointless Research Finding of the Month April 2008

    As an exercise in utter stupidity I undertook to determine the longest English word present in the current protein sequence database of 6 million sequences. The answer? WARRANTER - a person who warrants or makes a warranty.

    Well as anti-climaxes go, this ranks highly.

    Brief History

  • 1972 - Born Syracuse, New York, U.S.A

  • 1994 - B.A. Biochemistry, Christ Church, Oxford University, UK

  • 1997 - Ph.D. Biomolecular Computing, Leicester University, UK

    My PhD involved the development of various methods of analysing ensembles of protein structures determined by nuclear magnetic resonance spectroscopy. I was supervised by Dr. Mike Sutcliffe.

    Designed and programmed NMRCLUST (Kelley et al., 1996), a tool to cluster ensembles of NMR-derived protein structures into conformationally-related sub-families. This required the development of an automated method of clustering without rigid cut-offs or user intervention. Spent two months in the Bioinformatics group at Oxford Molecular Ltd. incorporating NMRCLUST into Architect, the IDITIS protein database generation tool. Designed and programmed NMRCORE (Kelley et al., 1997), a tool to define automatically the core atoms and domains in an ensemble of protein structures. Developed OLDERADO: On Line Database of Ensemble Representatives And DOmains. This is a searchable database of the results of NMRCORE and NMRCLUST on the current set of PDB-deposited NMR-derived ensembles.


  • 1997-2001 - Post-doctoral Research Fellow at the Imperial Cancer Research Fund, London, UK

    Worked in the Biomolecular Modelling Laboratory funded by Glaxo-Wellcome and the Imperial Cancer Research Fund (now Cancer Research UK) on protein structure prediction and the use of text.

    Protein Structure Prediction/Remote homology detection

    Designed and developed the fold recognition algorithm (3D-PSSM) to detect remote homology relationships between an uncharacterised protein sequence and proteins of known structure. This has been my major focus to date. The system is described in detail in (Kelley et al., 2000). Briefly, the system scores the match between a user's sequence and each structure in a representative fold library. The scoring methodology involves the use of predicted and known secondary structure, sequence profiles (PSSMs) generated by the widely-used program PSI-Blast, a solvation potential, and an extended structural profile which we have called a 3D-PSSM. These are generated by using structural superpositions of proteins known to occupy the same structural superfamily as determined by the protein experts at SCOP (Structural Classification of Proteins). The structural superposition of proteins sharing very little sequence similarity permits the construction of sequence profiles that better represent the diversity of a given superfamily and thus cover a larger area of sequence space. All these factors are scored using a modified dynamic programming algorithm.

    Designed and worked on the 3D-Crunch supercomputing project to assign folds to the then known bacterial genomes in collaboration with Dr. Manuel Peitsch.

    Took part in CASP 3 and CASP 4 This is a meeting held every 2 years as an international blind trial of protein structure prediction techniques. At CASP4 3D-PSSM was found to be the best-performing fully automatic method for structure prediction. In addition, our manual, human-crafted predictions were ranked 3rd out of the 100+ groups attending.

    Using textual annotation information

      SAWTED

    There is a wealth of largely untapped information available on proteins in the form of human annotation and journal abstracts. Part of the reason for the superior performance of experts using structure prediction programs such as 3D-PSSM over the purely automatic use of such algorithms is the human expert's ability to read textual annotation and otherwise human-readable information. Myself and Dr. Bob Maccallum developed the SAWTED algorithm (MacCallum et al., 1999). The purpose of SAWTED is to use the extent of shared human-assigned keywords between potentially remote homologues to lend confidence to tenuous homology assignments. The SwissProt homologues of a user's query sequence may contain keywords such as "cytochrome" or "p450". Similarly, the SwissProt homologues of a known structure may contain keywords such as "cytochrome" and "mitochondria". These terms are represented as abstract "term vectors" which can be compared using the vector cosine model of text retrieval. Despite a very weak sequence match between these two proteins using an algorithm such as PSI-Blast, the shared keyword or SAWTED score can automatically provide an independent source of evidence for genuine homology.

       Predicting sub-cellular localisation

    Continuing on with this idea of using textual information automatically, Dr. Ben Stapley and myself used tools from machine learning (specifically Support Vector Machines (SVMs)) in conjunction with collated Medline abstracts to represent proteins as very high-dimensional (40,000+) vectors of English language terms. SVMs are binary classifiers that permit such large spaces to be quickly and automatically partitioned given training data. Once trained such systems can be used to classify new proteins based on text alone, or combinations of text and sequence features. (Stapley et al., 2001; Stapley et al., 2002).





    Publications

    Books

    Fold Recognition. Kelley LA, in From Protein Structure to Function with Bioinformatics. Springer, Dordrecht. 2008

    Protein fold discovery using stochastic logic programs. Chen J, Kelley LA, Muggleton SH and Sternberg MJE. in Probablistic Inductive Logic Programming. Springer 2008.

    Protein Structure Prediction. Kelley LA. in Computational Genomics - Theory and Application. Horizon Bioscience 2004.


    Reviews

    Review: Computational Structural Biology - Methods and Applications. Kelley LA and Sternberg MJE in Crystallography Reviews in press. 2009


    Journal Articles and Proceedings

    3DLigandSite: Predicting ligand binding sites using similar structures. Wass M, Kelley LA and Sternberg MJE Nucleic Acids Research (2010) in press

    Protein Folding Requires Crowd Control in a Simulated Cell. Jefferys BR, Kelley LA and Sternberg MJE Journal of Molecular Biology (2010) Volume 397, Issue 5, 16 April 2010, Pages 1329-1338

    Discovering rules for protein-ligand specificity using support vector inductive logic programming. Kelley LA Shrimpton PJ Muggleton SH and Sternberg MJE Protein Engineering Design and Selection (2009); doi: 10.1093/protein/gzp035

    Protein structure prediction on the web: a case study using the Phyre server. Kelley LA and Sternberg MJE. Nature Protocols 4, 363 - 371 (2009)

    The evolution of biology. A shift towards the engineering of prediction-generating tools and away from traditional research practice. Kelley LA and Scott MA. EMBO reports 9, 12, 1163-1167 (2008).

    Using DNA to Generate 3D Organic Art Forms Latham W, Shaw M, Todd S, Fol Leymarie F, Jefferys B, Kelley LA. Lecture Notes in Computer Science, Vol 4974. Springer Berlin (2008)

    Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Bennett-Lovsey RM, Hebert AD, Sternberg MJE, Kelley LA. Proteins. vol 70, 3, 611-625. (2008)

    Using DNA to generate 3D organic art forms, W. Latham, M. Shaw, S. Todd, F.F. Leymarie, B. Jefferys, L. Kelley, EvoMUSART, Sixth European Workshop on Evolutionary and Biologically Inspired Music, Sound, Art and Design, Napoli, 2008.

    "From DNA to 3D Organic Art Forms," W. Latham, M. Shaw, S. Todd, F.F. Leymarie, L. Kelley & B. Jefferys, Sketch at SIGGRAPH, San Diego, USA, Aug. 2007.

    Multi-class prediction using stochastic logic programs. Chen J, Kelley LA, Muggleton S, Sternberg M. ILP '06 conference proceedings. S. Muggleton and R. Otero, eds., Springer LNAI, Santiago de Compostela, Spain 24-27 August, 2006 (to be published in Feb. 2007 by Springer)

    Capturing expert knowledge with argumentation: a case study in bioinformatics. Jefferys BR, Kelley LA, Sergot MJ, Fox J, Sternberg MJ. Bioinformatics 22(8):924-33. (2006)

    The proteome: structure, function and evolution. Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. Philos Trans R Soc Lond B Biol Sci. 361(1467):441-51. (2006)

    The extent and importance of intragenic recombination. de Silva E, Kelley LA, Stumpf MP. Hum Genomics. Nov;1(6):410-20. (2004)

    Predicting sub-cellular localization from text using support vector machines. Stapley, B.J., Kelley, L.A. & Sternberg, M.J.E. Pacific Symposia in Biocomputing. (2002)

    Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. (2001) Proteins. 45 Suppl 5:39-46.

    Protein Functional Classification by Text Data-Mining. Stapley, B.J., Kelley, L.A. & Sternberg, M.J.E. Proceedings of the 19th Twente Workshop on Language Theory. (2001)

    On John Allen's critique of induction. Kelley LA & Scott M (2001) Bioessays. 23(9):860-1.

    Enhanced Genome Annotation using Structural Profiles in the Program 3D-PSSM. Kelley LA, MacCallum RM & Sternberg MJE (2000). J. Mol. Biol. 299(2):499-520.

    SAWTED: Structure assignment with text description - enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. MacCallum, R. M., Kelley, L. A. & Sternberg, M. J. E. (1999). Bioinformatics.16(2):125-9.

    CAFASP-1: Critical Assessment of Fully Automated Structure Prediction. Fischer, D., Bryson, K., Elofsson, A., Godzik, A., Jones, D., Karplus, K., Kelley, L. A., MacCallum, R. M., Pawlowski, K., Rost, B., Rychlewski, L. & Sternberg, M. J. E. (1999). Proteins Suppl. 3, 209-217.

    Recognition of remote protein homologies using three-dimensional information to generate a point specific scoring matrix in the program 3D-PSSM. Kelley, L. A., MacCallum, R. M. & Sternberg, M. J. E. (1999). In "RECOM99 - Proceedings of the third annual conference on computational biology", (ed. Istrail, S., Pevzner, P. and Waterman, M.). pp. 218-225. Association for Computing Machinery, New York

    Progress in protein structure prediction: assessment of CASP3. Sternberg, M. J. E., Bates, P. A., Kelley, L. A. & MacCallum, R. M. (1999). Curr. Opin. in Struct. Biol. 9, 368-373.

    OLDERADO: On-Line Database of Ensemble Representatives And Domains. Kelley, L.A., and Sutcliffe, M.J. (1997) Prot.Sci., 6, 2628-2630.

    An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures. Kelley, L.A., Gardner, S.P. and Sutcliffe, M.J. (1997) Protein Eng. 10, 737-741.

    An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally-related subfamilies. Kelley, L.A., Gardner, S.P. and Sutcliffe, M.J. (1996) Protein Eng. 9, 1063-1065.

    Contact Information    
  • April II link