Recognition of analogous and homologous protein folds:
Analysis of sequence and structure conservation

Robert B. Russell (1)+, Mansoor A. S. Saqi (2), Roger A. Sayle (2),
Paul A. Bates (1) & Michael J. E. Sternberg (1)*

(1) Biomolecular Modelling Laboratory
Imperial Cancer Research Fund
Lincoln's Inn Fields, P.O. Box 123
London, WC2A 3PX, U.K.

(2) Bioinformatics Group
Glaxo-Wellcome Medicines Research Centre
Gunnels Wood Road
Stevenage, Herts, SG1 2NY, U.K.

+ Present Address:
SmithKline Beecham Pharmaceuticals
Research & Development
Bioinformatics
New Frontiers Science Park
Harlow, Essex, CM19 5AW, U.K.

* To whom correspondence should be addressed.

ABSTRACT

An analysis was performed on 335 pairs of structurally aligned proteins derived from the structural classification of proteins (SCOP) database. These similarities were divided into analogues, defined as proteins with similar three-dimensional structures (same SCOP fold classification) but generally with different functions and little evidence of a common ancestor (different SCOP superfamily classification). Homologues were defined as pairs of similar structures likely to be the result of evolutionary divergence (same superfamily) and were divided into remote, medium and close sub-divisions based on the % sequence identity. Particular attention was paid to the differences between analogues and remote homologues, since both types of similarities are generally undetectable by sequence comparison and their detection is the aim of fold recognition methods. Distributions of sequence identities and substitution matrices suggest a higher degree of sequence similarity in remote homologues than in analogues. Matrices for remote homologues show similarity to existing mutation matrices, providing some validity for their use in previously described fold recognition methods. In contrast, matrices derived from analogous proteins show little conservation of amino acid properties beyond broad conservation of hydrophobic or polar character. Secondary structure and accessibility were more conserved on average in remote homologues than in analogues, though there was no apparent difference in the RMS deviation between these two types of similarities. Alignments of remote homologues and analogues show a similar number of gaps, openings (one or more sequential gaps) and inserted/deleted secondary structure elements, and both generally contain more gaps/openings/omitted elements than medium and close homologues. These results suggest that gap parameters for fold recognition should be more lenient than those used in sequence comparison. Parameters were derived from the analogue and remote homologue datasets for potential used in fold recognition methods. Implications for protein fold recognition and evolution are discussed.