Protein fold recognition from secondary structure assignments

Robert B. Russell, Richard R. Copley & Geoffrey J. Barton
From Proc. 29th Ann. Hawaii Int. Conf. Sys. Sci., 5, 302-311, 1995.

Abstract

A novel method is described for finding protein tertiary folds consistent with a set of secondary structure assignments. Given a secondary structure pattern and other restraints for the protein or protein family, all matches within a non-redundant database of known protein three-dimensional structural domains are found that are both structurally sensible and consistent with any experimental information provided. All possible matches between the query pattern and every database structure are first generated by a comparison of secondary structure strings, which accounts for likely errors in predicted secondary structure elements and likely variations between query and database structure by allowing for a user defined number of deletions of whole secondary structural elements. These matches are then passed through a series of filters to leave only those structures which are compact, have good beta sheet bonding, and allow the provided loop or turn lengths to bridge the distance between adjacent secondary structures. Matches are then filtered further by user defined restraints, based on the requirement for particular secondary structures (eg. those predicted strongly, or those having active site residues), and any distance restraints known from experiments (eg. disulphide bonds). The final list of matches provides a set of plausible topologies for the protein of unknown 3D structure, which can be inspected visually using computer graphics, or tested by experiment. To demonstrate the power of the method, a prediction for the src homology 2 (SH2) domain is used to search the database. The search reveals 13 possible topologies, one of which is a portion of the E. coli bio operon protein, which is known to adopt a structure similar to the SH2 domain. The use and further development of the method are discussed.