next up previous
Next: Knowledge mining Up: Project Previous: Knowledge representation

Establishment of the protein topology database

The procedure to construct the database will be: i) A non-redundant database of protein structures established by sequence identity (Hobohm & Sander, 1994) will be classified into domains using a combination of our assignment algorithm (Islam et al., 1995) and visual inspection. Secondary structure assignment will be assigned. (ii) Structural comparison using STAMP developed by Drs Russell (now at the ICRF) and Barton (Russell and Baron, 1992 ) will identify weaker homologies and these will be removed. (iii) The following topological features will be encoded: sequence; secondary structure packing geometry; loop length; chirality; local hydrophobicity; a generic description of the function (e.g. DNA binding), and the predictability of the secondary structure, e.g. by the Robson et al method (Gibrat et al., 1987). (iv) Data will be encoded as PROLOG clauses based on principles established in the TOPOL and PIPS database of Rawlings and coworkers (Rawlings et al., 1985; Clark et al., 1991).



Marcel Turcotte
1999-10-20