There are now more than 300 protein domain folds and the number is doubling every two years ( e.g. Orengo et al., 1994; Islam et al., 1995 ). The extraction of the principles governing these folds is important for several areas (e.g. see Sternberg, 1996a). i) The fundamental understanding of the principles governing protein architecture - what are the common building blocks and construction rules? ii) To identify possible relationships governing the function of a protein and its conformation - particularly important as increasingly genes are being sequences and protein structures solved (or predicted) without information about the biological role of the molecule. iii) As a key component to translate protein sequence into structure following the strategy of secondary structure prediction and subsequent identification of a common tertiary fold by threading.
Most present approaches to extract such principles concentrate on a single feature e.g. coordinates e.g. (Orengo & Taylor, 1993; Russell & Barton, 1992) or packing geometry (Grindley et al., 1993). However structural principles are often described using concepts such as chirality, strand arrangements and a hierarchical description of protein architecture (Orengo & Thornton, 1993; Sternberg, 1996b). It is appropriate therefore to encode these concepts directly into the database and to use searches algorithms that can reason with such concepts.