STRUCTURAL DATA USED FOR BENCHMARKING

The algorithm was applied to 284 chains (Table 1) based on the representative data set established by Hobohm et al (1992). We used the most restrictive set so that no two chains had a sequence similarity of greater than 25% identity for aligned subsequences of 80 or more residues. The list of chains was obtained from the fileserver (netser@embl-heidelberg.de) and the coordinates were from the October 1993 release of the Brookhaven databank (Bernstein et al., 1977). Coordinates were not available for a few chains in the Hobohm et al (1992) list. In addition, occasionally there was no suitable literature domain assignment for a chain and then this chain was substituted by a homologue if available or otherwise the chain was excluded.