Supplementary material for: Muller, A., MacCallum, R.M. & Sternberg, M.J.E. (1999). Benchmarking PSI-BLAST in genome annotation, J. Mol. Biol. 293, 1257-1271

( See also enhanced Fold recognition for Proteins in Mycoplasma genitalium via 3D-PSSM )

Overview & Introduction

The recognition of remote protein homologies is a major aspect of the structural and functional annotation of newly determined genomes. Here we benchmark the coverage and error rate of genome annotation using the widely-used homology-searching program PSI-BLAST (position-specific iterated basic local alignment search tool). This study evaluates the one-to-many success rate for recognition, as often there are several homologues in the database and only one needs to be identified for annotating the sequence. In contrast, previous benchmarks considered one-to-one recognition in which a single query was required to find a particular target. The benchmark constructs a model genome from the full sequences of the structural classification of protein (SCOP) database and searches against a target library of remote homologous domains (<20% identity). The structural benchmark provides a reliable list of correct and false homology assignments. PSI-BLAST successfully annotated 40% of the domains in the model genome that had at least one homologue in the target library. This coverage is more than twice that if one-to-one recognition is evaluated (11% coverage of domains). Although a structural benchmark was used, the results apply to just sequence homology searches. Accordingly, structural and sequence assignments were made to the sequences in the genomes of Mycoplasma genitalium and Mycobacterium tuberculosis.

The web pages contain detailed structural and functional annotations for the two genomes and data files essential for the benchmarks.

If you are interested in the complete benchmark (e.g. the SCOP + NRPROT database we used), please contact Arne Muller, a.mueller@cancer.org.uk.uk .

a.mueller@cancer.org.uk
Generated: Thu Jun 27, 2002