next up previous contents
Next: Browse the results Up: Results Previous: Parameters setup

Interesting rules

In constructing rules, Progol looks for motifs which are common to all the domains of a given fold but almost never encountered in others, except for a limited number cases which is set by a user defined threshold (noise). Features which are important for structure and/or function tend to be conserved amongst members of the same fold, at least up to the level of super-family. Hence the rules learnt by Progol should be useful in identifying those conserved motifs. In this section we review six rules which were produced automatically and present a possible biological interpretation. The complete set of rules is available on our Web site.

Rule 1 (Globin fold)   Helix A at position 1 is followed by helix B. B contains a proline residue. 

The Globin-fold is good example of divergent evolution. In SCOP, this fold comprises diverse sequences such as myoglobin, hemoglobin and phycocyanins (but not colicins). Yet the three-dimensional structure of these proteins is well preserved. One hallmark of this fold is presence of a conserved proline residue in helix B, which causes a sharp bend in the main chain. This observation has been reported previously by Bashford et al. [7], see the corresponding Web page.

Rule 2 (4-helical cytokines)   The first helix is long and followed by another helix. 

Rule 3 (4-helical cytokines)   The second strand is immediately followed by a helix. 

Often, Progol produces more than one rule to cover all the positive examples of a fold. Similarly, SCOP classification has more than one family and/or more than one super-family per fold. Thus, sometimes the mapping of the rules onto the examples matches that of SCOP. This occurs for the 4-helical cytokines, which has two families, the long-chain and short-chain cytokines. Members of the long-chain cytokines family all start with a long helix, as observed by Progol, see Rule 2. While the distinctive feature for the short-chain cytokines is the absence of a coil between the last strand-helix pair, Rule 3. Although these proteins have been classified in the same family, their sequences are quite diverged (with pairwise distances within the so-called twilight zone).

Rule 4 (lambda repressor)   Helix C at position 3 is followed by helix D. The protein is between 53 and 88 residues long. The coil between C and D is about 6 residues long. 

Cro and repressor, of all three bacteriophages, $\lambda$, 434 and P22, bind DNA in a similar way. The second helix of a helix-turn-helix motif makes sequence-specific contacts with the edge of base pairs situated in the major groove of the DNA molecule, see the corresponding Web page. The specificity for the different operator regions is the result of a surface complementarity. Specific DNA sequence, TA-rich region, allows for specific deformation which makes favorable contacts with the loop region between the second helix of the helix-turn-helix motif and the following one. Which covers the following bacteriophage domains: lambda C1 repressor (1lmb), 434 C1 repressor (2r63), cro 434 (2cro) and P22 C2 repressor (1adr), but also in the eukaryotic Oct-1 POU-specific domain (1pou). The POU-specific domain contains a helix-turn-helix motif and binds DNA in a similar way to the prokaryotic transcription factors Cro and repressor. Based on this evidence a common origin has been suggested. Here, we find that the similarity also extents to the conservation of the length of the loop between the recognition helix (C) and the one that follows (D).

Rule 5 (Rossmann fold)   Strand A at position 1 is followed by helix B. Strand C at position 6 is followed by helix D. The coil between A and B is about one residue long. 

NAD-binding domains of the Rossmann fold all have a similar binding mechanism. The adenosine is bound to the short loop between the first strand and the following helix. The region is embedded in a $\beta-\alpha-\beta$ motif which has a highly conserved and contains the sequence motif G-X-G-X-X-G. The fifth and sixth secondary structures clamp the nicotinamide moiety of NAD, see the corresponding Web page.

Rule 6 (P-loop)   Strand A at position 1 is followed by helix B. The coil between A and B is about 5 residues long. 

The first loop of the P-loop fold is necessary for the proper binding of the guanine nucleotide, see the corresponding Web page. The loop is also called diphosphate-binding loop or P-loop, and gives its name to the fold. The strand-loop-helix region contains the conserved sequence motif, G-X-X-X-X-G-K-S/T, which is used in PROSITE database [8] to characterise the ATP/GTP-binding motif.


next up previous contents
Next: Browse the results Up: Results Previous: Parameters setup
Marcel Turcotte
1999-10-20