PHYRE Protein Fold Recognition Server

Subscribe to Phyre at Google Groups

Email:

Visit Phyre at Google Groups

Protein Homology/analogY Recognition Engine V 2.0

Help

When to run Intensive mode - example

This example illustrates when you might want to run an intensive job after examining the results from a normal mode. The really important thing to remember is that running an intensive job is normally a really bad idea.

Note that the top hit that Phyre2 gives details for is the same solution as hit no. 1 in the list of (up to) 20 that are actually modelled.

Normal mode result and analysis

A normal mode job for the UniProt accession P46060 (which has 587 amino acids) will give a top hit with high confidence and high sequence identity, but with low coverage:

In this particular case, the output from the normal mode job tells you that running intensive mode may be a good idea. There is additional information that supports this suggestion.

The first thing to note is that the "top hit" result has 100% confidence but only 27% coverage, indicating that there is 73% of the sequence which has not been modelled.

Immediately below that output, Phyre2 tells you that

Additional confident templates have been detected (see Domain analysis) which cover other regions of your sequence.
587 residues (100%) could be modelled at > 90% confidence using multiple-templates.
You may wish to try resubmitting your sequence in "intensive" mode to model more of your sequence.

In addition to this, if you look at the "Detailed template information", there are two quite different well-ordered models in the list of results, each of which covers a large proportion of the sequence. The first two models in the list correspond to alpha-alpha superhelices from the C-terminal domains from RanGAP1 proteins, but the third Phyre2 model is an alpha-beta solenoid (or a right-handed beta-alpha superhelix) composed of a leucine-rich repeat region.

The Phyre2 template for hit no. 1 is "d2grrb1";

the leading "d" indicates that it is based on a SCOP2 domain, and was added to the template library quite some time ago (we haven't used SCOP2 classifications to identify fragments for the library since before 2010).
"2grr" is the PDB entry 2GRR
"b" tells us the chain B from the PDB entry, and
The trailing "1" tells us this model was identified as domain 1 by SCOP2. All entries added since 2010 have an underscore "_" in this field.

Hit no. 3 corresponds to the template library entry c1k5dL_, i.e. it's based on chain L from PDB entry 1k5d.

Apart from this having a completely different fold to hits 1 and 2, note that the confidence of this model is also 100%, but the % ID is only 33%.

Checking the Alignment Coverage bar charts (to the left of the ribbon diagrams) it can be seen that the red bars for hits 1 and 2 broadly overlap with each other (~residues 431 - 587) but not with that for hit 3 (22-361).

Taken together, we would like to be able to knit the two different parts together and that is precisely what intensive mode tries to do.

There is one important caveat to be borne in mind. The template library has no information about the linkage between these two templates or about the relative orientation between them, so any model produced from intensive mode that includes both templates is very highly speculative.

Intensive mode contra-indications

The message that "You may wish to try resubmitting your sequence in "intensive" mode" is insufficient on its own to justify running intensive mode. Even when you get this message, there are several pointers that indicate that running it will not give additional useful information.

For example;

If the Coverage is high (say, > ~70%) and there are fewer than ~100 - 150 residues that have not been modelled.
The amount of the sequence that could be modelled at high confidence is only a little more than has been modelled in Normal mode (say, another 10 or 15%)
You get this message:
Warning: XX% of your sequence is predicted disordered.
Disordered regions cannot be meaningfully predicted.
The models in the "detailed template information" are not mostly made up of recognised secondary structure elements like alpha-helices or beta-strands

These are all indicators that it is very unlikely that Intensive mode will help.

Intensive mode result and analysis

If we look at the top of the output from the intensive job run using the same sequence, we can see that the ribbon diagram shows a model that has been built using the models from templates 1 and 3 from the normal mode run. The output on the right of the ribbon diagram has been replaced with some that shows how much of the sequence has been modelled by each of the two templates; in this case, 87% of the sequence has been modelled with > 90% confidence.

A note on Intensive mode and AlphaFold

Observant readers will have noticed that the linked page for UniProt accession P46060 includes a reference to the model produced by AlphaFold (at the time of writing there are ~220,000,000 AlphaFold models available). It's interesting to note that, while the two individual domains predicted by Phyre2 and Alphafold are broadly similar, their mutual orientation is somewhat different.

As noted above, the linking performed by Phyre2 intensive mode is not very reliable - but the linkage in the Alphafold model is also unreliable; the residues concerned are predicted with "very low" confidence, and the PAE matrix plot shows that the spatial relationship of the two domains is speculative.

It is likely that, while both results may be useful, neither should be taken to represent the true complete 3D structure.

Phyre is now FREE for commercial users!

All images and data generated by Phyre2 are free to use in any publication with acknowledgement

Accessibility Statement

Please cite: The Phyre2 web portal for protein modeling, prediction and analysis

Kelley LA et al. Nature Protocols 10, 845-858 (2015) [paper] [Citation link]

© Structural Bioinformatics Group, Imperial College, London
Michael Sternberg
Disclaimer
Terms and Conditions

Phyre2 is part of Genome3D