Please see our recently published paper in Nature Protocols for a detailed tutorial on how to use Phyre:
Protein structure prediction on the web: a case study using the Phyre server. Kelley LA and Sternberg MJE. Nature Protocols 4, 363 - 371 (2009) [pdf]
Detailed Tutorial to hopefully come one day. For now, here is a FAQ.
How do I cite Phyre?
What do the E-values and Estimated Precision Values mean?
What does 'Query Sequence Conservation XX%' mean?
What does 'Query Sequence Evolutionary Trace' mean?
How, basically, does Phyre work?
Can I model the effect of single residue mutations?
How are the "Functional Keywords" generated?
Phyre only modelled a small portion of my sequence - why?
Protein structure prediction on the web: a case study using the Phyre server. Kelley LA and Sternberg MJE. Nature Protocols 4, 363-371 (2009)
For more detail regarding the algorithms used please see:Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre.
Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA. Proteins: Structure, Function, Bioinformatics, vol 70, 3, (2008).
For a more historical perspective, please see our paper on 3D-PSSM:
Enhanced genome annotation using structural profiles in the program 3D-PSSM
LA Kelley, RM MacCallum, MJ Sternberg - J. Mol. Biol, vol 299, pg 499-520, 2000
E-values should be considered an internal scoring scheme. We have benchmarked the system on an extensive set of known remote homologies and calculated how often a given e-value is assigned to true and false homology. This is the basis for our 'Estimated precision' calculation. A typical result will present an e-value and next to it, an estimated precision value. The estimated precision is the percentage of times a match with the given e-value was found to be a true homology. Thus 80% estimated precision says that, on our benchmark set, 80% of hits with an an e-value equal to or lower than that reported were correct homologues, and 20% were false positives.
The simple way to interpret the estimated precision is - 'The likelihood that the match is correct'. However, it does not necessarily reflect detailed accuracies/inaccuracies in the model caused by alignment errors - this is an outstanding difficult problem in protein structure prediction
Your query sequence is scanned against a non-redundant database of all currently known protein sequences using PSI-Blast. This permits us to generate a pseudo-multiple sequence alignment of all the detectable homologues of your sequence. We would like to know which positions in your sequence are more conserved than others as this may indicate that these residues are involved in function. However, relative conservation is heavily influenced by redundancy - i.e. having many similar sequences in the alignment. Thus Phyre performs a conservation calculation at different levels of redundancy. 'Query Sequence Conservation 40%' means all sequences with >40% mutual sequence identity are removed from the alignment before conservation calculation.
Red positions are most conserved, blue least conserved.
Evolutionary trace is a technique originated/popularised by (Lichtarge 1996, Mihalek 2004). This method entails ranking the relative functional importance of amino acids in a protein sequence by correlating their variations during evolution with divergences in the phylogenetic tree of that sequence family. Lichtarge et al have shown that the best-ranked residues typically cluster spatially in the protein structure (Madabushi 2002) and thereby reveal the location of functional sites (Yao 2003). This approach is similar to laboratory-based mutational scanning, but it exploits the vast number of mutations and assays that were already tested through evolution, and that are increasingly retrievable from sequence and structure databases.
The models produced by Phyre are based on finding a sequence alignment to a known structure, copying the coordinates and relabelling the residues according to your sequence (based on the alignment).
We can detect remotely homologous structures that can't be found by conventional methods. This is because we use profiles (or PSSMs) generated by PSI-Blast for both your sequence and the sequences of the known structures. Phyre performs a profile-profile matching algorithm together with predicted secondary structure matching.
The only changes Phyre makes to the backbone of the known structure (template) is when modelling insertions or deletions which is done by searching our loop library for compatible loops.
Small sequence changes will have almost no effect on the resulting model except for sidechains. This is because the same alignment to the same known structure will probably be generated. Phyre cannot model very small structural changes.
The functional keywords are found by gathering homologues to your sequence from Swissprot, taking the keywords associated with the Swissprot homologues and weighting them according to their background frequency across the whole Swissprot database.
By default, Phyre performs a local alignment of your sequence against our fold library. This means if a small portion of your sequence show significant homology to a domain in our library, often only that domain will be modelled.We will instate a global-local mode for Phyre shortly. In the meantime however, if Phyre returns a confident prediction for a subsequence of your sequence, I suggest you cut out that confidently modelled section and resubmit the two remaining portions to see if confident matches can be found for those subsequences. Lawrence Kelley
| ||News - Phyre Search - Help - Contact - Disclaimer - Example|
|© Structural Bioinformatics Group, Imperial College, London|
|Lawrence Kelley, Riccardo Bennett-Lovsey & Alex Herbert|