A Guide to Structure Prediction (version 2)

These pages were created to accompany a joint CCP11/British Biophysical Society Meeting:
Getting the Most from your Protein Sequence
which was held at the Wellcome Trust, London on the 11th of March 1996.

Preface to version 2 (September 1999)

Thanks to the hundreds of people who e-mailed me saying that they wanted this site maintained. I only had one day to update the server, so I expect that there are going to be several ommissions and problems. Please let me know if you find any, or if you have any suggestions.

The world has moved on considerably since I did the original pages. So many of the details have been removed or modified substantially to reflect the changing times. Please let me know if, in the process of doing this, I have removed anything that you may have found useful.


This is by no means intended to be a comprehensive guide to predicting protein 3D structure. Rather, I have tried as best as possible to summarise my general approach to the problem in a manner that I hope is useful and not too difficult to follow. I apologise in advance for failing to include various references, WWW sites, etc. I would strongly recommend exploring the WWW pages given here, and looking for "related sites", etc. In this way you should get a more comprehensive picture of what is available.

The assumption is that you have a sequence of a protein that you want to know more about. Before you start, remember that this approach will not always provide satisfying or complete answers. However, it is increasingly rare that the techniques described here fail to shed any light on a protein sequence. Just a little time to analyse a sequence can possibly save time and money by aiding experimental design.

I should emphasise that the title of talk for the above meeting was Secondary structure prediction and fold recognition. The contents of these pages are thus heavily biased towards these two subjects (e.g. there are no figures for most of the other sections). Mostly, however, there are links within the other sections that can give you more information about them.

A Flowchart for Structure Prediction

See the clickable flowchart to see what I think is a generalised approach to predicting protein structure. Most regions of the flowchart are described in separate sections below. Ideally, a protein sequence goes in one end and a protein structure comes out of another. Be warned that this is not always possible. Nevertheless, the other sections of the flowchart can provide useful insights into protein structure and function, and provide information that can aid experimental design.

The contents of this guide (all reachable via the flowchart) have been divided into several sections:

About the figures

Quite a few people have asked me about the figures in the pages above. Pictures of protein three-dimensional structures were drawn using Suhail Islam's program PREPI (available from this server).

The pretty alignment shown in the secondary structure prediction section was drawn using Geoff Barton's ALSCRIPT program.

I recommend them both.

More information

There are hundreds or thousands of WWW sites to get more information about tools for analysis in molecular biology. Some good starting points are:

This server also contains a relatively up-to-date list of links related to protein structure prediction and analysis, see here.

What to do for help?

Obviously many of the above involves a lot of technical detail that is impossible to put into this document. For technical help, I suggest you get in touch with someone who does this sort of work, or alternatively you can try the newsgroups, such as:

I hope this guide has helped, and I wish you the best success in your predictions. Please send me comments if you have them, particularly if you feel of other things that should be included in this document.