Locating domains

If you have a sequence of more than about 500 amino acids, you can be nearly certain that it will be divided into discrete functional domains. If possible, it is preferable to split such large proteins up and consider each domain separately. You can predict the locatation of domains in a few different ways. The methods below are given (approximately) from most to least confident.

If you have separated a sequence into domains, then it is very important to repeat all the database searches and alignments using the domains separately. Searches with sequences containing several domains may not find all sub-homologies, particularly if the domains are abundent in the database (e.g. kinases, SH2 domains, etc.). There may also be "hidden" domains. For example if there is a stretch of 80 amino acids with few homologues nested in between a kinase and an SH2 domain, then you may miss matches found when searching the whole sequence against a database.

Anyway, here is my slide from the talk related to this subject:

Back to the Flowchart