next up previous
Next: Rule evaluation Up: Project Previous: Establishment of the protein

Knowledge mining

i) The PROLOG encoded protein topology database will be the data input to PROGOL. ii) Known higher level principles and tendencies of protein structure will be encoded as background knowledge with frequencies assigned from the raw data. In addition, stereochemical constraints will be encoded. iii) PROGOL (Muggleton, 1995) will be used to discover new principles and regularities of protein topology which explain the data. iv) Rules will be accepted if they have sufficient explanatory power and sensible in terms of the fundamental principles of protein chemistry. v) Based on rules from iii and iv), a hierarchy of inter-related principles will be constructed. This hierarchy will express the substructure organisation of protein topology.

The following features of PROGOL should allow us to meet these objective: (a) ability to represent and search for relationships within labelled graphs; (b) ability to maintain hierarchies of related concepts; (c) tendencies can be encoded in background knowledge and learnt hypotheses expressed using probabilistic logic programming.



Marcel Turcotte
1999-10-20