next up previous
Next: Knowledge mining and bioinformatics Up: Background Previous: Protein structure

Knowledge mining

Several advances in computer science have been brought together under the tile of "data mining" (Langley & Simon, 1995). Techniques range from simple pattern searching to advanced data visualisation and neural networks. Since our aim is to extract comprehensible and communicable scientific knowledge, our approach should be characterised as "knowledge mining" . Accordingly, the machine learning techniques under the title inductive logic programming (ILP) will be applied (see Muggleton, 1991; Muggleton & De Raedt, 1991; De Raedt, 1996). Induction is the process of obtaining general rules from example data. Logic facilitates the explicit encoding of constraints and relevant prior knowledge together with machine generated and testable hypotheses.

The system GOLEM (Muggleton & Feng, 1990) was ground breaking in demonstrating the feasibility of applying ILP to real-world problems (Bratko & Muggleton, 1995). Various restriction in GOLEM (some identified by its application to bioinformatics see below) were removed in its successor PROGOL (Muggleton, 1995). For scientific discovery, the data mainly consists of observations from the real-world (i.e. positive examples). Previously, ILP systems have required in addition negative examples - i.e. statements about the way the world is not. Recent advances in PROGOL now enable learning from positive examples only using Bayesian statistics (Muggleton, 1996) . This approach needs to be tested on a real-world problem and the complexity of protein topology provides an ideal application area.


next up previous
Next: Knowledge mining and bioinformatics Up: Background Previous: Protein structure
Marcel Turcotte
1999-10-20