Workshop at

APRIL II: Probabilistic Inductive Logic Learning and its Applications to Biology

Luc De Raedt, Francois Fages, Paolo Frasconi, Heiki Mannila, Stephen Muggleton, Mike Sternberg

A Tutorial/workshop to be held concurrently with the ECML/PKDD Conference in Berlin.

The Workshop will be held on September 22-23, 2006 in conjunction with the 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases in Berlin, Germany.

If you would like to register for the workshop or submit a poster, Please e-mail Lawrence Kelley.

Workshop Description

The APRIL II project is a consortium consisting of 5 partners from 5 EU member states with expertise in various areas of machine learning and biology. In this combined tutorial and workshop, we will explore recent work by the consortium on the theory and application of probabilistic logic representations to difficult problems in biology.

The term probabilistic in our context refers to the use of probabilistic representations and reasoning mechanisms grounded in probability theory, such as Bayesian networks, hidden Markov models and stochastic grammars. Such representations have been successfully used across a wide range of applications and have resulted in a number of robust models for reasoning about uncertainty. Application areas include genetics, bioinformatics, computer vision, speech recognition and understanding, diagnostic and troubleshooting, information retrieval, software debugging, data mining and user modelling.

The term logic in the present description of work refers to representations based on the predicate calculus (i.e., first order logic) such as those studied within the field of computational logic. The primary advantage of using such representations is that it allows one to elegantly represent complex situations involving a variety of objects as well as relations among the objects.

The term learning in the context of probabilistic logic refers to deriving the different aspects of a model in a probabilistic logic on the basis of data. Typically, one distinguishes various learning algorithms on the basis of the given data (fully or partially observable data) or on the aspect being learned (the parameters of the probabilistic representation or its logical structure). The motivation for learning is that it is often easier to obtain data for a given application domain and learn the model than to build the model using traditional knowledge engineering techniques.

So, probabilistic logic learning aims at combining its three underlying constituents: learning and probabilistic reasoning within first order logic representations.

Several biological applications of these methods are being developed by the April II consortium, including:

Protein folding: Understanding the organisation of protein fold space. Structural genomics is revealing numerous different protein folds so today there are more than 600 folds and this number will double over the next few years. Understanding the organisation of the complex arrangement of the component secondary structures and is central to understanding the relationships that result from evolutionary constraints and predicting function from structure. This knowledge is a major component of extracting functional information from protein folds, with its potential medical benefit. The complexity of the inter-relationships requires a robust formal method of learning combining probability and logic rather than ad hoc combinations derived for individual applications. With a robust learning structure, there is major scope for major computationally driven advances for both fundamental and applied research.

Metabolic pathways:As was shown in the assessment project, probabilistic logic learning is needed for properly integrating the current knowledge about the complex systems formed by biochemical processes. The application of these techniques to a real-size problem, like for example the mammalian cell cycle control, would allow one to enrich the existing models, fill their holes w.r.t. some temporal properties, and correct errors or suggest biological experiments.

Haplotype structure for gene mapping: Gene mapping, i.e., discovery of genes predisposing to diseases, is crucial for understanding the genetic background of diseases and for finding good targets for drug development. Recent advances in genetic data measurement techniques, such as the development of dense SNP marker maps, require new techniques in data analysis as well. Understanding the haplotype structure of the human genome is crucial for gene mapping. The goal of this application area is to develop probabilistic logic methods for finding the haplotype structure in human populations, and to develop techniques for comparing the haplotype information against phenotypic data. The results can be immediately applied to gene mapping.

Programme

The full programme is now available

Call for Posters and Registration

Authors are invited to submit original work in the form of a poster. Accepted posters will appear in an electronic proceedings, and authors will be asked to display their posters during the workshop.

Topics of Interest

We ask authors to submit work related to the application of probabilistic logic learning techniques to real world data. This is not restricted to biological applications, although that would be preferred.

If you would like to register for the workshop or submit a poster, Please e-mail Lawrence Kelley.

Please note you do not have to be registered with the ECML 06 in order to attend this workshop

For further details on the APRIL II project, please visit the main APRIL site.