Graduate Student in Mike Sternbergs group at Cancer Research UK & Imperial College, London between 1998 - 2002.
I'm now working in Bioinformatics/Systems Toxicology at Novartis, email contact: forename.surname [AT] gmail com
In 1998 I got my Diploma in Biology (at Georg-August University of Goettingen,
Germany). My special interest were proteins and their three dimensional
structure. With some programming skills in C and some development
on RasMol (thanks Roger!) I jumped into protein structure based bioinformatics.
Why? Hmm ..., because I realized that I can be a software developer
and a biologist at the same time - and I don't have to work in a wet-lab
anymore ;-) . Anyway, after four years of bioinformatics I have to admit
things are not much easier just because there is no protein to purify or
cells to grow, so here we go!
I was working on structural genome annotation.
My latest project in Mike Sternberg's lab (2004) involved the development and maintenance of a database for structural and functional genome annotation. The generated annotation is compared across fully sequenced genomes. In particular the protein domain family compositions of genomes are compared in several different ways and contexts. The goals of the projects were:
A) General evolutionary insights, e.g. how protein families and superfamilies have evolved.
B) How domain families have been recruited and are used in a new a functional context (e.g. domain combinations, evolution of repeats within proteins, globular domains in trans membrane proteins, domains from disease genes).
C) Provide access to a broad database of structural and functional annotation.
D) Provide a research platform for projects within our lab. The database is interfaced by a high level object oriented perl API (perl is the language consensus in our group ;-), allowing for e.g. fast retrieval of pre calculated homologous sequences, alignments, and other features.
The analysis pipeline currently has a focus on protein sequences for which we perform several steps of analysis such as: Identification of transmembrane regions, coiled-coils, low complexity regions, Prosite-patterns, PFAM and SCOP domains, repeats, homologous sequences and secondary structure prediction. Structural information (fold classification) is assigned to sequences of the genomes via homology (using Blast, PSI-Blast and our in-house software 3D-PSSM).
The database is accessible via the web as 3D-GENOMICS.
My Ph.D. thesis "A protein structure based annotation of genomes" describes the above and other projects in detail and is available on the web.
I have worked on project that deals with PSI-BLAST in genome annotation. We have developed a benchmark that evaluates the performance of PSI-BLAST in terms of coverage and errors in genome annotation. Another part of the work is to identify ORFs with homologues of known structure for the genomes of Mycoplasma genitalium and Mycobacterium tuberculosis. Results and data of the project can be found found here ....
RasMol. Between 1996 and 1998 I've done some development on Roger Sayle's molecular viewer RasMol which were published as RasMol2.6b2x1 (eXtended RasMol). These changes have been taken up OpenRasMol, a project coordinated by Herbert Bernstein to integrate, maintain and develop the different derivatives of RasMol that have been around.
Software tools downloadable from this site.
For parts of my work I use PYTHON as programming language. Python is a high level object oriented scripting language. You can download the software package: a parser for BLAST/PSI-BLAST written in python (see below).
The parser reads in an output file from a BLAST/PSI-BLAST/tBlastN-run
and represents it as a data structure. You can access the individual bits
of information of the BLAST results. Please note, the parser is still not
perfect and was exclusively developed for my own needs. The README
of the package and the source code itself provides some documentation.
The software may be part of the BIOPYTHON
project and can be used under the terms of the BIOPYTHON license (also
included in the README file).
NEW in version 1.2 (major changes):
Download the blast parser:
For most of my scientific writing I use LaTeX, unfortunately the PubMed literature database at NCBI does not provide any export filters for the BibTeX format that LaTeX uses for managing a bibliography. Therefore I've written a simple web-based interface to PubMed that allows to query PubMed as if you were directly on the PubMed server; you can select articles and store them in a "shopping basket", once you think "that's about it" you can export these articles in one go in BibTeX format.
Enter TeXMed here ...
An other PubMed to BibTeX site: http://www.pmbrowser.info/
This is a sort of tutorial or review like document that is based on the introduction of my Ph.D. thesis. It coveres subjects like protein sequence databases, annotation procedures, sequence comparisons and sequence database searches, the use of protein structure in protein annotation and modelling and the sequence-structure-function relationship.
download the pdf
file (52 pages excluding references, size is 3.7 MB) or download
sources (tar.gz file, size is 2.5 MB)