----------------------------------------------- Capturing expert knowledge with argumentation: a case study in bioinformatics ----------------------------------------------- Benjamin R. Jefferys(1), Lawrence A. Kelley(1), Marek J. Sergot(1), John Fox(2), Michael J. E. Sternberg(1) (1) - Imperial College, London (2) - Cancer Research UK, London ----------------------------------------------- This is the output of the 3DPSSM for the 123 searches performed for validation of the argumentation system. Each search protein sequence was taken from the database itself, therefore the first match is always trivially exactly the correct answer. Any of the remaining 19 matches with the same SCOP superfamily as the number 1 match is therefore a "correct prediction" - that is a homoologue has been found which could be a good model for the protein. Each filename is a 16-digit hexadecimal number with ".xml" on the end. The file contents are formatted as XML, output using the Perl XML::Simple XML output module, on a Perl data structure constructed from parsing the various files which make up the output of 3DPSSM. The XML format is therefore quite naive and inefficient, but easily parsed. If using Perl, XML::Simple can actually read the XML back in and reconstruct the original data structure precisely. The format should be obvious from looking at the file, however here is some explanation. Each section has a tag name as the heading, followed by the attributes it may have, then other tags it may contain. The top-level tag is . The textual order of most tags is significant, and therefore should be preserved on parsing. For more information see: http://www.sbg.bio.ic.ac.uk/~brj03/argumentation/paper/ For further help email: benjamin dot jefferys at imperial dot ac dot uk ----------------------------------------------- This is the single top-level tag ATTRIBUTES numHomologues: (unsigned integer) number of homologues which the query sequence PSSM is made from CONTAINS 20 * Multiple ----------------------------------------------- Each one of these represents a SINGLE match of a template against the query. ATTRIBUTES name: (string) name of the template, usual looks like SCOP reference - e.g. d1a77_1 rank: (unsigned integer) rank of the template match in the 3DPSSM result table. Where rank=1, the match is exactly the sequence that was used as a query. eval3D: (float) 3DPSSM E-value confidence3d: (float) 3DPSSM percentage confidence that the match is correct, derived from the E-value. numHomologues: (unsigned integer) number of homologues which the template PSSM is made from qlen: (unsigned integer) number of amino acids in query sequence tlen: (unsigned integer) number of amino acids in template (matched) sequence segOutput: (string) output from SEG low-complexity masking tool, for debugging purposes rZs, corescore, csc, normsc, lognormsc: ignore method: ignore - indication of algorithm used to find match CONTAINS tlen *