k-SLAM

k-mer Sorted List Alignment and Metagenomics
HomeAboutTutorialContact

k-SLAM is a fast algorithm for assigning taxonomy to reads based on aligning them to a database of genomes. Output is in the form of a Krona visualisation of the taxonomic abundance of the sample
Uploading data
k-SLAM takes as input gzipped FASTQ files of whole-metagenome shotgun reads. Please download the following two files for the tutorial:
Test R1 file
Test R2 file
On the home page, please upload these two files by clicking "Choose file" and selecting them in your downloads folder.
Waiting for job to complete
When a job is submitted, a unique job id should be displayed. If you don't want to leave the web page open then take a note of the id and enter it on the homepage to check the status of the job or simply bookmark the link displayed.
Interpreting the output formats
The web server's primary output is visualised by Krona, this provides an easy to interpret per read taxonomic breakdown of the sample.
For an example output, please see here.
The visualisation breaks down the taxonomic content of the uploaded sample. Each section of the pie chart contains the per-read abundance of a particular taxon. For nodes within the visualisation, double-clicking will magnify that node so that taxons within it can be more clearly seen. For more detailed help regarding Krona visualisations please see here.
For a more detailed output, please download the gzipped results file which contains files in the following formats:
Summary XML: Each identified taxon is listed in descending order of the number of reads assigned to it. Because of the use of a Lowest Common Ancestor method, taxons may be at any rank. Each read is assigned to a maximum of one taxon (a genus entry will not contain reads that have been mapped to individual species within that genus but only reads that mapped to several species whose LCA was that genus). Each taxon has the following tags: abundance (number of reads and percentage of total reads), taxonomyID, lineage (from NCBI taxonomy), name, genes and reads. Note: Output has to be in XML format therefore any annotations of genes etc will have the characters <, > , & ,' and " replaced with the relevant entity reference.
For each taxon, the genes found are listed using the gene tag. A maxinum of one gene is inferred for each aligned read, based on its position on the genome. The "count" field describes the number of reads that overlapped with that particular gene. The protein, locus, product, GeneID, reference sequence are listed (using NCBI data) along with the cds range. The same gene may appear in multiple taxons.
Tab Separated Taxonomy: Mapped reads are listed with their LCA taxonomy. Unmapped reads are not listed.
Abbreviated Taxonomy: Each identified taxon is listed along with the percentage of reads that mapped to it.
SAM: Output in the Sequence Alignment/Map format. A Bowtie style output is used (for each read there is a primary line for each reference that it aligned to). A BWA style XA tag can be printed (using the --sam-xa parameter) instead of listing all hits. Unmapped reads are not printed except when they belong to a pair where the other read was mapped.
The following k-SLAM specific tags are used:
XS: alignment score assigned by k-SLAM, using pseudo assembly.
XO: number of hits for this segment.
XT: taxonomy ID of this reference.
XG: gene at this position in the reference.
XP: protein ID of this gene.
XR: product of this gene.
XA: BWA style alternate hits in format (chr,pos,CIGAR,NM;)*.


Please cite
k-SLAM: Accurate and ultra-fast taxonomic classification and gene identification for large metagenomic datasets.
Ainsworth DJ, Sternberg MJE, Raczy C & Butcher A
Genome Biology (under review)