k-SLAM Web Application

k-SLAM

k-mer Sorted List Alignment and Metagenomics

Introduction

Inefficiency of sequence analysis algorithms is major bottleneck in metagenomic research. Currently, sequence alignment methods provide the most information about the composition of a sample, but methods are prohibitively slow. This inefficiency has lead to reliance on faster, but less accurate, algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments of reads against annotated genomes.

k-SLAM is a novel, ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k-mer based method, k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches. The alignments found can also be used to find variants and genes present in a mixed sample, along with their taxonomic origins. The alignment positions can be used in the estimation of per-genome coverage, helping to identify false positive species. k-SLAM uses a novel pseudo-assembly method to produce more specific taxonomic classifications on species which have high sequence homology within their genus. This provides a significant (up to 40%) increase in accuracy on these species.

Uses of k-SLAM

The primary use of k-SLAM as a taxonomic classifier for whole metagenome shotgun data and can be used to analyse both microbiome and environmental samples.

k-SLAM’s gene identification can be used to characterise novel bacterial strains by finding genes shared with known strains from the database. An specific example of this would be an analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes.

k-SLAM taxonomic classification can also be used for binning prior to metagenomic assembly or to screen for contaminants in an isolate dataset. A possible use case would be in a quality control step to identify contaminants in a single strain dataset in order to find their origin and remove from further sequencing experiments. Another application would be to screen bacterial sequence from human saliva samples prior to mapping to the human genome.

Method

k-SLAM is an alignment based metagenomic classifier that finds k-mer overlaps between reads and genomes which are then verified using a Smith-Waterman pairwise alignment.
k-mer overlaps are found by splitting each read into overlapping k-mers and the k-mers are added to a list. Each genome is split into non-overlapping k-mers (to save memory) and the k -mers are added to the same list. The list is sorted lexicographically, placing identical k -mers next to one another. The list is iterated over, finding overlaps between reads and genomes.
For each read, the best scoring alignments are selected and a lowest common ancestor method is used to infer taxonomy. Output is in the form of an XML report containing the taxonomies found and genes identified. SAM output is also possible, reporting alignments and variants.
This web server has additional output, in the form of a Krona visualisation, providing a quick overview of the per-read taxonomic breakdown of the sample.
For additional information on output formats, please see the help page.

Please cite
k-SLAM: Accurate and ultra-fast taxonomic classification and gene identification for large metagenomic datasets.
Ainsworth DJ, Sternberg MJE, Raczy C & Butcher A
Genome Biology (under review)