================= =SuSPect Package= ================= Christopher M Yates Centre for Integrative Systems Biology & Bioinformatics Imperial College London c.yates11@imperial.ac.uk www.sbg.bio.ic.ac.uk/suspect BACKGROUND ---------- SuSPect (disease-SUsceptibility-based nsSNV PrEdiCTor) is a method for predicting the phenotypic impact of missense mutations. Using a support vector machine, SuSPect combines sequence and structural features with systems-level information. In a blind test, we have found SuSPect to perform significantly better than other methods. Scores range from 0 (neutral) to 100 (deleterious). SuSPect is available for use online at www.sbg.bio.ic.ac.uk/suspect This download is useful if you have large numbers of mutations and would like a local copy. Scores have been precalculated for the entire human UniProt database (2013_03 version) and stored in a SQLite database. This database can be queried directly with SQLite or using the enclosed Perl script (suspect.pl). The database also includes a table containing scores for nSNPs from dbSNP. INSTALLATION ------------ Simply untar the downloaded tarball: tar -zxvf suspect_package-vX.X.tar.gz This should give a directory, suspect_package, containing: suspect.pl Perl script. Run this to get pre-calculated scores README.txt This file. If you're reading this, you've probably already done it. data/ Various data files needed by suspect.pl, including score database If you want to use suspect.pl, change the paths at the top of the file: 1) Where you have placed the suspect_package directory, 2) The paths to the directories containing blastall/blastpgp and annovar. USAGE ----- suspect.db (in suspect_package/data/) is an SQLite database. If you wish, you can query this directly using SQLite. There are two tables, uniprot_201303 and dbSNP. =uniprot_201303= Has 23 columns: uniprot UniProt accession (6 characters) pos Position in the protein wt Wild-type amino acid at this position A-Y 20 columns, one per amino acid, with the SuSPect score for each. e.g. Q9NQG7 74 L 38 41 41 34 29 44 41 18 41 12 34 41 44 41 41 31 26 20 41 31 =dbSNP= Has 5 columns rsID The dbSNP rsID for the variant chr The chromosomal location of the SNP uniprot UniProt accession nsSNP Effect of the SNP on the protein score SuSPect score for this variant The same rsID can map to multiple entries in UniProt due to isoforms, etc. Scores are returned for all entries for which there is data. e.g. rs147060589 22:26873014A>C Q9NQG7 L74R 41 rs147060589 22:26873014A>C E5RG08 L69R 51 suspect.pl ---------- suspect.pl can score a file of variants. suspect.pl takes an input file of nsSNPs and scores them. The input file can be in one of four formats: vcf (see http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf -variant-call-format-version-4) rsID e.g. rs147060589 chr e.g. 22:26873014A>C uniprot e.g. Q9NQG7 L74R #Comments are ignored E5RG08 69 R #Wild-type amino acid is not needed To get SuSPect scores, simply run: perl suspect.pl --input data/test.txt --output test.out --input, --in, -i input file (vcf or UniProt) --output, --out, -o output file --type, -t chr/rsID/vcf/uniprot (default=uniprot) --verbose, -v --help, -h print a help message REQUIREMENTS ------------ For scoring nsSNPs using UniProt or rsID, there are no further requirements. For scoring of VCF/chr files: BLAST (blastall or blastpgp) ANNOVAR (with hg19 in humandb directory) CITATION -------- If you use SuSPect, please cite the following: Yates CM, Filippis I, Kelley LA, Sternberg MJE (2014) SuSPect: Enhanced prediction of non-synonymous single nucleotide polymorphism (nsSNP) phenotype using network features (Manuscript in preparation) HELP ---- If you need any further help, or have any comments/queries, email me at: c.yates11@imperial.ac.uk