| Performance |
SSAKE is a genomics application for assembling millions of very short DNA sequences.
Project Description
The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets.*Best performance is achieved by quality-trimming your reads before assembly
Summary
SSAKE is written in PERL and runs on Linux. SSAKE cycles through short sequence reads stored in a hash table and progressively searches through a prefix tree for the longest possible identical overlap between any two sequences. The algorithm was used to assemble 25-36 bp sequence reads from viral, bacterial and fungal genomes and on forty millions 25-mers simulated using the whole-genome shotgun (WGS) sequence data from the Sargasso sea metagenomics project. Considering the number of sequences to assemble, SSAKE is robust and tractable.Documentation
René L Warren, Granger G Sutton, Steven JM Jones, Robert A Holt. 2007 (epub 2006 Dec 8). Assembling millions of short DNA sequences using SSAKE. bioinformatics. 23:500-501.License
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Credits
René Warren, Granger Sutton, Steven Jones and Robert HoltSource: http://www.bcgsc.ca/platform/bioinfo/software/ssake
No comments:
Post a Comment