Fast local fragment chaining using sum-of-pair gap costs

Research output: Contribution to journalJournal articleResearchpeer-review

Documents

Background: Fast seed-based alignment heuristics such as BLAST and BLAT have become indispensable tools in comparative genomics for all studies aiming at the evolutionary relations of proteins, genes, and non-coding RNAs.
This is true in particular for the large mammalian genomes. The sensitivity and specificity of these tools, however, crucially depend on parameters such as seed sizes or maximum expectation values. In settings that require high sensitivity the amount of short local match fragments easily becomes intractable. Then, fragment chaining is a powerful leverage to quickly connect, score, and rank the fragments to improve the specificity.
Results: Here we present a fast and flexible fragment chainer that for the first time also supports a sum-of-pair gap cost model. This model has proven to achieve a higher accuracy and sensitivity in its own field of application. Due to a highly time-efficient index structure our method outperforms the only existing tool for fragment chaining under the linear gap cost model. It can easily be applied to the output generated by alignment tools such as segemehl or BLAST. As an example we consider homology-based searches for human and mouse snoRNAs demonstrating that a highly sensitive BLAST search with subsequent chaining is an attractive option. The sum-of pair gap costs provide a substantial advantage is this context.
Conclusions: Chaining of short match fragments helps to quickly and accurately identify regions of homology that may not be found using local alignment heuristics alone. By providing both the linear and the sum-of-pair gap cost model, a wider range of application can be covered. The software clasp is available at http://www.bioinf.uni-leipzig.de/Software/clasp/.
Original languageEnglish
JournalAlgorithms for Molecular Biology
Volume6
Issue number4
ISSN1748-7188
Publication statusPublished - 2011

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 37640532