link

October 13, Tuesday
14:00 – 16:00

Seeded Search Techniques for DNA Homology Detection and Mapping of Next Generation Sequencing Reads
Computer Science seminar
Lecturer : Gary Benson
Lecturer homepage : http://tandem.bu.edu/benson.html
Affiliation : Boston University
Location : 202/37
Host : Dr Dekel Tsur
Standard search techniques for detecting homology in DNA sequences start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models (k-mers, k-tuples, etc.) have existed for many years and are used in programs like BLAST and BLAT. Newer models include spaced seeds and indel seeds. Both of these seed models have been shown to be more sensitive than contiguous seeds while maintaining similar specificity, where sensitivity measures the ability to find true homologies, and specificity measures the ability to avoid wasting computation time on false candidates for homology. The domains of application for the seed classes differ: spaced seeds are superior under alignment models which only allow matches and mismatches, indel seeds under models which also allow insertions and deletions in the alignments. For any value k, there is only one contiguous seed of length k, but there can be many, many spaced seeds and indel seeds. Optimal seed selection is a resource intensive activity because essentially all possible seed shapes must be tested. In this talk, I describe the various seed models, show how to efficiently compute optimal seeds, and discuss an application in the context of new technologies for genome sequencing, in particular, mapping of short sequencing reads to a reference genome.