link

April 19, Tuesday
12:00 – 14:00

Edit Distance and Large Data Sets
Computer Science seminar
Lecturer : Dr. Ziv Bar-Yossef
Affiliation : Dep. of Electrical Engineering. Technion
Location : -101/58
Host : Dr. Kobbi Nisim
Edit distance is a fundamental measure of distance between strings, having applications in text processing, web search, and genomics. In some of these applications the two inputs strings are very long. In others, one needs to maintain edit distance on massive repositories of strings. The current best algorithms for edit distance are too inefficient for these tasks, even if one settles for reasonable approximations.

In this talk I will describe the first "sketching schemes" for approximating edit distance. In these schemes, each string is individually compressed into a small "sketch", and the distance between strings is estimated using the sketches alone. Sketching schemes are a main building block in nearest neighbor schemes and are related to low distortion embeddings. They are thus very useful for handling large repositories of strings.

If time permits, I will also outline a new linear time algorithm, which achieves the best known approximation factor for edit distance. This algorithm is useful for estimating edit distance between long strings.

Joint work with T.S. Jayram, Robi Krauthgamer, and Ravi Kumar.