May 25, Wednesday
14:00 – 15:30
Clustering and Approximating High-Dimensional Streaming Data using Coresets
Computer Science seminar
Lecturer : Dan Feldman
Affiliation : The Center for the Mathematics of Information, Caltech
Location : 202/37
Host : Dr. Kobbi Nissim
A coreset (or, core-set) for a given problem is a "compressed" representation of its input, in the sense that a solution for the
problem with the (small) coreset as input would yield an approximate solution to the problem with the original (large) input.
Using traditional techniques, a coreset usually implies provable linear time algorithms for the corresponding optimization problem,
which can be computed in parallel, via one pass over the data, and using only polylogarithmic space (i.e, in the streaming model). During the recent years, coresets were suggested for problems such as k-means clustering, classification, facility location, linear regression, PCA, and matrix approximation.
I will give an introduction to this new paradigm, including recent implementation in the context of computer vision.
Based on a paper with Michael Langberg (STOC'11), and a paper Nir Sochen and Micha Feigin (SSVM'11)