link

March 22, Thursday
15:30 – 17:00

Advances in Data Mining
Computer Science seminar
Lecturer : Prof. Jeffrey Ullman
Affiliation : CS, Stanford
Location : 202/37
Host : Dr. Michael Elkin
We shall talk about three topics in which there has been recent progress. First, PageRank, the key idea behind Google, has been modified to involve a "teleport set," leading to the ability to measure importance of Web pages according to their relevance to a specific topic. The concept may also lead to methods for detecting spam pages. Second, a number of problems involving massive data are solved using a pair of "hashing" techniques together: minhashing and locality-sensitive hashing. We shall explain how these work and how they are used, for example, to find similar Web pages. Finally, we look at the classic problem of frequent itemsets in market baskets ("what to people buy together?"). The a-priori algorithm can be improved by careful attention to how main memory is managed