This course is offered by Jeff Ullman. The schedule has changed; the first class is 16:30 May 14 in the Saal Auditorium.
More information about office hours, schedule, etc. will be posted before the first class. The Course Poster.
Scheduling News: the missing class will be made up on Thursday May 22, 4:30PM-8PM in the Saal Auditorium.
Instructor Email: ullman (at) gmail (dot) com
Gradiance News: Because we did not finish all I planned, I will move the deadline for the first two assignments back to May 21. But please do try to work out what you can now, especially the harder problems in the "Algorithms" set.
Regarding the discussion of the best number of hash functions to use when there are 8 times as many bits as there are members of the set S, I'm going to let you work this out. However, if we use k hash functions, the expected number of bits that will be turned to 1 is 1 - e-k/8. You want to maximize the probability that a given member of file F that is not in S will hash at least once to a bit that is not 1. What is this probability as a function of k? What value of k maximizes the probability.
| Topic | Class | Slides | PDF Versions |
|---|---|---|---|
| Introduction | May 14 | PPT | |
| A-Priori Algorithm | May 14 | PPT | |
| Hash-Based Improvements | May 14 and May 21 | PPT | |
| PageRank and Related Topics | May 21 | PPT | |
| Shingling-Minhashing-LSH | May 22 | PPT | |
| Applications of LSH | May 22 | PPT | |
| Map-Reduce | May 28 | PPT | |
| Stream-Mining 1 | May 28 | PPT | |
| Stream-Mining 2 | May 28 | PPT |
Go to The Gradiance Home Page and create an account. Then, sign up for course FA335CA1 .
There will be weekly automated homeworks posted. You should work the problems and then answer random short-answer questions about them. If you get a question wrong, you are given a hint and allowed to try again.
Note: if you get "assignment temporarily closed," wait 10 minutes. The system prevents rapid guessing.
Assignments:
| Assignment | Due at Sundown on: |
|---|---|
| Frequent Itemsets - Basics | May 28 |
| Frequent Itemsets - Algorithms | May 28 |
| PageRank | May 28 |
| Minhash-LSH | May 29 |
| Map/Reduce-Streams | June 4 (appears May 28) |