Class materials

A recent book has come out that gives an excellent coverage of the topics of our course: Foundations of Machine Learning by Mohri, Rostamizadeh and Talwalkar.

Another common reference is The Elements of Statistical Learning, freely available online.

You are encouraged to peruse the following excellent machine learning course pages:


For background on convext optimization (relevant to the SVM material), see chapters 4 and 5 in Boyd and Vandenberghe's book (available online free of charge).

Here is a classic tutorial on Support Vector Machines by Christopher Burges.

Here are my notes on the EM algorithm.

Lectures

  1. We briefly reviewed some basic notions of probability, defined the PAC learning model, and gave the example of axis-aligned rectangles. All of this material may be found in Mohri, lectures 1-2.
  2. We spent a good portion of the lecture discussing HWK1. This brief explanation of conditional expectations might be helpful. We proved that any finite concept class $\mathcal{C}$ is PAC-learnable with sample complexity $(1/\epsilon)\log(|\mathcal{C}|/\delta)$. We gave an efficient algorithm for learning conjunctions. See Mohri Lecture 2, slides 17-28. We gave a reduction proving that a concept class for which it's hard to find consistent hypotheses is also hard to PAC-learn. See Kearns-Vazirani Chapter 1.4. Next time we'll continue with Mohri's lecture 2 and strive towards characterizing which concept classes are PAC learnable and which are not.
    • introduced agnostic PAC
    • showed that any finite concept class $\mathcal{C}$ is agnostic PAC-learnable with sample complexity $O((1/\epsilon^2)\log(|\mathcal{C}|))$
    • defined VC-dimension
    • Claimed (without proof) that a concept class is PAC learnable (in either consistent or agnostic PAC) iff it has finite VC-dimension. A proof of the consistent case may be found in Kearns-Vazirani ch. 3, and the agnostic case is proved in Mohri et al.'s book. We covered these proofs in a seminar I offered last fall, which I hope to offer again next year.
    • We started talking about linear separators. To prepare for the next lecture, it would help to review linear algebra and basic notions of convexity.
    • the Perceptron algorithm [Mohri lecture 7 slide 25 onwards]
    • the Mistake Bound model and proof that MB⇒PAC
    • support vector machines [Mohri lecture 4 slides 1-9].
    • See B+V's book for background on convex optimization, Lagrange multiplies, KKT conditions, etc. Mohri et al.'s book also has the relevant background.
  3. SVM
  4. Boosting, see also the new book by Freund and Schapire.
  5. Nearest Neighbors
  6. Regression
  7. Online learning