# Class materials

The following textbooks will be useful for this course:

# Lectures

• 10-03-2016: We proved Markov's and Chebyshev's inequalities, and used these to conclude that the Maximum Likelihood Estimator , applied to random variables , converges in probability to the true value . (This is essentially the Weak Law of Large Numbers.) Later, we stated (but did not prove yet) the Chernoff-Hoeffding inequality, and showed how it implies the almost-sure convergence of to via the Borell-Cantelli lemma. In our proof of the Borel-Cantelli lemma, we used 2 facts. Fact 1: For non-negative such that

we have

Fact 2:

You should prove both for yourself.
• 17-03-2016: We proved Hoeffding's Inequality, which relies on Hoeffding's Lemma (which we also proved). Then we proved McDiarmid's inequality and used it to prove that

almost surely.

Next, we defined Uniform Glivenko-Cantelli and Universal UGC (UUGC) function classes. Suppose is a probability space and . Let and define and . Define also

We defined to be UGC with repsect to if

and to be UUGC if

(In both cases, convergence almost surely – and hence in probability – also follows.)

We observed that any finite is UUGC and also exhibited an infinite UUGC over : the class of indicator functions.

• 07-04-2016: We showed how to control the expected uniform deviation in terms of Rademacher averages. The lecture material is outlined here, with references to sources; see also these lecture notes. Along the way, we proved that , where is are independent -subguassian random variables. This proof also shows that ; can you see why? [The answer is in the linked note.] It is also true that for independent Gaussian random variables with variance , for some universal constant ; one may take .

• 14-04-2016: We proved the Perles-Sauer-Shelah-VC lemma. Our proof was by induction, along the same lines as here, or here (don't have the Kearns-Vazirani book?! Write to me and I'll provide a scan of the relevant section). In fact, there are at least 4 distinct proofs of this fundamental combinatorial fact; three are presented here and my favorite is via Pajor's lemma (which I unfortunately didn't have time to present in class). It is in general not easy to compute the VC-dimension of a hypothesis class exactly, but various compositional properties of the VC-dimension are also known.

In the second half of the class, I explained the connection between Universal Glivenko-Cantelli classes and PAC learnability. The material is summarized in these notes; please read them carefully.

• 05-05-2016: We showed that UUGC implies agnostic PAC learnability; this is proved in Theorem 4.1 here. Let be the Empirical Risk Minimizer of the sample and define the Excess Risk

We showed that, for fixed confidence parameter , we have

and claimed that the can be removed with (much) additional effort. For now, however, we are focusing on a matching lower bound:

meaning that for any learner and any sample size, we can construct an adversarial distribution that forces an excess risk proportional to . We started in this direction by analyzing the optimal predictor of a random coin's bias and proving Pinsker's inequality.
• 26-05-2016: We proved the agnostic PAC lower bound (actually, a lower bound on the minimax expected excess risk – which you're asked convert to a sample complexity lower bound in one of the exercises).
• 09-06-2016: We proved the agnostic PAC upper bound using the Dudley chaining integral and the Dudley-Pollard packing bound.