## Follow-up notes

regarding the papers
L. Kontorovich, C.Cortes and M. Mohri. Kernel Methods for Learning Languages. Theoretical Computer Science 405 (2008), pp. 223-236
and
L. Kontorovich, C.Cortes and M. Mohri. Learning Linearly Separable Languages. In ALT 2006.

• Aug 2010: To achieve a proper bias-variance tradeoff (i.e., to avoid overfitting) instead of defining the single kernel $tex$ in equation (10) we should have really considered a family of kernels $tex$, where the summation is over all shuffle ideals of length up to $tex$. This technique is fleshed out here.