Follow-up notes

regarding the papers
L. Kontorovich, C.Cortes and M. Mohri. Kernel Methods for Learning Languages. Theoretical Computer Science 405 (2008), pp. 223-236
and
L. Kontorovich, C.Cortes and M. Mohri. Learning Linearly Separable Languages. In ALT 2006.

  • Aug 2010: To achieve a proper bias-variance tradeoff (i.e., to avoid overfitting) instead of defining the single kernel tex in equation (10) we should have really considered a family of kernels tex, where the summation is over all shuffle ideals of length up to tex. This technique is fleshed out here.