Follow-up notes
regarding the papers
L. Kontorovich, C.Cortes and M. Mohri.
Kernel Methods for Learning Languages.
Theoretical Computer Science 405 (2008), pp. 223-236
and
L. Kontorovich, C.Cortes and M. Mohri.
Learning Linearly Separable Languages. In
ALT 2006.
- Aug 2010: To achieve a proper bias-variance tradeoff (i.e., to avoid overfitting)
instead of defining the single kernel in equation (10)
we should have really considered a family of kernels , where the summation is over all shuffle ideals of length up to .
This technique is fleshed out here.