Quizz 07: Classification for NLP

This quizz covers material from the seventh lecture on classification for NLP.

When performing feature extraction over words, list 3 types of features that are typically useful. For each one, provide an estimate of the expected number of values for the feature given a training dataset containing N tokens and V distinct words.
List 3 methods that are useful to reduce the dimension of a Bag of Word feature representation for documents. For each method, explain why the simplification induced by the method is intuitively justified.
Describe the TF-IDF method - why is it useful and how is it computed (intuitively - without detailed formula)
Learning a Bayes classifier without independence assumptions requires an unrealistic number of training examples. Compute the number of parameters to be learned for a model Y = f(X) where Y is a boolean variable, and X is vector of N boolean features. What is the number of parameters under Naive Bayes assumptions?
```
p(Y | X) = p(X | Y) p(Y) / p(X)
```
Number of parameters for p(Y)

Number of parameters for p(X | Y)

Number of parameters under Naive Bayes independence assumption:
Consider the task of classifying first-names as masculine vs. feminine. Explain why a Naive Bayes model that considers features (suffix[-1], suffix[-2]) (that is, 2 features for the last 2 letters in the word) behaves differently from one that considers a single feature (suffix[-12]) - that is, a single feature containing the last 2 letters.

Last modified 02 Dec 2018