Quizz 07: Classification for NLP

This quizz covers material from the seventh lecture on classification for NLP.
  1. When performing feature extraction over words, list 3 types of features that are typically useful. For each one, provide an estimate of the expected number of values for the feature given a training dataset containing N tokens and V distinct words.






  2. List 3 methods that are useful to reduce the dimension of a Bag of Word feature representation for documents. For each method, explain why the simplification induced by the method is intuitively justified.









  3. Describe the TF-IDF method - why is it useful and how is it computed (intuitively - without detailed formula)










  4. Learning a Bayes classifier without independence assumptions requires an unrealistic number of training examples. Compute the number of parameters to be learned for a model Y = f(X) where Y is a boolean variable, and X is vector of N boolean features. What is the number of parameters under Naive Bayes assumptions?
    p(Y | X) = p(X | Y) p(Y) / p(X)
    


    Number of parameters for p(Y)

    Number of parameters for p(X | Y)

    Number of parameters under Naive Bayes independence assumption:


  5. Consider the task of classifying first-names as masculine vs. feminine. Explain why a Naive Bayes model that considers features (suffix[-1], suffix[-2]) (that is, 2 features for the last 2 letters in the word) behaves differently from one that considers a single feature (suffix[-12]) - that is, a single feature containing the last 2 letters.









Last modified 02 Dec 2018