Quizz 06: Classification

This quizz covers material from the sixth lecture on Classification.
  1. Consider the feature extraction function we reviewed for POS tagging using a classifier:
    def features(sentence, index):
        """ sentence: [w1, w2, ...], index: the index of the word """
        return {
            'word': sentence[index],
            'is_first': index == 0,
            'is_last': index == len(sentence) - 1,
            'is_capitalized': sentence[index][0].upper() == sentence[index][0],
            'is_all_caps': sentence[index].upper() == sentence[index],
            'is_all_lower': sentence[index].lower() == sentence[index],
            'prefix-1': sentence[index][0],
            'prefix-2': sentence[index][:2],
            'prefix-3': sentence[index][:3],
            'suffix-1': sentence[index][-1],
            'suffix-2': sentence[index][-2:],
            'suffix-3': sentence[index][-3:],
            'prev_word': '' if index == 0 else sentence[index - 1],
            'next_word': '' if index == len(sentence) - 1 else sentence[index + 1],
            'has_hyphen': '-' in sentence[index],
            'is_numeric': sentence[index].isdigit(),
            'capitals_inside': sentence[index][1:].lower() != sentence[index][1:]
        }
    
    Indicate which of these features are of the following types:


    Lexical:
    Morphological:
    Syntactic:

  2. Why is the classifier-based approach for POS superior to the backoff method of combining multiple taggers (for example, the unigram tagger with the affix-based tagger combined in a backoff manner)?





  3. Explain the intuition behind TF*IDF word weighting for bag of words document encoding.








  4. Why is it desirable to reduce the dimensions of the bag of words representation before applying a classifier?







  5. Give three examples of dimensionality reduction used for bag of words representation









Last modified 10 Dec 2017