Quizz 07: Word Embeddings

This quizz covers material from the sixth lecture on Word Embeddings.

Consider the task of predicting the POS tag of a word using a model similar to the one we discussed in class predicting the language of documents. Given a tagset of dimension T, what would be the task-specific embedding of words the model would learn? What would be the dimension of the embeddings?
List three key properties of word embedding representations which distinguish them from one hot encodings?
Explain the intuition behind distributional methods - that is, why do we believe that solving the task of predicting a word given its context yields embeddings which capture lexical semantics?

Last modified 24 Dec 2017