Quizz 08: Structured Prediction

This quizz covers material from the eighth lecture on Structured Prediction and on RNNs.
  1. Explain why structured prediction is different from the greedy model of POS tagging we reviewed in previous lectures. What are we trying to optimize differently?





  2. What is the difference between a generative probabilistic model and a discriminative one? Explain which formal tool is used to specify a generative model.





  3. In a Hidden Markov Model of a tagging problem - given a sequence of words xi and the predicted labels yi, we model the joint distribution p(x,y) as:
    p(x1 ... xn, y1 ... yn+1) = Πi=1..N+1 q(yi | yi−2, yi−1) Πi=1..Ne(xi | yi)
    
    Assuming y0 = y-1 = *
    
    Write down the independence assumptions which correspond to this equation.








  4. Consider the basic Viterbi algorithm:
    Input: a sentence x1 ... xn, parameters q(s|u, v) and e(x|s).
    Definitions: Define K to be the set of possible tags. Define K−1 = K0 = {*}, and Kk = K for k = 1 ... n.
    Initialization: Set π(0, *, *) = 1.
    Algorithm:
    For k = 1 ... n,
    	For u ∈ Kk−1, v ∈ Kk,
    		π(k, u, v) = maxw ∈ Kk−2(π(k − 1, w, u) × q(v | w, u) × e(xk | v))
    Return maxu ∈ Kn−1,v ∈ Kn(π(n, u, v) × q(STOP | u, v))
    
    What is the complexity of this algorithm which tags a sequence of size n with a label set of size K?










Last modified 21 Dec 2018