Quizz 10: Statistical Syntactic Parsing with PCFGs

This quizz covers material from the 9th lecture on Statistical Syntactic Parsing with PCFGs.
  1. Consider the tree from the Penn Treebank:
    "Had Casey thrown the ball harder, it would have reached home plate in time."
    (S (SBAR-ADV (SINV had
    		   (NP-SBJ Casey)
    		   (VP thrown
    		       (NP the ball)
    		       (ADVP-MNR harder))))
       ,
       (NP-SBJ it)
       (VP would
           (VP have
    	   (VP reached
    	       (NP home plate)
    	       (PP-TMP in
    		       (NP time))))))
    
    Write the set of productions (rules) one can derive from this tree - ignore the suffixes of the non-terminals (e.g., NP-SBJ is like NP):













  2. Consider a PCFG in Chomsky Normal Form (CNF).
    Non-terminal rules:
    X → Y1 Z1 | p1 
    X → Y2 Z2 | p2 
    ...
    X → Yn Zn | pn 
    
    Σpi = 1.0
    
    Complete the form of pre-terminal rules for a CNF PCFG:









  3. What is the generative story of a PCFG generative probabilistic model (Terminals, Non-Terminals, Productions, S)


    Start from the start symbol S




  4. Consider the CYK adaptation to parse with PCFGs in CNF format - complete the missing line:
    # p[X, i, l] = p(wi....wi+l-1 is covered by category X)
    
    Init:
    For i = 1...n:
      For X in NT(G):
        p[X, i, 1] = p(X | wi)  # These probabilities of pre-terminals 
                                # can be estimated using a POS tagger.
    
    Recursive step:
    For l = 2..n:              # Length of span
      For i = 1..n-l+1:        # Start of span
        For k = 1..l-1:        # Partition of span
          For (X → Y Z, q) in Rules(G): # q = p(Y Z | X)
    	  
            new_p = _________________________________________________________________
    
            If new_p > p[X, i, l]:
              p[X, i, l] = new_p
              back[X, i, l] = (Y, Z, k)
    



Last modified 14 Jan 2018