Parsing - Compositional Semantics


A general introduction to parsing natural language can be found in the NLTK book Chapter 7.

Bottom-up Parsing

Norvig's implementation of a bottom-up parser. (and in common lisp). This parser uses the following ADTs for rules, partial parses (a tree covering a prefix of a list of words) and trees:
;; A rule of the form (lhs --> rhs)
;; For example: (NP -> (NP VP))
(define (make-rule lhs rhs) (cons lhs rhs))
(define (rule-lhs rule)     (car rule))
(define (rule-rhs rule)     (cdr rule))

;; ====================
;; Partial parse: a parse tree covering a prefix of a sentence
;; and the remaining words of the sentence
;; ====================
(define (make-parse tree rem) (cons tree rem))
(define (parse-tree parse)    (car parse))
(define (parse-rem parse)     (cdr parse))

;; ====================
;; Trees are of the form: (lhs . rhs)
;; ====================
(define (new-tree cat rhs) (cons cat rhs))
(define (tree-lhs tree)    (car tree))
(define (tree-rhs tree)    (cdr tree))

(define (parse-lhs parse) (tree-lhs (parse-tree parse)))

The code computes all possible parses that are acceptable by the grammar. The following examples illustrate its performance:
(use *grammar3*)
;;;; -> 15

(parser '(the table))
;;;; -> ((NP (ART THE) (NOUN TABLE)))

(parser '(the ball hit the table))
;;;; -> ((SENTENCE (NP (ART THE) (NOUN BALL))
;;;;               (VP (VERB HIT)
;;;;                   (NP (ART THE) (NOUN TABLE)))))

(parser '(the noun took the verb))
;;;; -> ((SENTENCE (NP (ART THE) (NOUN NOUN))
;;;;               (VP (VERB TOOK)
;;;;                   (NP (ART THE) (NOUN VERB)))))


(use *grammar4*)
;;;; -> 54

(parser '(The man hit the table with the ball))
;;;; -> ((S (NP (D THE) (N MAN))
;;;;        (VP (VP (V HIT) (NP (D THE) (N TABLE)))
;;;;        (PP (P WITH) (NP (D THE) (N BALL)))))
;;;;     (S (NP (D THE) (N MAN))
;;;;        (VP (V HIT)
;;;;            (NP (NP (D THE) (N TABLE))
;;;;                (PP (P WITH) (NP (D THE) (N BALL)))))))

(parser '(the orange saw))
;;;; -> ((S (NP (D THE) (N ORANGE)) (VP (V SAW)))
;;;;     (NP (D THE) (A+ (A ORANGE)) (N SAW)))
This version can also recognize unknown words, by hypothesizing that they belong to an open-class pre-terminal syntactic category (N, V, A):
(parser '(John liked Mary))
;;;; -> ((S (NP (NAME JOHN))
;;;;        (VP (V LIKED) (NP (NAME MARY)))))

(parser '(Dana liked Dale))
;;;; -> ((S (NP (NAME DANA))
;;;;        (VP (V LIKED) (NP (NAME DALE)))))

(parser '(the rab zaggled the woogly quax))
;;;; -> ((S (NP (D THE) (N RAB))
;;;;        (VP (V ZAGGLED) (NP (D THE) (A+ (A WOOGLY)) (N QUAX)))))

(parser '(the slithy toves gymbled))
;;;; -> ((S (NP (D THE) (N SLITHY)) (VP (V TOVES) (NP (NAME GYMBLED))))
;;;;     (S (NP (D THE) (A+ (A SLITHY)) (N TOVES)) (VP (V GYMBLED)))
;;;;     (NP (D THE) (A+ (A SLITHY) (A+ (A TOVES))) (N GYMBLED)))

(parser '(the slithy toves gymbled on the wabe))
;;;; -> ((S (NP (D THE) (N SLITHY))
;;;;        (VP (VP (V TOVES) (NP (NAME GYMBLED)))
;;;;              (PP (P ON) (NP (D THE) (N WABE)))))
;;;;     (S (NP (D THE) (N SLITHY))
;;;;        (VP (V TOVES) (NP (NP (NAME GYMBLED))
;;;;                          (PP (P ON) (NP (D THE) (N WABE))))))
;;;;     (S (NP (D THE) (A+ (A SLITHY)) (N TOVES))
;;;;        (VP (VP (V GYMBLED)) (PP (P ON) (NP (D THE) (N WABE)))))
;;;;     (NP (NP (D THE) (A+ (A SLITHY) (A+ (A TOVES))) (N GYMBLED))
;;;;         (PP (P ON) (NP (D THE) (N WABE)))))
If the recognizer was using morphological clues (e.g., a word ending in "ed" is most likely a verb), it would do a better job.

The only change to the parser to make it accept unknown words is to the following function:

;; Categories to consider for unknown words
;; open-class parts-of-speech
(define *open-categories* '(N V A Name))
  
;; All rules that can produce word as rhs
(define (lexical-rules word)
    (let ((rules-in-grammar (find-all word *grammar* rule-rhs eq?)))
        (if (null? rules-in-grammar)
            (map (lambda (cat) (make-rule cat word)) *open-categories*)
            rules-in-grammar)))


Memoization

The naive version of the bottom-up parser can be very slow because it does lots of redundant work. By using the general technique of memoization (also known as tabulation), the parser is made hundreds of times more efficient.

General purpose memoization in Scheme (same in Common Lisp).


Efficient Parsing

In the code for the parser, only the following lines are necessary to take advantage of memoization (Common Lisp example):
(memoize 'lexical-rules)
(memoize 'rules-starting-with)
(memoize 'parse :test #'eq)

(defun parser (words)
  "Return all complete parses of a list of words."
  (clear-memoize 'parse) ;*** Reinitialize the tables
  (mapcar #'parse-tree (complete-parses (parse words))))

Parsing with Semantics

Norvig's Common Lisp implementation of a bottom-up parser with semantics. (Same in Common Lisp). The important concept is to use compositional semantic analysis: the semantic interpretation of a node in the parse tree is obtained as a function of the semantic interpretation of its daughter constituents.

This is illustrated in the following simple grammar to handle commands given to a CD-player machine of the form "Play songs 1 to 5 without 3".

(define *grammar5*
  '((NP -> (NP CONJ NP) infix-funcall)
    (NP -> (N)          list)
    (NP -> (N P N)      infix-funcall)
    (N ->  (DIGIT)      identity)
    (P ->  to           integers)
    (CONJ -> and        union)
    (CONJ -> without    set-difference)
    (N -> 1 1) (N -> 2 2) (N -> 3 3) (N -> 4 4) (N -> 5 5)
    (N -> 6 6) (N -> 7 7) (N -> 8 8) (N -> 9 9) (N -> 0 0)))
Here the semantics of an NP constituent "1 to 5" is a list of integers. The semantic interpretation for complex NPs is obtained by applying the function attached to conjunctions to the semantic interpretations of the NPs they conjoin.
(use *grammar5*)
;;;; -> 17

(meanings '(1 to 5 without 3))
;;;; -> ((1 2 4 5))

(meanings '(1 to 4 and 7 to 9))
;;;; -> ((1 2 3 4 7 8 9))

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 4 5 6) (1 2 5 6))


(use *grammar6*)
;;;; -> 18

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 5 6))

(meanings '(1 and 3 to 7 and 9 without 5 and 6))
;;;; -> ((1 3 4 7 9))

(meanings '(1 and 3 to 7 and 9 without 5 and 2))
;;;; -> ((1 3 4 6 7 9 2))

(meanings '(1 9 8 to 2 0 1))
;;;; -> ((198 199 200 201))

(meanings '(1 2 3))
;;;; -> (123 (123))


Last modified Mar 18th, 2007