Natural Language Generation (201-2454101)
Notes Week 3 - Spring 1997 - Michael Elhadad
previous class main page next class

Parsing - Compositional Semantics


Bottom-up Parsing

Norvig's Common Lisp implementation of a bottom-up parser. This parser uses the following ADTs for rules, partial parses (a tree covering a prefix of a list of words) and trees:
(defstruct (rule (:type list)) lhs -> rhs)

(defstruct (parse) "A parse tree and a remainder." tree rem)

;; Trees are of the form: (lhs . rhs)
(defun new-tree (cat rhs) (cons cat rhs))
(defun tree-lhs (tree) (first tree))
(defun tree-rhs (tree) (rest tree))
The code computes all possible parses that are acceptable by the grammar. The following examples illustrate its performance:
(use *grammar3*)
;;;; -> 15

(parser '(the table))
;;;; -> ((NP (ART THE) (NOUN TABLE)))

(parser '(the ball hit the table))
;;;; -> ((SENTENCE (NP (ART THE) (NOUN BALL))
;;;;               (VP (VERB HIT)
;;;;                   (NP (ART THE) (NOUN TABLE)))))

(parser '(the noun took the verb))
;;;; -> ((SENTENCE (NP (ART THE) (NOUN NOUN))
;;;;               (VP (VERB TOOK)
;;;;                   (NP (ART THE) (NOUN VERB)))))


(use *grammar4*)
;;;; -> 54

(parser '(The man hit the table with the ball))
;;;; -> ((S (NP (D THE) (N MAN))
;;;;        (VP (VP (V HIT) (NP (D THE) (N TABLE)))
;;;;        (PP (P WITH) (NP (D THE) (N BALL)))))
;;;;     (S (NP (D THE) (N MAN))
;;;;        (VP (V HIT)
;;;;            (NP (NP (D THE) (N TABLE))
;;;;                (PP (P WITH) (NP (D THE) (N BALL)))))))

(parser '(the orange saw))
;;;; -> ((S (NP (D THE) (N ORANGE)) (VP (V SAW)))
;;;;     (NP (D THE) (A+ (A ORANGE)) (N SAW)))
This version can also recognize unknown words, by hypothesizing that they belong to an open-class pre-terminal syntactic category (N, V, A):
(parser '(John liked Mary))
;;;; -> ((S (NP (NAME JOHN))
;;;;        (VP (V LIKED) (NP (NAME MARY)))))

(parser '(Dana liked Dale))
;;;; -> ((S (NP (NAME DANA))
;;;;        (VP (V LIKED) (NP (NAME DALE)))))

(parser '(the rab zaggled the woogly quax))
;;;; -> ((S (NP (D THE) (N RAB))
;;;;        (VP (V ZAGGLED) (NP (D THE) (A+ (A WOOGLY)) (N QUAX)))))

(parser '(the slithy toves gymbled))
;;;; -> ((S (NP (D THE) (N SLITHY)) (VP (V TOVES) (NP (NAME GYMBLED))))
;;;;     (S (NP (D THE) (A+ (A SLITHY)) (N TOVES)) (VP (V GYMBLED)))
;;;;     (NP (D THE) (A+ (A SLITHY) (A+ (A TOVES))) (N GYMBLED)))

(parser '(the slithy toves gymbled on the wabe))
;;;; -> ((S (NP (D THE) (N SLITHY))
;;;;        (VP (VP (V TOVES) (NP (NAME GYMBLED)))
;;;;              (PP (P ON) (NP (D THE) (N WABE)))))
;;;;     (S (NP (D THE) (N SLITHY))
;;;;        (VP (V TOVES) (NP (NP (NAME GYMBLED))
;;;;                          (PP (P ON) (NP (D THE) (N WABE))))))
;;;;     (S (NP (D THE) (A+ (A SLITHY)) (N TOVES))
;;;;        (VP (VP (V GYMBLED)) (PP (P ON) (NP (D THE) (N WABE)))))
;;;;     (NP (NP (D THE) (A+ (A SLITHY) (A+ (A TOVES))) (N GYMBLED))
;;;;         (PP (P ON) (NP (D THE) (N WABE)))))
If the recognizer was using morphological clues (e.g., a word ending in "ed" is most likely a verb), it would do a better job.

The only change to the parser to make it accept unknown words is to the following function:

(defparameter *open-categories* '(N V A Name)
  "Categories to consider for unknown words")

(defun lexical-rules (word)
  "Return a list of rules with word on the right hand side."
  (or (find-all word *grammar* :key #'rule-rhs :test #'equal)
      (mapcar #'(lambda (cat) `(,cat -> ,word)) *open-categories*)))

Memoization

The naive version of the bottom-up parser can be very slow because it does lots of redundant work. By using the general technique of memoization (also known as tabulation), the parser is made hundreds of times more efficient.

General purpose memoization in Common Lisp.


Efficient Parsing

In the code for the parser, only the following lines are necessary to take advantage of memoization:
(memoize 'lexical-rules)
(memoize 'rules-starting-with)
(memoize 'parse :test #'eq)

(defun parser (words)
  "Return all complete parses of a list of words."
  (clear-memoize 'parse) ;*** Reinitialize the tables
  (mapcar #'parse-tree (complete-parses (parse words))))

Parsing with Semantics

Norvig's Common Lisp implementation of a bottom-up parser with semantics. The important concept is to use compositional semantic analysis: the semantic interpretation of a node in the parse tree is obtained as a function of the semantic interpretation of its daughter constituents.

This is illustrated in the following simple grammar to handle commands given to a CD-player machine of the form "Play songs 1 to 5 without 3".

(defparameter *grammar5*
  '((NP -> (NP CONJ NP) infix-funcall)
    (NP -> (N)          list)
    (NP -> (N P N)      infix-funcall)
    (N ->  (DIGIT)      identity)
    (P ->  to           integers)
    (CONJ -> and        union)
    (CONJ -> without    set-difference)
    (N -> 1 1) (N -> 2 2) (N -> 3 3) (N -> 4 4) (N -> 5 5)
    (N -> 6 6) (N -> 7 7) (N -> 8 8) (N -> 9 9) (N -> 0 0)))
Here the semantics of an NP constituent "1 to 5" is a list of integers. The semantic interpretation for complex NPs is obtained by applying the function attached to conjunctions to the semantic interpretations of the NPs they conjoin.
(use *grammar5*)
;;;; -> 17

(meanings '(1 to 5 without 3))
;;;; -> ((1 2 4 5))

(meanings '(1 to 4 and 7 to 9))
;;;; -> ((1 2 3 4 7 8 9))

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 4 5 6) (1 2 5 6))


(use *grammar6*)
;;;; -> 18

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 5 6))

(meanings '(1 and 3 to 7 and 9 without 5 and 6))
;;;; -> ((1 3 4 7 9))

(meanings '(1 and 3 to 7 and 9 without 5 and 2))
;;;; -> ((1 3 4 6 7 9 2))

(meanings '(1 9 8 to 2 0 1))
;;;; -> ((198 199 200 201))

(meanings '(1 2 3))
;;;; -> (123 (123))


Last modified March 19th, 1997 Michael Elhadad