Natural Language Processing (201-2454101)
Class 9 - Spring 1999Yael Dahan Netzer

Parsing- Cont./ Unification

Going to a semantic representation: the CD Player example.

How to get to the meaning of the sentence, not just its parse tree?
Need to define a domain to talk about: CD Player commands:
Can play back selected songs based on their track number.
Has buttons "play" "to" "and" "without" and numbers.
Can compose sentences such as:
"Play 1 to 5 without 3"

Write a parser for the limited language.

Defining the needed objects:

Add semantics to rules and to parse trees.
Realize that trees are instances of rules: use the :include feature of defstruct.
;;;; Code from Paradigms of AI Programming
;;;; Copyright (c) 1991 Peter Norvig

(defstruct (rule (:type list)) lhs -> rhs sem)

(defstruct (tree (:type list) (:include rule) (:copier nil)
                 (:constructor new-tree (lhs sem rhs))))

- don't need a copier because already have copy-tree built-in.
- Use (new-tree a b c) instead of (make-tree :lhs a :sem b :rhs c).
  [remain consistent with previous version].

What is the semantics of an object?

Compositional semantics:

The semantics of a tree is computed from the semantics of its elements.
If the tree is built by applying rule R, and rule R has semantics S,
apply function S to the constituent of the tree to compute the semantics of the tree.
(NP -> (NP CONJ NP) infix-funcall)

"1 to 5 without 3":
   "1 to 5"  -> semantics (1 2 3 4 5)
   "without" -> semantics #'set-difference
   "3"       -> semantics (3)

Evaluate: (infix-funcall '(1 2 3 4 5) #'set-difference '(3))
and get the semantics of the new tree: (1 2 4 5).

Example of grammar:

(defparameter *grammar5*
  '((NP -> (NP CONJ NP) infix-funcall)
    (NP -> (N)          list)
    (NP -> (N P N)      infix-funcall)
    (N ->  (DIGIT)      identity)
    (P ->  to           integers)
    (CONJ -> and        union)
    (CONJ -> without    set-difference)
    (N -> 1 1) (N -> 2 2) (N -> 3 3) (N -> 4 4) (N -> 5 5)
    (N -> 6 6) (N -> 7 7) (N -> 8 8) (N -> 9 9) (N -> 0 0)))

(defun integers (start end)
  "A list of all the integers in the range [start...end] inclusive."
  (if (> start end) nil
      (cons start (integers (+ start 1) end))))

(defun infix-funcall (arg1 function arg2)
  "Apply the function to the two arguments"
  (funcall function arg1 arg2))
-Note distinction lexical rules and nonlexical rules with function semantics:
(CONJ -> and union) 
(NP -> (N P N) infix-funcall)
In lexical, the function is returned as value of semantics, in nonlexical the function is called on constituents of tree.

Supporting semantic handling in the parser:

- Only minor changes are necessary.
- Convention: semantic value nil indicates failure.
(defun parse (words)
  "Bottom-up parse, returning all parses of any prefix of words.
  This version has semantics."
  (unless (null words)
    (mapcan #'(lambda (rule)
                (extend-parse (rule-lhs rule) (rule-sem rule) ;***
                              (list (first words)) (rest words) nil))
            (lexical-rules (first words)))))

(defun extend-parse (lhs sem rhs rem needed) ;***
  "Look for the categories needed to complete the parse.
  This version has semantics."
  (if (null needed)
      ;; If nothing is needed, return this parse and upward extensions,
      ;; unless the semantics fails
      (let ((parse (make-parse :tree (new-tree lhs sem rhs) :rem rem)))
        (unless (null (apply-semantics (parse-tree parse))) ;***
          (cons parse
                (mapcan
                  #'(lambda (rule)
                      (extend-parse (rule-lhs rule) (rule-sem rule) ;***
                                    (list (parse-tree parse)) rem
                                    (rest (rule-rhs rule))))
                  (rules-starting-with lhs)))))
      ;; otherwise try to extend rightward
      (mapcan
        #'(lambda (p)
            (if (eq (parse-lhs p) (first needed))
                (extend-parse lhs sem (append1 rhs (parse-tree p)) ;***
                              (parse-rem p) (rest needed))))
        (parse rem))))

(defun apply-semantics (tree)
  "For terminal nodes, just fetch the semantics.
  Otherwise, apply the sem function to its constituents."
  (if (terminal-tree-p tree)
      (tree-sem tree)
      (setf (tree-sem tree)
            (apply (tree-sem tree)
                   (mapcar #'tree-sem (tree-rhs tree))))))

(defun terminal-tree-p (tree)
  "Does this tree have a single word on the rhs?"
  (and (length=1 (tree-rhs tree))
       (atom (first (tree-rhs tree)))))

(defun length=1 (x) 
  "Is x a list of length 1?"
  (and (consp x) (null (cdr x))))

Testing the parser

(defun meanings (words)
  "Return all possible meanings of a phrase.  Throw away the syntactic part."
  (remove-duplicates (mapcar #'tree-sem (parser words)) :test #'equal))

(meanings '(1 to 5 without 3))
;;;; -> ((1 2 4 5))

(meanings '(1 to 4 and 7 to 9))
;;;; -> ((1 2 3 4 7 8 9))

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 4 5 6) (1 2 5 6))

How to deal with ambiguity?
;;;; Grammars
(defparameter *grammar6*
  '((NP -> (NP CONJ NP) infix-funcall)
    (NP -> (N)          list)
    (NP -> (N P N)      infix-funcall)
    (N ->  (DIGIT)      identity)
    (N ->  (N DIGIT)    10*N+D)
    (P ->  to           integers)
    (CONJ -> and        union*)
    (CONJ -> without    set-diff)
    (DIGIT -> 1 1) (DIGIT -> 2 2) (DIGIT -> 3 3)
    (DIGIT -> 4 4) (DIGIT -> 5 5) (DIGIT -> 6 6)
    (DIGIT -> 7 7) (DIGIT -> 8 8) (DIGIT -> 9 9)
    (DIGIT -> 0 0)))

(defun union* (x y) (if (null (intersection x y)) (append x y)))
(defun set-diff (x y) (if (subsetp y x) (set-difference x y)))
(defun 10*N+D (N D) (+ (* 10 N) D))

- Testing the new grammar:
(use *grammar6*)
;;;; -> 18

(meanings '(1 to 6 without 3 and 4))
;;;; -> ((1 2 5 6))

(meanings '(1 and 3 to 7 and 9 without 5 and 6))
;;;; -> ((1 3 4 7 9))

(meanings '(1 and 3 to 7 and 9 without 5 and 2))
;;;; -> ((1 3 4 6 7 9 2))

(meanings '(1 9 8 to 2 0 1))
;;;; -> ((198 199 200 201))

(meanings '(1 2 3))
;;;; -> (123 (123))

- Ambiguous between list of a single number and the number itself - N
and NP - but does not hurt.

Semantic Interpretation with Preferences

First, we saw a way to limit ambiguities by using a more restrictive grammar.
Now, need to improve semantic interpreter to rank semantic interpretations.
Need to keep track of score of each tree and to indicate in rules how to compute score of a new tree:

(defstruct (rule (:type list) 
                 (:constructor rule (lhs -> rhs &optional sem score)))
  lhs -> rhs sem score)

(defstruct (tree (:type list) (:include rule) (:copier nil)
                 (:constructor new-tree (lhs sem score rhs))))
Added a new constructor rule to make sem and score optional and remain compatible with previous versions. Make sure the right constructor is being called in use:
(defun use (grammar)
  "Switch to a new grammar."
  (clear-memoize 'rules-starting-with)
  (clear-memoize 'lexical-rules)
  (length (setf *grammar* 
                (mapcar #'(lambda (r) (apply #'rule r))
                        grammar))))

Maintaining score information during parsing

(defun parse (words)
  "Bottom-up parse, returning all parses of any prefix of words.
  This version has semantics and preference scores."
  (unless (null words)
    (mapcan #'(lambda (rule)
                (extend-parse (rule-lhs rule) (rule-sem rule)
                              (rule-score rule) (list (first words)) ;***
                              (rest words) nil))
            (lexical-rules (first words)))))

(defun extend-parse (lhs sem score rhs rem needed) ;***
  "Look for the categories needed to complete the parse.
  This version has semantics and preference scores."
  (if (null needed)
      ;; If nothing is needed, return this parse and upward extensions,
      ;; unless the semantics fails
      (let ((parse (make-parse :tree (new-tree lhs sem score rhs) ;***
                               :rem rem)))
        (unless (null (apply-semantics (parse-tree parse)))
          (apply-scorer (parse-tree parse)) ;***
          (cons parse
                (mapcan
                  #'(lambda (rule)
                      (extend-parse
                        (rule-lhs rule) (rule-sem rule)
                        (rule-score rule) (list (parse-tree parse)) ;***
                        rem (rest (rule-rhs rule))))
                  (rules-starting-with lhs)))))
      ;; otherwise try to extend rightward
      (mapcan
        #'(lambda (p)
            (if (eq (parse-lhs p) (first needed))
                (extend-parse lhs sem score 
                              (append1 rhs (parse-tree p)) ;***
                              (parse-rem p) (rest needed))))
        (parse rem))))

Computing the score of trees in a compositional manner

Each Rule is viewed as a tree-building function.
Its score can either be a number (to add to the sum of the scores of the children of the tree) or a function (to apply on the children).

(defun apply-scorer (tree)
  "Compute the score for this tree."
  (let ((score (or (tree-score tree) 0)))
    (setf (tree-score tree)
          (if (terminal-tree-p tree)
              score
              ;; Add up the constituent's scores,
              ;; along with the tree's score
              (+ (sum (tree-rhs tree) #'tree-score-or-0)
`                 (if (numberp score)
                     score
                     (or (apply score (tree-rhs tree)) 0)))))))

(defun tree-score-or-0 (tree)
    (if (numberp (tree-score tree)) (tree-score tree) 0))

(defun sum (numbers &optional fn)
  "Sum the numbers, or sum (mapcar fn numbers)."
  (if fn
      (loop for x in numbers sum (funcall fn x))
      (loop for x in numbers sum x)))

A Grammar with Preference Rules

- Added coverage: post-modifiers ("shuffled" and "reversed") and "repeat".
- Allow for brackets to disambiguate input "[1 to 7] without [2 to 4]".
(defvar *grammar7*
  '((NP -> (NP CONJ NP) infix-funcall  infix-scorer)
    (NP -> (N P N)      infix-funcall  infix-scorer)
    (NP -> (N)          list)
    (NP -> ([ NP ])     arg2)
    (NP -> (NP ADJ)     rev-funcall    rev-scorer)
    (NP -> (NP OP N)    infix-funcall)
    (N  -> (D)          identity)
    (N  -> (N D)        10*N+D)
    (P  -> to           integers       prefer<)
    ([  -> [            [)
    (]  -> ]            ])
    (OP -> repeat       repeat)
    (CONJ -> and        append         prefer-disjoint)
    (CONJ -> without    set-difference prefer-subset)
    (ADJ -> reversed    reverse        inv-span)
    (ADJ -> shuffled    permute        prefer-not-singleton)
    (D -> 1 1) (D -> 2 2) (D -> 3 3) (D -> 4 4) (D -> 5 5)
    (D -> 6 6) (D -> 7 7) (D -> 8 8) (D -> 9 9) (D -> 0 0)))

The semantic functions

(defun arg2 (a1 a2 &rest a-n) (declare (ignore a1 a-n)) a2)
(defun rev-funcall (arg function) (funcall function arg))
(defun repeat (list n)
  "Append list n times."
  (if (= n 0)
      nil
      (append list (repeat list (- n 1)))))
(defun integers (start end)
  "A list of all the integers in the range [start...end] inclusive.
  This version allows start > end."
  (cond ((< start end) (cons start (integers (+ start 1) end)))
        ((> start end) (cons start (integers (- start 1) end)))
        (t (list start))))
(defun permute (bag)
  "Return a random permutation of the given input list."
  (if (null bag)
      nil
      (let ((e (random-elt bag)))
        (cons e (permute (remove e bag :count 1 :test #'eq)))))) 

The scoring functions

(defun prefer< (x y) (if (>= (sem x) (sem y)) -1))
(defun prefer-disjoint (x y) (if (intersection (sem x) (sem y)) -1))
(defun prefer-subset (x y)
  (+ (inv-span x) (if (subsetp (sem y) (sem x)) 0 -3)))
(defun prefer-not-singleton (x)
  (+ (inv-span x) (if (< (length (sem x)) 2) -4 0)))

(defun span-length (tree)
  "How many words are in tree?"
  (if (terminal-tree-p tree) 1
      (sum (tree-rhs tree) #'span-length)))
(defun inv-span (tree) (/ 1 (span-length tree)))
(defun sem (tree) (tree-sem tree))

- Composition scorer: call the right scorer when needed depending on syntax:

(defun infix-scorer (arg1 scorer arg2)
  (funcall (tree-score scorer) arg1 arg2))
(defun rev-scorer (arg scorer) (funcall (tree-score scorer) arg))

Reporting the results


(defun all-parses (words)
  (format t "~%Score  Semantics~25T~a" words)
  (format t "~%=====  =========~25T============================~%")
  (loop for tree in (sort (parser words) #'> :key #'tree-score)
    do (format t "~5,1f  ~9a~25T~a~%" (tree-score tree) (tree-sem tree)
               (bracketing tree)))
  (values))

(defun bracketing (tree)
  "Extract the terminals, bracketed with parens."
  (cond ((atom tree) tree)
        ((length=1 (tree-rhs tree))
         (bracketing (first (tree-rhs tree))))
        (t (mapcar #'bracketing (tree-rhs tree)))))

(use *grammar7*)
26

(all-parses '(1 to 6 without 3 and 4))
Score  Semantics         (1 TO 6 WITHOUT 3 AND 4)
=====  =========         ============================
  0.3  (1 2 5 6)         ((1 TO 6) WITHOUT (3 AND 4))
 -0.7  (1 2 4 5 6 4)     (((1 TO 6) WITHOUT 3) AND 4)

(all-parses '(1 and 3 to 7 and 9 without 5 and 6))
Score  Semantics         (1 AND 3 TO 7 AND 9 WITHOUT 5 AND 6)
=====  =========         ============================
  0.2  (1 3 4 7 9)       (1 AND (((3 TO 7) AND 9) WITHOUT (5 AND 6)))
  0.1  (1 3 4 7 9)       (((1 AND (3 TO 7)) AND 9) WITHOUT (5 AND 6))
  0.1  (1 3 4 7 9)       ((1 AND ((3 TO 7) AND 9)) WITHOUT (5 AND 6))
 -0.8  (1 3 4 6 7 9 6)   ((1 AND (((3 TO 7) AND 9) WITHOUT 5)) AND 6)
 -0.8  (1 3 4 6 7 9 6)   (1 AND ((((3 TO 7) AND 9) WITHOUT 5) AND 6))
 -0.9  (1 3 4 6 7 9 6)   ((((1 AND (3 TO 7)) AND 9) WITHOUT 5) AND 6)
 -0.9  (1 3 4 6 7 9 6)   (((1 AND ((3 TO 7) AND 9)) WITHOUT 5) AND 6)
 -2.0  (1 3 4 5 6 7 9)   ((1 AND (3 TO 7)) AND (9 WITHOUT (5 AND 6)))
 -2.0  (1 3 4 5 6 7 9)   (1 AND ((3 TO 7) AND (9 WITHOUT (5 AND 6))))
 -3.0  (1 3 4 5 6 7 9 6) (((1 AND (3 TO 7)) AND (9 WITHOUT 5)) AND 6)
 -3.0  (1 3 4 5 6 7 9 6) ((1 AND (3 TO 7)) AND ((9 WITHOUT 5) AND 6))
 -3.0  (1 3 4 5 6 7 9 6) ((1 AND ((3 TO 7) AND (9 WITHOUT 5))) AND 6)
 -3.0  (1 3 4 5 6 7 9 6) (1 AND (((3 TO 7) AND (9 WITHOUT 5)) AND 6))
 -3.0  (1 3 4 5 6 7 9 6) (1 AND ((3 TO 7) AND ((9 WITHOUT 5) AND 6)))

(all-parses '(1 and 3 to 7 and 9 without 5 and 2))
Score  Semantics         (1 AND 3 TO 7 AND 9 WITHOUT 5 AND 2)
=====  =========         ============================
  0.2  (1 3 4 6 7 9 2)   ((1 AND (((3 TO 7) AND 9) WITHOUT 5)) AND 2)
  0.2  (1 3 4 6 7 9 2)   (1 AND ((((3 TO 7) AND 9) WITHOUT 5) AND 2))
  0.1  (1 3 4 6 7 9 2)   ((((1 AND (3 TO 7)) AND 9) WITHOUT 5) AND 2)
  0.1  (1 3 4 6 7 9 2)   (((1 AND ((3 TO 7) AND 9)) WITHOUT 5) AND 2)
 -2.0  (1 3 4 5 6 7 9 2) (((1 AND (3 TO 7)) AND (9 WITHOUT 5)) AND 2)
 -2.0  (1 3 4 5 6 7 9 2) ((1 AND (3 TO 7)) AND ((9 WITHOUT 5) AND 2))
 -2.0  (1 3 4 5 6 7 9)   ((1 AND (3 TO 7)) AND (9 WITHOUT (5 AND 2)))
 -2.0  (1 3 4 5 6 7 9 2) ((1 AND ((3 TO 7) AND (9 WITHOUT 5))) AND 2)
 -2.0  (1 3 4 5 6 7 9 2) (1 AND (((3 TO 7) AND (9 WITHOUT 5)) AND 2))
 -2.0  (1 3 4 5 6 7 9 2) (1 AND ((3 TO 7) AND ((9 WITHOUT 5) AND 2)))
 -2.0  (1 3 4 5 6 7 9)   (1 AND ((3 TO 7) AND (9 WITHOUT (5 AND 2))))
 -2.8  (1 3 4 6 7 9)     (1 AND (((3 TO 7) AND 9) WITHOUT (5 AND 2)))
 -2.9  (1 3 4 6 7 9)     (((1 AND (3 TO 7)) AND 9) WITHOUT (5 AND 2))
 -2.9  (1 3 4 6 7 9)     ((1 AND ((3 TO 7) AND 9)) WITHOUT (5 AND 2))

Reporting the results intelligently

(defun meaning (words &optional (tie-breaker #'query-user))
  "Choose the single top-ranking meaning for the words."
  (let* ((trees (sort (parser words) #'> :key #'tree-score))
         (best-score (tree-score (first trees)))
         (best-trees (delete best-score trees
                             :key #'tree-score :test-not #'eql))
         (best-sems (delete-duplicates (mapcar #'tree-sem best-trees)
                                       :test #'equal)))
    (case (length best-sems)
      (0 (format t "~&Sorry, I didn't understand that.") nil)
      (1 (first best-sems))
      (t (funcall tie-breaker best-sems)))))

(defun query-user (choices &optional
                           (header-str "~&Please pick one:")
                           (footer-str "~&Your choice? "))
  "Ask user to make a choice."
  (format *query-io* header-str)
  (loop for choice in choices for i from 1 do
        (format *query-io* "~&~3d: ~a" i choice))
  (format *query-io* footer-str)
  (nth (- (read) 1) choices))

(meaning '(1 to 5 without 3 and 4))
(1 2 5)

(meaning '(1 to 5 without 3 and 6))
(1 2 4 5 6)

(meaning '(1 to 5 without 3 and 6 shuffled))
(6 4 1 2 5)

(meaning '([ 1 to 5 without [ 3 and 6 ] ] reversed))
(5 4 2 1)

(meaning '(1 to 5 to 9))
Sorry, I didn't understand that.
NIL

(meaning '(1 to 5 without 3 and 7 repeat 2))
Please pick one:
  1: (1 2 4 5 7 1 2 4 5 7)
  2: (1 2 4 5 7 7)
Your choice? 1
(1 2 4 5 7 1 2 4 5 7)

(all-parses '(1 to 5 without 3 and 7 repeat 2))
Score  Semantics              (1 TO 5 WITHOUT 3 AND 7 REPEAT 2)
=====  =========              ============================
  0.3  (1 2 4 5 7 1 2 4 5 7)  ((((1 TO 5) WITHOUT 3) AND 7) REPEAT 2)
  0.3  (1 2 4 5 7 7)          (((1 TO 5) WITHOUT 3) AND (7 REPEAT 2))
 -2.7  (1 2 4 5 1 2 4 5)      (((1 TO 5) WITHOUT (3 AND 7)) REPEAT 2)
 -2.7  (1 2 4 5)              ((1 TO 5) WITHOUT ((3 AND 7) REPEAT 2))
 -2.7  (1 2 4 5)              ((1 TO 5) WITHOUT (3 AND (7 REPEAT 2)))

Problem with CFGs.

More on that point:
  1. talked about equivalence classes: not very much so - for example:
    the dog runs
    the dogs run
    
  2. not all phrases are structured from adjacent constituents:
    Write everything down that she tells you
    
  3. CFG not intuitive: Missing the point of process phrase vs. subject phrase forces the distinction for v.similar classes like plural and singular nouns. Also, if this distinction is ok then should be also defined for determiners etc. In languages like Hebrew: more - adjective agrees also, definiteness, gender and number.
Linguistic Methods to solve this problem:
  1. transformational grammar
    - write everything that she tells you down  ->
    - write everything down that she tells you
    
  2. Phrase Structure based formalism.
Phrase Structure formalism: Account for all sentences by the same set of rules. Although number of equivalence classes is very large, the various subsets of the classes need to be distinguished only for some purposes: for other they can be treated as a single larger class.
Use: description structures: ((cat noun)
                              (number singular))
Description refers not only to the lists of attributes but also to the operation on them.

An Example: (from Unification, Martin Kay in Computational Linguistics and Formal Semantics, ed. M.Rosner, R.Johnson).

- meet a person you don't know how he looks like
- needs a description
- refer to some attributes according to the vast of people possibly around:
the more descriptions, the more distinctive the description is.

((height 5ft10inch)
 (beard yes)
 (suit ((color blue)
        (stripes yes))))

Structure: (attribute value) pairs.
Value can be: 1. an atom 2. a description

- describe me: that is - describe the suit.. embedded description.

A value (a v) is unique:
- unique suit - can be refered to as "the suit"

Defines a partial function on description:
1. a function: F(beard)=yes only
2. partial: F(x) for most x is not defined.

- scene of descriptions:
- suppose you were late and ask the bystander if saw anyone waiting, he
gives the description:
((age 45)
 (beard yes)
 (suit ((fabric tweed))))

- not helpful: now contradictions, only beard is common.
- if you then find out it was me, then you know a broader picture of my 
description:
((height 5ft10inch)
 (beard yes)
 (suit ((color blue)
        (stripes yes)
        (fabric tweed)))
 (age 45))

This description is a combination of the (a v) sets, eliminating repetitions.
This process is called unification.

In the case of a description like
((age 45)
 (height 6ft2ins))

Unification would fail.

To sum:

Grammar Rules

(cat s) ->  ((cat np)          ((cat vp)
	       (number ?n)         (number ?n)
               (person ?p))        (person ?p))

-FDs for "the dog runs" "the dogs run"
-add a notion of a name of an fd:
allow ((cat s)
        (subj ?subj)
        (pred ?pred)) ---> ?subj is the name of the np description, 
 			   ?pred of the vp description.

Back to the course home page

Last modified May 3, 1999 Yael Dahan Netzer