Natural Language Processing - Class 2

Examples for linguistic knowledge, in various levels:

(example are from http://www.cs.columbia.edu/~kathy)

  • Example 1: Morphology
    I waited for Mary to come to class. (wait + ed, past, one time)
    I wait for Mary to come to class. (present, habitual)
  • Example 2: Syntactic Knowledge -- How words can be put together to form sentences
    I wait for Mary.
    Mary waits for me.
    Does Mary wait for me?
    The woman in blue waits for me.
  • Example 3: Semantic Knowledge -- Word meaning
    I gazed at the star through my new telescope.
    I gazed at the star making her way through the crowd of reporters.

  • Example 4: Pragmatic Knowledge -- How sentences are used in different contexts
    It was a clear night and the moon had not yet risen. I gazed at the stars intently.
    I had never gotten so close to a celebrity before. I had found a place near the Academy Awards entrance. I gazed at the stars intently.

  • Example 5: Conflict between semantics and pragmatics:
    The astronomer married the star.

  • Introduction to LISP

    (This class follows Norvig's Artificial Intelligence programming book, chaps 1,2).
    Files with source: generate1.l (procedural version) and generate2.l (rule-based version).
    1. Parenthesized prefix notation: (+ 2 4)
    2. Expressions can be nested: (- (+ 2 3 4 5) (* 2 5))
    3. Evaluation rule: apply function to value of arguments.
    4. Symbolic computation: other types
      Primitive: numbers, characters, stringsa, symbols.
      Compound: lists
      (append '(a b) '(c d)) --> (a b c d)
      Quote: block the evaluation of the following expression. '(a b) --> (a b)
      (a b) --> error: a undefined function.
    5. Self-evaluating types: numbers, characters, strings.
    6. Symbols as variables - not case sensitive.

    Variables:

    1. Define new objects in terms of others + name them.
    2. (setf p '(a b)) --> (a b) p --> (a b)
    3. Symbols also used to name functions.

    Special forms:

    1. Forms not evaluated according to the evaluation rule.
    2. Special forms: defun, defparameter, setf, let, case, if, function, quote.

    Lists:

    1. Primitive aggregating type.
    2. Primitive functions: first, second, ..., length, last. append, cons, list.

    Defining new functions:

    1. (defun  () "doc" )
         (defun last-name (name)
           "Select last name from a name represented as a list."
           (first (last name)))
      
         (last-name '(john r vandamme)) --> vandamme
         (last-name '(john quirk MD))   --> MD
      
    2. Importance of abstraction. (defun first-name (name) (first name))

    Using functions:

    1. (setf names '((john x ford) (peter p rabbit) (fabianna f wayne)))
         (mapcar #'last-name names) --> (ford rabbit wayne)
      #' from name of function to function object. mapcar primitive.
    2.   (defparameter *titles* '(Mr Mrs Miss Madam Ms Sir Dr Admiral Major General))
         (defun first-name (name)
           "Select first name from a list representing a name."
           (if (member (first name) *titles*)
               (first-name (rest name))
               (first name)))
          
         (if   )
      
         (first-name '(Madam Major General Paula Jones)) --> Paula
      
    3. Trace functions

    Higher order functions

    1. Functions as first-class objects: can be manipulated, created, modified by running code.
    2. Apply: (apply #'+ '(1 2 3 4)) --> 10
    3. Funcall:
      (funcall #'+ 1 2) --> 3
                  (funcall #'+ '(1 2)) --> error (1 2) is not a number.
    4. Function constructor: lambda.
         (lambda (parameter ...) body...): non atomic name of a function.
         ((lambda (x) (+ x 2)) 4) --> 6
         (funcall #'(lambda (x) (* x 2)) 4) --> 8
      

    Other data types

    1. Strings: (length "abc") --> 3
    2. Many number types.

    Lisp evaluation rule

    1. Every expression is either a list or an atom.
    2. Every list to be evaluated is either a special form or a function application.
    3. A special form expression is a list whose first element is a special form operator and is evaluated according to the special rule of that operator.
    4. A function application is evaluated by first evaluating the arguments (the rest of the list) and then finding the function named by the first element of the list and applying it to the list of evaluated arguments.
    5. Every atom is either a symbol or a non-symbol.
    6. A symbol evaluates to the most recent value assigned to the variable.
    7. A non-symbol atom evaluates to itself.

    What makes lisp different

    1. Built-in support for Lists.
      Easy to use lists. Very versatile. Other aggregates available (arrays, hash-tables, structures).
    2. Automatic memory management.
      Garbage collection: In Pascal or C, variables allocated on the heap or on the stack. If on the heap, must be freed. If on the stack, must be simple types.
      /* Pascal */				;;; Lisp
      a * (b + c)				(* a (+ b c))
      
      With matrices:
      var temp, result: matrix;		(mult a (add b c))
      add(b, c, temp);
      mult(a, temp, result);
      return(result);
      
      Write Pascal functions which allocate new matrices on the heap: mult(a, add(b, c)) but then cannot free the new matrices!
      var a,b,c,x,y: matrix;
      x:=add(b,c);
      y:=mult(a,x);
      free(x);
      return(y);
      
    3. Dynamic typing -- polymorphic functions. Easier development but also danger of having errors undetected.
    4. First-class functions - can create functions at run time, pass them as arguments etc.
    5. Uniform syntax - simple syntax: easy to write programs that manipulate other programs to define new languages. Easy for text editors to parse Lisp. Same syntax for data and programs.
    6. Extensibility - macros - first to integrate new developments since 1958 (structured programming, rule-based, object-oriented).

    Problems:

    1. Write a function (power 3 2) = 3^2 = 9
    2. Write a function that counts the number of atoms in an expression. (count-atoms '(a (b) c)) --> 3
    3. (count-anywhere 'a '(a ((a) b) a)) --> 3
    4. (dot-product '(10 20) '(3 4)) --> 10x3 + 20x4 = 110
    5. Write a function (flatten '(a (b) () ((c)))) --> (a b c) which removes all levels of parenthesis and returns a flat list of atoms.
    6. Write a function (remove-dups '(a 1 1 a b 2 b)) --> (a 1 b 2) which removes all duplicate atoms from a flat list. (Note: there is a built-in remove-duplicates in Common Lisp, do not use it).

    First lisp program: random sentence generation

    First case study of a very simple program. Main point: design a little-language to solve your problem and write your program as an interpreter of your little-language. Example: write a little-language to describe grammar rules. Objective: generate random English sentences which are well-formed according to a mini-grammar of English. First tiny grammar (CFG):
    Sentence --> Noun-Phrase + Verb-Phrase
    Noun-Phrase --> Article + Noun
    Verb-Phrase --> Verb + Noun-Phrase
    Article --> the, a ...
    Noun --> man, ball, woman, table...
    Verb --> hit, took, saw, liked...
    

    First solution: function for each syntactic category.

    (defun sentence () (append (noun-phrase) (verb-phrase)))
    (defun noun-phrase () (append (article) (noun)))
    (defun verb-phrase () (append (verb) (noun-phrase)))
    (defun article () (one-of '(the a)))
    (defun noun () (one-of '(man ball woman table)))
    (defun verb () (one-of '(hit took saw liked)))
    
    (defun one-of (set)
      "Pick one element of set and make a list of it."
      (list (random-elt set)))
    
    (defun random-elt (choices)
      "Choose an elt from a list at random"
      (elt choices (random (length choices))))
    
    ;; (elt list index) index in [0..l-1]
    ;; (random n) --> integer between 0 and n-1
    
    > (sentence) --> (the woman hit the ball)
    > (sentence) --> (the woman hit the man)
    > (sentence) --> (the ball saw the woman)
    > (noun-phrase) --> (the man)
    > (verb-phrase) --> (liked the woman)
    
    

    A more complex grammar.

    Noun-phrase --> Article + Adj* + Noun + PP*
    Adj* --> 0, Adj + Adj*
    PP* --> 0, PP + PP*
    PP --> prep + Noun-phrase
    Adj --> big, little, blue, green
    Prep --> to,in,by,with,...
    
    (defun adj* ()
      (if (= (random 2) 0)
          nil
          (append (adj) (adj*))))
    
    (defun pp* ()
      (if (random-elt '(t nil))
          (append (pp) (pp*))
          nil))
    
    (defun noun-phrase () (append (article) (adj*) (noun) (pp*)))
    (defun pp () (append (prep) (noun-phrase)))
    (defun adj () (one-of '(big little blue green)))
    (defun prep () (one-of '(to in by with on)))
    
    ;; BAD DEFINITIONS OF ADJ*
    (defun adj* ()
      (one-of '(nil (append (adj) (adj*)))))
    (defun adj* ()
      (one-of (list nil (append (adj) (adj*)))))
    
    

    A rule based solution

    Express knowledge declaratively, and write an interpreter for this knowledge.
    ;;; (NORVIG Chap 2)
    ;;; ==============================
    
    (defparameter *simple-grammar*
      '((sentence -> (noun-phrase verb-phrase))
        (noun-phrase -> (Article Noun))
        (verb-phrase -> (Verb noun-phrase))
        (Article -> the a)
        (Noun -> man ball woman table)
        (Verb -> hit took saw liked))
      "A grammar for a trivial subset of English.")
    
    (defvar *grammar* *simple-grammar*
      "The grammar used by generate.  Initially, this is
      *simple-grammar*, but we can switch to other grammers.")
    
    ;;; ==============================
    
    ;;; Define how to manipulate this data
    ;;; Accessors for the Knowledge base.
    
    (defun rule-lhs (rule)
      "The left hand side of a rule."
      (first rule))
    
    (defun rule-rhs (rule)
      "The right hand side of a rule."
      (rest (rest rule)))
    
    (defun rewrites (category)
      "Return a list of the possible rewrites for this category."
      (rule-rhs (assoc category *grammar*)))
    
    
    ;;; ==============================
    ;;; Use the knowledge base to solve the problem:
    ;;; 
    
    ;;; That's all it takes!
    (defun generate (phrase)
      "Generate a random sentence or phrase"
      (cond ((listp phrase)
             (mappend #'generate phrase))
            ((rewrites phrase)
             (generate (random-elt (rewrites phrase))))
            (t (list phrase))))
    
    
    ;; A function to call fn on each element of the-list and return the append
    ;; of the results (note that mapcar returns the list of the results).
    (defun mappend (fn the-list)
      "Apply fn to each element of list and append the results."
      (if (null the-list)
          nil
          (append (funcall fn (first the-list))
                  (mappend fn (rest the-list)))))
    
    
    ;;; ==============================
    ;;; Reuse the same code on a different knowledge base.
    
    (defparameter *bigger-grammar*
      '((sentence -> (noun-phrase verb-phrase))
        (noun-phrase -> (Article Adj* Noun PP*) (Name) (Pronoun))
        (verb-phrase -> (Verb noun-phrase PP*))
        (PP* -> () (PP PP*))
        (Adj* -> () (Adj Adj*))
        (PP -> (Prep noun-phrase))
        (Prep -> to in by with on)
        (Adj -> big little blue green adiabatic)
        (Article -> the a)
        (Name -> Pat Kim Lee Terry Robin)
        (Noun -> man ball woman table)
        (Verb -> hit took saw liked)
        (Pronoun -> he she it these those that)))
    
    (setf *grammar* *bigger-grammar*)
    
    
    
    ;;; ==============================
    ;;; Reuse the same knowledge base to do something slightly different.
    ;;; In this case, return the parse tree of the generated sentence instead
    ;;; of just the sentence.
    
    (defun generate-tree (phrase)
      "Generate a random sentence or phrase,
      with a complete parse tree."
      (cond ((listp phrase)
             (mapcar #'generate-tree phrase))
            ((rewrites phrase)
             (cons phrase
                   (generate-tree (random-elt (rewrites phrase)))))
            (t (list phrase))))
    
    
    ;;; ==============================
    ;;; Generate the whole language described by a grammar.
    ;;; Yet another use of the same knowledge base.
    
    (defun generate-all (phrase)
      "Generate a list of all possible expansions of this phrase."
      (cond ((null phrase) (list nil))
            ((listp phrase)
             (combine-all (generate-all (first phrase))
                          (generate-all (rest phrase))))
            ((rewrites phrase)
             (mappend #'generate-all (rewrites phrase)))
            (t (list (list phrase)))))
    
    (defun combine-all (xlist ylist)
      "Return a list of lists formed by appending a y to an x.
      E.g., (combine-all '((a) (b)) '((1) (2)))
      -> ((A 1) (B 1) (A 2) (B 2))."
      (mappend #'(lambda (y)
                   (mapcar #'(lambda (x) (append x y)) xlist))
               ylist))
    
    
    

    For any question, contact me: yaeln@cs.bgu.ac.il
    Back to course homepage

    Last modified February 24, 2000