Natural Language Processing (201-2454101)
Class 19 - Spring 1999 - Yael Dahan Netzer

SURGE - Surface Unification-based Realizing Grammar for English

from Elhadad's class. See also:

Syntactic Realisation - Motivation

Discourse Plan via the process of mircoplanning generates a
Text Specification - complete specification of the document to be generated.
A text specification further composed from logical elements and phrase specifications

Phrase specifications - corrseponds to grammatical objects.
Varaieties of Phrase Specifications:
Input for Syntactic Realizer should be in the form of a lexicalised case frame.
This form includes both full semantic content and a specific lexical
items to be used in realising, after content was decided but not the
final form of realization.

How should the input to a realizer should look like? a big question.
Three realizers, three different levels of abstractions.

Steps on the way to surface realization:

Starting point - a skeletal proposition
John gives Mary the blue ball
The word give here is a realization of a relation or event.

give(g,m,b)
The word give here is the symbol for a relation, could be realized in various ways, various languages.

Meaning specification
A relation as give(g,m,b) does not provide all needed information, such as:
is-a(ball,b) and color(b,blue).

The planner will decide which information from the knowledge base will be included in each phrase specification:
the meaning specification of the sentence.
The predicate-argument structure is refered to with a more syntactical terms - a process and its participants.
A specification will further include index and semantics (
extension and intenstion, reference and sense).

A lexicalized case frame
A meaning representation after lexemes were chosen for the semantic context.
Abstract Syntactic Strucutres
With these representations of phrases - the realizer now does Syntactic inference - how to be realized (active vs. passive for example).
change of focus - passive realization
The blue ball was given to Mary by John
Mary was given a blue ball.

Motiviation - why NLG and not just string manipulations?
Consider the following problem: you are given two strings, "S1: John eats" and "S2: John sleeps" and you are asked to build a correct English sentence that combines the two events into one sentence expressing that S2 occurred after S1 in the past (something like: "S3: After eating, John slept").
Working on the strings, such transformations are very difficult to achieve. One need to know the syntactic structure of the small sentences in order to be able to combine them appropriately. The input to SURGE is a good representation level to allow such manipulations. Instead of starting with the strings S1 and S2, we will start with the Functional Descriptions (FDs) I1 and I2 describing S1 and S2 respectively.
The second motivation for using a syntactic realization module like SURGE is to provide an interface between a lexical chooser and the grammar within a complete generation system. The responsibility of the realization module is to abstract away from the complexity of the syntax and to present a simple and compositional interface to the lexical chooser.
Our main goal in this tutorial, is to learn how to write inputs for sentences in SURGE. Inputs to surge look like the following example:
(def-test give1
  "John does not often give it to Mary."
  ((cat clause)
   (adverb ((lex "often")))
   (polarity negative)
   (process ((type composite)
	     (relation-type possessive)
	     (lex "give")))
   (participants ((agent ((cat proper) (lex "John")))
                  (affected ((cat proper) (lex "Mary")))
                  (possessor {^ affected})
                  (possessed ((cat pronoun)))))))
The theory of grammar implemented in SURGE provides a definition for terms like "clause", "process", "participants", "agent", "possessor" etc. The following notes give a highlight of this theory.

Role of a syntactic realization component

  1. Map thematic structure onto syntactic roles
  2. Control syntactic paraphrasing and alternations
  3. Prevent over-generation
  4. Provide defaults for syntactic features
  5. Propagate agreement features (down to morphology)
  6. Select closed-class words
  7. Provide linear-precedence constraints among syntactic constituents
  8. Inflect open-class words
  9. Linearize syntactic tree into a string of inflected words
  10. Perform syntactic inference

Evaluation Criteria for Syntactic Realization Components

Desired features:

Structure of the grammar

SURGE supports the following parts of speech:
  1. Clause (cat clause)
  2. Nominal group (cat np)
  3. Verb group (cat verb-group)
  4. Adjectival group (cat ap)
  5. Prepositional phrase (cat pp)
  6. Adverbs (cat adv)
It is at the toplevel an alternation of subgrammars (one for each part-of-speech). Each subgrammar is in turn subdivided into a set of "systems" which are the main decision points of the subgrammar. Every constituent in a SURGE input must have a well-specified cat feature, or one that can be inferred from its position within a higher-level constituent. For example, in the following FD:
(def-test t1 
  "This car is expensive."
  ((cat clause)
   (process ((type ascriptive)))
   (participants ((carrier ((lex "car")
		            (cat common)
		            (distance near)))
                   (attribute ((lex "expensive")))))))
The cat of the participant attribute is not specified, because by default, attributes are adjectival phrases - so it can be inferred from its position.

Syntactic Theories

SURGE is a syntactic realizer which is, in its spirit, mainly a combination of the two following therories, but allows information to come from other sources (such as HPSG, descriptive (such as Quirk et al.).

Systemic Grammar

Systemic-Functional linguistics - langauge as resource for expressing meaning in context (Halliday).
Sentences are collections of functions - the grammar is a set of rules for mapping these functions onto explicit grammatical forms.
Good approach for generation. A sentence can be viewed in several layers:
Johnwilleatthe Apple
Mood subject finite predicator object
Tranisitivityprocessgoal
Theme theme rheme
The tree meta-functions: A systemic grammar is structured by a directed acyclic and/or graph - a system network. The grammar for the clause category is the most complex. It consists of four main systems:
  1. Transitivity system: determines the type of the main process and its participants.
  2. Mood system: determines whether the clause is finite (declarative, interrogative or relative) or non-finite (imperative, infinitive, participial).
  3. Voice system: active, passive, causative etc.
  4. Circumstantials: determines the structure of modifiers to the predicate and to the clause as a whole.
According to systemic theory, a clause can be viewed as realizing several layers of meaning into a single linguistic constituent. The most important way to classify these layers of meaning is by referring to the three meta-functions that language satisfies: Each function of the clause belongs to one of these 3 meta-functions. For example, the transitivity system belongs to the ideational meta-function, mood to the inter-personal, and voice to the textual. This explains why each function can be studied independently of the other - as each system is largely orthogonal to the others. Eventually, though, all the decisions taken on the clause must be combined into one coherent linguistic structure. This is the point where unification plays a crucial role, in allowing the grammar-writer to combine decisions from several orthogonal systems in a most natural way, through the values shared by a set of attributes. One way to view the decision process going on inside the grammar, is that each system posts constraints on the value of a set of attributes and the unification mechanism finds a combined set of values that satisfies all these constraints all at once.

The transitivity system

The transitivity system determines what participants contribute to the meaning of the clause - when the clause is viewed as a description of an event or relation in the world. At its heart, the clause is a the description of a process - a generic term that can refer to either an event or a relation (and has no relation to the aspect of the clause as in process vs. event vs. state). Participants surface as linguistic constituents that satisfy the following linguistic criteria: NOTE: Each one of these criteria taken alone is not sufficient to characterize participants, but taken together they have proven quite reliable.
Semantically, participants correspond to the nuclear roles of the process. In knowledge representation terms, a process is a relation among terms. The participants are the terms that fill the basic arity of the relation. Additional terms can then be added compositionally to modify the meaning of the process or of the predicate (these correspond to sentence and predicate adjuncts as explained
below.

Run FUF:

From Emacs
M-x load-system 
fug5
From CL:
(setenv "fug5" "~elhadad/fuf/fuf53")
(load "$fug5/fug53.l")
:pa fug5
Load grammar:
(load "~yaeln/surge23/code/gr.l")
(load "~yaeln/surge23/code/linearize2.l")
File of examples
~yaeln/surge23/inputs/ir.l
~yaeln/surge23/inputs/code.l

You can load them, use them as a tutorial for writing inputs.

Run a test:
(test :item 'test-name)
Run all defined tests:
(test :item *ordered-tests*)
Clear tests:
(clear-tests)

Look in the FUF manual for more help.


Back to the course home page

Last modified June 12, 1999 Yael Dahan Netzer