next up previous contents
Next: Explicit Specification of Sub-constituents: Up: Precise Characterization of FDs Previous: Optional Features: the OPT

      
Control of the Ordering: the PATTERN Keyword

As mentioned previously, the generation of a sentence includes two subprocesses: unification and linearization. Unification produces a complex description of a sentence, made of several constituents. Each constituent is described by an FD, and can recursively contain other subconstituents.

Linearization takes such a complex non-ordered description and outputs a linear, ordered, string of words. This operation is constrained by directives put within the FD. These constraints on the ordering appear after the special attribute pattern.  

For example, in a sentence containing the constituents prot, goal and verb, the following pattern can be used:

          
(PATTERN (PROT VERB GOAL))
          
          
This means that the linearizer should output a string made of the linearization of the constituent prot first, followed by the linearization of the constituent verb and terminated by the linearization of the constituent goal. It also means that nothing can come before prot and after goal, and nothing can come between each pair.

The constituents correspond to features of the FD describing the sentence. That is, this FD must contain pairs with the attributes prot, verb and goal. For example:

          
((cat S)
 (PROT (...))
 (GOAL (...))
 (VERB (...))
 (PATTERN (PROT VERB GOAL)))
          
          

If a constituent mentioned in the pattern is not present in the FD, nothing happens: the linearization of an empty (or non existent) constituent is the empty string.

The pattern directives are generally added by the grammar, since the input to the unifier should be a semantic representation and therefore does not contain any constraint on word ordering.

NOTE: Patterns can contain full paths to specify constituents. For example, the following is a legal pattern:
          
(PATTERN ({prot n} {verb v} goal))
          
          
A given grammar can generate several constraints, that is it can add 2 or more pattern pairs to the result. The unifier therefore includes a pattern unifier. The role of the pattern unifier is to take several constraints on the ordering and to output one ordering that subsumes all of them.  

The following symbols have a special meaning for the pattern unifier: dots and pound (standing respectively for the notations `...' and `#').        

A pattern (c1 ... c2) (noted in the program (c1 dots c2)) indicates that the constituent c1 must precede the constituent c2, but they need not be adjacent. Zero, one or many other constituents can come in between. The pattern (c1 ... c2) still requires the sentence to start with constituent c1 and to end with c2. The pattern (... c1 ... c2 ...) only forces c1 to come before c2.

The pound (#) symbol is used to represent 0 or 1 constituent. For example, if you want to allow a sentence to start with an optional adverbial, you can specify it with the pattern (# prot ... verb ...). This directive will be compatible with both (prot verb goal) and (adverb prot verb goal) for example.

As a consequence of the use of the two symbols pound and dots, the constraints described by pattern directives are PARTIAL orderings.

NOTE: because of the presence of dots and pound, the unification of patterns is a non-deterministic operation. It can produce several results for a given input, and there is no way to predict in which order these possible solutions will be tried. Caution should be exercised when specifying patterns: they should be specific enough to allow only acceptable word orderings (do not use too many dots) but should not be too specific to allow for as yet not supported constituents (for example, a sentence can start with an Adverbial, not necessarily an NP).  

The following example illustrates the fact that pattern unification is non-deterministic in general:

          

Pattern Unification:
p1: (pattern (dots a dots b dots))
p2: (pattern (dots c dots d dots))

Compatible Results: (pattern (dots a dots b dots c dots d dots)) (pattern (dots a dots c dots b dots d dots)) (pattern (dots a dots c dots d dots b dots)) (pattern (dots c dots a dots b dots d dots)) (pattern (dots c dots a dots d dots b dots)) (pattern (dots c dots d dots a dots b dots))

Pattern Unification: p3: (pattern (dots a dots b)) p4: (pattern (dots b c)) Pattern Unification fails.

Patterns are eventually interpreted by the linearization component to produce a string out of an FD.

Appendix [*] describes some advanced uses of pattern unification.


next up previous contents
Next: Explicit Specification of Sub-constituents: Up: Precise Characterization of FDs Previous: Optional Features: the OPT
Michael Elhadad - elhadad@cs.bgu.ac.il