next up previous contents
Next: Implicit and Incremental CSET Up: Precise Characterization of FDs Previous: Control of the Ordering:

       
Explicit Specification of Sub-constituents: the CSET Keyword

The unifier works top-down recursively: it unifies first the top-level FD against a grammar (generally the top-level FD represents a sentence), and then, recursively, it unifies each of its constituents. For example, to unify a sentence, the unifier first takes the whole FD and unifies it with the grammar of the sentences (cat S), then it unifies the prot and goal with the grammar of NPs (cat np), then it unifies the verb with the grammar of VPs (cat vp).    

You can specify explicitly which features of an FD correspond to constituents and therefore need to be recursively unified. To do that, add a pair:

          
(CSET (c1 ... cn))

For example: (CSET (PROT VERB GOAL))

The value of a cset (stands for Constituent SET) is considered as a SET (unordered). Therefore the following 2 pairs are correctly unified:

          
(CSET (PROT VERB GOAL))
(CSET (VERB GOAL PROT))
          
          

Actually, two cset pairs are unified if and only if there values are two equal sets.  

NOTE: A cset values can contain full paths to specify constituents. So for example, the following is a legal feature:

          
(cset ({prot n} {verb v} goal))
          
          
FUF does not rely exclusively on csets to find the constituents to be recursively unified. FUF generally tries to infer the value of cset from the value of pattern and an observation of the features of the current FD (with the assumption that features containing a cat attribute are constituents). The exact procedure followed to identify the implicit constituent set of an fd is:

1.
If a feature (cset (c1 ... cn)) is found in the FD, the constituent set is just (c1 ... cn).

2.
If no feature cset is found, the constituent set is the union of the following sub-fds:
(a)
If a pair contains a feature (cat xx), it is considered a constituent.

(b)
If a sub-fd is mentioned in the pattern, it is considered a constituent.

As a consequence, explicit csets are rarely necessary. They are generally used when an fd contains a sub-fd that either is mentioned in the pattern or contains a feature cat, but that you do NOT want to unify. In that case, you can explicitly specify the cset without including this unwanted sub-fd. For larger grammars, however, you should put the emphasis on a clean constituent structure, and therefore you should carefully use the explicit CSET facilities instead of blindly relying on FUF's inferencing. In this case, the advanced CSET facilities described below will prove helpful.



 
next up previous contents
Next: Implicit and Incremental CSET Up: Precise Characterization of FDs Previous: Control of the Ordering:
Michael Elhadad - elhadad@cs.bgu.ac.il