We present here a minimal grammar that contains just enough to generate the simplest complete sentences. It is included in file gr0.l in the directory containing the examples. A little more complex grammar, handling the active/passive distinction, is available in gr1.l, and a more interesting one in gr2.l. 1
((alt MAIN (
;; a grammar always has the same form: an alternative
;; with one branch for each constituent category.
|
A few comments on the form of this grammar: the skeleton of a grammar is always the same, a big alt (alternation of possible branches, the unifier will pick one compatible branch to unify with the input). Each branch of this alternation corresponds to a single category (here, S, NP and VP).
The second remark is about the form of the input: as shown in the following example, an input is an FD, giving some constraints on certain constituents. The grammar decides what grammatical category corresponds to each constituent.
The next main function of the grammar is to give constraints on the ordering of the words. This is done using the pattern special attribute. A pattern is followed by a picture of how the constituents of the current FD should be ordered: (Pattern (prot verb goal)) means that the prot constituent should come just before the verb constituent, etc.
In the first branch, the only thing to notice is how the agreement subject/verb is described: the number of the PROT will appear in the input as a feature of the FD appearing under PROT, as in:
(prot ((number plural) (lex `car')))
standing for ``cars''. To enforce the subject/verb agreement, the grammar picks the feature number from the prot sub-fd and requests that it be unified with the corresponding feature of the verb sub-fd. This is expressed by:
(verb ((number {prot number})))
which means: the value of the number feature of verb must be the same as the value of the number feature of prot. The curly-braces notation denotes what is called a ``path'' which is a pointer within an fd. Note that in this line of the grammar, we refer to {prot number} even though the {prot number} feature does not appear under prot in the rest of the grammar. This is a general feature of FUF: any attribute can appear in an FD, and its value can be given either by the grammar directly where it would appear, or by the input, or by the grammar coming from a distant place and using a path.
Note also that the agreement constraint could have been written in the ``opposite'' direction:
(prot ((number {verb number})))
Or even:
({prot number} {verb number})
In the second branch, describing the NPs, we have two cases, corresponding to proper and common nouns. Common nouns are preceded by an article, whereas proper nouns just consist of themselves, e.g., ``the car'' vs. ``John''. If the feature proper is not given in the input, the grammar will add it. By default, the current unifier will always try the first branch of an alt first. That means that in this grammar, proper nouns are the default.
Finally, a brief word about the general mechanism of the unification: the unifier first unifies the input FD with the grammar. In the following example, this will be the first pass through the grammar. Then, each sub-constituent of the resulting FD that is part of the cset (constituent-set) of the FD will be unified again with the whole grammar. This will unify the sub-constituents prot, verb and goal also. This is how recursion is triggered in the grammar. The next section describes how the cset is determined. All you need to know at this point is that if a constituent contains a feature (cat xxx) it will be tried for unification.
In the input FDs, the sign === is used as a shortcut for the notation:
(n === John) <===> (n ((lex John)))
|
The lex feature always contains the single string that is to be used in the English sentence for all ``terminal'' constituents.
When unified with the following FD, the grammar will output the
sentence ``John likes Mary''.
|
Following the trace of the program will be the easiest way to figure out what is going on:
LISP> (uni ir01)
->
>STARTING CAT S AT LEVEL {}
|
In the figure, you can identify each step of the unification: first the top level category is identified: (cat s). The input is unified with the corresponding branch of the grammar (branch #1). Then the constituents are identified. We have here 3 constituents: PROT of cat NP, VERB of cat VP and GOAL of CAT NP. Each constituent is unified in turn. Then for each constituent, the unifier identifies the sub-constituents. In this case, no constituent has a sub-constituent, and unification succeeds. Note that in general, the tree of constituents is traversed breadth first.
Now, it is also important to know when unification fails. The following example tries to override the subject/verb agreement, causing the failure:
|