next up previous contents
Next: Indexing Up: Control in FUF Previous: Control in FUF

     
Complexity

The complexity of a grammar can be described by the number of possible paths through it, each path corresponding to the choice of one branch for each alternation.

To understand this measure of complexity, it is useful to define the notion of disjunctive normal form  . In general, a grammar is made up of embedded disjunctions, that is, of alts within alts. Embedded disjunctions can always be rewritten into top-level disjunctions. This transformation works as indicated below:

          

Embedded form                	
Normal form

((size ((alt (1 2 3))))) ((alt (((size 1)) ((size 2)) ((size 3)))))

((alt (((cat ((alt (np clause)))) ((alt (((cat np) (rank group)) (rank group)) ((cat word) ((cat clause) (rank word))))) (rank group)) ((cat word) (rank word)))))

((size ((alt (1 2)))) ((alt (((size 1) (weight ((alt (1 2))))) (weight 1)) ((size 2) (weight 1)) ((size 1) (weight 2)) ((size 2) (weight 2))))

The number of branches in the top level alt of an FD in normal form is exponential in the depth of the embedding of alts in the original FD. In practice, for large grammars, the normal form can contain up to 1029 branches.

In certain implementations, disjunctions are expanded and kept in extensive form instead of being ``resolved'' as in FUF, by choosing one branch out of the set of branches. In such an approach, the unification of an FD with a disjunction yields a new disjunction:

unify(FD, ((alt (d1...dn)))) = ((alt (unify FD d1)...(unify FD dn)))
If one of the branches di is not compatible with FD, the branch is canceled, by virtue of the equivalences:
          
((alt (d1 fail))) = d1
((alt ())) = fail
          
          
where fail is the symbolic representation of an incompatible FD. The problem with such an approach is that in practice many disjuncts stay ``alive'' during the unification process, and the result tends to take on sizes close to the normal form expansion of the grammar, which is clearly unmanageable. The FUF implementation does not follow this approach, although one form of the wait construct presented in Section [*] allows the grammar writer to selectively keep certain disjunctions unresolved when desired.

The measure of the complexity of a grammar is thus the number of branches the grammar would have in disjunctive normal form. Indexing the grammar actually divides this measure of complexity by a great number.

In FUF, the functions complexity and avg-complexity compute different measures of the complexity of a grammar.

          
(COMPLEXITY &optional grammar with-index)
 
-> number of branches of grammar in disjunctive normal form.
- By default, grammar is *u-grammar*
- By default, with-index is T. When it is T, all indexed alts are
  considered as one single branch, when it is nil, they are
  considered as regular alts.

(AVG-COMPLEXITY &optional grammar with-index rough-avg)   -> `average' number of branches tried when input contains no constraint. - By default, grammar is *u-grammar* - By default, with-index is T. When it is T, all indexed alts are considered as one single branch, when it is nil, they are considered as regular alts. - By default, rough-avg is nil. When it is nil, the average of an alt is the sum of the complexity of the half first branches. When it is T, the average is half of the sum of the complexity of all branches.

Note that these functions do not currently measure the ambiguity of the patterns included in the grammar.


next up previous contents
Next: Indexing Up: Control in FUF Previous: Control in FUF
Michael Elhadad - elhadad@cs.bgu.ac.il