The complexity of a grammar can be described by the number of possible paths through it, each path corresponding to the choice of one branch for each alternation.
To understand this measure of complexity, it is useful to define the notion of disjunctive normal form . In general, a grammar is made up of embedded disjunctions, that is, of alts within alts. Embedded disjunctions can always be rewritten into top-level disjunctions. This transformation works as indicated below:
Embedded form
Normal form
|
The number of branches in the top level alt of an FD in normal form is exponential in the depth of the embedding of alts in the original FD. In practice, for large grammars, the normal form can contain up to 1029 branches.
In certain implementations, disjunctions are expanded and kept in extensive form instead of being ``resolved'' as in FUF, by choosing one branch out of the set of branches. In such an approach, the unification of an FD with a disjunction yields a new disjunction:
unify(FD, ((alt (d1...dn)))) = ((alt (unify FD d1)...(unify FD dn)))If one of the branches di is not compatible with FD, the branch is canceled, by virtue of the equivalences:
((alt (d1 fail))) = d1
((alt ())) = fail
|
allows the grammar writer to selectively keep certain
disjunctions unresolved when desired.
The measure of the complexity of a grammar is thus the number of branches the grammar would have in disjunctive normal form. Indexing the grammar actually divides this measure of complexity by a great number.
In FUF, the functions complexity and avg-complexity compute different measures of the complexity of a grammar.
Note that these functions do not currently measure the ambiguity of the patterns included in the grammar.