next up previous contents
Next: Control in FUF Up: Morphology and Linearization Previous: The Dictionary

    
Linearization and Punctuation

The linearizer interprets the pattern ordering constraints and assembles the words of the sentence into a linear string. In addition, the linearizer deals with punctuation and capitalization. The general algorithm followed by the linearizer is:

1.
If a feature gap is found, the linearization of the FD is the empty string.

2.
Else: Identify the pattern feature in the current FD. If a pattern is found:
(a)
For each constituent of the pattern, recursively linearize the constituent.

(b)
The linearization of the fd is the concatenation of the linearizations of the constituents in the order prescribed by the pattern feature. (Note that during linearization, dots and pounds are ignored in the pattern.)

3.
If no feature pattern and a feature lex is found:
(a)
Find the lex feature of the fd, and depending on the category of the constituent, the morphological features needed. For example, if fd is of (cat verb), the features needed are: person, number, tense.

(b)
Send the lexical item and the appropriate morphological features to the morphology module  .

(c)
Identify the feature punctuation. Punctuation can contain three sub-features: before, after and capitalize. According to the value of these features, append or prepend punctuation to the string computed by the morphology module, and capitalize the string if requested. The linearization of the fd is the resulting string.

(d)
If the current cat is not known by the morphology system, the linearization of the constituent is a string of the form <unknown cat X: S>.

4.
If no feature pattern and no feature lex is found, the linearization of the current fd is the empty string.

The linearizer also deals with inserting sequences of punctuation signs, insertion of spaces between words and the ``liaison'' article ``an'' as in ``an interesting case'' or ``an RPS''. The following rules are implemented:

1.
A space is inserted between each pair of words except when:
(a)
Do not put space after an opening bracket `([`'

(b)
Do not put space before a closing bracket `)]'`

(c)
Do not put space before punctuations.

2.
When the indefinite singular article (``a'') is followed either by a word that starts with a vowel or by a word whose FD contains the feature (a-an yes), the form ``an'' is used instead of ``a''. For example, if a word is produced from the FD ((lex `RPS') (a-an yes)), the string ``an RPS'' will be produced.

3.
The first word of the string produced by the linearizer is capitalized.

4.
If an FD contains the feature (punctuation ((capitalize yes))), the string produced by the linearizer is capitalized. If the value of capitalize is no, the string is not capitalize, even if it starts the sentence.

5.
If an FD contains the feature (punctuation ((before `,') (after `,'))), the string produced by the linearizer starts and ends with a comma. Any string can be specified in this feature.

6.
Leading punctuations are removed from the final string. (There are 6 punctuation signs ,.;:!?)

7.
A final period (.) is added to sentence if it does not already end with a punctuation. If the mood of the sentence is a specialization of interrogative, then a final question mark (?) is added.

8.
Sequences of punctuations are filtered according to the following rules:
(a)
,,     -&gt  ,
,.	.
,;	;
,:	:
,!	!
,?	?

(b)
.,	.,
..	.
.;	.;
.:	.:
.!	.!
.?	.?

(c)
;,	;
;.	.
;;	;
;:	:
;!	!
;?	?

(d)
:,	:
:.	:
:;	:
::	:
:!	!
:?	?

(e)
!,	!,
!.	!,
!;	!;
!:	!:
!!	!
!?	!?

(f)
?,	?,
?.	?
?;	?
?:	?
?!	?
??	?


next up previous contents
Next: Control in FUF Up: Morphology and Linearization Previous: The Dictionary
Michael Elhadad - elhadad@cs.bgu.ac.il