.

HUGG - Hebrew Unification Grammar for Generationn of text


Department of Mathematics & Computer Science
Ben-Gurion University of the Negev
Yael (Dahan) Netzer

Advised by Michael Elhadad

.
Motivation


.
Method of Research


.

Examples - paraphrasing, default items


Examples - paraphrasing, default items
חלון דיירי המרתף
החלון של הדיירים במרתף
חלונם של הדיירים במרתף
החלון של דיירי המרתף
ועוד ...

displaymath283



Example - paraphrasing

קופסאות סיגריות אנגליות מפח
קופסאות פח עם סיגריות אנגליות

displaymath284



.
Process of Generation


Generation is a proces with 2 main stages:

  • Plan - what should be said.
  • Realize - lexically and syntactically.

This work concerns Syntactic Realization - the grammar.

Input for grammar: lexicalized representation of phrase in various stages of abstraction.

Output: A grammatical string, representing most accurately the info in the input.


.
Implemantation



The grammar is written in FUF - Functional Unification Formalism [Elhadad]

Input: FD - a list of (att val) val = atom\fd\path.
Grammar: meta-FD: disjunction with ALT, control with NONE, GIVEN, ANY.

All components in the generation process can be implemented with this formalism.


.
Guidlines for choosing input


  • General features of SR
  • Input of SURGE
  • Special features of Hebrew
    1. Smixut
    2. Definiteness
    3. Quantifiers and Determiners
    4. Adjectives
  • Conclusions


.
Guidlines for choosing input


  1. General requirement for input in generation systems:
    • semantic input.
    • partial info (defaults from grammar).
    • expressive -- general enough to suit different systems.
  2. Input for English NP in SURGE [Elhadad]:
    • basis for bilingual applications
    • use knowledge of SURGE: e.g., argumentative features for quantifiers.
  3. Special features of Hebrew NPs


.
Special features of Hebrew NPs


Features of Hebrew that need special treatment in comparison to English:

  1. Smixut
  2. Definiteness
  3. Quantifiers and Determiners
  4. Adjective sequences

Assumption: Syntactic knowledge should be in syntax.


.
General functions in NP


Using the functional-systemic definitions
A set of functional modifiers is defined -

  1. Head - My old science book on the table
  2. Classifier - My old science book on the table
  3. Describer - My old science book on the table
  4. Qualifier - My old science book on the table
  5. Possessor - My old science book on the table


.
Smixut - when is it generated?


Using the systemic definitions in Hebrew:

one syntactic structure - more than one semantic mapping:

  • possessor:
  • classifier:

Features can cooccur -

.

constraints on producing smixut


  1. syntactic constraints
    • type of head (quantifier, noun, adjective).
    • definiteness (somex and nismax marked for the the same definiteness value).
    • number of modifiers - only one somex is possible.
      • english: leather house shirt
      • hebrew: *חולצת בית עור
    • not all nouns can be in smixut (foreign origin and more).

.

Semantic constraints


relations - rel(nismax,somex) [levi,azar,glinert and more].
define a set of semantic relations that can be deleted to form smixut (RDPs)

Main relations (based on theories and corpus):


.
how to deal with smixut?


purpose: keep syntactic knowledge in syntax
provide alternatives for generation when smixut cannot be formed.

method:

  • provide enough semantic information in input: define a set of semantic relations that can be realized in smixut.
  • provide enough defaults in grammar - other possible paraphrasing for the same input, suitable preposition for qualifier (``ל'' for purpose, ``מ'' for material).


.
when should smixut be generated?


Being a member in a semantic set is not sufficient and not necessary:
  • not necessary: naming - using ``smixut'' without context is not understood, but a in a marked context - it becomes ok.
    ילד שוקולד ← love(ילד, שוקולד)
  • not sufficient: not every two words in the semantic relations
    will generate the desired meaning:

    vs.

  • smixut is not obligatory: relations can be realized in other forms:
Yet, producing smixut seems to be constrained:
not every relation will keep its meaning when deleted:

  • ביצת שוקולד ← material(ביצה, שוקולד)
  • ילד שוקולד? ← love(ילד, שוקולד)
    - needs context.

.

Smixut - conclusions


Therefore, to form smixut:
  1. There should be a demand for production *
    • being a member of the semantic set or
    • need for naming or contrastive context.

* paradoxical conclusion...

Pending problems:

  • what can motivate smixut: length, distinction, social, style
  • discourse wins? [lascarides & copestake]


.
Smixut - implementation


To realize a modifier in a pre-determined way - use the feature:
(realize-function-as classifier/qualifier/describer)

Default: if possible semantic or syntactic - use smixut.
Implementation: using bk-class - controlling backtracking for efficiency.


.
Relational nouns, Nominalization


Two additional constructions of NP that use smixut:
  1. Relational nouns - nouns that lexically obligate a complement. relation with its complement can be unique and not be included in a given semantic set. (example - next slide).
  2. Nominalizations - representing a process with a noun - allow input similar to that of the clause. this issue was not handled in hugg (lack of clause grammar).

Relational-noun: example

מהירות המכונית
displaymath285



Nominalization: example
אכילת התפוח

displaymath286


semantic modifiers - example


.
Definiteness


Definiteness is a semantic feature of NP
Definining NPs as definite: should be done with context -
features as uniqueness, existence, specificity are not sufficient.
Rely on planner to define this feature in suitable context.

English has the features: (definite yes/no) - one article in NP.

Hebrew is polydefinite - different constituents of NP (describer, cardinal etc.) are marked as definite.
Definite NPs can be both syntactically marked/unmarked as definite

.

Realizing Definiteness


Definite NPs
Realize definite mark as a separate article, not a morphologic feature: "ha-yeled"

Use an additional feature: (mark-definite yes/no).
mark-definite can be a feature of the whole NP yeled zeh or a feature of the possibly-marked components: Hanah ha-Hamuda.

Non-definite nps:

  • Unmarked (default): indicate with null (definite no)-
    Functionally, when referent not yet in shared knowledge.
  • Marked indefinite: for a different reason than ``discourse introduction''. Marked with a non-empty indefinite-article.
    • Selective feature (selective yes)
      איזה דלת אחת

      displaymath287


    • forced indefinite: (definite indef)

Definite - examples

נער הכפר טוב הלב הזה

displaymath288


definite - examples

כל החיילים

displaymath289



.
Quantifiers and Determiners


HUGG uses
SURGE features to generate quantifiers and determiners.

Distinction of quantifiers and determiners (following [Glinert])

  • Determiners - express identity.
  • Quantifiers - express quantity i.e., amount or portion.
    Portion can be expressed either by partitive quantifiers or by amount-quantifiers in partitive-construction.
    Ex. English: all children - all of the children Hebrew: col hayeladym - *col mhayeladym
This separation defines the order of the quantifiers/determiners in the NP.

Quantifiers - examples

כמה מהילדים

displaymath290



.
Describers - Adjectival Modifiers


Two issues:
  • Semantic input: what semantic relations are mapped to adjectival modifiers
  • Syntactic structure: broken vs. unbroken + order of adjectival modifiers English: unbroken sequence as default - dark old house
    Hebrew: broken sequence:

Semantic Origin of Adjectival Modifiers Adjectives can be generated by:

  • classifiers (non-predicative semantic relation)
  • describers or pronominals (status)

From RDP modifier to NCA modifier when adjective given as modifier.

From non-RDP semantic relation to CA or NCA depending on adjective lexical property (derived or given).

.

Syntactic Structure of Adjective Sequence


What affects the syntactic structure of adjectival modifiers?
  • Order: non-comparative (NCA) before comparative (CA)
  • NCA: temporal, locative or adverbial at end of the NCA-sequence.
  • CA: broken sequence, order is determined by ``heaviness''

Implementation: two categories of describers: NCA-describer (mapped from classifier) and describers.

Limitations: ``כדור קטן לבן'' vs. ``כדור לבן קטן'' vs. ``כדור קטן ולבן''

Describers - Examples

כדור לבן וקטן

displaymath291


קומקום חשמלי קטן

displaymath292


.

Conclusions


In this work we presented a wide-coverage grammar for the generation of NPs in Hebrew.
We assumed that syntactic knowledge should be known to the SR only, but it seems that this assumption cannot be enforced all the time.

Limitations: No grammar for the clause, need for the generation of relative clause.


.
Contribution and Future work


  • Contributions
    1. A fragment of the Hebrew Grammar for Generation
    2. Organizing syntactic knowledge of NP in Hebrew for Generation
  • Future Work
    1. Expand HUGG to Clause level
    2. Realize semantic representation in Hebrew and English
    3. Evaluate coverage on large-scale corpora

Yael Netzer
Tue Mar 3 12:29:54 IST 1998