Natural Language Processing - Class 4

First mission in understanding:

Tagging


Words

What is a word?

Smallest meaning bearing part of langauge, that can stand independently.
book, table

Not a word:
-er, -ed

Word?
she'd 
In written langauge, not every word/unit surrounded by spaces is a single word.
For instance, in Hebrew:
V-Axalty-hu (V-Any-Axalty-Ooto)

Morphology:

Words are made of morphemes (havarot) - The smallest meaning bearing units.

Some words hare made of single morphemes

Bound vs. Free morphemes

words: car, son
suffixes: -ing : having, eating
          -est : best , smallest
prefixes: re- reproducing, re-aranging
infixes - 

Other kinds of word formation:

   Morpheme internal changes: 

      ring rang rung 
      sing sang sung 
      man men 
      foot feet 

   Suppletion: complete replacement: 

      I am I was 
      I go I went 

Other kinds of word formation:

   Templatic morphology (found in semitic languages) 

      Consonantal root, inflection indicated by the vowels and pattern of consonants and vowels. 
      ktb 

   katav 'write active perfective' 

   extov 'active imperfective' 

   nixtav 'passive imperfective' 

   huxtab 'cause to write 



Inflection:

Some words change its base pattern for syntactic need: such as: mark number, plural, gender for nouns and verbs, tense for verbs.
For instance:
to mark plural - boy -> boys
to mark person  - he runs / they run
In Hebrew, the system is much more complicated.

Derivation:

change major classes of words:
sweet - adjective
sweetness - noun

halixa - noun
holex - verb

Inflection vs. Derivation


   Inflection 

   doesnít change part of speech or content 

      walk vs. walks 

   required by the syntax 

      She walks/*walk 

   productive 

      -s appears on almost all verbs 

   occur at margins of words 

      (outside derivational) 

      solid+ify+s

   Derivation 

   can change part of speech or content. 

      solidA solidifyV 

   not required by syntax (not grammatical) 

   not productive 

      brotherhood *friendhood 

   before inflection
Morphology can be used to find the parts of speech of words.
Morphology and POS tagging for English:
English has little inflectional morphology:
English relies more on word order and prepositions than on morphology of words.
Systematic POS ambiguities:
 plural / vs. present progressive
 I write plays 
 My son plays the guiter
  
English has a lot of derivational morphology.

What is Part of Speech:


Time flies like an arrow;
Fruit flies like a banana.
(first flies is a verb, second one is a noun. First like is a comparative conjunction, second like is a verb. )
Problems with the semantic def. 

   The yinkish dripner blorked quastofically into the nindin with the pidibs. 

      yinkish -adj 
      dripner -noun 
      blorked -verb 
      quastofically -adverb 
      nindin -noun 
      pidibs -noun 
We determine the P.O.S of a word by the affixes that are attached to it and by the syntactic context (where in the
   sentence) it appears in. 

   The definition of P.O.S is distributional 

Two kinds of distribution

   Morphological distribution 

      (affixes that appear on the word) 

   Syntactic distribution 

      (position relative to nearby words.)

P.O.S distributionally (English)

   Nouns 

      take -s, 's, -ness, -ment, -er, affixes 
      appear after [ the _____ ] 
      can be subject of sentence 

   Verbs 

      take -s, -ed, -ify, -ing, re- affixes 
      appear after auxiliaries [ will ______ ] 
      [Please _______!]


   Adjectives 

      take -er, -est, -ate, -ity, affixes 
      appear between the & noun [ the _____ book ] 
      can follow very [very _______] 
      can appear in [John is __________] 

   Adverbs 

      take -ly affix 
      appear before adjectives and verbs 
      [very ______] 

Open vs. Closed P.O.S

   Open POS: 

      allow neologisms (new words) 
      express content 
      N, V, Adj, Adv

   Closed POS: 

      don't allow new additions 
      express function 
      Prepositions, conjunctions, modals, auxiliaries, determiners
(articles) pronouns.

Some closed POS

   Prepositions: to, from, under, over, with, by, etc. 

   Conjunctions: and, or 

   Determiners/Deitics/quantifiers: this, that, the, a, my, your, our his, her, their, each, every, some. 

   Complementizers: that, which, for 

   Auxiliaries/Modals: will, have, can, should, is, must, would 

   Negators: no, not, n't 

   Intensifiers: very, right. 

Tagging - why?

Identify phrases Identify structures.


Tagging - how to do tagging?

Preliminaries

Tagset
: the set of possible tags for parts of speech. (size is changing in applications, languages...)

A tagset should include the information that is needed for the next steps in the process, and that people can annotate well.

Penn treebank tagset

Practice tagging:

Tagset of Nouns:

practice

Tagset of Verbs

Tagset of Adjectives and adverbs: Tagset of Prepositions and conjunctions: More tagsets: Here's text to practice on.

HOW to tag?

  1. ensure people can reproduce tagging.
  2. check data
  3. needs some context - simple rules.

References:


For any question, contact me: yaeln@cs.bgu.ac.il
Back to course homepage

Last modified , 2000