Natural Language Processing - Class 4
|
First mission in understanding:
Tagging
Words
What is a word?
Smallest meaning bearing part of langauge, that can stand independently.
book, table
Not a word:
-er, -ed
Word?
she'd
In written langauge, not every word/unit surrounded by spaces is a single
word.
For instance, in Hebrew:
V-Axalty-hu (V-Any-Axalty-Ooto)
Morphology:
Words are made of morphemes (havarot) -
The smallest meaning bearing units.
Some words hare made of single morphemes
Bound vs. Free morphemes
- Free morphemes can stand alone (eg. cat)
- Bound morphemes must be attached to another to form a word: (eg. -s)
- The most common kind of bound morpheme is an affix (prefix, suffix,
infix)
words: car, son
suffixes: -ing : having, eating
-est : best , smallest
prefixes: re- reproducing, re-aranging
infixes -
Other kinds of word formation:
Morpheme internal changes:
ring rang rung
sing sang sung
man men
foot feet
Suppletion: complete replacement:
I am I was
I go I went
Other kinds of word formation:
Templatic morphology (found in semitic languages)
Consonantal root, inflection indicated by the vowels and pattern of consonants and vowels.
ktb
katav 'write active perfective'
extov 'active imperfective'
nixtav 'passive imperfective'
huxtab 'cause to write
Inflection:
Some words change its base pattern for syntactic need:
such as: mark number, plural, gender for nouns and verbs,
tense for verbs.
For instance:
to mark plural - boy -> boys
to mark person - he runs / they run
In Hebrew, the system is much more complicated.
Derivation:
change major classes of words:
sweet - adjective
sweetness - noun
halixa - noun
holex - verb
Inflection vs. Derivation
Inflection
doesnít change part of speech or content
walk vs. walks
required by the syntax
She walks/*walk
productive
-s appears on almost all verbs
occur at margins of words
(outside derivational)
solid+ify+s
Derivation
can change part of speech or content.
solidA solidifyV
not required by syntax (not grammatical)
not productive
brotherhood *friendhood
before inflection
Morphology can be used to find the parts of speech of words.
Morphology and POS tagging for English:
English has little inflectional morphology:
English relies more on word order and prepositions than on morphology
of words.
Systematic POS ambiguities:
plural / vs. present progressive
I write plays
My son plays the guiter
English has a lot of derivational morphology.
What is Part of Speech:
- Equivalence class
- A Basic semantic classifications:
- Nouns - persons, place or things - (book table imaganaiton)
- Verbs - actions occurrence or state of being : walk, eat..
- Adjectives - modifier that expresses quality, quantity or
extent big, blue, lazy.
- Adverb:
odifier that expresses manner, quality, place, time, degree, number, cause, opposition, affirmation or denial
-
Preposition:
modifier that indicates location or origin.
- and more: coordination (and, or, since..) and determiners (the, this, many..).
- Closed-class (no new words, prepositions, determiners..) and
open-class words.
Time flies like an arrow;
Fruit flies like a banana.
(first flies is a verb, second one is a noun. First like is a comparative conjunction, second like is a verb. )
Problems with the semantic def.
The yinkish dripner blorked quastofically into the nindin with the pidibs.
yinkish -adj
dripner -noun
blorked -verb
quastofically -adverb
nindin -noun
pidibs -noun
We determine the P.O.S of a word by the affixes that are attached to it and by the syntactic context (where in the
sentence) it appears in.
The definition of P.O.S is distributional
Two kinds of distribution
Morphological distribution
(affixes that appear on the word)
Syntactic distribution
(position relative to nearby words.)
P.O.S distributionally (English)
Nouns
take -s, 's, -ness, -ment, -er, affixes
appear after [ the _____ ]
can be subject of sentence
Verbs
take -s, -ed, -ify, -ing, re- affixes
appear after auxiliaries [ will ______ ]
[Please _______!]
Adjectives
take -er, -est, -ate, -ity, affixes
appear between the & noun [ the _____ book ]
can follow very [very _______]
can appear in [John is __________]
Adverbs
take -ly affix
appear before adjectives and verbs
[very ______]
Open vs. Closed P.O.S
Open POS:
allow neologisms (new words)
express content
N, V, Adj, Adv
Closed POS:
don't allow new additions
express function
Prepositions, conjunctions, modals, auxiliaries, determiners
(articles) pronouns.
Some closed POS
Prepositions: to, from, under, over, with, by, etc.
Conjunctions: and, or
Determiners/Deitics/quantifiers: this, that, the, a, my, your, our his, her, their, each, every, some.
Complementizers: that, which, for
Auxiliaries/Modals: will, have, can, should, is, must, would
Negators: no, not, n't
Intensifiers: very, right.
Tagging - why?
Identify phrases
Identify structures.
Tagging - how to do tagging?
Preliminaries
Tagset
: the set of possible tags for parts of speech.
(size is changing in applications, languages...)
A tagset should include the information that is needed for the next
steps in the process, and that people can annotate well.
Penn treebank tagset
Practice tagging:
Tagset of Nouns:
- Common nouns:
- singular NN child, book
- plural NNS children, books
- Proper nouns: NNP
- Pronoun (closed class)
- Personal pronoun: PRP i him me we
- Possessive pronoun: PRP$ my his our
practice
Tagset of Verbs
- Infinitive: untensed verb usually preceded by "to" or a modal.: VB
(to go, to help)
- Tensed:
- MD modals (closed class): will, can, may..
- VBZ (3rd present singular, ends in -s):
she goes, runs, walks..
- VBP (present non-3rd persong
we are, they have, you do, i feel
- VBD (past tense, ends with -ed or d)
we were, they had, wanted, ...
- VGB (present participle ends in "ing")
going, being, running
notecould be adjectives: interesting or
nouns: building.
- VBN: past participle sent, written, been.
Tagset of Adjectives and adverbs:
- Adjectives (modify nouns)
- JJ - interesting, yellow, difficult
- JJR - comparative form ending in -er (bigger)
- JJS - superlatives (mmost, earliest)
- Adverbs (modify other things than noun: adjectivs, verbs and adverbs):
- RB: quickly, fast, perhaps...
- RBR - comparative - faster, later
- RBS - superlative - fstest
Tagset of Prepositions and conjunctions:
- Prepositions
- TO the word to
- IN all others that are associated with noun, and
subordinating conjunctions (because, like, so...)
- Particle: RP - associated with verb
covered it up.
- Conjunction:
CC - coordinating conjunction
and, but, or, not..
More tagsets:
- Possessive endings: POS 's (John's, students')
- Number - CD two, 152
- Determiner
- DT a, every, ...
- PDT -PDT pre-determiner (preceding 'the'
- Wh-words
- WDT - which, that
- WP - who, whom, whtat
- WP$ - whose
- WRB - when, why , where, who, how
- Miscellaneous
- Existential "There":
there is no way.
- Expletive or excmlanation UH hey, oh, mmm
- foreign word - FW perestroika
- symbols: SYM 2*x=y
- list LS
Here's text to practice on.
HOW to tag?
- ensure people can reproduce tagging.
- check data
- needs some context - simple rules.
References:

For any question, contact me: yaeln@cs.bgu.ac.il
Back to course homepage
Last modified , 2000