General Intro to NLP
Structural Analysis of Text
Infinite occurrences / finite processing.
Text is structured, but structure is not manifest.
Therefore, problems of ambiguity (determining the intended structure).
Local vs global ambiguity:
Problem of parsing linear structure into constituent tree.
The questions to address are:
Why is syntactic structure important:
- Structure determines meaning.
- Structure impacts on form (for example, agreement).
Structural vs Functional
Map meanings to linguistic forms
- What is the structural position of a constituent?
- What is the function of a constituent within a sentence, the discourse?
Language is intentional - used for a purpose
- Obtain info or services
- Provide info or services
- Just be in touch
- Argument for a conclusion
Several parallel levels to language
- Syntax: structure, grammar.
- Semantics: meaning, propositional content.
- Pragmatics: usage, speech-acts, presuppositions
- Textual: contribute to extended coherent text.
Representation of meaning
- Option1: meaning = procedure - set of instructions to achieve
what the speaker wants.
Simplest form:
- meaning of command = procedure to carry out action
- question = procedure to find answer
- statement = proc to update the hearer's beliefs
Problem: what is a procedure? opaque?
- if yes: how can explain refusal, reflection on meaning?
- if no: how different from regular representation?
- Option2: network based representations.
concept = nodes, links = relationships.
facilitate certain type of inferences (inheritance, propagation).
intuitive.
- Option3: logic based representation. Compositionality principle
(Frege 1890, Montague).
Use of knowledge
"City officials refused the demonstrators a permit because"
--- they feared violence
--- they advocated violence
Difficult in terms of simple selectional markers.
Need to reason about goals and plans of the participants to generate
expectations about meaning.
--- Excuse me, do you know if there's a bank hapoalim near here?
--- Today is Monday!
Why is he telling me this?
Regard utterances as actions given certain preconditions will achieve an
Effect.
Plan intended not marked in syntax:
Can you pass the salt?
It's rather cold!
Parts of Speech and Morphology
Syntactic analysis starts with the following questions:
- What is a word? (basic terminal unit in the parse tree)
Issues in tokenization.
- What is a sentence? (top level unit in the parse tree)
Sentence delimitation.
- What are the basic properties of words
Linguists define groups of words which "behave in a similar manner" in
syntactic contexts. The basic test is substitution:
A {large, small, blue, enormous, ...} box is in the room.
Basic classes are: verb, noun, adjective. These are called parts of
speech.
For basic Parts of speech we can provide a semantic interpretation:
- verb: action or state
- noun: entity (people, concepts, things)
- adjective: property
Distinguish:
- Open-class (many members, new members often added)
- Closed-class (few members, functional use): article, preposition...
Morphological features:
- Features of words: number, gender, tense, person, case
- Processes: Inflection, derivation, compounding.
Inflection is the process of modifying a root form by combining prefixes
and suffixes to indicate the presence of morphological features.
Derivation is the process of creating a new lexical item from an
existing, more basic one. For example, the derivation of the noun of
action destruction from the verb destroy, or the chain
of derivations luck/noun, lucky/adjective,
luckily/adverb.
Compounding is the process of combining several lexical items into a
new one, whose properties can be derived from the compounded elements,
or can be independent. For example, in Hebrew beit sefer is derived
from bayit and sefer.
For more information on morphology, refer to this excellent
Introduction to morphology.
Phrase Structure / Dependency
Constraints on word order.
Words occur in groups, with dependencies among them.
Constituent / Phrase.
Common Phrases:
- Noun Phrase (NP)
- Adjectival Phrase (AP)
- Verb Phrase (VP)
- Clause
Phrase Structure Grammars / (PSG)
Context Free Grammars (CFG)
Last modified Mar 18, 2007