Topics in Natural Language Processing
Assignment 1 - Spring 2007 - Michael Elhadad
Natural Language Processing - Assignment 1
This assignment includes 2 parts:
- Random Generation from Context Free Grammars
- Bottom-up Parsing with Semantics and Generation from Semantics
Random Generation - Limits of Context Free Grammars
The following files implement two solutions to the problem of generating
random sentences from a context-free grammar. They serve as the basis for
the assignment:
- generate1.scm: procedural representation.
- generate2.scm: explicit representation of
grammars with interpreter.
Question 1.1
Write an extension of the bigger grammar in generate2.scm that limits the depth
of the generated tree to 5 levels. In other words, the grammar is finite and
can only generate trees of depth up to 5. You should not change the
interpreter, but change the grammar (the rules).
Question 1.2
Write a new interpreter for grammars that receives as an additional
parameter the maximum depth of the tree to be generated.
Question 1.3
Write a new interpreter for grammars that generates exhaustively all the
sentences of a grammar in increasing order of tree depth. The interpreter
has a parameter n, for the maximum number of sentences to output, and
generates all the sentences with depth 1 first, then all the sentences of
depth 2, 3... until n sentences have been produced.
Question 1.4
Write a CFG grammar that can generate sentences with transitive and
intransitive verbs appropriately. For example:
John runs
John buys a pizza.
but not:
* John runs a pizza.
* John buys.
(Note: The * before a sentence is a convention in linguistics indicating
that it is not grammatical.)
Question 1.5
Write a grammar for Hebrew with the following features:
- Coverage similar to that of *bigger-grammar*
- (Prep -> le be al im)
- (Adj -> katan gadol cahol tsahov)
- (Article -> ha)
- (Name -> Avraham Yitzhak Yaakov)
- (Noun -> shulhan cise luah iparon)
- (Verb -> lakah raa ahav ratz)
- (Pronoun -> hu hi ze)
The grammar must work properly with definite and indefinite noun phrases:
Avraham lakah et ha shulhan ha katan
Avraham lakah shulhan katan
* Avraham lakah et ha shulhan katan
* Avraham lakah et shulhan ha katan
Bottom-up Parsing with Semantics
This question builds on the bottom-up parser studied in class:
Norvig's Common Lisp implementation of a bottom-up
parser with semantics.
Question 2.1: Extended semantic grammar
Imagine an interface to a dual tape recorder. The CD player only had one
verb, "play", left implicit in the semantics (which was an ordered list of
tracks to play). The new unit contains three devices - CD, Tape1 and Tape2
- and it can do three types of operations:
- Play: from one of the devices
- Record: from one of the devices to one of the tapes
- Erase: from one of the tapes
The unit also contains a timer to determine when to start an operation.
The meaning of an expression in this case will be a program with the
following predicates:
- (play <list of tracks> :from <unit>)
- (record <list of tracks> :from <unit> :to <unit>)
- (erase <list of tracks> :from <unit>)
Assume that the :from and :to parameters have appropriate defaults.
In addition, the timer parameter can be added to all the predicates, as
follows:
> (meaning '(play 1 to 5 from CD and record 1 to 5 from CD at 300))
(PROGN (play '(1 2 3 4 5) :from 'cd)
(record '(1 2 3 4 5) :from 'cd :at 300))
Extend the grammar *grammar6*
to support this new situation.
You may have to change the support function meaning
as well.
Question 2.2: Generate from Semantics
In this question we want to use the same grammar *grammar6*
to
generate sentences from their semantics. In this case, the original
semantics is a sequence of track numbers. Write the following function:
(generate-all <list-of-number> <depth>)
The function must work in 3 stages to operate correctly:
- Content organization: turn the list of numbers into a
list of structured representations using the following semantic
predicates:
(sequence <from> <to>)
(set-diff <set1> <set2>)
(set-union <set1> <set2>)
Only the structures of depth less than or equal to <depth> will
be returned.
- Realization: turn the structured representation into a parse tree
using the rules of the grammar.
- Linearization: turn the parse tree into a linear string.
Accordingly write the following functions:
-
(structure <list-of-numbers> <depth>)
Given a list of numbers, return all the possible semantic structures
using the 3 predicates listed above and of maximum depth
<depth>
and that evaluate to the desired list of
numbers.
-
(realize <semantic-structure>)
Given a semantic structure, turn it into a parse tree according to
the rules of *grammar6*
.
-
(linearize <parse-tree>)
Given a parse tree, return an ordered list of words.
Finally, write a function:
(generate-best <list-of-number> <scoring-function>)
which returns the "best" sentence for a given semantics according to the
scoring function. For example, one scoring function can count the number
of words (shortest sentence is best) or the syntactic complexity (minimum
number of constituents). Experiment to find a scoring function that will
eliminate semantically bizarre sentences (for example, those where the same
numbers are repeated).
The end.
Last modified Mar 18th, 2007