Natural Language Processing Spring 07 - Assignment 1

Topics in Natural Language Processing
Assignment 1 - Spring 2007 - Michael Elhadad

Natural Language Processing - Assignment 1

This assignment includes 2 parts:

Random Generation from Context Free Grammars
Bottom-up Parsing with Semantics and Generation from Semantics

Random Generation - Limits of Context Free Grammars

The following files implement two solutions to the problem of generating random sentences from a context-free grammar. They serve as the basis for the assignment:

generate1.scm: procedural representation.
generate2.scm: explicit representation of grammars with interpreter.

Question 1.1

Write an extension of the bigger grammar in generate2.scm that limits the depth of the generated tree to 5 levels. In other words, the grammar is finite and can only generate trees of depth up to 5. You should not change the interpreter, but change the grammar (the rules).

Question 1.2

Write a new interpreter for grammars that receives as an additional parameter the maximum depth of the tree to be generated.

Question 1.3

Write a new interpreter for grammars that generates exhaustively all the sentences of a grammar in increasing order of tree depth. The interpreter has a parameter n, for the maximum number of sentences to output, and generates all the sentences with depth 1 first, then all the sentences of depth 2, 3... until n sentences have been produced.

Question 1.4

Write a CFG grammar that can generate sentences with transitive and intransitive verbs appropriately. For example:

John runs

John buys a pizza.

but not:

* John runs a pizza.

* John buys.

(Note: The * before a sentence is a convention in linguistics indicating that it is not grammatical.)

Question 1.5

Write a grammar for Hebrew with the following features:

Coverage similar to that of *bigger-grammar*
(Prep -> le be al im)
(Adj -> katan gadol cahol tsahov)
(Article -> ha)
(Name -> Avraham Yitzhak Yaakov)
(Noun -> shulhan cise luah iparon)
(Verb -> lakah raa ahav ratz)
(Pronoun -> hu hi ze)

The grammar must work properly with definite and indefinite noun phrases:

Avraham lakah et ha shulhan ha katan

Avraham lakah shulhan katan

* Avraham lakah et ha shulhan katan

* Avraham lakah et shulhan ha katan

Bottom-up Parsing with Semantics

This question builds on the bottom-up parser studied in class: Norvig's Common Lisp implementation of a bottom-up parser with semantics.

Question 2.1: Extended semantic grammar

Imagine an interface to a dual tape recorder. The CD player only had one verb, "play", left implicit in the semantics (which was an ordered list of tracks to play). The new unit contains three devices - CD, Tape1 and Tape2 - and it can do three types of operations:

Play: from one of the devices
Record: from one of the devices to one of the tapes
Erase: from one of the tapes

The unit also contains a timer to determine when to start an operation. The meaning of an expression in this case will be a program with the following predicates:

(play <list of tracks> :from <unit>)
(record <list of tracks> :from <unit> :to <unit>)
(erase <list of tracks> :from <unit>)

Assume that the :from and :to parameters have appropriate defaults. In addition, the timer parameter can be added to all the predicates, as follows:

> (meaning '(play 1 to 5 from CD and record 1 to 5 from CD at 300))

(PROGN (play '(1 2 3 4 5) :from 'cd)
       (record '(1 2 3 4 5) :from 'cd :at 300))

Extend the grammar *grammar6* to support this new situation. You may have to change the support function meaning as well.

Question 2.2: Generate from Semantics

In this question we want to use the same grammar *grammar6* to generate sentences from their semantics. In this case, the original semantics is a sequence of track numbers. Write the following function:

(generate-all <list-of-number> <depth>)

The function must work in 3 stages to operate correctly:

Content organization: turn the list of numbers into a list of structured representations using the following semantic predicates:
```
     (sequence <from> <to>)
     (set-diff <set1> <set2>)
     (set-union <set1> <set2>)
     
```
Only the structures of depth less than or equal to <depth> will be returned.
Realization: turn the structured representation into a parse tree using the rules of the grammar.
Linearization: turn the parse tree into a linear string.

Accordingly write the following functions:

```
(structure <list-of-numbers> <depth>)
```
Given a list of numbers, return all the possible semantic structures using the 3 predicates listed above and of maximum depth <depth> and that evaluate to the desired list of numbers.
```
(realize <semantic-structure>)
```
Given a semantic structure, turn it into a parse tree according to the rules of *grammar6*.
```
(linearize <parse-tree>)
```
Given a parse tree, return an ordered list of words.

Finally, write a function:

(generate-best <list-of-number> <scoring-function>)

which returns the "best" sentence for a given semantics according to the scoring function. For example, one scoring function can count the number of words (shortest sentence is best) or the syntactic complexity (minimum number of constituents). Experiment to find a scoring function that will eliminate semantically bizarre sentences (for example, those where the same numbers are repeated).

The end.

Last modified Mar 18th, 2007