Yoav Goldberg

Hebrew NP Chunking

Yoav Goldberg

August 2005

NP chunking is the task of labelling noun phrases in natural language text. The input to this task is free text with part of speech tags (which indicate nouns, adjectives, etc). The output is the same text with brackets around base noun phrases. A base noun phrase is an NP which does not contain another NP (it is not recursive).

The definition of base NPs has to be adapted to the case of Hebrew to handle smixut (construct state) correctly.

This system was trained on a set of 5000 sentences manually parsed in the Knowledge Center for Hebrew Processing (treebank in Hebrew, treebank in English). The results on a test set of about 15,000 NPs are about 83% precision and 88% recall.

The method learns a set of Part of Speech patterns which covers the training set. About 1800 patterns were learned for the best results achieved.

This page contains:



Last modified September 1st 2005, Yoav Goldberg