« Projects 2008

Automatic Linguistic Analysis of Hebrew Text

  • Project number: 202-08-09
  • Students: Meni Adler, Yoav Goldberg
  • Supervisor: Michael Elhadad

Natural language processing (NLP) is a subfield of artificial intelligence, which studies the problems of automated generation and understanding of human languages. For the case of Hebrew, the processing task is complicated, because Hebrew has a very rich morphology, and a high level of ambiguity. We present a Hebrew text analysis system, which combines various algorithms and models, and exploits the special characteristics of the Hebrew language. Given a Hebrew text, the system assigns a full set of morphological features for each word, extracts noun phrases, and recognizes entity names (persons, locations, organizations, temporal and number expression). A fully operating version of the system is available online at: http://www.cs.bgu.ac.il/~nlpproj/demo.

The system uses advanced machine learning methods, to acquire knowledge of the Hebrew language from statistical analysis of large quantities of texts. It also advances our understanding of Hebrew linguistic phenomena, by providing access to a carefully annotated corpus of Hebrew texts.