Hebrew Named Entity Recognition (NER)

HebrewNER
 
Download
 
Thesis
 

About NER

HebrewNer is a named entity recognition package for Hebrew. Named Entity recognition is a form of information extraction in which we seek to classify every word in a document as being a person-name, organization, location, date, time monetary value, percentage, or “none of the above”. The significance of task is mostly due to the marketing potential of the exact Named Entity recognition system. Such system could contribute a lot in many areas, like: development of more exact search engine, machine translation, general organization and indexation of the documents and books. Named Entity recognition is a foundation for work on more complex information extraction tasks.

In European language the problem, at least on the surface, doesn’t seem to be very complicated - since most of the named entities start with capital letter. However, in Hebrew we don’t use capital letter that complicates a problem a lot. Additional difficulties are caused by the large ambiguity of the Hebrew language. Many other qualities, unique to Hebrew language and culture, might influence the Named Entity recognition problem. For example: smihut, Hebrew calendar, agglutination and etc…

In order to address those issues, we used a maximum entropy probabilistic modeling technique. Our system constructs a statistic-probabilistic model that is able to evaluate the likelihood of every word to be in one of mentioned above categories.