|
About NER
HebrewNer is a
named entity recognition package for Hebrew. Named Entity recognition is a form
of information extraction in which we seek to classify every word in a document
as being a person-name, organization, location, date, time monetary value,
percentage, or “none of the above”. The significance of task is mostly due to
the marketing potential of the exact Named Entity recognition system. Such
system could contribute a lot in many areas, like: development of more exact
search engine, machine translation, general organization and indexation of the
documents and books. Named Entity recognition is a foundation for work on more
complex information extraction tasks. In European
language the problem, at least on the surface, doesn’t seem to be very
complicated - since most of the named entities start with capital letter.
However, in Hebrew we don’t use capital letter that complicates a problem a
lot. Additional difficulties are caused by the large ambiguity of the Hebrew
language. Many other qualities, unique to Hebrew language and culture, might
influence the Named Entity recognition problem. For example: smihut, Hebrew
calendar, agglutination and etc… In order to
address those issues, we used a maximum entropy probabilistic modeling
technique. Our system constructs a statistic-probabilistic model that is able
to evaluate the likelihood of every word to be in one of mentioned above
categories. |