Summarization Using Lexical Chains


Summarization proceeds in four steps:


Lexical chain construction Generic algorithm

Lexical cohesion is a surface property of text, and it is a frequent one. (Halliday, Hasan) This property makes it possible to compute lexical chains. We define lexical chains as groups of semantically related words. Recently several algorithm for chain computing were presented - Morris & Hirst (1995), Stairand (1996), Barzilay & Elhadad (1997). A general algorithm for computing a chain can be presented in the following way:

Algorithm parameters are the definition of candidate words, relatedness criterion and choice of appropriate chain for the new candidate word. By assingning different values to these parameters all existing chaining algorithms can be produced.


Our algorithm is generated by the following parameters setting:

We verified our parmeter setting by extensive empirical evaluation the quality of the lexical chainer by identifying how successful the lexical chainer is in disambiguating nouns in context. Evaluation methods measures to what extent our implementation of a lexical chaining algorithm satisfies this task. We compare different parameter setting for the generic algorithm using this measure.

Evaluation Data
The data that was used for evaluation consists of 37 scientific articles from the Semantic ConCordance corpus (SemCor). The words in SemCor are tagged with word senses from WordNet. The tagging was done manually.


Segmentation strategy

The issues to be verified:

Evaluation procedure:
Compare chaining the whole text without division, Hearst segmentation, paragraph division and random division.

Results:

Lessons learned:


Dynamic vs. greedy strategy

The issue to be verified:

Results:

Disambiguation measure shows a significant advantage of the dynamic strategy yields over the greedy one.


Candidate terms

We compare the influence of addition of noun compounds to candidate words instead of single nouns.

Results:

Lessons learned:
In contradiction to our assumtion, the addition of noun compounds does not improve disambiguation ratio.


Conclusions:

In our evaluation we verified our parameter setting as most appropriate:

Summary Evaluation

The goal of experiment is to compare similarity between lexical-chain based summarizer and human summaries. To study agreement of human subjects, 40 documents were selected; for each document, 10 summaries were constructed by 5 human subjects using sentence extraction. Each subject constructed 2 summaries of a document: one at 10% length and the other at 20%. For convenience, percent of length was computed in terms of number of sentences. We also chose 2 additional automatic summarizers Microsoft summarizer and discourse structure based summarizer and ran them on the 40 documents to generate 10% and 20% summaries for each document. A total of 16 summaries were produced for each document. The documents were selected from the TREC collection. They are news articles on computers, terrorism, hypnosis and nuclear treaties. The average length of the articles is 30 sentences. We measured agreement among human subjects using percent agreement, a metric defined by Gale92. The results show high agreement:

Length Avg. Agreement Max Min
10% 96% 100% 87%
20% 90% 100% 83%

Results of the experiment:
The ``ideal'' summary was constructed by taking the majority opinion of 5 human summaries at the same length. The results are shown in the table below.

Microsoft Lexical Chain Summarizer Discource Structure Summarizer
Prec Recall Prec Recall Prec Recall
10% 33 37 61 67 46 64
20% 32 39 47 64 36 55