Summarization Using Lexical Chains
Summarization proceeds in four steps:
- The original text is segmented (using the M. Hearst algorithm);
- Lexical chains are constructed;
- Strong chains are identified;
- Significant sentences are extracted from the text.
Lexical chain construction Generic algorithm
Lexical cohesion is a surface property of text, and it is a frequent one.
(Halliday, Hasan) This property makes it possible to compute lexical chains.
We define lexical chains as groups of semantically related words. Recently
several algorithm for chain computing were presented - Morris & Hirst
(1995), Stairand (1996), Barzilay & Elhadad (1997). A general algorithm
for computing a chain can be presented in the following way:
- Select a set of candidate words from the text;
- For each of the candidate words, find an appropriate chain to receive a
new candidate word, relying on a relatedness criterion among members of
the chains and the candidate words;
- If such a receiving chain is found, insert the candidate word in this
chain and update it accordingly; else create a new chain.
Algorithm parameters are the definition of candidate words,
relatedness criterion and choice of appropriate chain for the new
candidate word. By assingning different values to these parameters all
existing chaining algorithms can be produced.
Our algorithm is generated by the following parameters setting:
- Choice of appropriate chain for the new candidate word - dynamic
strategy.
- Candidate terms - noun compounds.
- Building chains in the segments and merging resulting chains in the
final step.
We verified our parmeter setting by extensive empirical evaluation the quality
of the lexical chainer by identifying how successful the lexical chainer is in
disambiguating nouns in context. Evaluation methods measures to what extent
our implementation of a lexical chaining algorithm satisfies this task. We
compare different parameter setting for the generic algorithm using this
measure.
Evaluation Data
The data that was used for evaluation consists of 37 scientific articles from
the Semantic ConCordance corpus (SemCor). The words in SemCor are tagged with
word senses from WordNet. The tagging was done manually.
Segmentation strategy
The issues to be verified:
- Does the division of the text into pieces improve chaining?
- Is Hearst's segmentation better than other possible ones?
Evaluation procedure:
Compare chaining the whole text without division, Hearst segmentation,
paragraph division and random division.
Results:
Lessons learned:
- Using divided text is much better than using the whole text strategy -
an improvement of 15%.
- Small difference between different division strategies - Hearst's
segmentation is slightly better.
Dynamic vs. greedy strategy
The issue to be verified:
- Does dynamic stratedy improve chaining?
Results:
Disambiguation measure shows a significant advantage of the dynamic strategy
yields over the greedy one.
Candidate terms
We compare the influence of addition of noun compounds to candidate words
instead of single nouns.
Results:
Lessons learned:
In contradiction to our assumtion, the addition of noun compounds does not
improve disambiguation ratio.
Conclusions:
In our evaluation we verified our parameter setting as most appropriate:
- Dymanic strategy.
- Segmentation using paragraphs.
- Single nouns as candidate terms.
Summary Evaluation
The goal of experiment is to compare similarity between lexical-chain based
summarizer and human summaries. To study agreement of human subjects, 40
documents were selected; for each document, 10 summaries were constructed by 5
human subjects using sentence extraction. Each subject constructed 2 summaries
of a document: one at 10% length and the other at 20%. For convenience,
percent of length was computed in terms of number of sentences.
We also chose 2 additional automatic summarizers Microsoft summarizer and
discourse structure based summarizer and ran them on the 40 documents to
generate 10% and 20% summaries for each document.
A total of 16 summaries were produced for each document. The documents were
selected from the TREC collection. They are news articles on computers,
terrorism, hypnosis and nuclear treaties. The average length of the articles
is 30 sentences.
We measured agreement among human subjects using percent agreement, a metric
defined by Gale92. The results show high agreement:
| Length | Avg. Agreement | Max | Min
|
| 10% | 96% | 100% | 87%
|
| 20% | 90% | 100% | 83%
|
Results of the experiment:
The ``ideal'' summary was constructed by taking the majority opinion of 5 human
summaries at the same length. The results are shown in the table below.
|
| Microsoft
| Lexical Chain Summarizer
| Discource Structure Summarizer
|
| | Prec | Recall | Prec | Recall | Prec | Recall
|
| 10% | 33 | 37 | 61 | 67 | 46 | 64
|
| 20% | 32 | 39 | 47 | 64 | 36 | 55
|