Authors: Meru Brunn, Yllias Chali, Christopher J. Pinchak
Until now there has always been posts about "Text Simplification". So you must be wondering what "Text Summarization" has relevance with our project. I suggest you to read more and in the end you will be convinced about the relevance. So lets begin with the details.
What is Text Summarization?
Summarization is the process of condensing a source text into a shorter version by preserving its information content. It becomes very useful for a reader when he has no time to read the whole paper to understand whether it is important to him. Legal documents are usually very lengthy and includes jargons which makes it difficult for a reader to understand. Summarization tool can be of great help in such situations.
Introduction
Summarization is usually done by extracting important sentences from the source text and compiling them to generate coherent summaries. In this paper they provide an algorithm to identify important sentences by forming lexical chains.
The overall architecture of the system is shown in Figure 1. It consists of several modules organized as a pipeline.
Preprocessing
1. Segmentation: To start the summarization process, the original text is first sent to the text segmeter. The role of the text segmenter is to divide the given text into segments that address the same topic. This segmentation allows later modules to better analyze and generate a summary for a given source text.
2. Tagging: This module performs Part-of-speech tagging. The words are considered individually and the semantic structure is not considered.
3. Parsing: In this module, tagged words are collected and organized into their syntactic structure. We can select various components (or phrases) depending on their syntactic position within a sentence. For example, we could decide to select all noun phrases within a given text. Finding these noun phrases would be a trivial task using a parsed representation of the text. Since the parser and tagger are not entirely compatible with respect to input/output, the tagger output is refined such that it becomes compatible with the parser. For example, The parser expects that the tagged words will be of the form ’word TAG’. The tagger outputs tagged words with the form ’word_TAG’, and so the underscore is simply removed.
4. Noun filtering: Noun filtering improves the accuracy of text summarization by selectively removing nouns from the parsed text. These nouns come from the source text and are identified by the tagger. However, there are nouns that both contribute to and detract from the subject of the text.
Consider an analogy to analogue data transmission. During data transmission, there is both a signal component and a noise component. Data transmission conditions are ideal when there is a strong signal and low noise. It is when the signal is overcome by noise that it becomes difficult to detect. This is similar to the presence of nouns within the source text. Those nouns that contribute to the subject of the text are part of the ’signal’, and those that do not are part of the ’noise’. The noun filter’s job is to reduce the ’noise’ nouns while still retaining as many ’signal’ nouns as possible.
There are a number of different heuristics that could be used to filter out the ’noise’ nouns. They have designed a heuristic using the idea that nouns contained within subordinate clauses are less useful for topic detection than those contained within main clauses. However, these main and subordinate clauses are not easily defined. Hence for their system, they have selected a relatively simple heuristic. They chose to identify the first noun phrase and the noun
phrase included in the first verb phrase from the first sub-sentence of each sentence as the main clause, with other phrases being subordinate.
5. Lexical chainer:
The steps of the algorithm for lexical chain computation are as follows:
- We select the set of candidate words. A candidate word comes from an open class of words that function as a noun phrase or proper name as results of the noun filtering process.
- The senses of all the candidate words are considered, which are obtained from the thesaurus. In this experiment, we used WordNet thesaurus . At this step all senses of the word are considered, and each word sense is represented by distinct sets considered as levels. The first one constitutes the set of synonyms and antonyms, the second one constitutes the set of first hypernyms/hyponyms and their variations (i.e., meronyms/holonyms, etc.), and so on.
- They find the semantic relatedness among the set of senses according to its representations. If two sense representations of two distinct words matches,then they are said to be semantically related. Each semantic relationship is associated with a measure that indicates the length of the path taken in the matching with respect to the levels of the two compared sets.
- They build up chains that are sets such as
- We retain the longest chains by relying on the following preference criterion:
word repetition >> synonym/antonym . . .
In this implementation, this preference is handled by assigning scores to each pairwise semantical relation in the chain, and then summing those pairwise scores. Hence, the score of a chain is based on its length and on the type of relationships among its members.
In the lexical chaining method, each word-sense has to be semantically related to every other word-sense in the chain. The order of the open class words in the document does not play a role in the building of chains. However, it turned out that the number of lexical chains could be extremely large, and thus problematic, for larger segments of text. To cope with this, they reduced the word-sense representation to synonyms only when they had long text segments. Lexical chains are computed for each text segment.
6. Sentence Extraction: Each sentence is ranked with reference to the total number of lexical cohesion scores collected. The objective of such a ranking process is to assess the importance of each score and to combine all scores into a rank for each sentence. In performing this assessment, provisions are made for a threshold which specifies the minimal number of links required for sentences to be lexically cohesive. Ranking a sentence according to this procedure involves summing the lexical cohesion scores associated with the sentence which are above the threshold.
Each sentence is ranked by summing the number of shared chain members over the sentence. More precisely, the score for sentence(i) is the number of words that belong to sentence(i) and also to those chains that have been considered in the segment selection phase. The summary consists of the ranked list of top-scoring sentences, according to the desired compression ratio, and ordered in accordance with their appearance in the source text.
They participated in the single document DUC evaluation. The task consisted of, given a document, creating a generic summary of the document with a length of approximately 100 words. Thirty sets of approximately 10 documents each were provided as system input for this task. According to their analysis, the results seem promising. The grammaticality of their summaries scored an average of 3.73/4. Similarly, the cohesion and organization scores were, on average, of 2.55/4 and 2.66/4, respectively.
Conclusions and Future Work
This paper presents an efficient implementation of the lexical cohesion approach as the driving engine of
the summarization system. The ranking procedure, which handles the text ’aboutness’ measure, is used to select the most salient and best connected sentences in a text corresponding to the summary ratio requested by the user. In the future, they plan to investigate the following problems:
- Their methods extract whole sentences as single units. The use of compression techniques will increase the condensation of the summary and improve its quality.
- Their summarization method uses only lexical chains as representations of the source text. Other clues could be gathered from the text and considered when generating the summary.
- In the noun filtering process, their hypothesis eliminates the terms in subordinate clauses. Rather than eliminating them, it may also prove fruitful to investigate weighting terms according to the kind of clause in which they occur.
The concept of lexical chain is similar to our idea of constructing the Markov chain matrix. We can consider this matrix to be a source of sense disambiguation and a tool which will tell us the "about"ness of the text and help us in simplifying the text in the right manner.
P.S: I have used an equation in my post and I feel proud to say that I learnt it from one of our Sir's blog post. You can also refer to it by visiting the blog Academic Me! :-)





