Monday, January 31, 2011

Automatic Summarization for Text Simplification: Evaluating Text Understanding by Poor Readers


Authors: Paulo R. A. Margarido, Thiago A. S. Pardo, Gabriel M. Antonio, Vinícius B. Fuentes,
Rachel Aires, Sandra M. Aluísio, Renata P. M. Fortes


This paper presents experiments on text summarization and text simplification. The authors show that each simplification approach has different effects on readers of varied levels of literacy, but it also shows that all of them do improve text understanding at some level.

In this paper they claim to be the first one to effectively use summarization for TS and to evaluate its effectiveness for text understanding.

Since we all are already familiar with Text simplification basics, let us directly jump into some of the summarization methods:


  1. Method based on Keyword Extraction: This is a simple technique. Given a text and set of keywords, any sentence that contains at least one keyword is selected to be in the summary. The keyword selection was done by looking for word patterns classified as <NOUN> or <NOUN+PREPOSITION+NOUN> or adjectives at any position in the text. Another technique was implemented based on the above where instead of considering any sentence that contains a keyword as part of the summary, all sentences are first ranked by the number of keywords they present and, then, the highest ranked ones are selected to form the summary.

      1. Method Based on Gist IdentificationGistSumm is one of the first summarizers created for Brazilian Portuguese and, to the best of our knowledge, it is the system with the highest precision for this language . For producing the summary, the system first computes the frequency of every stem in the text. Each sentence receives a score, which is the sum of the frequencies of every stem that belongs to it. Then, the sentence with the highest score is elected 
      the gist sentence. To decide the rest of the sentences that will form the summary, there are two restrictions: the sentences must have at least one stem in common with the gist sentence and their scores must be above a threshold, which is the mean score of all sentences.


    Method Based on Machine Learning: This system uses several features to classify each sentence from the text according to its importance. Some of the features are sentence length and position, word frequency, presence of importance signaling phrases, and occurrence of proper nouns.


    Methods Based on Graphs : Recently few authors presented a language-independent method based on Google PageRank algorithm .The method was called TextRank. It represents text sentences as nodes in a graph and adds edges by measuring the similarity among the sentences. This is basically computed by a word overlap measure. TextRank enriched with thesaurus synonym and antonym relations (to improve the word overlap measure) were evaluated for Portuguese and very good results were achieved.

    EVALUATION: Three experiments were conducted to evaluate all previous methods and define which one yields the best results. This would provide them with the best summarization tool to be used for TS.

    Summarization can be used for TS purposes in varied ways: showing only the summary for the reader, showing the text with only the main sentence highlighted, showing the text with all important sentences highlighted, etc. Experiments were conducted on people with varied literacy levels.
    About people with until 5 years of study: 66% considered that the summary was easier to understanding; 100% considered that the original text and the text with the important sentences in bold were equally understandable; and 60% considered that the text with the main sentence in bold was more difficult to understand.

    This number varied for people with until 8-10 years of study. In general, they realized that people from each literacy level consider different simplification strategies useful: simplification
    could not help people with until 2 years of study, summaries helped people with until 5 years of study, the important sentences in bold helped people with until 8 years of study, and the main
    sentence in bold helped people with more than 10 years of study. 

    1 comment: