Hi All......
IEEE papers are difficult to understand for most of us ,in fact all of us,so in this post I elaborate on an IEEE paper which I read recently and tried to simplify it according to my understanding which I hope will help you guys and is fun to read and understand too!..
The paper title is published by Caroline Gasperin1, Lucia Specia1, Tiago F. Pereira1, Sandra M. Aluisio1
In Brazil there are two different kinds of people rudimentary and basic literacy level .
This paper aims at producing text simplification tools for promoting digital inclusion and accessibility for people with such levels of literacy, and possibly other kinds of reading disabilities. More specifically, the goal is to help these readers to process documents available on the web. Additionally, it could help children learning to read texts of different genres or adults being alphabetized.There are two kinds of simplification Natural(basic literacy level ) and strong simplification (rudimentary).
The difference between these two is the degree of application of simplification operations to the sentences
The focus in this paper is on
natural simplifications.Based on observations made by annotator(analyst) the natural simplified text are produced when a sentence is simplified by splitting it. Here we focus on which sentences to split and how to split.
They say that none of the previous text simplification systems aims to provide varying degrees of simplification according to the user needs. Moreover,none of the existing systems addresses the language under consideration (Brazilian).The corpus for simplification is taken from two Brazilian papers
(Zero Hora and Folha de S˜ao Paulo).A tool called
Simplification Annotation Editor is used by annotator(analyst) for this manual simplification task.
They have used a separate eleven simplification rules to be applied to the original texts like(non-simplification,replacing collocations,subject-verb-object,changing to active voice etc).
When performing natural simplification, the order of simplification is not maintained and they can be used randomly whereas strong simplification is driven by explicit rules(when and how to apply rules) . The ultimate result should be simplified text .....:)
The sentence splitting operation, which is the focus in this paper, can be applied usually when a sentence contains
apposition, relative clauses, coordinate or subordinate clauses, but it is not a mandatory operation for natural simplifications.
The parallel corpora of original and simplified texts:- Zero Hora
Original(2116) Natural(3,104) Strong(3537)
Number of sentences in the original, natural and strong corporaIn the simplified version the overall text length is longer than in the original, which was expected, since simplification usually yields the repetition of information in different sentences, particularly when splitting operations are performed.
Natural simplification system:-A
binary classifier is trained with a large number of features in order to identify which sentences should be split to produce a natural simplified text.
Feature set :- From the analysis of our annotated corpora, we extract a number of features which aim to describe the characteristics of the sentences involved (or not) in splitting operations like number of words,characters,nouns,pronouns,verbs etc(29 are there!).
In order to improve performance of an classifier we divide into two types and selects all features that performed above the average accuracy in the first case and which caused a decrease in the classifier’s performance below the average accuracy in the second case. We added the best performing features to the basic set .
Classification :-Sentences are tagged as positive instances if they were annotated as containing a splitting operation; otherwise they are negative.
The features that were added to this baseline yielded a slight increase in the performance of the classifier.If best performing features are added to basic set it increases the performance of an classifier.
Simplification:- The binary classifier tells whether to split the sentence or not but the actual simplification, when recommended by the classifier, is performed by a rule-based system that implements simplification rules for all syntactic constructions that are considered complex.
Concluding remarks :-They have presented a
corpus-based system for natural text simplification, focusing on the
sentence splitting operation as the main point of distinction between this and the strong level of simplification.
This simplification framework, corpus-based classifier followed by rule-based simplifier, will be the core of a tool for online simplification of texts on the Web, aiming at people with low literacy levels.
Future work...... Instead of using a classifier to make a decision about the whole sentence (split vs. non split), they aim to have a
classification step for each potential splitting point within the sentence. This
would allow them to simplify just specific points of a sentence.
Hope you guys understand the paper! and you can shoot your doubts and questions if any..
Hard work brings prosperity; playing around brings poverty.
Cya....