This paper has similar methods,reasons and features as that of the previous paper which I blogged and is quite interesting as well...
The author are:- Caroline Gasperin1 , Erick Maziero1 , Lucia Specia1 , Thiago Pardo1 , Sandra M. Aluisio1
Text simplification is a research area of Natural Language Processing, whose goal is to maximize the text understanding through simplification of grammatical structure.For this purpose the author proposes an architecture of two layers which will be explained later......
The simplification involves substitution of common words understood by people and syntactical structure of the sentences.
Our text simplification architecture will be the core of two online applications: a web browser plug-in for online simplification of texts on the Web, and an authoring tool to help writers to create simplified texts.
Corpus based simplification is done in this paper i.e the relevant simplification and necessary degree of simplification for a given task is learnt from the corpus.
There are two types of simplification natural and strong and this architecture handles both the types.




It is composed by two layers:
The first is a machine-learning system who learns from manually simplified texts when to apply simplification operations to a sentence so that the resulting simplified text is considered natural.
The second is a rule-based system that implements all simplification operations and executes them when recommended by the first layer.
For strong simplification, the text only needs to pass by the second layer.
For natural simplification, each sentence of the original text passes by the first
layer, natural simplification classifier, and if it decides that the sentence should be simplified, the sentence proceeds to the second layer, where the simplification actually occurs,otherwise it is left untouched.
The method proposed is to use a binary classifier for the first layer to decide on which sentences to split to obtain a natural simplified text. For the second layer simplification rules for the most complex syntactic constructs are used .
Simplification annotator Editor is used to help in the simplification and it has two modes lexical (proposes for changes of complex words) and syntactical(proposes for syntactical operation based syntactic clues provided by a parser).
The Simplification Annotation Editor follows a 3-step architecture. In the first
step, the original text is created (or simply opened from a file). In the second step, natural simplifications are produced and from these, strong simplifications are generated (step3).
I will explain the 2 layers in brief...
First layer( Natural simplification machine-learning system):
A binary classifier is trained with a large number of features to identify which sentences should be split in order to produce a natural simplified text. This process involves feature set involving features describing the characteristics of sentences involved (or not) in splitting operations.
Since the cue phrases and rhetorical relations are usually very sparse,A basic set is produced by adding n best performing features .
Feature set ---- Precision,----- recall
Petersen ----- 71.68----- 71.54
Basic-------- 72.48------ 72.34
All ----------72.5--------- 72.48
Basic+50-------73.50------- 73.42
As features are added to the basic set the precision goes on increasing........i,e there is an improvement in the performance of simplification.
Second Layer: Rule-based simplification system
This layer composes of operations to be simplified.Each sentence of the text is analyzed so that the linguistic phenomena(clauses,relative clauses,subordinate clauses,passive-active) are identified and the appropriate operations are called.
These operations can also be cascaded for better performance...
Future work:-
In order to assess if the sentences were correctly simplified, it is necessary to do a
manual evaluation. It is not possible to automatically compare the output of the rule-based simplifier with the annotated corpus because the sentences in the corpus have passed by operations that are not performed by the simplifier (such as lexical substitution). They are in the process of preparing this manual evaluation phase.
That comes to an end of 2nd paper ,hope you guys understood the same...:)
Cya...
Are the code and the corpus that were used for this work available somewhere?
ReplyDelete