Tuesday, February 1, 2011

Automatic Induction of Rules for Text Simplification

Authors: R. Chandrashekar, B. Srinivas

This paper aims at converting Complex Sentences into Simple Sentences. Before getting into details lets have a quick overview of what they are aiming at..
What is Complex Sentence?
Complex sentence is a sentence with one independent clause and at least one dependent clause.
ex :
Talwinder Singh, who masterminded the Kanishka crash in 1984, was killed in a fierce two hour encounter.

The simplified version of above sentence is as shown below

Talwinder Singh was killed in a fierce two hour encounter. Talwinder Singh mastermined the Kanisha crash in 1984.

This above simplification is done manually but the paper aims at automating process.

In this paper they have shown process of simplification in 2 stages
1. Analysis phase: This provides the structural description of the input.
2. Transformation Phase:This uses the representation of previous phase for the simplification
Analysis phase
In this paper they have explained 2 alternative methods to analyse text. 1. Finite State grammar approach, 2. Dependency based approach.
In finite state grammar approach articulation points are considered to be the places where we can split sentences for simplification.Segments of a sentence between two articulation points may
be extracted as simpli ed sentences. The nature of the segments delineated by the articulation points depends on the type of the structural analysis performed If the sentences are viewed as linear strings of words we could define articulation points to be say punctuation marks If the words in the input are also tagged with part of speech information we can split sentences based on the category.
On the other hand if the sentence is annotated with phrasal bracketing's the beginnings and ends of phrases could also be articulation points.

For example the sentence with a relative clause annotated with phrasal bracketing can be simplified into two sentences as shown below using a rule:

If a sentence starts with some segment W and a noun phrase (X:NP) and is then followed by a phrase of the form (, RelPron Y ,) followed by some ( Z )where Y and Z are arbitrary sequences of words then the sentence may be simplified into two sentences namely the sequence (W X) followed by (Z) and the sequence (X) followed by (Y).

Applying this to the example shown above:

[Talwinder Singh]: NP, who :RelPron masterminded :V [ the Kanishka crash]: NP, [was killed]: V [in [ a fierce two-hour encounter]: NP]: PP

this is of the form :
W X:NP, RelPron Y , Z -> W X:NP Z. X:NP Y .

Applying the rule we can get something like this:

Talwinder Singh was killed in a fierce two hour encounter. Talwinder Singh mastermined the Kanisha crash in 1984.
To be continued...

No comments:

Post a Comment