Project Text Simplification: Motivations and Methods for Text Simplification

Helloo..

The authors for the above paper are R. Chandrasekhar, Christine Doran and B. Srinivas

As the title suggests the paper talks about the methods and reasons for Text Simplification.

They say that to simplify a sentence we need an idea of the structure of the sentence, to identify the components to be separated out.A parser could be used to get the complete structure of the sentence.since parser is prone to errors while parsing long and complex sentences ,they use two alternatives for a parser that is used for simplification .

The first approach uses a Finite State Grammar (FSG) to produce noun and verb groups while the second uses a Super tagging model to produce dependency linkages.

Now let us discuss the reasons for Text simplification :
1) If sentences are simple it is easy for both programs and users to process.
2) Simple sentences are easy to parse because they involve less ambiguity.
3) Simple sentences results in quality of machine translation.
4) Information retrieval is easy i.e only specific relevant sentences can be retrieved in response to the queries.
5)Simplification can be used to weed out irrelevant text with greater precision, and thus aid in summarization.
6)Clarity of text.

Simplification process is a two step procedure one is to obtain structure of the sentence and then apply simplification rules on the structure to identify the components that can be simplified.

In order to simplify one need to identify the articulation points i.e the points where the sentence can be logically split.Possible articulation points include the beginnings and ends of phrases, punctuation marks, subordinating and coordinating conjunctions, and relative pronouns.

These articulation points define a set of rules which can map original sentence pattern to simpler sentence pattern and is applied again and again until it is no more applicable.
ex:
Talwinder Singh, who masterminded the Kanishka crash in 1984, was killed in a fierce two hour encounter...
Talwindcr Singh was killed in a fierce two-hour encounter ... Talwinder Singh masterminded the Kanishka crash in 1984.

FSG based Simplification:

Here we consider sentences as word groups or chunks and consider the chunk boundaries as articulation points .
Chunking allows us to find out the syntax of the sentence and the structure of simplification rules at a coarser granularity, since we need no longer be concerned with the internal
structure of the chunks.

Each chunk is a word group consisting of a verb phrase or a noun phrase, with some attached
modifiers. The noun phrase recognizer also marks the number (singular/plural) of the phrase. The verb phrase recognizer provides some information on tense, voice and aspect.

The chunked sentences are then simplified using a set of ordered simplification rules.

An example rule that simplifies sentences with a relative pronoun

X:NP,Relpron Y,Z->XP Z . X:NP Y
The rule is interpreted as follows. If a sentence starts with a noun phrase (X:tiP), and is followed
by a phrase with a relative pronoun, of the form
( RelPron Y ,) followed by some (Z), where Y and Z are arbitrary sequences of words, then
the sentence may be simplified into two sentences, namely the sequence (X) followed by (Z), and (X) followed by (Y). The resulting sentences are then recursively simplified, to the extent possible.

A Dependency-based model:
This model is based on simple dependency representation provide by LTAG( Lexicalized Tree Adjoining Grammar) .

LTAG: These contain elementary tress called initial trees and auxiliary trees.
Initial trees include nouns,PP,simple sentences etc.
Auxiliary tress include relative clauses ,adverbials etc.

Supertagging: LTAG tells us that only dependency elements be present in the same tree because the LTAG localizes dependency elements.
As a result of this localization, a lexical item may be associated with more than one eLementary
tree, We call these elementary trees super tags.
We use trigrams to disambiguate the super tags as to assign one super tag for each word in a process called super tagging.

EVALUATION:
To establish the dependency links among the words of the sentence, we exploit the dependency
information present in the super tags. Each supertag associated with a word allocates slots for
the arguments to the word. These slots have a
polarity value reflecting their orientation with respect to the anchor of the supertag. Also associated with a supertag is a list of internal nodes
that, appear in the supertag.Using this information, a simple algorithm
may be used to annotate the sentence with dependency links.

The objective of the evaluation is to examine the advantages of the DSM over the FSG-based model for simplification. In the FSG approach since the input to the simplifier is a set of noun and verb groups, the rules for the simplifier have to identify basic predicate argument relations to ensure that the right chunks remain together in the output. The simplifier in the DSM has access to information about argument structure, which makes it much easier to specify simplification patterns involving complete constituents.

Project Text Simplification

Tuesday, March 1, 2011

Motivations and Methods for Text Simplification

1 comment:

Followers

Blog Archive

Contributors