Project Text Simplification: Complex Lexico-Syntactic Reformation of Sentences using Typed Dependency Representations

Author: Advaith Siddhartha Department of Computing Science,University of Aberdeen

The reasons for why the most of the authors want to choose one formulation over the other is for ,avoiding shifts in focus and issues of salience and end weight and also to account for differences in reading skills and domain knowledge. This paper is all about an approach to automate complex reformulation. Reformulation of complex sentences is for better understanding by the person with the low literacy level.

Let us consider the following four discourse makers for causation studied by the author. These differ in the lexico syntactic properties of discourse marker such as cause,because of,because,cause of.

Example(1) a.An incendiary device caused the explosion [A-CAUSE-B](here A implies an incendiary device caused and B implies the explosion)
b.The explosion occurred because of an incendiary device[B-BECAUSE OF-A]

c. The explosion occurred because of incendiary device[B-BECAUSE-A].
d.The cause of the explosion was an incendiary device[CAUSE OF-B-A].

The discourse makers can be verbs,prepositions,conjunctions and nouns.Additionally the order of presentation ca also be varied to the following four more forms.

(1) e. The explosion was caused by an incendiary

device. [B-CAUSEBY-A]

f. Because of an incendiary device, the explosion occurred. [BECAUSEOF-A-B]

g. Because there was an incendiary device, the

explosion occurred. [BECAUSE-A-B]

h. An incendiary device was the cause of the explosion. [A-CAUSEOF-B]

From the above example it is clear that some formulations of a given content can be more felicitous than others. i.e The explosion was caused by an incendiary device(1e) is more preferable to Because there was an incendiary device, the explosion occurred(1g).

Related work on text reformulation:

1.Discourse Connectives and Comprehension

This work involved the manual reformulation of the complex sentences. The sentences were manually rewritten to make language more accessible or to make the content more transparent.

Drawback: The manual reformulation was dependent on the way a person sees the text.

For example (2)

a. Because Mexico allowed slavery, many Americans and their slaves moved to Mexico during

that time.

b. Many Americans and their slaves moved to Mexico during that time, because Mexico allowed slavery.

Thus the (b) version of the above example would be preferred for children who can grasp causuation ,but who have not yet become comfortable with alternative clause orders.

2.Connectives and Text (Re)Generation

Much of the work regarding (re)generation of text based on discourse connectives aims to simplify

text in certain ways, to make it more accessible to particular classes of readers.The PSET technique about which I have already blogged,considered simplifying news report for aphasic readers. That paper mainly focused on lexical simplification by replacing difficult words with the simpler one.The syntactic simplification in PSET was restricted to string substitution and sentence splitting based on pattern matching over chunked text.The technique in this paper aims to extend these strands of research by allowing more sophisticated insertion,deletion and substitution reorganization and modification of of content within a sentence.

Drawback:However ,to date, these systems do not consider syntactic reformulations of the type we are interested in.

3.Sentence Compression:

Sentence compression is a research area that aims to shorten sentences for the purpose of summarising the main content.The approach to sentence compression focus on deletion operations,mostly performed low down in the parse tree to remove modifiers.

Drawback:However ,given their focus on sentence compression ,they restricted themselves to local transformations near the bottom of the parse tree.

Regeneration using Transfer Rules

In this section,let us first describe our data, and then report our experience with performing text reformulation using these representations.

DATA:

We use a corpus which contains examples of complex lexico syntactic reformulations such as those in the example one(the above first example).The corpus contains 144 such examples.

1.Reformulation using Phrasal Parse Trees:

The following parse tree shows the active and passive voice with "cause" as verb.A transfer rule is derived by aligning nodes between two parse trees so that the rule only contains the differences in structure between the trees.

passive voice:The explosion was caused by an incendiary device.

(NP (AT The) (NN1 explosion))

(VP (VBDZ be+ed)

(VP (VVN cause+ed)

(PP (II by)

(NP (AT1 an) (JJ incendiary) (NN1 device))))))

Active voice:An incendiary device caused the explosion.

(NP (AT1 An) (JJ incendiary) (NN1 device))

(VP (VVD cause+ed)

(NP (AT the) (NN1 explosion))))

Derived Rule:

(??X0[NP])

(VP (VBZ be+s)

(VP(VVN cause+ed) (PP(II by+) (??X1[NP])))))

↓

(??X1[NP])

(VP (VVZ cause+s) (??X0[NP])))

In the representation derived rule the variable X0[NP] maps onto any node (sub tree) with the label NP.In this example "explosion" is labelled with NP.

Drawback:In practice however , the parse tree representation is too dependent on the grammar rules employed by the parser.

2.Reformulation using MRS(Minimal Recursion Semantics):

This representation provides another option to use a bi-directional grammar and perform the transforms at a semantic level.

Consider a very short example for ease of illustration:

Tom ate because of his hunger.

The MRS representation of the above sentence is shown below

named(x5,Tom), _eat_v_1(e2,x5),_because_of(e2,x11), poss(x11,x16),pron(x16), _hunger_n(x11)

This technique treats because of as a multi word expression and assigns it a comparable to a prepositions.The possible rule is as follows

_because_of(e,x), P(e,y) <-> _cause_v_1(e10,x,y,l1), l1:P(e,y)

Here 'P' is to be understood as a general predicate.After applying the rule the example turns as follows

His hunger caused Tom to eat.

Drawback:The problem encountered ,however is that bidirectional grammars fail to parse ill-formed input and will also fail to analyse some well-formed input because of limitations in coverage of unusual constructions.

Reformulation using Typed Dependencies

Let us consider the following example

The explosion was caused by an incendiary device.

The set of dependencies represent a tree. while phrase structure trees represent the nesting of constituents with the actual words at the leaf nodes,dependency trees have words at every node:

To generate from a dependency tree,we need to know the order in which to process nodes -in general tree traversal will be “inorder”; i.e, left sub trees will be processed before the root and right sub trees after. These are generation decisions that would usually be guided by the type of dependency and statistical preferences for word and phrase order. However, we can simply use the word positions (1–8) from the original sentence.

The first transformation is that one list of predicates is replaced by another. Applying this transformation creates a new dependency tree:

Thus our transformation rules, in addition to Deletion and Insertion operations, also need to provide rules for tree traversal order. These only need to be provided for nodes where the transform has reordered sub trees

(“??X0”, which instantiates to “cause+ed:4” in the trees). Our rule would thus include:

3. Traversal Order Specifications:

(a) Node ??X0: [??X2, ??X0, ??X3]

This states that for node ??X0, the traversal order should be subtree ??X2 followed by current

node ??X0 followed by subtree ??X3. Using this specification would allow us to traverse the tree

using the original word order for nodes with no order specification, and the specified order where

a specification exist. In the above instance, this would lead us to generate:

An incendiary device caused the explosion.

Project Text Simplification

Wednesday, March 2, 2011

Complex Lexico-Syntactic Reformation of Sentences using Typed Dependency Representations

2 comments:

Followers

Blog Archive

Contributors