Project Text Simplification: Practical Simplification of English Newspaper Text

Authors: John Caroll,Guido Minnen,Yavnne Canning,Sioban Devlin and John Tait Computing and Information systems, University of Suderland.

Sometimes people may find it very difficult to understand the newspaper text. This is a disability named Aphasia which is a result of head injury of stroke. Aphasia is a huge problem worldwide: theNational Aphasia Association reports that one million Americans have this disability. This paper introduces a system to assist the people with such a disability there by automatically simplifying the news paper text available on the internet.

The architecture of the system is as shown:

The system is mainly divided into two parts. One is Analyser and the other is simplifier.The analyser in-turn is sub-divided into three components-the lexical tagger, morphological analyser and parser. The simplifier consists of two components namely lexical simplifier and the syntactic simplifier.

Working of the system:

The Analyzer:

The news paper text is first fed into the analyser which consist of lexical tagger ,morphological analyser and the parser. The lexical tagger ,tags the parts of speech and the other punctuations which poses difficulty in understanding in the sentences. Next the morphological analyser , takes the output from the lexical tagger and makes improvements to the original there by introducing some changes such as language translation,word definition, and thesaurus capability. Next the parser, . Next the parser constructs a representation of the words which were tagged and analysed by the morphological analyser.

The Smplifier:

The lexical simplifier which is the important and main component of the system where the actual simplification is carried out.Here in this component, for each complicated word a file is created containing the synonyms for that entry. The simplifier reads the file and depending on the level of simplification specified by the user, it extracts some synonyms and interrogates Oxford Psycholinguistic Database.

Oxford Psycholinguistic Database is being used to provide frequency counts for the input words and their synonyms extracted . The most suitable word with the highest frequency count becomes the replacement word in the system's simplified output. The frequency of occurance of each of the words is retrieved from the psycholinguistic database and the word with the highest frequency is chosen to replace the original word in the output text. If the original word has the highest frequency, then that remains in the output text.The most appropriate word is selected and written to an output file so reconstituting the text.

Since it refrains from an elaborate analysis of the meaning of the text, lexical simplification could possibly change its meaning. However, in practice this will not turn out to be a problem given the observation that less frequent words-which are thus candidates for the replacement-often have a very specific meaning, that is are less likely to be ambiguous.

Project Text Simplification

Tuesday, February 1, 2011

Practical Simplification of English Newspaper Text

No comments:

Post a Comment

Followers

Blog Archive

Contributors