Project Text Simplification: January 2011

Monday, January 31, 2011

Automatic Summarization for Text Simplification: Evaluating Text Understanding by Poor Readers

Authors: Paulo R. A. Margarido, Thiago A. S. Pardo, Gabriel M. Antonio, Vinícius B. Fuentes,

Rachel Aires, Sandra M. Aluísio, Renata P. M. Fortes

This paper presents experiments on text summarization and text simplification. The authors show that each simplification approach has different effects on readers of varied levels of literacy, but it also shows that all of them do improve text understanding at some level.

In this paper they claim to be the first one to effectively use summarization for TS and to evaluate its effectiveness for text understanding.

Since we all are already familiar with Text simplification basics, let us directly jump into some of the summarization methods:

Method based on Keyword Extraction: This is a simple technique. Given a text and set of keywords, any sentence that contains at least one keyword is selected to be in the summary. The keyword selection was done by looking for word patterns classified as <NOUN> or <NOUN+PREPOSITION+NOUN> or adjectives at any position in the text. Another technique was implemented based on the above where instead of considering any sentence that contains a keyword as part of the summary, all sentences are first ranked by the number of keywords they present and, then, the highest ranked ones are selected to form the summary.

Method Based on Gist Identification: GistSumm is one of the first summarizers created for Brazilian Portuguese and, to the best of our knowledge, it is the system with the highest precision for this language . For producing the summary, the system first computes the frequency of every stem in the text. Each sentence receives a score, which is the sum of the frequencies of every stem that belongs to it. Then, the sentence with the highest score is elected
the gist sentence. To decide the rest of the sentences that will form the summary, there are two restrictions: the sentences must have at least one stem in common with the gist sentence and their scores must be above a threshold, which is the mean score of all sentences.

Method Based on Machine Learning: This system uses several features to classify each sentence from the text according to its importance. Some of the features are sentence length and position, word frequency, presence of importance signaling phrases, and occurrence of proper nouns.

Methods Based on Graphs : Recently few authors presented a language-independent method based on Google PageRank algorithm .The method was called TextRank. It represents text sentences as nodes in a graph and adds edges by measuring the similarity among the sentences. This is basically computed by a word overlap measure. TextRank enriched with thesaurus synonym and antonym relations (to improve the word overlap measure) were evaluated for Portuguese and very good results were achieved.

EVALUATION: Three experiments were conducted to evaluate all previous methods and define which one yields the best results. This would provide them with the best summarization tool to be used for TS.

Summarization can be used for TS purposes in varied ways: showing only the summary for the reader, showing the text with only the main sentence highlighted, showing the text with all important sentences highlighted, etc. Experiments were conducted on people with varied literacy levels.

About people with until 5 years of study: 66% considered that the summary was easier to understanding; 100% considered that the original text and the text with the important sentences in bold were equally understandable; and 60% considered that the text with the main sentence in bold was more difficult to understand.

This number varied for people with until 8-10 years of study. In general, they realized that people from each literacy level consider different simplification strategies useful: simplification

could not help people with until 2 years of study, summaries helped people with until 5 years of study, the important sentences in bold helped people with until 8 years of study, and the main

sentence in bold helped people with more than 10 years of study.

Synonymy in Collocation Extraction

This paper is published by Darren Pearce et all, School of Cognitive and Computing Sciences (COGS) University of Sussex.
This paper describes about the collocation extraction using Wordnet, mainly concentrates about methods employed for extracting collocations. Before going through the details let us have a small lookout over what are collocations?

A collocation is two or more words that often go together. These combinations just sound "right" to native English speakers, who use them all the time.
Ex: fast food, quick meal, strong coffee etc.. its just co-incidence that I have taken examples of food! :P

Complicated PaPeR Definition: A pair of words is considered a collocation if one of the words significantly prefers a particular lexical realisation of the concept the other represents.

The definition of the exact nature of a collocation varies from one to another. It is variously defined as a habitual word combination.

Techniques:
Collocations may involve words in between for ex: I break down the door ; I broke down the door; I broke down the battered, old door. All contain Collocation (break-down, door) with varying number of words in between.

1.Church and Hanks (1990): This technique used mutual information to measure the strength of association between words. Potentially, this could be used directly for collocation extraction.

Problems:This leads to some strange collocations such as (doctor, hospital) Just because words occur together frequently does not mean they form a collocation.

2.Smadja (1993): To over come the above problem, semantics of the two words are determined by the graph of the distribution of counts between the two collocates. Here if there is a narrow, peaked spread then this is an indication that there's a syntactic relation between the two words.

Problems: This uses implicit syntax strategy for extracting syntactic information.

3.New Approach: This approach makes use of Wordnet database resource. With respect to a particular target word, it is possible to partition a synonym set into three disjoint subsets:

Those words which are collocations of the target word
Those words which tend not to be used with the target word although, if used, do not lead to unnatural readings;
Those words which must not be used with the target word since they will lead to unnatural readings.

This last subset has been named anti-collocations.

In the paper they have discussed examples which derives the classification of potential collocations into 4 categories: collocation, potential, unknown and wrong.

They have derived several formula to find out the collocation strength

Formulation:

Occurrence count, c(w1,w2) : This returns the number of occurrence of word w1 in combination with word w2.
co-occurrence set, CSw : For a given word w at least two elements in the synset have non-zero co-occurrence counts.
Candidate Collocation Synsets, CCSw:Synsets are filtered with respect to w to obtain CCSw.
For a synset, S belongs to CCSw, a word w1 is selected as the most frequently co-occurring element with the word w. Its corresponding frequency is f1.
The highest co-occurrence frequency, f11, of the remaining words is then calculated.
Collocation strength:It is the difference between the occurrence counts of these two top-ranked elements, f1-f11, that can be used to rate 'collocation strength', s, in the following way:s=(f1-f11)/f1.

Please refer paper for the formulas of above mentioned terms.

Conclusions and Future Work

This paper speaks about many techniques for collocation extraction, among which the new technique makes extensive use of Wordnet resource.This includes lexical relations such as hypernymy and meronymy.

Drawbacks:

In the formulation mentioned above it is assumed that for any sysnet there's one and only one element that forms a collocation with a particular target word. This is not the case as we discussed above about the anti-collocations.Such situations could also be accounted for by a probabilistic approach.

Learning When to Simplify Sentences for Natural Text Simplification

Hi All......

IEEE papers are difficult to understand for most of us ,in fact all of us,so in this post I elaborate on an IEEE paper which I read recently and tried to simplify it according to my understanding which I hope will help you guys and is fun to read and understand too!..

The paper title is published by Caroline Gasperin1, Lucia Specia1, Tiago F. Pereira1, Sandra M. Aluisio1

In Brazil there are two different kinds of people rudimentary and basic literacy level .This paper aims at producing text simplification tools for promoting digital inclusion and accessibility for people with such levels of literacy, and possibly other kinds of reading disabilities. More specifically, the goal is to help these readers to process documents available on the web. Additionally, it could help children learning to read texts of different genres or adults being alphabetized.

There are two kinds of simplification Natural(basic literacy level ) and strong simplification (rudimentary).
The difference between these two is the degree of application of simplification operations to the sentences

The focus in this paper is on natural simplifications.Based on observations made by annotator(analyst) the natural simplified text are produced when a sentence is simplified by splitting it. Here we focus on which sentences to split and how to split.

They say that none of the previous text simplification systems aims to provide varying degrees of simplification according to the user needs. Moreover,none of the existing systems addresses the language under consideration (Brazilian).

The corpus for simplification is taken from two Brazilian papers (Zero Hora and Folha de S˜ao Paulo).

A tool called Simplification Annotation Editor is used by annotator(analyst) for this manual simplification task.

They have used a separate eleven simplification rules to be applied to the original texts like(non-simplification,replacing collocations,subject-verb-object,changing to active voice etc).

When performing natural simplification, the order of simplification is not maintained and they can be used randomly whereas strong simplification is driven by explicit rules(when and how to apply rules) . The ultimate result should be simplified text .....:)

The sentence splitting operation, which is the focus in this paper, can be applied usually when a sentence contains apposition, relative clauses, coordinate or subordinate clauses, but it is not a mandatory operation for natural simplifications.

The parallel corpora of original and simplified texts:-

Zero Hora
Original(2116) Natural(3,104) Strong(3537)

Number of sentences in the original, natural and strong corpora

In the simplified version the overall text length is longer than in the original, which was expected, since simplification usually yields the repetition of information in different sentences, particularly when splitting operations are performed.

Natural simplification system:-
A binary classifier is trained with a large number of features in order to identify which sentences should be split to produce a natural simplified text.

Feature set :- From the analysis of our annotated corpora, we extract a number of features which aim to describe the characteristics of the sentences involved (or not) in splitting operations like number of words,characters,nouns,pronouns,verbs etc(29 are there!).

In order to improve performance of an classifier we divide into two types and selects all features that performed above the average accuracy in the first case and which caused a decrease in the classifier’s performance below the average accuracy in the second case. We added the best performing features to the basic set .

Classification :-Sentences are tagged as positive instances if they were annotated as containing a splitting operation; otherwise they are negative.
The features that were added to this baseline yielded a slight increase in the performance of the classifier.If best performing features are added to basic set it increases the performance of an classifier.

Simplification:- The binary classifier tells whether to split the sentence or not but the actual simplification, when recommended by the classifier, is performed by a rule-based system that implements simplification rules for all syntactic constructions that are considered complex.

Concluding remarks :-
They have presented a corpus-based system for natural text simplification, focusing on the
sentence splitting operation as the main point of distinction between this and the strong level of simplification.
This simplification framework, corpus-based classifier followed by rule-based simplifier, will be the core of a tool for online simplification of texts on the Web, aiming at people with low literacy levels.

Future work......
Instead of using a classifier to make a decision about the whole sentence (split vs. non split), they aim to have a classification step for each potential splitting point within the sentence. This
would allow them to simplify just specific points of a sentence.

Hope you guys understand the paper! and you can shoot your doubts and questions if any..

Hard work brings prosperity; playing around brings poverty.

Cya....

Saturday, January 29, 2011

My Saturday so far...

Normally, on a Saturday morning, the first thought that occurs, as soon as I manage to pull myself off my bed, is….. It’s an offffffffffffffff and I’m gonna finish this Sidney Sheldon novel ( which will be right next to my pillow, as I would have fallen asleep while reading it the previous night!) or watch so-and-so movie on TV or on my comp. But today, the first thought that came to my mind was… I JUST HAVE TO read up and actually set those thinking wheels on the roll!

Well, I did have a Sidney Sheldon to read (“Rage of Angels” for all the Sheldon enthusiasts or people who just wanna know the name of the book :P ). But I ignored the urge to open it and switched on my comp so that I could stick to my just-before-hitting-the-sack oath ( which I strictly adhere to by the way) : gather, learn, take in and put-to-use as much as you can read up about Python and NLTK and yes, also BLOG!!! :P

I’m sure the question will arise, “What were you doing all this while? “, “Project work started like 15 days back”,“What were you doing till now??” Honestly, I was just reading papers and articles, going through bits and pieces of Python programming and NLTK. And also gathering my thoughts about text simplification and complication. Today, I decided to go about it in a more structured and systematic fashion.

My progress so far you ask?

· I went through Python programming from the site : http://software-carpentry.org/ . It’s a very good site with video tutorials. Makes learning much more fun. And since audio and video is a great combo for the learning process (atleast for me), it did really help!

· I covered the first 150 pages of the book- Natural Language Processing with Python.

Well it’s not much for now, but it’s a start for the weekend. And I couldn’t resist the urge to blog right away, despite the fact that I haven’t really covered much for the day.! Oh my God, I forgot I skipped lunch! (I’m sure Bhuvan is beyond surprised that I actually missed a meal (:P) ) I’ll go grab a quick bite… Will be back with more updates!!

Thursday, January 27, 2011

Work on DSI field ;)

A week ago i heard this- "I wrote an good article, in all simple English- got rejected then rewrote got rejected, then after many reviews i wrote the same article in highly complicated,twisted English and i was awarded" - Dr. Sudarshan Iyengar
In a talk given by sir for our teachers who are PhD aspirants and researchers. Now you got to know why all the papers you find in IEEE website are Rocket Science..

We just answered a question that is so common with all our batch-mates busy in downloading the papers and telling what the crap is this!.. Making things complicated makes it a way for approving a paper :D (no offence), same policy holds good with our synopsis hehe :D :D

I spoke about people of high standards, what about we common people, whats there take on complicating English?? thats our todays task, work on DSI field;) go catch students, buy their little time and ask them to write few lines and make an analogy why do they complicate there write-ups! And i'm hoping loads of short write-ups from our DSI students!! ;) ;)

But what does this do with our project!??
Well before getting into text simplification we just want to know why do they complicate English!..

We are all ready to get to know why do they DO IT!??
Apoo and Ambika are finding it out with our teachers.
Myself and Anussaa are finding it out with students..

I'm all pumped up with my new cello technotip pen and a handbook ready in hand offering a short and sweet write-ups from DSI girls hehe :D :D :D

Field Work

In the morning when i was reading the phrase I was stuck in a sentence which read:

““The teacher descended upon the exams,sank his talons into their pages, ripped the answers to shreds, and then, perching in his chair, began to digest."

As an average reader i didn’t understand these lines.This seriously need our text simplification.Same-thing i felt when i was reading some IEEE papers on text simplification. The sentences in the papers are very difficult to understand.There were atleast 3 complicated words in each sentence. It took me five to ten minutes to read a sentence,find the meaning of the words and then understand the context of that sentence. I don’t know why people make such statements with complicated words. Why is this need for people to complicate English??

We in text simplification group are going to do some field work on this tomorrow. Which will be an exciting and useful task. An experiment on our beloved and highly qualified teachers. The text-simplification group has done lots and lots of literature survey on simplification of text but we are more interested in finding why people make English language so complicated that the average readers find difficulty in reading and understanding it.So we start this research task by experimenting on our teachers. We want to know practically what makes people to complicate text.This experiment is going to be very exciting and i am sure that we will get lots of input from our highly qualified teachers which will be an important factor to be taken into consideration.

In preparation for the event of "Field Work" for our Project Text Simplification assigned to us,I am very excited to speak to my teacher's and experiment on them.

Well coming onto the process of Field work, we are going to ask our teachers to write a simple text on any topic first.Then we will ask them to complicate the written text obviously our teacher's donot require any use of thesaurus or dictionary as we did:):). I am very sure if I would have experimented this on Sudarshan Sir,Sir would have made us to use thesaurus for each simple sentence:)

This episode will continue:) I’ll update the status of our field work tomorrow:)

Field Work and Fun....

Before we dive into the implementation of Text "Simplification", we need to understand the aesthetics of text "complication". We wanted to understand why writers complicate their English when their spoken English was so simple and easy to understand. For example, you can take a look at Akshatha's post which is quite complicated for a person whose foreign language is English. But you will not feel so when you speak with her! Why is that difference?

While we were debating on this topic in our lab, our classmate Neeraj Chettri walked into the lab. He is an avid reader mostly interested in Indian politics and history. We could not have found a better subject for our experiment. (Neeraj exclaimed- "what??! you are making me a guinea pig?" ) haha! We asked him to write a few lines on any topic he wished and if you know him well then you would know what he would choose ;) Anyways he managed to write a half page essay on George Bernard Shaw which was simple and lucid. Later, we asked to him complicate the same using the online thesaurus. And what he came out with was truly remarkable! I could not deduce the meaning of few words in his sentences. He actually managed to replace every word that he could with a synonym which he thought we would not know. His essay read something like this-

" George Bernard Shaw, the magnanimous writer of the bygone century aforesaid 'Only mountaineering, boxing and racing are sports; the others are mere derisions'. He, being a man of superlative intelligence and having humongous credentials did of course camouflage a deep disposition in the above adage."

Alright. People who understood this without the help of any dictionary or thesaurus, raise your hands! Hmmm... there would not be many. So we asked him why he chose those particular words and not other available synonyms? He promptly replied that the main reason was his showmanship and he was trying to sell his writing. If not for those words, people would not be very interested to read it. Though that was one of the reasons, he argued that he would give 70% of the emphasis on his idea and 30% on his written English. Maybe or may not be!

Let us find out.......

It would be wrong to conclude on something this early. It may vary from person to person. Thus we have planned to get our hands dirty by getting onto the field. Tomorrow is going to be a fun day at college! We are going to experiment this on people with varied backgrounds (Fyi: Bhuvan will be covering the MBA block and no!...he is not going there to check out the MBA girls ;) ). It would be interesting to see how people complicate English!

I am all excited to try this on our lecturers in the department (except Prasad Sir of course!). Is he following this blog by any chance? :P So that is our plan so far. Follow this post for more updates :)

Field work!!..:)

Hi All..

'Shoot for the moon. Even if you miss, you'll land among the stars' , This I must say is an inspirational quote for all our text simply mates:)..AIM HIGHER!!
Our project has started with great zeal!...
As I mentioned in the earlier post there has been lot of work done on literature survey and tools need for implementation of the project.

In this post I would elaborate on the same done this week..
We did a lot of literature survey and collected many research papers which would indeed help us to know the methods and ideas used by many scientists in this field

And coming to the actual field work , we are assigned tasks of finding the reason behind ' HOW DO PEOPLE COMPLICATE ENGLISH LANGUAGE AND WHY DO THEY DO IT'.

The above task has to be experimented on maximum number of people and find the appropriate solution for it.

When the above tasks was divided I could see a non stop SMILE from Bhuvan because he was more happy with responsibility given to him..(lucky Bhuvan....) reason being...
The task has been rightly divided:P with bhuvan managing the girls and I have to experiment the task on guys :) ....interesting isn't :P and Apoorva and Ambika are responsible for staff department.
The motives behind the task:-
1)why do people complicate English?
2)How do they do it?
3)Given a topic, the vocabulary used to speak is different from what they write,why?
4)What makes them to complicate English?

Well.... we are all set for the tasks to get started!!.........Hope to see a good result at the end:)..
Cya soon.....

Monday, January 24, 2011

How Does One go About Complicating English ?

I would be very interested to see some one of you (or all) blog about how people complicate English. How and when will a writer use thesaurus. You can try asking one of your friends to write a quick article for you with a thesaurus given to her :-) . You can then watch her use the thesaurus to write his extra polished paragraph. This would throw a flash light towards the problem. A thorough understanding how people complicate English would enable you with the ideas to reverse that process.

I am reminded of how people thought of building an flying vehicle by understanding the aerodynamics of a dragon fly.

Quoting from wikipedia:

"Insects are the only group of invertebrates known to have evolved flight. Insects possess some remarkable flight characteristics and abilities, still far superior to attempts by humans to replicate their capabilities. Even our understanding of the aerodynamics of flexible, flapping wings and how insects fly is imperfect. "

So, you may want to take a look at how people take a flight in English writing (or should I say take the reader for a ride with their extra-complicated vocabulary usage).

Would love to hear your thoughts on this. You can blog with the following points in mind:

1) How do people complicate the language.

2) How can you experiment on this using some human subjects.

Get your gears on for some field work folks!!!! :-))

Saturday, January 22, 2011

Akshatha's project experience so far...

Text simplification... Well, I realised and discovered the intricacies associated with these two words when I decided to do my final semester project with Mr Sudarshan Iyengar from IISc. On listening to the project idea and details, I realized this topic falls under something that interests me the most, the English language! It would definitely provide a rich learning experience and be extremely fun to work on!

Text simplification will aid in tackling one of the problems faced by many readers: failing to comprehend the text being read by them and losing interest in the article, blog or excerpt due to excess complexity of language. Genuine interest may turn into impatience! ( I say this from past experiences of attempting to read philosophical novels and autobiographies and giving up on completing the book because of the language complexity involved.)

My work on the project kick started when I heard the line “I was flabbergasted by your flamboyance”. That’s when I realized, I sure do need text simplification to help me understand that sentence! I joined my team members to write a python program that was a part of our task-to-be-done-soon exercise, something that triggers the interest to learn and ignites the spark to finish the assignment before the given deadline!!

From then on, it was a continuous learning process. Whether it was learning about Python, installing Python and various packages, understanding Natural Language Processing Toolkit, Python debugger and WordNet or learning and sharing the information that goes into understanding and implementing the project. Each of them was fun to do! The discussions help to provide the prerequisites for the project progress and raise issues and doubts that will go into achieving the aim of this project, i.e breaking down the complexity of the text by natural language processing to make it readable and understandable by the human reader.

More importantly, this project provides scope for a lot of creative inputs and ideas. Amalgamation of these ideas will help develop this project and help surpass the target that has been set for this project. We may actually overcome the semantic hurdles associated with the text and understand what the writer or author of the complex text had in mind while putting it forth and retain the ideology and essence of the original text and not just perform a direct or crude conversion of the text under consideration.

All in all, it has been quite an experience so far which includes the rule to treat other team members if you don’t make it on the time set by our guide, Mr. Sudarshan, for the day ( the rule is applicable to our guide too, so there is impartiality!) , tea and lunch breaks that extend to indefinite periods and the oh-my-God-I-actually-need-to-brush-up-my-English-vocabulary-and-grammar-skills feeling.! I’m sure I would have sharpened and improved my English vocabulary and verbal skills by the end of this project!

Jaweriya's story so far...

It feels great to write my first blog on my project Text Simplification.I would like to start with the word "Text Simplification" which itself simply says simplifying the complications.The day from which i started working on the project and giving the jist of a project to other's would start with an example as phrased by Sudarshan Sir-"I am flabbergasted by your flamboyance" which means "I am amazed by your behaviour".Most of the people gets exasperated when they dont get a continuous flow, when being struck with the complicated sentences.This may ultimately result in losing the interest.The solution to this problem is "Text Simplification".

Here we are going to provide a plugin that would present you with the simplified version.The first day of mine had started with an all well planned task assigned to each of us under Sudarshan Sir. I learnt that day the usage of pyhton which provides its own style of indentation. The different tools that I am acquainted till now with the help of my team members and that would require to proceed with the project are-

Python(The open source interpreted programming language)

NLTK(Natural language Processing Toolkit) which houses several packages like Wordnet which provides you with the corpus of complete English dictionary.

SVN(Subversion)

PDB(python Debugger)provides an interactive step by step execution of program.

The process of Text Simplification abstractly starts with scanning and parsing the given text,breaking it into sentence,breaking the sentence into words,identifying the complex words based on the frequency of usage,replacing the synonym of the keywords in a sentence(with the usage of wordnet package) and this replacement is unique in itself because we are finding the frequency of usage of this synonym in that sentence. All this has already driven me into a process of learning new things everyday which I am very zealous about.

I would like to tell prolific usage of text simplification that I would be working on is grading of the text to help the reader to know the proficiency level of the text.This would help the reader to know whether the graded document can be read by him or not.The other usage may be Text Summarization which may find its great applications on Twitter to derive the conclusions from tweets.

The project would help the people with disabilities who find difficulty in reading.It is great invention to me because I always feel zonked with the usage of dictionary and for all to whom who likes simplicity with no complications.

Its again the great time for me to be working on the project and to work with my team members.

Apoorva's story so far...

Less than 2 weeks ago, I was just another VTU student in my final year B.E who had got placed in an IT company and who wanted to be done with college and join work. A job in hand, a project in a company and I thought my life was set until I met my project mentor Sudarshan Iyengar from CSA dept IISc. He once discussed with me of the importance of the final year project in my career and the value of doing it in a research institute such as IISc. I found ample amount of time to consolidate and make the right decision. But to be honest, formulating a team to work on this project was not an easy task. But once the team was confirmed, there was no looking back.

We had our first project meeting at IISc in a restaurant called "Nesara" (don't miss to have 'Bonda Soup' there ;-)) where we were briefed about the project and the meetings henceforth. I have never felt more interested in a project or an idea before. Though the first two weeks passed by familiarizing with the tools required for the project, it has almost been 7 hours a day of work. Oh! no....not complaining. It is always nice to learn something new. Few days are very productive while few other days turn out to be discouraging without any progress. But it is all part of a project. We started off with learning the basics of Python programming language. With this comes a lot of packages and APIs which eases your job (as Sir calls it -"The magic wands") . One of the most indispensable tools which we learnt was WordNet which is a package available in NLTK. Bhuvan was assigned the task of learning the basics of WordNet and teaching us some of it. It was hilarious when Bhuvan was giving us some examples in class and it used to return ridiculous synonyms and we all used to burst out laughing. He is the most enthusiastic in the team(only boy too ;-)), so it is quite obvious as to who gets ragged daily in class. We are doing a lot of literature survey, learning a lot of English grammar in the process. Anusha was so keen on teaching all of us about Subversion that she caught hold of Bhuvan when she realized none of us were listening to her. This is a funny picture that I captured on that day. I remember Bhuvan shouting out to her- "aye! bitbide nannana please" :D

But Anusha did not let him go so easily. She managed to make him sit and listen to her entire explanation. ;-)

We usually meet at 7.30 in the morning at DSCE and start off the day with discussions and to-do list for the day. Since most of us Indians are very punctual (count me first!), we decided that one should treat the teammates for lunch, should he come late to class that day. And if I may recall an incident, we were treated with delicious lunch by our Sir the very next day ;-). Let me get a little technical and tell you what our project is all about.

Text Simplification is a self-explanatory phrase which means simplifying the text. The definition would change only with the target users. But the real question to ask would be: How do you simplify the text? Well, there has been remarkable development in this field over the last decade. Countries like UK,US and Japan have developed systems which simplifies text for people with cognitive disabilities and which performs text-to-speech translation. If you can notice something here, these are nothing but NLP tasks. There are very simple ways to do it, one of them being replacement of complicated words with simpler synonyms. But one must understand the complexities behind such a process. You may end up paraphrasing few sentences which will not fit into the flow of the text. This is widely known as the discourse problem. There has been a lot of research going on in this field and we hope to contribute for the same in our own creative way.

It feels good to know that you are contributing to the society in some way and doing your duty as an Engineer, which I am sure I wouldn't have gained much by buying a project from outside.

I not only get a chance to do a nice project here but it comes with some added benefits:

1. If we come out with a nice paper, we can publish it in an International Conference.

2. I would add value to my profile in case I decide to apply to any university for my higher education. (selfish motive!)

With such great tools like Internet and facilities provided to us students, I am confident that each one of us could develop a very nice project in the given time.

Finally, this post would be incomplete if I didn't mention my teammates who are immensely talented. With Ambika's intelligence and dedication, Bhuvan's enthusiasm and programming skills, Anusha's determination and attitude, I don't see any reason why our project wouldn't be an extraordinary success!

Cheers :-)

Bhuvan's story so far!..

"Bhuvaaaaaan edheelooooo" mum shouting at 9am that was on friday and my mob's vibrating, got a msg asking "dude are u interested in doing proj at IISc??" from apoo, i was like WoOw! definitely Yes!

Journey started!....

On the same day evening we met Sudarshan sir, my thoughts, "dude he looks so young has he really got Phd!?" hehe :D lesson don't get deceived by looks at IISc! :P Discussion was superb with nice cup of tea-

Idea of Text Simplification, first thought that was running in my mind "damn it i struggle so much reading these books, stopped reading novels because of such complicated phrases they get it into novels, y didn't i get this thought before!!??"

Woow sooper we are going to build some software that actually can simplify the text, It's going to be more of help to me and many such people who gets frustrated reading Dan browns huge novels!..

Well at the same time Sudarshan sir promised us he'll give tasks that are always achievable with our valid efforts.. That's one of the big reason why all of us have been able to gather knowledge of so many things in such a short span. First day he told we have to learn new language python, the first person i remembered then was Madhura, my classmate we always used to tease her "hebbav suthkondidhya" haha :D well now i'm going to learn that python super, well the language is awesome, no wonder y google has turned towards python! so many so many inbuilt functions, i'm in love with Snake!!..

And that was just a beginning i have seen nothing i was made to go through the the languages libraries, Natural lang toolkit, i just started loving more and more... How much they have improved the language with all the libraries, the corpus of information like wordnet, movie_reviews, genesis etc that they have collected is mind boggling. Going through wordnet was amazig and i used to be fully pumped up when it comes teaching it but first day was biggest flop show :P, those 3 didn't understand much of the concepts felt really guilty and made it a point i ll do better next time i was able to teach bigger crowed hehe :D then i taught 6 at a time was happy that everyone could understand and i was satisfied! :) :)

I think the best part of python is its ease of use.. And then sir was very particular about using ubuntu, i was like everything is there and can be made in windows till the date i installed this ubuntu, its like awesome, what u want to install?? just give a command and go and have cup of coffee ur software would be running on the machine :) :)

We are taken the task of converting the text into simplified version, one of the biggest challenge would be identifying the complexity, that can be achieved by seeing how frequently people use the word and replacing the word with most frequently used ones.. the challenge is retaining the meaning of the sentence even after the replacements.

I think text simplification will be really be a boon to the man kind, that would help so many people, so many people with there reading the text. Here we are trying to make that text simplification, with all our combined efforts under the guidance of an high voltage guide Dr.Sudarshan!

Anusha 's story so far......

Hi All,

The project 'Text simplification' which we are working is most interesting and has tremendous benefits to a varied group of people .

The word text simplification says it all,simplifying a complicated text which normal users find it difficult to read, into more simpler one so that they can read it without much effort . The reason for it to be interesting is not just to simplify the complex words by simple ones but also to retain the meaning of the sentence in that context.

Many research scientists have already published many papers for this topic ,which acts as a source of reference for our project . Our project take inputs from various existing systems and we are interested in implementing the already existing one with enhancing/improvising the same.

We started off with this project with a group of four and learning process till now has been absolutely interesting so far by exploring new methods to solve this problem .

I was able to learn python(programming language ,free of use) , NLTK(Natural Language Toolkit) which houses sample data,libraries,documentation and various other useful packages.

and also subversion which is a version control system for managing files and directories for this project

We were acquainted with python debugger and Wordnet(the most interesting and useful package) . These two packages are very useful for all python programmers.

The learning process with my friends has been absolutely fascinating and enthusiastic all the way.... we were assigned individual tasks that made us to explore more and work on that topic and the same were to be presented the next day..

Mutual idea sharing and discussion with the team made us tweak into the new topics suitably

I was never bored of any tasks which was given to us ...including the 'coffee and lunch breaks ':P

The day I started my project ..the word text simplification is all over my mind :P ,from then on whenever I read a newspaper/blog or wikipedia articles ,and if there is a need for me to look up thesaurus or dictionary ,I would just say to myself that this part of the text needs simplification and I am glad that I am working on one such project.:)

Each day there was something to learn and progress and that made me feel happy and contended.

The idea for text simplification is to break the passage into sentences and scan those sentences for complicated words and then replace those words with simpler ones .

The sentence with replaced word is rated based on the frequency of usage and the one with most frequently used is replaced to make it simple .

Ambika's story so far

A reader's task in the early stage of reading involves word identification and sentence processing with the goal of extracting meaning from basic component units of the text. If the text involves very complicated words then the reader may find it difficult and it may be time consuming to understand. There is no such facility available online for web readers to simplify the text automatically with just a plug-in.This project intends to develop such a plug-in.

A list of linguistic issues need to be addressed,including:resolution of pronouns and anaphoric references,assigning correct tense to the verbs that depend on the governing verbs or other elements,deciding the implicit subject of the verb in relative clauses,etc.

The package that will be developed in this project will help readers simplify the wikipedia articles in particular and other various articles present on the web. This package will prove to be a very unique contribution in the field of text simplification. Many techniques have evolved over the years for text simplification such as PSET,HAPPI,KURA-for users with language disabilities,SKILLSUM-for people without disabilities who have low literacy and ATA-for language teachers,children and adult secondary learners. But each of these have their own drawbacks. The project is intended to simplify text and enable the readers with different levels of vocabulary to understand the text easily.

In order to develop such a package we need

Python
Technical typesetting
NLTK(Natural Language Processing Toolkit)
WordNet
Web development API's
CGI

The learning task is divided among the team members.After learning each tool/package, the team member should teach the other members.

Started off with learning the basics of python which is a very easy programming language .

Learnt WordNet which is a package with large lexical database of

English.Nouns,verbs,adjectives and adverbs are grouped into sets of cognitive synonyms,each expressing a distinct concept.

Also learnt the PDB-Python Debugger. It implements an interactive debugging environment that lets you pause your program,look at the values of variables ,and watch step-by step execution of your program.Hence it facilitates and helps to undestand what the code actually does.

Learnt NLTK usage. I am bit familiar with it as of now.

Coming to my idea towards text simplification is that it would be very helpfull for the children and visually disabled people that if we replace the complicated words in the sentences with the appropriate pictures or figurs that best fit the context so that they can understand it in an efficient way.

Project Text Simplification