Sunday, May 22, 2011
Our penultimate show...
Thursday, May 19, 2011
Kickass Ending! Part-1
Wednesday, May 4, 2011
Points To Be Taken Care While Preparing Project Report
Any project is incomplete without a proper report. Now it's time to prepare report for our project. Today we got the format for writing the report. Before we start writing the report we should keep the following points in mind and come up with a good report.The points to be taken care while writing the report are
No sentence in the report must be copy pasted.
Sentences in the report must be as small as possible.
Paragraph should not go beyond 10 lines.
Include as many diagrams as possible(at least 20 diagrams must be there in the report)
Maintain the same tense throughout the report.
Literature survey should include all the papers which are there in the blog and also the papers in the http://www.citeulike.org site.
Introduction part should include
1. Why is this problem important.
2. Entire report must be summarized in the introduction.
3. It should include definitions of necessary terms such as co-occurrance .
4. It should also include brief introduction of packages such as NLTK, wordnet,pygtk etc.
Conclusion part should include the work we have done(in brief) and it's future enhancements.
Keeping all these points in mind, friends lets put our best effort and come up with a great report :):)
Tuesday, April 12, 2011
20% progress!
- never go with techi terms while you are presenting.
- Never bring out the term markov matrix or matrix in that case!
- Be confident and go for it.
Review story.....
SAD MONDAY!..:(:(
Well Monday being the first day of the week is supposed to be best day,energetic,fully pumped etc etc but sadly for our text simplify group this turns out to be the most disappointing day all the time :(
Every time we plan out of ideas on how to show case our project i.e prepare for presentations before entering the board ,we would be all pumped up and fully confident on how to speak but when we start our presentation ,the entire story line changes ,disappointing!...:(
The same repeated this Monday too, We had our project review and we were supposed to explain what work has been so far and the status of the project so we reached college at 7.30 to discuss how to present our work done so far in a nice filmy way ( I meant to give a great show :P). We listed out few key points on what to speak like
1)What is Text Simplification?
2)Why text simplification is needed?
3)Our idea of implementing Text simplification etc etc
We were all set to face HOD to give a grand show as decided .Messages started flooding in inbox saying that we are supposed to be at 10.00 in board room for the review.We were in HODs cabin at 10.30 when our review was at 10.00 thinking that our turn would take time since ours was 3rd batch but anyways after getting few scoldings from Ramakrishna sir we got our chance to enter the board room.
Now starts the main play...........
Hod as soon as he saw started scolding for mass bunk ,Indiscipline etc etc.
Then all of sudden he pointed out Ambika's name and told her to explain what role she played in the project and what is her contribution to the project.I was quite shocked when he started off like that but Ambika somehow completed her explanation ,Hod was not convinced though .Then it was Bhuvan's turn . He has almost everything in his mind that he goes blank to explain even a single line of what he has done. He was angry and we could see his frustration by looking at the way how he was scribbling on the paper when Hod told him explain the Co ocurrence concept :P .Then it was Apoorva's turn followed by mine .I had only few contributions to explain even for that which HOD was not convinced.:(
All were looking at each others face perplexed when all this happened and the most interesting part is our guide was there sitting in front of us watching all this.He had explained very well beforehand on how to face project reviews and how to speak confidently but sadly all this drama happened in front of him:(
After we came out We discussed few points with our guide on what work is left out and how to face reviews in future.
We assigned tasks amongst ourselves and started with the same and we are waiting for this Saturday as this our deadline for completion of project and show it to HOD that our project indeed helps others in TEXT SIMPLIFICATION .
Party waiting on this Saturday for two reasons, one is project completion and one more Bhuvan's treat ..:P
hoping to post very soon ..
Cya GN!.........
Monday, April 11, 2011
Text Simplification --------Part 2
Sunday, March 6, 2011
Text Summarization Using Lexical Chains
- We select the set of candidate words. A candidate word comes from an open class of words that function as a noun phrase or proper name as results of the noun filtering process.
- The senses of all the candidate words are considered, which are obtained from the thesaurus. In this experiment, we used WordNet thesaurus . At this step all senses of the word are considered, and each word sense is represented by distinct sets considered as levels. The first one constitutes the set of synonyms and antonyms, the second one constitutes the set of first hypernyms/hyponyms and their variations (i.e., meronyms/holonyms, etc.), and so on.
- They find the semantic relatedness among the set of senses according to its representations. If two sense representations of two distinct words matches,then they are said to be semantically related. Each semantic relationship is associated with a measure that indicates the length of the path taken in the matching with respect to the levels of the two compared sets.
- They build up chains that are sets such as
- We retain the longest chains by relying on the following preference criterion:
- Their methods extract whole sentences as single units. The use of compression techniques will increase the condensation of the summary and improve its quality.
- Their summarization method uses only lexical chains as representations of the source text. Other clues could be gathered from the text and considered when generating the summary.
- In the noun filtering process, their hypothesis eliminates the terms in subordinate clauses. Rather than eliminating them, it may also prove fruitful to investigate weighting terms according to the kind of clause in which they occur.
Wednesday, March 2, 2011
Complex Lexico-Syntactic Reformation of Sentences using Typed Dependency Representations
Author: Advaith Siddhartha Department of Computing Science,University of Aberdeen
The reasons for why the most of the authors want to choose one formulation over the other is for ,avoiding shifts in focus and issues of salience and end weight and also to account for differences in reading skills and domain knowledge. This paper is all about an approach to automate complex reformulation. Reformulation of complex sentences is for better understanding by the person with the low literacy level.
b.The explosion occurred because of an incendiary device[B-BECAUSE OF-A]
d.The cause of the explosion was an incendiary device[CAUSE OF-B-A].
Tuesday, March 1, 2011
Motivations and Methods for Text Simplification
The authors for the above paper are R. Chandrasekhar, Christine Doran and B. Srinivas
As the title suggests the paper talks about the methods and reasons for Text Simplification.
They say that to simplify a sentence we need an idea of the structure of the sentence, to identify the components to be separated out.A parser could be used to get the complete structure of the sentence.since parser is prone to errors while parsing long and complex sentences ,they use two alternatives for a parser that is used for simplification .
The first approach uses a Finite State Grammar (FSG) to produce noun and verb groups while the second uses a Super tagging model to produce dependency linkages.
Now let us discuss the reasons for Text simplification :
1) If sentences are simple it is easy for both programs and users to process.
2) Simple sentences are easy to parse because they involve less ambiguity.
3) Simple sentences results in quality of machine translation.
4) Information retrieval is easy i.e only specific relevant sentences can be retrieved in response to the queries.
5)Simplification can be used to weed out irrelevant text with greater precision, and thus aid in summarization.
6)Clarity of text.
Simplification process is a two step procedure one is to obtain structure of the sentence and then apply simplification rules on the structure to identify the components that can be simplified.
In order to simplify one need to identify the articulation points i.e the points where the sentence can be logically split.Possible articulation points include the beginnings and ends of phrases, punctuation marks, subordinating and coordinating conjunctions, and relative pronouns.
These articulation points define a set of rules which can map original sentence pattern to simpler sentence pattern and is applied again and again until it is no more applicable.
ex:
Talwinder Singh, who masterminded the Kanishka crash in 1984, was killed in a fierce two hour encounter...
Talwindcr Singh was killed in a fierce two-hour encounter ... Talwinder Singh masterminded the Kanishka crash in 1984.
FSG based Simplification:
Here we consider sentences as word groups or chunks and consider the chunk boundaries as articulation points .
Chunking allows us to find out the syntax of the sentence and the structure of simplification rules at a coarser granularity, since we need no longer be concerned with the internal
structure of the chunks.
Each chunk is a word group consisting of a verb phrase or a noun phrase, with some attached
modifiers. The noun phrase recognizer also marks the number (singular/plural) of the phrase. The verb phrase recognizer provides some information on tense, voice and aspect.
The chunked sentences are then simplified using a set of ordered simplification rules.
An example rule that simplifies sentences with a relative pronoun
X:NP,Relpron Y,Z->XP Z . X:NP Y
The rule is interpreted as follows. If a sentence starts with a noun phrase (X:tiP), and is followed
by a phrase with a relative pronoun, of the form
( RelPron Y ,) followed by some (Z), where Y and Z are arbitrary sequences of words, then
the sentence may be simplified into two sentences, namely the sequence (X) followed by (Z), and (X) followed by (Y). The resulting sentences are then recursively simplified, to the extent possible.
A Dependency-based model:
This model is based on simple dependency representation provide by LTAG( Lexicalized Tree Adjoining Grammar) .
LTAG: These contain elementary tress called initial trees and auxiliary trees.
Initial trees include nouns,PP,simple sentences etc.
Auxiliary tress include relative clauses ,adverbials etc.
Supertagging: LTAG tells us that only dependency elements be present in the same tree because the LTAG localizes dependency elements.
As a result of this localization, a lexical item may be associated with more than one eLementary
tree, We call these elementary trees super tags.
We use trigrams to disambiguate the super tags as to assign one super tag for each word in a process called super tagging.
EVALUATION:
To establish the dependency links among the words of the sentence, we exploit the dependency
information present in the super tags. Each supertag associated with a word allocates slots for
the arguments to the word. These slots have a
polarity value reflecting their orientation with respect to the anchor of the supertag. Also associated with a supertag is a list of internal nodes
that, appear in the supertag.Using this information, a simple algorithm
may be used to annotate the sentence with dependency links.
The objective of the evaluation is to examine the advantages of the DSM over the FSG-based model for simplification. In the FSG approach since the input to the simplifier is a set of noun and verb groups, the rules for the simplifier have to identify basic predicate argument relations to ensure that the right chunks remain together in the output. The simplifier in the DSM has access to information about argument structure, which makes it much easier to specify simplification patterns involving complete constituents.
Sunday, February 27, 2011
Our working Sunday..
Friday, February 25, 2011
Ideas,Coding and High spirits!!............
Tuesday, February 22, 2011
It's just the Beginning(V1.0)!..
Busy Monday!...:)
Its Monday!!..The first day and also a new start for the week .I had to get up at 5 'o' clock(AM) since our Sir had called us for project discussion at 7.30 am.
I board my bus at 6.30 .It was an amazing weather,cool breeze ,window seat and what else could I ask for... and I took my cell out to read all my forwards which I had received last night and I must tell you the nice road near Pesit reminds me of Jab We Met song though it is funny.. :P..
I was the first one to reach coll at 7.10 and I was jobless so I started taking few good pics of my coll in my camera. Surprisingly Bhuvan was very late and 4 of them including Sir had to treat 3 of us(celebrities are ANUSHA,AKSHATHA and JAWERIYA ;) ) for coming late ,this is our SFPE rule !!! ,(stomach full pocket empty ) which we assessors and simplifiers follow.
Jaweriya was star for the today as she gave us a wonderful explanation on grading a text .
The talk was really interesting as our sir filled in examples to make it a lively session.
She wrote a few formulas for the readability assessment and content information which included lot of mathematical equations.
Her main explanation was on Grading the text ,she stresses on two points for this, one is readability assessment and one more Content information and how much of each as to be used to get the peak and efficient value.
For ex: Bhuvan likes coding and it is 20%work of our project and apoo likes reading which involves 80% work of our project but both are important for the completion of our project.
work ----------------- Grades
100% of bhuvan's coding ----------- 0
90% of bhuvans coding & 10% apoo s reading------ 10
50% of Bhuvan' work and 50% of apoo 's work-------- 50
20% of Bhuvan' work and 80% of apoo 's work ------- 85
0% of Bhuvan' work and 100% of apoo 's work-------- 5
Here in the example we can note that at one point there is a maximum efficiency and that combination of work from both of them would give maximum result .Similarly the same concept is applied to grading a text and here the two contenders are Readability assessment and Content information....:).
After a good discussion on this topic ,we were supposed to continue our work and we did that till till 12 o clock with continuous debugging and coding.
exactly at 12 we were told to solve Prasad s sir problem and that went on for 1 hour ,we were supposed to trace an algorithm on Chain Matrix Multiplication( Given a a set of matrices like a1,a2,a3,a4.. which combination of matrix multiplication would result in least number of steps in multiplications.
After this long time of tracing ,we went to have lunch in NRI canteen and I was waiting for it because Bhuvan was supposed to treat ..yahooooooooo !!!..:)
Then we all had a small Birthday party for Madhura which we enjoyed a lot and not to forget the pastry cake ,it was yummmmm...:P.
End of the day is the most important part for which me and apoo were waiting eagerly ..that is our code to be free of errors ,which we succeeded at around 1.30am due to our sir s help..
But one thing people ..This Index out of range error I tell you is so damn irritating if you dont overcome it .
I recommend all of you to please use python debugger (PDB) which saves a lot of time in your coding when you encounters errors.
Python debugger is really wonderful tool of python ....
We now started again with next module of our coding which searches sub sentences in the corpus by taking adjacent words.
Our search module was a grand success...
Lastly Credits to Bhuvan because he wrote an amazing code which simplifies a text with complicated words to simpler one but it did make sense when the word is replaced ...Kudos Bhuvan.....
This is the story of my BUSY monday........more posts to come.........
Cya. for now....
Monday, February 21, 2011
Lunch at IISc :-)
Aah! A nice day spent at IISc... Yummy lunch and a fun game played under the shade of the trees. More details to follow this post.
Monday, February 14, 2011
Progress for the day :)
- Removing the stop words.
- Lookup for the 1500 English simple words. Filtering out the simple words.
- Identifying the complex words(key words).
- Finding the synonyms for the keywords.
- Finding out co-occurrence rate for the key words.
- Based on the rate we should select the proper synonym and fit for replacing.
My progress......
I must say that today was not my day :( I got up late and somehow I managed to go to mess , it was exact 9’o clock by then. when I reached mess , there was a very long queue which resembles to the queue waiting for the ticket of the first day first show of Rajnikant movie. By the time I finished my breakfast it was 9:10. Then I rushed to library where I thought my team mates will be waiting for me but that was not the fact :). They came few minutes later .We were very energetic and very excited to code together . We planned that we four will work together and definitely today we will finish off our first module and we will show it to sir . But our plan didn’t work well:( First the WiFi did not connect. I don’t know what happens to the WiFi sometimes(you can say most of the times). We tried and tried but it didn’t. Then we stepped into digital library, but there also net was very slow. Only the system infront of which Bhuvan was sitting was working fine. By that time it was almost 10 ‘o clock and we were getting tensed that sir will scold us properly and we were preparing our mind for that. Then we left library and went to lab and there were only two systems free. We started off with the coding. But we could not code because of the disturbance there. Then we decided to go home. As Wifi was not connecting I went to Bhuvan’s home and anusha and apoorva went to their home.
Then we started with the actual coding. We were able to remove the nouns, proper nouns and prepositions. We extracted few key words. Then we searched for these keywords in the most frequently used English words. If that word is present in those words which the file contains then no need to replace that word and search for the next word. The code worked fine. Next we found the set of synonyms for the tokenized words and listed them in one variable. As of now the main and most important step of our module is to replace the word with the proper synonym which retain the context of the sentence . As of now we don’t know how to do this but we are trying to find a way for it. I tried different techniques but those were not so fruitful. In short today my progress in coding is that now I am able to extract the keywords that is the difficult wordsin a sentence wise manner. The next thing is to replace the word with the appropriate synonym which matches the context. This is difficult and most important step of our coding. Hope we will come up with the solution by tomorrow :)















