Sunday, February 27, 2011

Our working Sunday..

It was almost nearing midnight on Saturday when all of us were lethargic and drowsy. We hadn't progressed much all this week since we had started to think that there was lots of time left and the others had not even started. So we all went to bed deciding to meet online on Sunday ( a little later than 9 am). I didn't set my alarm too.

I got up to the tinkling sound the "ghante" at 9.30 am ( Anirudh does sandhyavandane everyday without fail). I was sipping the milk when Anusha called me online. We all had received a mail from Sir which truly was a wake-up call for us. It was time we understood a simple fact that there was nothing wrong in him having certain expectations from us, when he is giving us his time and guiding us in return for nothing . We were hit by the mail brutally and we all worked for more than 8 hours , of course with some progress. (Ms Ambika excluded. She is absconding since 10 days ;) )


We decided to be sincere students and work harder.

We started the day off by learning SVN. It became less complicated since Anusha had done a Phd on it since a month. She knew almost all the required commands and links useful for us. We were successful in committing version 1 of a test file. We created a repository to store our files and got stuck while importing the files to the repository. I am guessing there is a problem with the permissions which we will resolve soon enough. Bhuvan was watching the tutorials on how to create a plug-in( Though I doubt if he was live-streaming the cricket match in between ;) ).

It has been 2 weeks since we started implementation and we could not develop more than 2 modules. We are lacking in this aspect because we dont know the simple tricks in programming. We got stuck with an error for 2 full days without knowing what it was when our Sir could debug it in less than 15 minutes. We failed because we didn't use Pdb. Had we learnt this tool a month back it would have saved us those 2 days. That was the day when we first realized the importance of that tool. It is so easy to debug the errors now!

We have snippets of codes written and stored everywhere on the disk with different names. As the number of modules keep increasing, I am finding it difficult to maintain all of them. When i want to include some module in a program, I go and search for it on the disk. Clearly shows that I am wasting my time. So Sir taught us how to modularize all our codes and to create a library which will help us stay organized. I can create my own library and when Bhuvan or Anusha want to use one of my modules, they can directly include it. This saves a lot of time and effort!

I had forgotten for a while that we are not doing an ordinary project but an extraordinary one!  :-) I feel glad to be a part of such a project with constant motivation by our Sir and my fellow simplifiers. I am sure all "text-simplify" mates are motivated and charged up to work harder. In the end, we all want to see the project being a huge success.


Cheers!


P.s:- I really want to appreciate Bhuvan who sat in front of the computer doing the TS project (hopefully) despite the over hyped India vs England WC match going on since afternoon :)

Friday, February 25, 2011

Ideas,Coding and High spirits!!............

Helloooooo people..

Well,it was Thursday morning and we had our project discussion at 7.30 and this time the meeting was only for Simplifiers...

I was first one to reach and again since I was jobless started thinking on Apoo's super Kannada and Bhuvan 's healthy Diet and Ambika's Hidden secret ... lol this are really interesting topics when u get to know the details ,which I will be sharing very soon ...

Since all 3 of us (Bhuvan ,Me,Apoo) slept a little late we were quite sleepy but our sir 's
evergreen voice and his energy made us active.

Just before sir came and joined all three of us we were into serious discussions on how to proceed further in our modules .

We had finished our module on sub sentence matching with the corpus and it was working fine...
It takes the sub sentences considering the adjacent words and compares it with the corpus, so we all were into thinking on how to simplify this sub sentence and we had many questions in our mind
1) whether we need to simplify all words in the sub sentence and replace it ,find the frequency and then compare it with original frequency??
2) replace a single word with the synonym and take adjacent words and then find the frequency??
3) we thought lets us consider a graph for each sub sentence and the one with peak value in the graph will be considered for simplification but we didn't know which words and how many words to consider for replacement................
and many more......

Finally after our Sir came and he raised a fantastic question ' how do we check the appropriateness of the sentence after simplification' He started the discussion with Markov matrix and he tells us that it might help in solving this problem.
I will explain What it is actually about???.....

Consider a set of words column wise in a matrix and consider the same words row wise in an another matrix and mulitiply those two matrices .The frequency count i.e the probability of two words are obtained and stored in a matrix , this probability is used to justify whether the already simplified sentence is SIMPLE AND IS MOST FREQUENTLY USED .

He says The Markov Principle talks about a set of events which immediately follows a event.
For ex soon after eating Papad in a restaurant it is most likely that one will order roti ..:P..not a perfect example though

At last we thought of replacing the complicated word assuming that there is only one such word with the synonym and then continue the process of sub sentence searching/matching.

hmmmmm full technical stuff till now .ufffffff

In midst of all these ,Ambika is nicely enjoying her vacation in her home town .When I call her She says the weather is good etc etc .Dont know why Ambika is feeling strange with known things happening around her.. :P

Then we had discussion on Apoo s diet and FITNESS mantra ..lol that is quite interesting u know..:P . She is up to something because she says 'WAIT FOR 2 MONTHS ' for everything we ask..:P adu en madthalo gothilla..

We Had good breakfast and headed to our class rooms ...
Classes are very much interesting with all interesting lectures;)

Bhuvan was eagerly waiting to show his code to the team and the afternoon session was kind of project review ,oh no it was in fact a code review..
we all showed our codes to sir and he was quite happy seeing the code working ...:)

We then discussed on next few tasks for the week and winded up .

Still searching for a good fine day to release my learning on Svn to friends .Hope that day comes soon.......

Lastly I feel like mentioning this quote which is in my diary and it is quite inspiring too!...

Your Dreams must come from your heart's deepest desires.Only Then will the barriers come down before you.


feeling sleepy people......cya good night!..

Tuesday, February 22, 2011

It's just the Beginning(V1.0)!..

It was just another Saturday, I should work for the project no no, its a weekend, I should enjoy or else I should sleep!. Last Saturday was different, i was active, i was charged up, i was on fire, i was all pumped up, burnt the midnight oil!!
By the way the reason behind it!!???
Cold war with our guide, He sent across a mail saying you people aren't progressing.. :P :P Taunting by his videos :P :P We all got fired up! My only aim was proving sir wrong.

Coding is never difficult unless you imagine, how the flow of the program should go. If you know the output if you could imagine how flow of your program should go then half your program is ready. Rest all is your little programming skills, Google and your ideas of using the language for the best.

I just know what to do, got the idea on what all functions i should work on, after all coding is the part i enjoy a lot, but seriously i had turn lazy i was actually postponing things until sir's mail and his taunting status! :P
Here I go Simplifying the text, its simple straight forward approach to simplify!! :P :P

Simple is never simple, My sweet computer had to face all my emotions my anger, my joy of getting outputs, frustration of untraceable errors, breaking heads on thinking how to code for particular. Google is god if it could show solutions for the problems, if not start cursing it for wasting my time hehe :P
It was Sunday evening around 6 when i first checked for my programs output after compiling all the small blocks of code into a single file. It was showing the output and i was jumping, dancing all around the place hehe :D

Learnt a lot coding through the program many thanks to our sir's taunts, mails ;) Yeah this ishttp://stackoverflow.com one of the best website I always look upon for help. Past one and half year its been my savior. If u have doubts regarding any programming language you could seek help there.

By the way Version1.0 is ready! its just matter of few days i will come up with a newer, better version i have rated the present one as just 35% accurate and "my expectations are more ";) ;) hehe :D

Signing off Bhuvan :)

Busy Monday!...:)

Hello.....

Its Monday!!..The first day and also a new start for the week .I had to get up at 5 'o' clock(AM) since our Sir had called us for project discussion at 7.30 am.
I board my bus at 6.30 .It was an amazing weather,cool breeze ,window seat and what else could I ask for... and I took my cell out to read all my forwards which I had received last night and I must tell you the nice road near Pesit reminds me of Jab We Met song though it is funny.. :P..

I was the first one to reach coll at 7.10 and I was jobless so I started taking few good pics of my coll in my camera. Surprisingly Bhuvan was very late and 4 of them including Sir had to treat 3 of us(celebrities are ANUSHA,AKSHATHA and JAWERIYA ;) ) for coming late ,this is our SFPE rule !!! ,(stomach full pocket empty ) which we assessors and simplifiers follow.

Jaweriya was star for the today as she gave us a wonderful explanation on grading a text .
The talk was really interesting as our sir filled in examples to make it a lively session.
She wrote a few formulas for the readability assessment and content information which included lot of mathematical equations.

Her main explanation was on Grading the text ,she stresses on two points for this, one is readability assessment and one more Content information and how much of each as to be used to get the peak and efficient value.

For ex: Bhuvan likes coding and it is 20%work of our project and apoo likes reading which involves 80% work of our project but both are important for the completion of our project.
work ----------------- Grades
100% of bhuvan's coding ----------- 0
90% of bhuvans coding & 10% apoo s reading------ 10
50% of Bhuvan' work and 50% of apoo 's work-------- 50
20% of Bhuvan' work and 80% of apoo 's work ------- 85
0% of Bhuvan' work and 100% of apoo 's work-------- 5

Here in the example we can note that at one point there is a maximum efficiency and that combination of work from both of them would give maximum result .Similarly the same concept is applied to grading a text and here the two contenders are Readability assessment and Content information....:).

After a good discussion on this topic ,we were supposed to continue our work and we did that till till 12 o clock with continuous debugging and coding.
exactly at 12 we were told to solve Prasad s sir problem and that went on for 1 hour ,we were supposed to trace an algorithm on Chain Matrix Multiplication( Given a a set of matrices like a1,a2,a3,a4.. which combination of matrix multiplication would result in least number of steps in multiplications.

After this long time of tracing ,we went to have lunch in NRI canteen and I was waiting for it because Bhuvan was supposed to treat ..yahooooooooo !!!..:)

Then we all had a small Birthday party for Madhura which we enjoyed a lot and not to forget the pastry cake ,it was yummmmm...:P.

End of the day is the most important part for which me and apoo were waiting eagerly ..that is our code to be free of errors ,which we succeeded at around 1.30am due to our sir s help..
But one thing people ..This Index out of range error I tell you is so damn irritating if you dont overcome it .
I recommend all of you to please use python debugger (PDB) which saves a lot of time in your coding when you encounters errors.
Python debugger is really wonderful tool of python ....

We now started again with next module of our coding which searches sub sentences in the corpus by taking adjacent words.
Our search module was a grand success...

Lastly Credits to Bhuvan because he wrote an amazing code which simplifies a text with complicated words to simpler one but it did make sense when the word is replaced ...Kudos Bhuvan.....

This is the story of my BUSY monday........more posts to come.........
Cya. for now....

Monday, February 21, 2011

Lunch at IISc :-)

Delicious lunch at the mess






Bhuvan and Ambika relishing the food!






Sir: Yenidu ellaru ootane kandildhero thara thinthaidira :P

L-R: Heroine's sister, Heroine, Don





Sir: Haaaa haaaa enjoy madi :)






Bhuvan trying to show off his photography skills ;-)




Hmmm.....If any of u want autographs, please stand in the queue ;)



Aah! A nice day spent at IISc... Yummy lunch and a fun game played under the shade of the trees. More details to follow this post. 

Monday, February 14, 2011

Progress for the day :)

uh! its coding it will be easy day for me.. Its never the case when you are thinking to work in DSCE with its facilities! :P We just get to know this all the times in a rough way, today is one such example.. I just went to college library at five past nine found ambika texting{msgs cost ;) } come on its valentines day after all ;) shouldn't ask whom.. :P
"Private rooms at library best place to do project"?? If you think yes, you must b biggest fool :P Or it is not meant for it also hehe :P :P Just went to see if there's a plug point in the private rooms, first shock for the day "No plug points in those rooms", second internet is not getting connected.
what a start for the week!! :P

We went to the study section where we thought we can find plug points and got one and got no internet still. All the while one sentence everyone chanting "idhella bekitha namge!!??" should have sat properly code from home with no problems what so ever!! Anyways sad story continues, we went to our dept saw juniors sitting on the comps, and here goes final year project, students are suppose to come to college and no comp to work over!!. Anyways went inside say very few students final years and we got 2 comps started coding, we have internet but not on ubuntu and upon that should install the packages all over again!! Once again "idhella bekitha!??"
After all the fuss decided to go to home, first good decision we took!!..

Sad story doesn't end here, i came to home and saw my internet was still not working :( anyways we have the comp with all the necessary packages.. :) Started coding and good time started got internet connected too :) Yeah not to forget all the fun when ambika started all blushing when she was getting msgs and she wanted to leave early as she was feeling sleepy, i think we all know the todays specialty and we could make out the urgency too ;) ;)
Getting serious...
We wrote the codes for:
  • Removing the stop words.
  • Lookup for the 1500 English simple words. Filtering out the simple words.
  • Identifying the complex words(key words).
  • Finding the synonyms for the keywords.
To do list for tomorrow:
  • Finding out co-occurrence rate for the key words.
  • Based on the rate we should select the proper synonym and fit for replacing.
Thats my progress for the day.. :)

My progress......

I must say that today was not my day :( I got up late and somehow I managed to go to mess , it was exact 9’o clock by then. when I reached mess , there was a very long queue which resembles to the queue waiting for the ticket of the first day first show of Rajnikant movie. By the time I finished my breakfast it was 9:10. Then I rushed to library where I thought my team mates will be waiting for me but that was not the fact :). They came few minutes later .We were very energetic and very excited to code together . We planned that we four will work together and definitely today we will finish off our first module and we will show it to sir . But our plan didn’t work well:( First the WiFi did not connect. I don’t know what happens to the WiFi sometimes(you can say most of the times). We tried and tried but it didn’t. Then we stepped into digital library, but there also net was very slow. Only the system infront of which Bhuvan was sitting was working fine. By that time it was almost 10 ‘o clock and we were getting tensed that sir will scold us properly and we were preparing our mind for that. Then we left library and went to lab and there were only two systems free. We started off with the coding. But we could not code because of the disturbance there. Then we decided to go home. As Wifi was not connecting I went to Bhuvan’s home and anusha and apoorva went to their home.

Then we started with the actual coding. We were able to remove the nouns, proper nouns and prepositions. We extracted few key words. Then we searched for these keywords in the most frequently used English words. If that word is present in those words which the file contains then no need to replace that word and search for the next word. The code worked fine. Next we found the set of synonyms for the tokenized words and listed them in one variable. As of now the main and most important step of our module is to replace the word with the proper synonym which retain the context of the sentence . As of now we don’t know how to do this but we are trying to find a way for it. I tried different techniques but those were not so fruitful. In short today my progress in coding is that now I am able to extract the keywords that is the difficult wordsin a sentence wise manner. The next thing is to replace the word with the appropriate synonym which matches the context. This is difficult and most important step of our coding. Hope we will come up with the solution by tomorrow :)

My Progress today...

I always like Monday since it the first day of the week... a fresh morning and a fresh start! It was 14-02-2011 (I need not elaborate on this!) , the day when most of us students bunk and go out with our beloveds. But we, "text-simplify" mates decided to work on the project and show some real progress since not much had been done the previous week. We decided to meet at DSCE and sit together and work on the modules since there would be collective exchange of efforts and ideas which would not be possible in case we stayed at home!

Here we are at the DSCE library at sharp 9 o'clock trying to find a plug point for our  laptops and trying to connect to the Wi-Fi.. Uh- huh! Nothing works our way. Finally we decide to login through the systems at digital library but the connectivity is so bad that we could not login to Gmail even. We keep running around, trying to figure out what to do and at 10 o'clock we decide to go back to our department and work in the labs. We enter the lab to find to find two systems unoccupied. What could have possibly gone wrong now? Take a wild guess! Yeah.. The internet does not work with Ubuntu.. Ugh! We tried to code the module with the help of the materials we already had, but it was not working.

Bhuvan said- "I am never going to come to college and work, you people do whatever you want!"

Ambika said- "ahan! Bhuvan.. Today you came to college just because internet was not working at your     place. So keep quiet" ;)

Anusha said- "le...naave dodda halla thodi, adralli biddange aythu" :P

Ha ha ha ha! The situation was so funny. I was laughing in spite of zero progress. Thereafter we decided that we would work from home efficiently!

We worked for the rest of the day at home. (of course! I took a number of breaks in between). We wrote a piece of module on our own.
The present module we are aiming at will scan for the presence of a particular sentence in the corpus and hence return the frequency count of that sentence.

I am reading two papers presently. The simplified version of it will follow soon.
  • Integrating selectional preferences in WordNet by Eneko Agirre and David Martinez
  • Text Simplification for Language Learners: A Corpus Analysis by Sarah E. Petersen, Mari Ostendorf

    Disappointing and a fun day! I am not very happy with the progress though. I will do better in my next post.

    Experience programming the first module.....

    Hi All,

    Early morning got up with great enthusiasm thinking of the fact that our team would collectively start coding on a module today and we are going see some output at the end of the day.

    Personally we thought team work would add more ideas to the project and the work will finish at a faster pace...
    As usual it took me one hour to reach college and as soon I entered the library , my teammates gave a sad news about our so called college Internet connection.

    Due to Bad internet connection in library ,we shifted to our department lab and started our actual planning for the module that we were supposed to code.

    We divided our work into points and started off with coding.........
    We took brown corpus for searching text and Shakuntala s blog as our sample test and scanned for each and every sentence.

    After scanning the text ,we break the sentence with a space.
    We take each sentence and match it with the brown corpus.If the string is present, it prints yes.

    The Brown Corpus was the first million-word electronic corpus of English, created in 1961 at Brown University. This corpus contains text from 500 sources, and the sources have been categorized by genre, such as news, editorial, and so on.


    So this was the progress done today and it continues.........
    Cya......

    Wednesday, February 9, 2011

    Implementation Details of the First Two Modules...

    After a week long discussion, we managed to come out with a lucid plan to develop a package to simplify the text (one paragraph at least if not more) ;-)

    We intend to come out with two modules, each exhibiting a different idea. So let us see what those 2 modules are:

    MODULE 1
     The module which is going to be developed by Bhuvan and Ambika will of course simplify the text, but we will see how they are going to do it.

    1. After the input text is scanned and broken into sentences, we remove the stop words such as pronouns, prepositions, etc.
    2. Now we are left with few words which may be noun, verb or adjective. We need to choose the keywords for replacement and hence it is the most important step in the module.
    3. Assuming a long sentence will not consist of more than 5 keywords, we limit our count to 5 and process those words.
    4. Consider a sentence to have 3 keywords namely:  $S=\{w_1, w_2, w_3\}$. 
    5. If $w_1$ has synonyms namely - $[s_1, s_2, s_3]$, then replace $w_1$ with $s_1$ and find the frequency count of $\{s_1, w_2,w_3\}$. Similarly, replace $w_1$ with $s_2$ and find the frequency count of $\{s_2, w_2, w_3\}$ and so on for all the synonyms of $w_1$. Finally $w_1$ is replaced with its synonym $s$ which has the highest frequency count compared to the other synonyms. For example, if $s_1$ has a frequency $(f=100)$ in $\{s_1, w_2, w_3\}$,  $s_2$ $(f=500)$ and $s_3$ $(f=300)$ then we replace $w_1$ with $s_2$.
    6. The same process is continued for the remaining words i.e $w_2$, $w_3$. (repeat step 5)
    7. An existing corpus will be used to check the presence of the keywords in context  and their frequency.
      MODULE 2
       The module which is going to be developed by Anusha and I will also simplify the text. This is a colorful one which will involve graphs as well! Here goes the details:

      1.  After the input text is scanned and broken into sentences, we remove the stop words such as pronouns, prepositions, etc.
      2. Now we are left with few words which may be noun, verb or adjective. We need to choose the keywords for replacement and hence it is the most important step in the module.
      3. Assuming a long sentence will not consist of more than 5 keywords, we limit our count to 5 and process those words.
      4. Let us consider the keywords as--- $\{w1, w2, w3, w4, w5 \}$
      5. Find the word in the mean position ($w_3$ in this case), let $w_3$ have the synonyms $\{s_1, s_2, s_3\}$.
      6. Substitute $w_3$ with $s_1$ and find out the frequency count for $\{w_2, w_3, w_4\}$. Now consider $\{w_1, w_2, s_1, w_4, w_5\}$ and plot the graph of frequency curve.
      7. Repeat step 6 by replacing $w_3$ with the remaining synonyms $\{s_2, s_3\}$.
      8. Compare the graphs and finally choose the best synonym to be replaced with $w_3$.
      9. Repeat steps 5-8 for all the keywords in the list.
      We intend to simplify the text partially if not completely through these modules.
      So Best of luck "Text-Simplify" mates!


      Tuesday, February 8, 2011

      An Information Retrieval Approach to Sense Ranking

      Authors:Mirella Lapata and Frank Keller School of Informatics, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, UK


      Text simplification mainly involves converting complicated text to the simpler one. This is done by replacing complicated words with the simpler and frequently used synonyms.It is obvious that each complicated word will have at-least two ambiguous synonyms or senses or meanings. One need to resolve this ambiguity. How do we do that is what this paper tells about.


      As I had mentioned in my previous post Word Sense Disambiguity(WSD) is the ability to identify the intended meaning(sense) of word in context. In WSD choosing the most frequent sense for an ambiguous word is a powerful heuristic.In this paper, an information retrieval base method for sense ranking queries on IR(Information Retrieval)Engine to estimate the degree of information between the word and its senses.

      The WSD can be achieved with the help of WordNet and Corpus.Now let us understand what WordNet and Corpus are????
      WordNet:
      It is a semantically oriented dictionary of English similar to traditional thesaurus but with a richer structure.There is a so called function, synset in the wordnet. Synset is nothing but the synonym set that is a collection of synonym words.
      Corpora:It is a large body of text.


      Method Used:Central in our approach is the assumption that context provides important cues regarding a word’s meaning. The documents are typically written with the certain topics in mind which are often indicated by word distributional patterns.

      For example
      , documents talking about "congressional tenure" are likely to contain words such "as term of office or incumbency", whereas documents talking about "legal tenure" (i.e., the right to hold property) are likely to include the words "right or land". Now, we could estimate which sense of tenure is most prevalent simply by comparing whether tenure co-occurs more often with term of office than with land provided we knew that both of these terms are semantically related to tenure.

      Fortunately, senses in WordNet are represented by synonym terms. So all we need to do for estimating words sense frequencies is to count how often it co-occurs with its synonyms.
      The co-occurrence definition is that two words co-occur if they are attested in the same document. After finding the synonym set next step is to find which is the dominant sense or synonym.This is explained as follows



      Dominant Sense Acquisition:
      Throughout the paper we use the term frequency the shorthand for document frequency that is the number of documents that contain a word or a set of words which may or may not be adjacent. For this we use the synset function of WordNet(which I explained earlier portion of this paper).
      purposes.
      As an example consider the noun "tenure", which has the following senses in WordNet:
      (1) Sense 1
      tenure, term of office, incumbency(synonym set of tenure)
      => term(hypernym of above senses)
      (2) Sense 2
      tenure, land tenure (synonym set of tenure)
      => legal right(hypernym of above senses)

      The senses are represented by the two synsets {tenure, term of office, incumbency} and {tenure, land tenure}. (The hypernyms for each sense are also listed; indicated by the arrows.) We can now approximate the frequency with which a word "w1" occurs with the sense "s" by computing its synonym frequencies

      synonym frequencies:for each word "S1" in syns(s),the set of synonyms of s, we field a query of the form w1 AND S1. These synonym frequencies can then be used to determine the most frequent sense of w1 in a variety of ways (to be detailed below).
      So the queries for the above example of tenure will be as follows:
      (1) a. "tenure" AND "term of office"
      b. "tenure" AND "incumbency"
      (2) "tenure" AND "land tenure"
      For example, query (1-a) will return the number of documents in which tenure and term of office co-occur.Presumably, tenure is mainly used in its dominant sense in these documents. In the same way,query (2) will return documents in which tenure is used in the sense of land tenure.


      Hypernym frequencies:
      Apart from synonym frequencies, we also generate hypernym frequencies by submitting queries of the form w1 AND S1, for each S1 in hype(s), the set of immediate hypernyms of the sense s. The hypernym queries for the two senses of tenure are:
      (3) "tenure" AND "term"
      (4) "tenure" AND "legal right"
      Hypernym queries are particularly useful for synsets of size one, i.e., where a word in a given sense has no synonyms, and is only differentiated from other senses by its hypernyms.

      Once the synonym frequencies and hypernym frequencies are in place, we can compute a word's predominant sense in number of ways


      First way:First, we can vary the way the frequency of a given sense is estimated
      based on synonym frequencies:

      • Sum: The frequency of a given synset(set of synonyms) is computed as the sum of the synonym frequencies.
      For example, the frequency of the dominant sense of tenure would be computed by
      adding up the document frequencies returned by the queries "tenure AND term of office"(1a)and"incumbency"(1b).

      • Average (Avg): The frequency of a synset is computed by taking the average of synonym
      frequencies.

      • Highest (High): The frequency of a synset is determined by the synonym with the highest
      frequency.


      Second way:
      we can vary whether or not hypernym are taken into account:

      • No hypernyms (−Hyp):
      Only the synonym frequencies are included when computing the
      frequency of a synset.
      For example,
      only the queries "tenure AND term of office"(1a)and"tenure AND incumbency"(1b) are relevant for estimating the dominant sense of tenure.
      • Hypernyms (+Hyp): Both synonym and hypernym frequencies are taken into account
      when computing sense frequency.
      For example, the frequency for the senses of tenure would be computed based on the document frequencies returned by queries "tenure AND term of office"(1a) ,"tenure AND incumbency"(1b) and "tenure AND term" (3)(by summing, averaging, or taking the highest value, as before).


      The third
      way:This option relates to whether the sense frequencies are used in raw or in normalized form:

      • Non-normalized (−Norm): The raw synonym frequencies are used as estimates of sense frequencies.

      • Normalized (+Norm): Sense frequencies are computed by dividing the word-synonym frequency by the frequency of the synonym in isolation.

      For example, the normalized frequency for "tenure AND term of office" (1-a) is computed by dividing the document frequency for "tenure" AND "term of office" by the document frequency
      for "term of office". Normalizing takes into account the fact that the members of the synset of a sense may differ in frequency.

      One of these three ways is used to get the word's sense acquisition. The model selection can be done as follows
      The goal is to establish which model configuration is best suited for the WSD task. We thus varied how the overall frequency is computed (Sum, High, Avg), whether hypernyms are included (±Hyp), and whether the frequencies are normalized (±Norm).
      For example the following table shows the sum,high and avg for some data content.


      -Norm

      +Norm


      +Hyp

      - Hyp

      +Hyp

      -Hyp


      P

      R

      P

      R

      P

      R

      P

      R

      sum

      42.3

      40.8

      46.3

      44.6

      45.9

      44.3

      48.6

      46.8

      High

      51.6

      49.8

      51.1

      49.3

      57.2

      55.1

      59.7

      57.9

      Avg

      44.1

      42.6

      48.5

      46.8

      49.6

      47.8

      51.5

      49.6








      In sum, the best performing model is High,+Norm, −Hyp, achieving a precision of 59.7% and a recall of 57.9%.
      Once the model has been selected the complicated word is replaced with the dominant sense which was found by the selected model.This is how the word sense rank is obtained.Depending on the rank, the most dominant sense is chosen for the replacement of the complicated words.This is done for each and every sentence of the text. Thats it we get the simplified version of the text:)