We in text simplification group have desired to simplify the English text to the simpler one which will help people in one or the other way. I am sure that our system will definitely help the 8th semester engineering students in understanding the IEEE papers while doing their project:):) Apart from that it will also help people with the limited vocabulary. To implement this “ text simplification “ system we are doing lot of literature survey to get an idea of how this is to be accomplished. Now we have got some ideas to achieve this.
The main objective behind writing this post is to come up with some ideas in implementing the text simplification process. As the task of “text simplification” group is to simplify a text containing complicated words to a simpler one. This achieved through replacing the complicated words with the simpler synonyms.
The text which need to be simplified by our text simplification system is taken as an input text. It is then scanned sentence by sentence. Then in-turn each sentence is parsed to remove the prepositions and adjectives. After removal of these we are left with only the complicated words in the sentences. Then the next task is to replace the complicated words.
Now the question is how to decide which word is complicated one in the sentences. The complexity of each word is decided through the frequency count of each word. The frequency count is nothing but how often the word is used in the corpus. If the words frequency count is too less then that word is the complicated one. We can set one threshold frequency and compare the frequency count of each word with this threshold value and decide how frequently the word is used. We can use the inbuilt function of the WordNet package to find the frequency count.
Our aim is not only the replacement of the complicated words with the simpler one but also to ensure that the replaced word will retain the meaning of the sentence and also it fits the context of the sentence. So after replacing the complicated word with the simpler one, we need to check whether the sentence after simplification retains its meaning. If it does not, then we need to repeat the above process. It is a tough task but using some APIs we can achieve this.
This is just a first step towards our coding part. I am sure that I will enjoy coding with my team :):)
No comments:
Post a Comment