Sciences, University of Sussex
When one thinks of converting a complicated text to a simpler one for better understanding, the first thing comes to mind is the task of replacing the complicated words with the simpler one. This involves the searching of the set of synonyms for the complicated word and replacing it with the simpler synonym, whose sense or meaning fits the context of the given sentence. There arises the ambiguity of which sense of the word is to be selected???? This paper answers the question. The paper involves how the ambiguity is resolved while replacing the word with the one of it's synonyms.
Let us understand the title of the paper which is "Word Sense Disambiguation Using Automatically Acquired Verbal Preferences".
The Word Sense Disambiguation(WSD) is an open problem, which governs the process of identifying which sense(synonym) of the word i.e meaning is used in a sentence, when the word has multiple meanings.This paper proposes a system which is responsible for achieving WSD.
A sentence normally consist of subject, verb and object(and sometimes prepositional phrases). There are two types of objects-direct and indirect object.
Let us take an example which helps in understanding the direct and indirect objects as
Ex: He gave Mary a rose.
He--subject
gave--verb
direct object--rose
indirect object--Mary
He--subject
gave--verb
direct object--rose
indirect object--Mary
The " Disambiguation Using Automatically Acquired Verbal Preferences" can only be used on the sentences containing direct objects, as the indirect objects are less common.
Now you know what word sense disambiguation is!!! Next we will see how it is achieved through verbal preferences.
The word "Verbal Preferences" refers to a disjoint set of noun classes, covering all nouns with a preference attached to it. That was a formal definition, put in simple words a preference is nothing but the importance given to each sense(meaning) of the word. It is called "Verbal Preference" because here it is applied to the verbs. Depending on this preference, the word is replaced with the sense having more preference. Disambiguation only takes place when the preferences can discriminate between senses.
Well this paper introduces to the system through which the Disambiguation is achieved through the automatically generating verbal preferences. The system consists of a shallow parser, SCF(Subcategorization Frame) acquisition, WSD and selectional preference acquisition through ACTM(Association Tree Cut Model).
Before moving to how the whole system works, let us understand the functionality of these components.
Shallow Parser along with SCF(Subcategarization Frame) extracts the verbs in the sentences tagged by the HMM(Hidden Marcov Model). The HMM is mainly used for Parts of Speech(PoS)tagging. PoS tagging is the process of marking up the words in a text as corresponding to a particular PoS, based on both it's definition as well as it's context. This simplified from is taught to school-age children, in identification of words as nouns, verbs, adjectives etc.
The WSD(Word Sense Disambiguation),this component is used to select the correct sense of the word that is to be replaced with. For that
1.the should be seen more than three times
2.the should be seen more than twice as often as the next second sense.
Selectional Preference Acquisition as specified earlier, preferences are the disjoint sets for noun classes. These classes are each assigned “association score” which indicate the degree of preference between verb and class. There arises one more new term which plays an important role in the disambiguation process, ACTM(Association Tree Cut Model) which is a collection of classes with association scores .WSD using the ACTM simply selects the word with highest association score.
For Example: The sense for Chicken under FOOD(has association score=75) would be preferred over the senses under the Life Form(association score=3) when occurring as the direct object of “eat”.
Now let me briefly explain the whole system after integrating the above components.
Working of whole system :
The input to the system is a raw text which has been tagged by the HMM.This is fed to the shallow parser. The ouput from the shallow parser that is the verbs extracted from the sentences are then fed to the next component which is a SCF. The set of synonyms or senses for each of the verb are obtained. Then the resulting output is directed to the Selectional Preference Acquisition which assigns the preferences to the senses , based on which , the next component that is WSD selects the sense or synonym with highest preference and replaces the word with that preference. This is how the whole system works!!!!!!
Hope this post is understandable to the readers and will be helpful :) Now I realized the importance of text simplification. One needs lot of efforts in manual “text simplification” of the paper which was a tough task, but now feeling great that I understood not the whole but atleast what the authors are trying to say. This is an advantage of text simplification which saves a lot of time in searching the meaning of each and every word of an IEEE paper. The whole paper needs at-least four hours to understand but the simplified version which I have presented here will not take more than 10 minutes :):)
No comments:
Post a Comment