Saturday, January 22, 2011

Ambika's story so far

A reader's task in the early stage of reading involves word identification and sentence processing with the goal of extracting meaning from basic component units of the text. If the text involves very complicated words then the reader may find it difficult and it may be time consuming to understand. There is no such facility available online for web readers to simplify the text automatically with just a plug-in.This project intends to develop such a plug-in.

A list of linguistic issues need to be addressed,including:resolution of pronouns and anaphoric references,assigning correct tense to the verbs that depend on the governing verbs or other elements,deciding the implicit subject of the verb in relative clauses,etc.

The package that will be developed in this project will help readers simplify the wikipedia articles in particular and other various articles present on the web. This package will prove to be a very unique contribution in the field of text simplification. Many techniques have evolved over the years for text simplification such as PSET,HAPPI,KURA-for users with language disabilities,SKILLSUM-for people without disabilities who have low literacy and ATA-for language teachers,children and adult secondary learners. But each of these have their own drawbacks. The project is intended to simplify text and enable the readers with different levels of vocabulary to understand the text easily.

In order to develop such a package we need
  • Python
  • Technical typesetting
  • NLTK(Natural Language Processing Toolkit)
  • WordNet
  • Web development API's
  • CGI
The learning task is divided among the team members.After learning each tool/package, the team member should teach the other members.
Started off with learning the basics of python which is a very easy programming language .
Learnt WordNet which is a package with large lexical database of
English.Nouns,verbs,adjectives and adverbs are grouped into sets of cognitive synonyms,each expressing a distinct concept.
Also learnt the PDB-Python Debugger. It implements an interactive debugging environment that lets you pause your program,look at the values of variables ,and watch step-by step execution of your program.Hence it facilitates and helps to undestand what the code actually does.
Learnt NLTK usage. I am bit familiar with it as of now.

Coming to my idea towards text simplification is that it would be very helpfull for the children and visually disabled people that if we replace the complicated words in the sentences with the appropriate pictures or figurs that best fit the context so that they can understand it in an efficient way.

4 comments:

  1. hey Ambika,
    nice post ..it gives proper information about for project:)

    ReplyDelete
  2. Great job Ambika, this is way beyond what I expected from someone who joined the project a couple of days back. Thumbs up!

    Your writing is both lucid and striking. This is exactly what our text simplification engine should achieve.

    You have had a very serious tone throughout the blog :-) I wish that you make it less serious the next time.

    BTW, a serious tone like this is what makes a project report a great success. You will play a phenomenal role in the thesis write-up :-). I am going to assign that mega task to you :-).

    ReplyDelete
  3. I know.. U just compiled everything what we did from the past few weeks in a lucid manner.
    We would like to see you tickle ur funny bone too :-)

    ReplyDelete