Chris Pollett >
Students > [Bio] [TheoryOfComputing Slides-PDF] [CS298 Presentation Slides-PDF] |
CS297 ProposalJapanese Kanji Suggestion ToolSujata Dongre (sujata.dongre@gmail.com) Advisor: Dr. Chris Pollett Description: Many times, we can see that if we enter misspelled search term in any of the search engines like Google, it will provide some help with "Did you mean:...". Similarly in my project, I am trying to provide some suggestions for the wrong Japanese text entered by a user. Japanese language has three types of writing styles - Hiragana, Katakana and the most difficult, Kanjis. In old days Japanese script was only written vertically. However, the horizontal writing style is more common nowadays. A single Kanji may be used to write one or more different compound words. From the point of view of the reader, Kanji are said to have one or more different "readings". Hence sometimes it becomes very difficult to understand what are the Kanjis and how to read them even if you know Japanese language.There are various different translation tools available nowadays that provide translation help. The famous websites for translation are as follows:
Yahoo: http://babelfish.yahoo.com/translate_txt But what if the Japanese term that you are entering for search is itself wrong? e.g. You are reading some Japanese text on website. You come across a sentence like "まずは、正しい英語学習法に頭をCHANGEしてください。" You do not understand the meaning of the sentence as you are unable to read the Kanjis. Now, you decide to use one of the above tools for translation. But you do not even understand which Kanjis to copy and mistakenly, you select "習法".Search results given by the above three translation websites are as follows:
Yahoo: Learning Method Hence, the purpose of my project is to ask user, "Did you mean: "学習法"?", which is the correct Japanese term and also has the equivalent English meaning. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Report on experiments with installing and running Japanese text corpus 2. Report on standard Japanese grammar techniques for computers 3. Report on experiments with parsers and algorithm 4. Report on experiments with online dictionaries or MySQL 5. CS297 Report References: Kyoto University Text Corpus 4.0. Retrieved August 26, 2009, from http://www-lab25.kuee.kyoto-u.ac.jp/nl-resource/corpus-e.html The Tanaka Corpus. Retrieved August 26, 2009, from http://www.csse.monash.edu.au/~jwb/tanakacorpus.html [1996] Statistical Language Learning. Eugene Charniak. MIT Press. 19996. Nagata Masaaki, Saito Teruka, Suzuki Kenji, (2001). Using the web as a bilingual dictionary, Proceedings of the workshop on Data-driven methods in machine translation, p.1-8, Toulouse, France. Qu Yan, Grefenstette Gregory, Evans David, (2003). Automatic Transliteration for Japanese-to-English Text Retrieval, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, p. 353-360, Toronto, Canada. Japanese WordNet. Retrieved August 26, 2009, from http://nlpwww.nict.go.jp/wn-ja/index.en.html WWWJDIC: Online Japanese Dictionary Service. Retrieved August 26, 2009, from http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C The EDICT Dictionary File. Retrieved August 26, 2009, from http://www.csse.monash.edu.au/~jwb/j_edict.html Kanji. Retrieved August 26, 2009, from http://en.wikipedia.org/wiki/Kanji |