Chris Pollett >
Students > [Bio] [Blog] |
CS298 ProposalImproved Chinese Language Processing for an Open Source Search EngineXianghong "Forrest" Sun (xianghong.sun@sjsu.edu) Advisor: Dr. Chris Pollett Committee Members: Dr. Robert Chun, Dr. Mike Wu Abstract:Yioop is an open source search engine. It supports many languages and has abilities to process and analyze text in different languages. Currently, there is some support for Chinese language processing in Yioop, such as Chinese text segmentation. However, the support for Chinese is limited and not very good. In this project, I am going to implement a better algorithm to segment Chinese text; I will implement an algorithm to do the part-of-speech tagging; and, I will also implement the Chinese question and answering system. CS297 Results
Proposed Schedule
Key Deliverables:
Innovations and Challenges
References:Sproat, Richard & Shih, Chilin & Gale, William & Chang, Nancy. (2002). A Stochastic Finite-State Word-Segmentation Algorithm For Chinese. Computational Linguistics. 22. 10.3115/981732.981742. Xue, Nianwen. Chinese Word Segmentation as Character Tagging. International Journal of Computational Linguistics and Chinese Language Processing, 8(1). 2003 Huihsin Tseng, Daniel Jurafsky, Christopher Manning. 2005. Discriminative Reordering with Chinese Grammatical Relations Features Dan Jurafsky and James H. Martin, Draft chapters in progress, August 29, 2019 Speech and Language Processing (3rd ed. draft) CH 23: Question Answering J Prager - Information Retrieval, 2006 Open-Domain Question-Answering Mengqiu Wang and Christopher D. Manning. 2013. Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning. Transactions of ACL 2013 Mengqiu Wang, Wanxiang Che and Christopher D. Manning. 2013. Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition. ACL 2013 Mengqiu Wang, Wanxiang Che and Christopher D. Manning. 2013. Effective Bilingual Constraints for Semi-supervised Learning of Named Entity Recognizers. AAAI 2013 Wanxiang Che, Mengqiu Wang and Christopher D. Manning. 2013. Named Entity Recognition with Bilingual Constraints. NAACL 2013 |