Chris Pollett > Students > Shenoy

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Improve Q/A Patch]

    [Literature Review]

    [Hindi Part of Speech Tagger]

    [Refactor Code and support Localization]

    [CS 297 Report-PDF]

    [CS 298 Proposal]

    [Del 1: Triplet Extractor]

    [Del 2: Lexicon in Database]

    [Del 3: Best Answer at the Top]

    [Del 4: Add Named Entities]

    [CS 298 Report-PDF]

    [CS 298 Presentation-PDF]

























Implement a Hindi Part of Speech Tagger

Description: The aim of the deliverable was to implement a hindi part of speech tagger. Given a Hindi sentence the PoS tagger should be able to tag each word in the sentence with the most likely part of speech.

The main challenge while implementing the tagger was getting a hindi lexicon. After finding the lexicon, I had to work on cleaning the lexicon file so that it was in a format similar to the English lexicon. After getting the lexicon in place, I implemented a rule based part of speech tagger.

The patch with the Hindi Part of speech tagger Hindi Part of Speech Tagger