CS297-298 Project News Feed

Meeting Date	Discussion Topics	Suggestions	TO DO
Nov 16, 2010	Implement HMM program into nutch web crawler	Dr. Pollett reviewed first draft for CS298 report and suggested changes	Incorporate HMM program into nutch Finish first draft of CS298 report Write program to save HMM final matrix into file and use that in all experiment programs
Nov 9, 2010	Implement HMM program into nutch web crawler	Dr. Pollett suggested few changes to add HMM program into the nutch web crawler	Incorporate HMM program into nutch Start writing experiments and report Write program to save HMM final matrix into file and use that in all experiment programs
Nov 2, 2010	Discuss about nutch web crawler	Dr. Pollett suggested few changes to make the nutch web crawler work	Try running nutch web crawler on windows platform Start writing experiments and report Write program to save HMM final matrix into file and use that in all experiment programs
Oct 26, 2010	Discuss how to merge all experiments and about front-end to show results	Download nutch web crawler. Dr. Pollett asked me to start writing experiments and report	Download nutch web crawler Start writing experiments and report Write program to save HMM final matrix into file and use that in all experiment programs Done with experiment 2 bug. All experiments are working now
Oct 19, 2010	Make changes in the experiments	Dr. Pollett suggested few experiments using n-gram approach	Work on experiments Experiment 1: Sort the binary search tree on number of occurrences of a particular word Experiment 2: Make changes in the code. Assume user enters two characters C1 and C2, if its count is positive, give this to HMM to find out what C3 would be else ignore that string Experiment 3: done. How to make changes in this program? Read existing crawlers, download and experiment with existing crawlers
Oct 12, 2010	Make changes in the experiments	Dr. Pollett suggested few experiments using n-gram approach	Work on experiments Experiment 1: Make changes in the code. Use binary search tree to store word as key and its value as number of occurrences in the corpus file Experiment 2: Make changes in the code. Use substring, array slice of K = 2 Experiment 3: done Read existing crawlers, download and experiment with existing crawlers
Oct 5, 2010	Different experiments with HMM Japanese Parser and Tanaka Corpus	Dr. Pollett suggested few experiments using n-gram approach	Work on experiments Experiment 1: Read characters in Tanaka Corpus file assuming window size of 2. This experiment is useful to suggest what will be the next character of user input string Experiment 2: Read characters in Tanaka Corpus file such that if special character is found then add 1 to count else subtract 1. This experiment is useful to detect the end of the word Experiment 3: Read characters in Tanaka Corpus file unless special character is found. This experiment is useful to make one dictionary of japanese words using corpus file Read existing crawlers, download and experiment with existing crawlers
Sept 28, 2010	Different experiments and making program more flexible for user input	Dr. Pollett suggested few experiments one is using HMM model and other using n-gram approach	Use generics as per JDK1.6 standards Modify program for first experiment: Accept user input and search string in Tanaka Corpus Modify program for second experiment: Accept user input and replace some of the characters and display the suggested string Assuming window size of 2, experiment with n-gram
Sept 21, 2010	How to check which character to choose to show to the user	Store probabilities in an array and sort the array and print it. Depending on probabilities, make right choices about the letters to be shown to the user as the suggested string. Use Tanaka Corpus to decide which one is better	Add sorted array functionality Print sentences form Tanaka Corpus file with the highest probability string
Sept 14, 2010	Modifications in Viterbi program	Modify viterbi program to accept input form the user. Append 191 japanese characters at the beginning, in the middle and at the end of the user input and run viterbi program on all these combinations. Get the string with the highest probability from viterbi program and print sentences containing this string from Tanaka Corpus file using command line	Modify viterbi program as suggested Print sentences form Tanaka Corpus file with the highest probability string Upload sub-deliverables
Sept 7, 2010	Viterbi program	With given HMM model for japanese language parsing, check the highest probability and path with the highest probability	Work on viterbi program, check if it is outputting correct probability and path
Aug 31, 2010	Analyzing probabilities after running HMM on Japanese text corpus	Come up with some rules of state transitions after analyzing the probabilities. Use that rule to detect word boundaries	With the given HMM and user input string, write a program that will output the string with the highest probability
May 12, 2010	Discuss about analyzing japanese corpus file and work to be done in future	Work on japanese characters in HMM training algorithm	Complete HMM training program for japanese characters
May 5, 2010	Discuss about analyzing japanese corpus file	Update HMM training program for japanese characters	Read Japanese characters from the corpus file. Assign all hiragana characters to different numbers of observations. Assign all katakana characters to different numbers of observations. Assign all kanjis to only one observation Yahoo search API code is working
Apr 28, 2010	Discuss about the HMM training program and how to implement search functionality	HMM training program is working now	Read Japanese characters from the corpus file. Assign all hiragana characters to different numbers of observations. Assign all katakana characters to different numbers of observations. Assign all kanjis to only one observation Implement search functionality using Google search APIs or Yahoo search APIs
Apr 21, 2010	Discuss about the HMM training program and how to implement search functionality	HMM training program is working now	Change HMM training program for Japanese corpus Implement search functionality using Google search APIs or Yahoo search APIs
Apr 14, 2010	Discuss about the tested HMM training program	We found out that the total number of characters is less than 50000 in file read and write code	Replicate the results from Dr. Mark Stamp's paper by hard coding initial probabilities Use these probabilities to check if the HMM is converging only in first iteration or not
Apr 7, 2010	Discuss about the tested HMM training program	Dr. Pollett checked the HMM training algorithm program. He checked the reading and writing of observation sequence file. We debugged the program and found out that there might be some problem with file read and write	Test file reading and writing code
Mar 24, 2010	Discuss about the log probabilities in the HMM training program	Dr. Pollett checked the HMM training algorithm program and suggested to debug the code just for 2 iterations. He asked to calculate all the probabilities manually to make sure that program is bug free	Check for the bugs in a program by iterating the program for two iterations
Mar 17, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett checked the HMM training algorithm program and suggested for few changes according to the Dr. Mark Stamp's paper	Check for the bugs in a program and try to make it work for English text using the Brown Corpus file
Mar 10, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett checked the HMM training algorithm program and suggested for few changes according to the Dr. Mark Stamp's paper	Make changes in HMM training algorithm program referring to Dr. Mark Stamp's paper
Mar 3, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett suggested to meet Professor Stamp and get some advice from him about HMM training algorithm	Make changes in HMM training program for calculating count C and hence the probability Professor Mark Stamp asked me to read his paper about HMM. He also suggested to think about the total number of characters to be considered for HMM training algorithm
Feb 24, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett suggested few changes in the program for calculating transition probabilities	Make changes in HMM training program for calculating count C and hence the probability
Feb 17, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett suggested few changes in the program for calculating transition probabilities	Make changes in HMM training program for calculating count C and hence the probability
Feb 10, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett explained the steps for implementing HMM training algorithm in details	Complete the HMM training program by implementing the steps
Feb 3, 2010	Discuss about the program for the HMM training algorithm	Dr. Pollett explained the steps for implementing HMM training algorithm. He also reviewed the program written by me	Complete the HMM training program by implementing the steps
Jan 27, 2010	Decide the meeting time for CS298		Start working on HMM training algorithm
Dec 1, 2009	Discuss about all the deliverables and CS297 report	Dr. Pollett reviewed all the deliverables and suggested few changes in some of the PDF files	Make changes in deliverable 2 Take printout of CS297 report and submit Check all the pages validate as XHTML 1.1. Also do full check using Acrobat Pro Prepare CS298 proposal and decide on committee members
Nov 24, 2009	Discuss about deliverable 4	Dr. Pollett suggested few changes to resolve the installation errors of MySQL N-gram parser plugin	Prepare deliverable 4 for MySQL N-gram parser installation experiment. Prepare CS297 Report.
Nov 17, 2009	Discuss about HMM training	Dr. Pollett explained me HMM training algorithm.	Add probability calculation table for HMM training in deliverable 2 Experiment with MySQL full text plugin for Japanese language
Nov 10, 2009	Discuss deliverable 3 and deliverable 4	Dr. Pollett verified the HMM example from deliverable 2. He asked me to add HMM learning algorithm in deliverable 3 and start to work on deliverable 4.	Make slides/PDF file for HMM learning algorithm: deliverable 3 changes Search for MySQL full text search for Japanese language: deliverable 4
Nov 2, 2009	Discuss deliverable 3	I asked few queries about the HMM model example in deliverable 2. Dr. Pollett explained me about the transition probabilities and emission probabilities and why is it required to consider the emission probabilities.	Make changes in deliverable 2 HMM model example. Understand HMM learning/training. Upload deliverable 3 of Viterbi and Forward Viterbi algorithm programs.
Oct 27, 2009	Discuss deliverable 3	Dr. Pollett suggested some changes in example for Viterbi algorithm. Also he asked me to understand HMM training algorithm and write a program for Viterbi and Forward Viterbi algorithms	Make changes in deliverable 2 report. Understand HMM learning/training. Write programs for Viterbi and Forward Viterbi algorithms.
Oct 20, 2009	Finalizing contents of deliverable 2 and discuss about deliverable 3	Dr. Pollett reviewed the example for HMM and Viterbi algorithm. He explained me the difference between Viterbi and Forward Viterbi algorithms.	Make changes in deliverable 2 report. Understand HMM learning/training. Write programs for Viterbi and Forward Viterbi algorithms.
Oct 6, 2009	Progress about deliverable 2 and discuss algorithms for parsing Japanese text	Dr. Pollett suggested few changes in the HMM report. He asked me to explain with my own example on HMM and Viterbi algorithm. Then he suggested me to start working on deliverable 3 by understanding the Viterbi algorithm and writing a program for it.	Make changes in deliverable 2 report. Prepare slides for chapters 2,3 and 4 from SLP. Start working on deliverable 3:Program for Viterbi algorithm.
Sept 29, 2009	Progress about deliverable 2 and Japanese parsing techniques	There are two parsers used for japanese text such as Chasen Morphological Analyzer and MeCab. Chasen is based on Hidden Markov Model and MeCab is based on CRFs. Dr.Pollett explained me some of the concepts in NLP such as entropy. He asked me to read and understand what HMMs and CRFs, why and how they work?	Read second and third chapter from SLP and prepare slides. Read and get better understanding of HMMs and CRFs. Write a report on HMMs and CRFs. Work on Deliverable 2
Sept 15, 2009	Progress about deliverable 1 and Japanese parsing techniques	Dr.Pollett suggested few changes in the Theory of Computing slides. He also suggested me few solutions for developing a program to remove english sentences from Tanaka Corpus file. Then we discussed about deliverable 2. Dr.Pollett asked me to find out different techniques that are used for parsing japanese text.	Put new program for removing english sentences from Tanaka Corpus file. Update Theory of Computing slides with examples. Read second chapter from SLP and prepare slides Describe more about Tanaka Corpus in deliverable 1. Make changes in the link tag. Find out techniques used for japanese text. Work on Deliverable 2
Sept 8, 2009	Japanese Corpus	Dr.Pollett asked me to read first chapter from the book SLP and prepare slides for the same. He also gave me Theory of Computing book for understanding Finite Automata concepts. After that we discussed about Kyoto Text Corpus and Tanaka Text Corpus. It is not possible to check working for Kyoto Text Corpus as it requires to purchase a CD. Tanaka Text Corpus works ok. Dr.Pollett suggested me to make changes in the existing Tanaka Text Corpus file and write a program that will take some kanji as input and display all the lines from the file containing that kanji character.	Write a program: input = kanji or any japanese character, output = Lines containing that kanji/character. Read first chapter from SLP and prepare slides Read some sections from Theory of Computing book and prepare slides Work on Deliverable 1
Aug 26, 2009	Initial proposal	Dr.Pollett suggested few changes in the initial proposal. He reviewed description of the project and suggested few changes about the purpose of the project. Dr. Pollett also asked me to refer to the Statistical Language Processing book and some of the Japanese corpuses.	Make required changes in the proposal and submit final copy to CS department. Start reading Statistical Language Learning book. Search for Japanese corpuses.