Chris Pollett >
Students > [Bio] [Del1] [Del2] [Del3] |
CS298 ProposalA Fast Alogorithm For Data MiningAarathi Raghu (aarathi99@yahoo.com) Advisor: Dr. Chris Pollett Committee Members: Dr. Chris Pollett, Dr. T.Y.Lin, Dr. Mark Stamp Abstract:Data Mining is a growing field and a variety of algorithms have been proposed. The Apriori algorithm is a commonly used algorithm in data mining. Recently, Lin and Louie[Lin2002] have proposed using bitmaps to improve the speed of this algorithm. Further, Lin, Hu, and Louie[Lin2003],have proposed a way to look at association rules as a lattice. A basis for this lattice could be used to generate any association rule for a given dataset. However, algorithms for generating a basis for this lattice are potentially exponential time. We studied modifications of the Apriori algorithm and the lattice based algorithm in CS297. Implementation of these algorithms formed the three deliverables for CS297. We have successfully completed the three deliverables- the bitmap based apriori ,the disk-based improvement of the previous implementation, and the lattice based algorithm.The goal of this project is to implement and test this lattice generating algorithm for datasets which follow a particular kind of distribution, for example, a Zipf distribution. We are hoping that for different kinds of distributions the algorithm might run faster and also be easily improved. Based on our experiments, we intend to make improvements to this algorithm in CS298. We also intend to try out a binning mechanism to make data conform more to a particular distribution and run the lattice-algorithm against it. CS297 Results
Proposed Schedule
Key Deliverables:
Innovations and Challenges
References:[Ramakrishnan] R.Ramakrishnan and J.Gehkre. Fundamentals of Database Systems.McGraw-Hill, 2002 [Molina] H.Garcia-Molina, J.Ullman, and J.Widom.Database System Implementation.Prentice-Hall, 2000 [Agarwal1994] R.Agarwal, and R. Srikant. Fast Algorithms for Mining Association Rules. Proc. Intl. Conf. on Very Large Databases. pp1522-1534. [Lin2003] T.Y.Lin, X.T.Hu, and E.Louie.Using Attribute Value Lattice to Find Frequent Itemsets. Data Mining and Knowledge Discovery: Theory,Tools and Technology. 2003.pp 28-36. [Lin2002] T.Y.Lin, and Eric Louie. Finding Asscoiation Rules by Granular Computing: Fast Algorithm for finding association rules. Data Mining, Rough Sets and Granular Computing. 2002.pp 23-42. |