Chris Pollett > Students >
Aarathi

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Del1]

    [Reading-PPT]

    [Del2]

    [Del3]

    [CS297Report-PDF]

    [CS298Proposal]

    [CS298Del1]

    [Project Code-ZIP]

    [CS298Presentation-PDF]

    [CS298Report-PDF]

                          

























CS297 Proposal

Fast algorithm for data mining

Aarathi Raghu (aarathi99@yahoo.com)

Advisor: Dr. Chris Pollett (pollett@cs.sjsu.edu)

Description:

The Apriori algorithm [Agarwal1994] is a common algorithm for mining association rules from large datasets. Recently, Lin and Louie have proposed using bitmaps to improve the speed of this algorithm. Further, Lin, Hu, and Louie,have proposed a way to look at association rules as a lattice. A basis for this lattice could be used to generate any association rule for a given dataset. However, algorithms for generating a basis for this lattice are potentially exponential time. The goal of this project is to implement and test this lattice generating algorithm for datasets which follow a particular kind of distribution, for example, a Zipf distribution. We are hoping that for different kinds of distributions the algorithm might run faster and also be easily improved. Based on our experiments, we will suggest improvements to this algorithm.

Schedule:

Weeks 1-3: 8/24 - 9/17Read Chapter 26 from [Ramakrishnan], Chapter 11 from [Molina] and [Agarwal1994] and Deliverable 1 due
Weeks 4-6: 9/18 - 10/8Read [Lin2002] and Deliverable 2 due
Weeks 7-9: 10/9 - 10/29Read [Lin2003] and Deliverable 3 due
Weeks 10-12: 10/30 - 11/19Deliverable 4 due
Weeks 13-15: 11/20 - 12/10Deliverable 5 due

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Apriori algorithm using bitmaps (not yet a disk-based implementation)

2. Disk-based implementation of the Apriori algorithm

3. Lattice-basis generating algorithm

4. Preliminary experimental results

5. CS 297 Report.

References:

[Ramakrishnan] R.Ramakrishnan and J.Gehkre. Fundamentals of Database Systems.McGraw-Hill, 2002

[Molina] H.Garcia-Molina, J.Ullman, and J.Widom.Database System Implementation.Prentice-Hall, 2000

[Agarwal1994] R.Agarwal, and R. Srikant. Fast Algorithms for Mining Association Rules. Proc. Intl. Conf. on Very Large Databases. pp1522-1534.

[Lin2003] T.Y.Lin, X.T.Hu, and E.Louie.Using Attribute Value Lattice to Find Frequent Itemsets. Data Mining and Knowledge Discovery: Theory,Tools and Technology. 2003.pp 28-36.

[Lin2002] T.Y.Lin, and Eric Louie. Finding Asscoiation Rules by Granular Computing: Fast Algorithm for finding association rules. Data Mining, Rough Sets and Granular Computing. 2002.pp 23-42.