Chris Pollett > Students >
Amith

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Del 1: OPIC Algorithm implementation]

    [Del 2: SALSA Algorithm & Nutch]

    [Del 3: Nutch implementation]

    [Del 4: HITS Algorithm implementation]

    [CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Final Report - PDF]

    [CS298 Project Source Code - ZIP]

                          

























Deliverable 2

The deliverable 2 provides slides describing the Stochastic Approach for Link-Structure Analysis (SALSA) algorithm and Nutch search engine.

SALSA Algorithm

SALSA is a variation of HITS algorithm (developed by Jon Kleinberg). It takes a result set R as input, and constructs a neighborhood graph from R in precisely the same way as HITS. Similarly, it computes an authority and a hub score for each vertex in the neighborhood graph, and these scores can be viewed as the principal eigenvectors of two matrices. However, instead of using the straight adjacency matrix that HITS uses, SALSA weighs the entries according to their in and out-degrees. Please refer the below presentation slides for further information.

    [SALSA Algorithm - PDF]

Nutch search engine

Nutch is an open source search engine. It uses Lucene for the search and index component. Nutch has a highly modular architecture allowing developers to create plug-ins for activities such as media-type parsing, data retrieval, querying and clustering. Lucene is a Free and Open Source search and index API released by the Apache Software Foundation. It is written in Java and is released under the Apache Software License. Lucene is just the core of a search engine. As such, it does not include things like a web spider or parsers for different document formats. Instead these things need to be added by a developer who uses Lucene.

Please refer the below presentation slides for further information.

    [NUTCH search engine - PDF]