Chris Pollett > Students >
Darshan

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS297 Project Report - PDF]

    [CS298 Proposal]

    [CS298 Spring 2011 - Progress Report]

    [CS298 Report]

    [CS298 Presentation]

    [CS298 Project Code]

                          

























Deliverable 1: Experiment with Heritrix

Description: Did study about heritrix, built project source code and got it to run and do sample crawls.

Obtaining Source Code: Download Source Code for Heritrix from sourceforge svn url using following command: svn co https://archive-crawler.svn.sourceforge.net/svnroot/archive-crawler/release-branches/heritrix-1.14.4 heritrix-1.14.4

Building Heritrix:

  • Heritrix can be built from source using Maven. Note: Do not use Maven 2.x
  • Setup the required plugins as described in Developer's Manual of Heritrix, point 2.2.[1]
  • Go to the heritrix-1.14.4 folder we have built: cd heritrix-1.14.4
  • Run maven command to start building: maven dist
  • If any error occurs, follow the onscreen instruction and download the required package for maven manually.
Text_description

Running Heritrix:

  • Go to heritrix-1.14.4/bin directory and type following command to launch web-interface for heritrix: heritrix --admin=LOGIN:PASSWORD
  • Here, LOGIN=admin, PASSWORD=letmein
  • Launch Web Browser and go to following address to access web based user interface for Heritrix: http://127.0.0.1:8080 [2]
Running Heritrix from command prompt Background Process for heritrix Login Screen Console Jobs Logs Reports Reports