Chris Pollett > Students >

    ( Print View )


    [Project Blog]

    [CS297 Proposal]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS297 Project Report - PDF]

    [CS298 Proposal]

    [CS298 Spring 2011 - Progress Report]

    [CS298 Report]

    [CS298 Presentation]

    [CS298 Project Code]


Deliverable 1: Experiment with Heritrix

Description: Did study about heritrix, built project source code and got it to run and do sample crawls.

Obtaining Source Code: Download Source Code for Heritrix from sourceforge svn url using following command: svn co heritrix-1.14.4

Building Heritrix:

  • Heritrix can be built from source using Maven. Note: Do not use Maven 2.x
  • Setup the required plugins as described in Developer's Manual of Heritrix, point 2.2.[1]
  • Go to the heritrix-1.14.4 folder we have built: cd heritrix-1.14.4
  • Run maven command to start building: maven dist
  • If any error occurs, follow the onscreen instruction and download the required package for maven manually.

Running Heritrix:

  • Go to heritrix-1.14.4/bin directory and type following command to launch web-interface for heritrix: heritrix --admin=LOGIN:PASSWORD
  • Here, LOGIN=admin, PASSWORD=letmein
  • Launch Web Browser and go to following address to access web based user interface for Heritrix: [2]
Running Heritrix from command prompt Background Process for heritrix Login Screen Console Jobs Logs Reports Reports