Chris Pollett > Students > Bocage

    Print View



    [CS280 Proposal]

    [CS280 Report]

Project Blog:

May 17, 2016

  • We discussed the patches were good
  • We discussed I will have the report posted soon
  • This is our last meeting for the semester

May 10, 2016

  • We went over the 6 top sites we will be adding by default
  • We discussed that I added test cases for them
  • I was able to make the database patch but not the CMS patch
  • I need to change the test file names for Joomla and Yioop to not include the exclamation point
  • I will get the patch done tonight
  • I will try to get the report done this weekend

May 03, 2016

  • We talked about the previous performance of the detector and I explained that it was a coding mistake on my part
  • By putting the get head tags method in the constructor it decreased the execution time to something manageable
  • I noted that upgrading PHP to version 7 did not provide any speed increase
  • We will include the top 5 sites: WordPress, Drupal, Blogger, Joomla! and vBulletin
  • I need to get the patch done after I do my best on detecting the top 5 sites
  • Eventually I will get the report done but not till I get my CS299 presentation done

April 26, 2016

  • We looked at the ManyDetectorsExperiment.php experiment
  • Noticed 239 detectors had some really slow results
  • Noticed 20 detectors was 4 times slower
  • I will take the top ten from this site and add what I can for detection
  • We looked at the code to see if we can optimize the code
  • I will create a constructor, move the get head tags method into it and make the call in the checkHeadTags method to a variable and remove the page parameter if possible
  • I will run the tests again to see if we get an improvement
  • If I have time upgrade my php version and rerun the experiment
  • We discussed the report
  • Dr. Pollett said I could add some graphs to help with page count
  • Experiment bar graphs
  • Top ten pie chart of my own creation and site it
  • Work performed
  • Background - how it works

April 19, 2016

  • We looked at the clean method
  • I proposed a fix of converting the encoding
  • Dr. Pollett commented that area of the code for him to look at later
  • We talked about how to submit the work I have done
  • I will make two patches
  • One with the db changes
  • One with the other changes
  • We discussed that I need to create a ManyDetectorsExperiment.php file in tests that can be run ad-hoc
  • It is to add a whole bunch of detectors (239) to the database
  • After I create ManyDetectorsExperiment.php, Iwill run a crawl with and without the many detectors
  • In the results, I need to take note of the number of summarized pages and process pages time
  • I will try to add the top 10 or twenty good detectors if the ManyDetectorsExperiment.php experiment proves sluggish, otherwise put them all in
  • We discussed the createdb and upgradedb testing went good

April 12, 2016

  • We talked about what I have done so far
  • We looked at what I need to do for my additions when the install is fresh
  • We made the changes and I need to test that the create and the upgrade work
  • We discussed why summarization was showing blank results
  • Dr. Pollett and I debugged the code and found that the call to $parent->clean on line 1269 of CrawlComponent.php is the issue
  • I will debug the $parent->clean method and let Dr. Pollett know what I find
  • I forgot to clean up the bold values in the CMS Detectors table
  • I will clean these up before our next meeting

April 05, 2016

  • No meeting this week

March 29, 2016

  • No meeting ... Spring Break

March 22, 2016

  • No meeting this week

March 15, 2016

  • No meeting this week

March 08, 2016

  • No meeting this week

March 01, 2016

  • We discussed the help page I created
  • Dr. Pollett signed off on the content
  • We discussed the table layout of the CMS detectors
  • When the values get long, the table stretches into the help area
  • I will fix the width settings so the text wraps better
  • I need to change the delimiter for the Important Content XPath value from a semicolon to three pound signs
  • I also need to update the help page to reflect that change
  • I need to unbold the text in the value columns
  • We discussed the universal CMS detector code I wrote
  • We tested it by entering in setting that would detect sites created for Yioop
  • It was able to detect the Yioop site
  • Since it was the only detector we also confirmed that it would not detect other sites a Yioop sites

February 23, 2016

  • We discussed what I did with adding the new activity to the admin page
  • Next I will start working on getting the code to use the settings
  • I will create a ManyDetectorsExperiment.php to create a couple hundred detectors and test the pages
  • I will change the titles from CMS Detection to CMS Detectors
  • I will create the help
  • I will edit the help page to the necessary help content

February 16, 2016

  • Dr. Pollett took a look at my site setup and approved
  • We discussed my failed attempt to add a new activity to the web site
  • It turned out that I was only missing one item, adding the role to the admin role
  • We discussed more about what needs to be done to incorporate my changes into the next release
  • I need to make a change in the createdb.php file
  • I need to make a change in the AdminController.php file
  • I need to make a change in the CrawlComponent.php file
  • I need to make a change in the upgradedb.php file
  • I need to make a change in the config.php file

February 09, 2016

  • I need to get my site updated with the bio entry and proposal etcetera
  • We talked about my first deliverable
  • I will add an activity to the crawls section called CMS Detection
  • I need to look at the CrawlComponents.php file to see where I need to hook in my new activity