Deliverable 1

Goals

The goal of this deliverable was to perform a crawl using Yioop and make changes to its UI. Following two issues in UI were fixed

  • Restrict Sites By Url check box in Edit crawl options page was not working in Internet Explorer.
  • A logged in user was automatically logged out when search button was hit in the main search page.

Installation of Yioop and making it to crawl

The machine used for installation of Yioop was a Windows XP Professional running on Intel Core2Duo with 2GB memory. The machine has a latest XAMPP bundle for Apache and PHP. The latest version of Yioop was downloaded from www.Seekquarry.com and placed in the document root of XAMPP installation. The one time configuration was done and data directory for the crawl was created.The steps for crawl were followed from www.seekQuarry.com and the Queue_Server and Fetcher scripts were started using the php command line. However the crawl was unsuccessful because fetcher was unable to send information about the crawl back to the queue_server. This information is sent to the queue_server as POST data using curl. A careful observation and experimentation revealed that it was failing because the URL for queue_sever in the configuration setting was incorrect. A / should be appended to the path of queue server in the configuartion. If / is not present at the end of URL for queue server, it causes the redirection of page and the posted data won't be sent once the redirect occurs.

i.e url should look like http://localhost/yioop/ rather than http://localhost/yioop

The URL was modified and crawl was started again. This time the fetcher successfully started sending information about crawled sites back to the queue server and it was stored in the data directory. The yioop search screen started fetching results for user queries against the crawl data.

crawl result

Description and Fix for First Issue in UI

For performing a crawl using Yioop a user logs into the application and navigates to the Manage crawl link of the main admin screen. The Option link next to the start new crawl button takes the user to the Edit crawl options page. The Restrict Sites By URL check box can be checked to dynamically dispaly a Allowed To Crawl Sites text area. This functionality was working in Mozilla firefox but not in Internet Explorer. To fix the issue crawloptions_element.php and basic.js were analyzed. This file contained the code for elements restrict-sites-by-url checkbox and allowed-sites text area. The check box element was using an onchange DOM event to invoke a javascript function which changed the visibility of text area element. The onchange event occurs when the content of a fileupload, select, text and textarea Javascript objects changes. Hence the event was not triggered in Internet explorer when the check box selected. To fix the issue the onchange DOM event was changed to onclick DOM event. This DOM event is supported by checkbox element and triggered when element is clicked to change its state. After the change was made the issue was fixed for IE and also worked fine for mozilla.

The code fragment from Yioop for elements and Javascript function used to investigate the issue are in the file below. DOM test

Description and Fix for Second Issue in UI

A registered user can log into yioop and perform admin tasks. A logged in user can also return to the main yioop search page to use the search engine to get search results. When this user enters a query and hits the search button the search results are displayed for the query. However when this was done the user was automatically logged out of the system. To fix this issue the following files were investigated from the yioop source code

  • admin_controller.php
  • controller.php
  • search_controller.php
  • search_view.php
  • signin_view.php
To prevent CSRF attacks Yioop uses a token to authenticate activities performed by a user logged into the system. This token is generated by a fucntion generateCSRFToken inside controller.php file. The token is generated through a hash of userid, timestamp and the authentication key. This token is checked by the checkCSRFToken function inside controller.php which returns a boolean value for a set of input parameters which is the token name and userid. This function extracts the token value from the $_REQUEST variable using the token name passed as parameter. when a user logs in, this token is generated by admin controller and stored in the $data array variable. This token is then extarcted from the array and placed as a hidden field of the form inside view files. The search_controller invoked during the search process verifies the token by invoking the checkCSRFToken function. If the function returns a false value it terminates the user session. This step was always getting executed because the search_view file did not have the hidden field inside the form with the token value. Hence the checkCSRFToken function was not getting the token value inside the $_REQUEST varible and was always returning a false value. Hence this hidden field was added to the search_view form to fix the issue. The code fragment added to form in search_view.php file is given below.

<input type="hidden" name="YIOOP_TOKEN" value="<?php e($data['YIOOP_TOKEN']); ?>" />