CS 297 Deliverable 03

Adding Relationship Links to Yioop

Description

While accessing any Wikipedia page, there is a link towards the left side of the page under Tools: "What Links Here". This displays all the other Wikipedia pages that are linked to the particular page. We shall implement the same feature by adding relationships links to Yioop pages. Thus, which page links to which other pages can be accessed easily.

To perform this action, three steps needs to be followed:

1. Fetch all the links from all pages and store them

2. Obtain their page IDs

3. Store them in a database along with the relationship type. As of now, we will be adding a constant integer "-1" to the type of relationship fields in the table. This will be taken care of in the next deliverable.

Steps to be followed

Step 01

To fetch all links from any particular page, we need to observe the file where the Wikimedia content is being translated to HTML. Here, some regular expressions indicating the presence of a link are used to fetch the linkages. These links are stored and changed in a format suitable to get their page ids from already existing database.

Step 02

Once we have all the links, their Page ID's can be obtained from Yioop database. After obtaining the parent ID and IDs of all pages that link from the parent page, its time to switch to Step 03

Step 03

There is a table named GROUP_PAGE_LINK with fields as Relationship Type, Parent Page ID and the Child Page ID. A database connection is established and the table is populated with these values. As mentioned above, "-1" is used for relationship type at this point of time. However, in Deliverable 04, identifying relationship type will be the major focus.

Deliverables

Mantis Bug ID:0000179

src/configs/Config.php

src/library/WikiParser.php

src/models/GroupModel.php

Code Snippet

For Fetching URLS, their ID and inserting to Database:

/**
* Title:fetchingURL
* Description: Calls method to fetch and display urls in a wikimedia document.Class 297 Deliverable-03
* COPYRIGHT (C) 2016
* Date 04/24/2016
* @author Yashi Kamboj
* @version 1.0 
*/
namespace Deliverable03\fetchURL_composer;

use seekquarry\yioop\models\GroupModel;
require_once "vendor/autoload.php";
/**
 * This is class is used to fetch internal links from a mediawiki document
 * @author Yashi Kamboj
 */
class FetchURL {
    /**
     * Used to get the data present in a mediawiki document
     *
     * @return $data mediawiki data content
     */
    public function getData($directorypath)
    {
        $data = array(); 
        //Get the file path
        $directory = $directorypath;
        chdir($directory);
        $fileArray = glob('*.wiki');
        foreach ($fileArray as $key => $value) { 
            $myFileName = $value;
            $fh = fopen($myFileName,'r');
            $data= fread($fh,filesize($myFileName));
            //echo $data;
            fclose($fh);
        }        
        return $data;
    }
    /**
     * Used to get the URLs present in a mediawiki document
     *
     * This function allows to get the URLs by matching the regex patterns to that of a mediawiki url.
     * On matching, it fetches the URLs and stores them in an array.

     * @return an array $data mediawiki URLs
     */
    public function getURLs($data)
    {
        //get the array containing wikimedia data as an input to this function
        $html = $data;
        $matches =null;
        echo sizeof($matches); echo("\n");
        preg_match_all("/\[\[([^\[\]]+?)\|([^\[\]]+?)\]\]/s",$html,$matches, PREG_PATTERN_ORDER);
        echo sizeof($matches[2]); echo("\n");
        return $matches[2];
    }
    /**
     * Used to get the IDs of the URLs present in a mediawiki document
     *
     * This function allows to get the IDs of URLs by calling a function defined in Models->Group Model-> getPageID and querying the database.
     * On matching, it fetches the Id and stores them in an array.

     * @return an array $data IDs of mediawiki URLs
     */
    public function getIDs($URLArray)
    {
        $groupmodel = new GroupModel();
        $group_id = 100;
        foreach ($URLArray as $key => &$value){
            $URLArray[$key] = str_replace(',','',str_replace(' ', '_', $value));
            $pageId[$key] = $groupmodel->getPageId($group_id, $URLArray[$key], "en-US");
        }
        return $pageId;     
    }
    /**
     * Used to update the database for parent and child page links with their IDs
     *
     * This function allows to skip the pages for which no ID is present and 
       enter the remaining pages to the database by adding a relationship between the parent and the child page.
     */
    public function updateIDAndGroupLink($linkTypeId, $parentID, $pageID){
        $groupmodel = new GroupModel();
        foreach ($pageID as $key=> $valuePageID){
           if($pageID[$key] == false){
               continue;
           }
           $page=$pageID[$key];
           $groupmodel->updateGroupPageLink($linkTypeId, $parentID, $pageID[$key]);
           echo "Done";
        }
    }
}

Some additions need to be made to Models-> Group Model by adding a method to update the GROUP_PAGE_LINK table, the code for which is as follows:

    /**
     * Used to update the database for parent and child page links with their IDs
     */
    public function updateGroupPageLink($linkTypeId, $parentID, $pageID){
        $db = $this->db;
        $sql = "INSERT INTO GROUP_PAGE_LINK (LINK_TYPE_ID, FROM_ID, TO_ID) VALUES (?,?,?)";
        $db->execute($sql, [$linkTypeId, $parentID, $pageID]);
    }

References

Yioop Documentation: Link