Chris Pollett > Students > Kamboj

    Print View



    [CS297 Proposal]

    [CS297 Del #01 (study of wiki systems) - PDF]

    [CS297 Del #02 (adding multiple users)]

    [CS297 Del #03 (fetch links)]

    [CS297 Del #04 (what links here)]

    [CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Del #01 (getting relationship types)]

    [CS298 Del #02 (displaying to and from links)]

    [CS298 Del #03 (advanced search)]

    [CS298 Report - PDF]

    [CS298 Presentation - PDF]

CS 297 Deliverable 03

Adding Relationship Links to Yioop


While accessing any Wikipedia page, there is a link towards the left side of the page under Tools: "What Links Here". This displays all the other Wikipedia pages that are linked to the particular page. We shall implement the same feature by adding relationships links to Yioop pages. Thus, which page links to which other pages can be accessed easily.

To perform this action, three steps needs to be followed:

1. Fetch all the links from all pages and store them

2. Obtain their page IDs

3. Store them in a database along with the relationship type. As of now, we will be adding a constant integer "-1" to the type of relationship fields in the table. This will be taken care of in the next deliverable.

Steps to be followed

Step 01

To fetch all links from any particular page, we need to observe the file where the Wikimedia content is being translated to HTML. Here, some regular expressions indicating the presence of a link are used to fetch the linkages. These links are stored and changed in a format suitable to get their page ids from already existing database.

Step 02

Once we have all the links, their Page ID's can be obtained from Yioop database. After obtaining the parent ID and IDs of all pages that link from the parent page, its time to switch to Step 03

Step 03

There is a table named GROUP_PAGE_LINK with fields as Relationship Type, Parent Page ID and the Child Page ID. A database connection is established and the table is populated with these values. As mentioned above, "-1" is used for relationship type at this point of time. However, in Deliverable 04, identifying relationship type will be the major focus.


Mantis Bug ID:0000179




Code Snippet

For Fetching URLS, their ID and inserting to Database:

* Title:fetchingURL
* Description: Calls method to fetch and display urls in a wikimedia document.Class 297 Deliverable-03
* COPYRIGHT (C) 2016
* Date 04/24/2016
* @author Yashi Kamboj
* @version 1.0 
namespace Deliverable03\fetchURL_composer;

use seekquarry\yioop\models\GroupModel;
require_once "vendor/autoload.php";
 * This is class is used to fetch internal links from a mediawiki document
 * @author Yashi Kamboj
class FetchURL {
     * Used to get the data present in a mediawiki document
     * @return $data mediawiki data content
    public function getData($directorypath)
        $data = array(); 
        //Get the file path
        $directory = $directorypath;
        $fileArray = glob('*.wiki');
        foreach ($fileArray as $key => $value) { 
            $myFileName = $value;
            $fh = fopen($myFileName,'r');
            $data= fread($fh,filesize($myFileName));
            //echo $data;
        return $data;
     * Used to get the URLs present in a mediawiki document
     * This function allows to get the URLs by matching the regex patterns to that of a mediawiki url.
     * On matching, it fetches the URLs and stores them in an array.

     * @return an array $data mediawiki URLs
    public function getURLs($data)
        //get the array containing wikimedia data as an input to this function
        $html = $data;
        $matches =null;
        echo sizeof($matches); echo("\n");
        preg_match_all("/\[\[([^\[\]]+?)\|([^\[\]]+?)\]\]/s",$html,$matches, PREG_PATTERN_ORDER);
        echo sizeof($matches[2]); echo("\n");
        return $matches[2];
     * Used to get the IDs of the URLs present in a mediawiki document
     * This function allows to get the IDs of URLs by calling a function defined in Models->Group Model-> getPageID and querying the database.
     * On matching, it fetches the Id and stores them in an array.

     * @return an array $data IDs of mediawiki URLs
    public function getIDs($URLArray)
        $groupmodel = new GroupModel();
        $group_id = 100;
        foreach ($URLArray as $key => &$value){
            $URLArray[$key] = str_replace(',','',str_replace(' ', '_', $value));
            $pageId[$key] = $groupmodel->getPageId($group_id, $URLArray[$key], "en-US");
        return $pageId;     
     * Used to update the database for parent and child page links with their IDs
     * This function allows to skip the pages for which no ID is present and 
       enter the remaining pages to the database by adding a relationship between the parent and the child page.
    public function updateIDAndGroupLink($linkTypeId, $parentID, $pageID){
        $groupmodel = new GroupModel();
        foreach ($pageID as $key=> $valuePageID){
           if($pageID[$key] == false){
           $groupmodel->updateGroupPageLink($linkTypeId, $parentID, $pageID[$key]);
           echo "Done";

Some additions need to be made to Models-> Group Model by adding a method to update the GROUP_PAGE_LINK table, the code for which is as follows:

     * Used to update the database for parent and child page links with their IDs
    public function updateGroupPageLink($linkTypeId, $parentID, $pageID){
        $db = $this->db;
        $db->execute($sql, [$linkTypeId, $parentID, $pageID]);


Yioop Documentation: Link