Chris Pollett > Students >
Patel

    ( Print View)

    [Bio]

    [Blog]

    [CS297 Proposal]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Report - PDF]

    [Defense Slides - PDF]

    [Netflix Recommendation System - PDF]

    [Item-Based Collaborative Filtering - PDF]

    [Collaborative Filtering Recommender Systems - PDF]

    [Neural Collaborative Filtering - PDF]

CS298 Proposal

Improving User Experiences for Wiki Systems

Parth Patel (parthamrutbhai.patel@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Philip Heller and Dr. Robert Chun

Abstract:

Yioop is an open-source web portal with search engine, a wiki system and discussion board. Yioop has a recommendation system for recommending groups and discussion threads, but lacks support for recommending media items like books, magazines, videos, etc. This project will extend current recommendation system in Yioop to support recommending media items in PDF, images, videos and e-doc file formats. This will be achieved by completing two main tasks. The first task is to create media job (background process that can either be scheduled or triggered by some event) in Yioop that will update the description of media items by web crawling appropriate web sites depending upon the file format of media items. The second task then aims to use the descriptions of media items for generating Hahs2Vec embeddings word vectors and combine with TF-IDF scores for words to create final vector representation of the media items. Finally find cosine similarity between media items vector and recommend top few media items similar to those which the user has already viewed.

CS297 Results

  • Implemented an Emoji Picker tool for the Direct Messages activity in Yioop which allows to use emojis in the text messages
  • Developed Yioop UI testing project in Node.js using Selenium web drivers which ensures correct behavior for Yioop UI
  • Integrated Stripe APIs for converting unused Yioop credits into real money and transfer them to users bank account
  • Understood Hash2Vec word embedding technique and refactored code of former student Anirudh Mallya related to Hash2Vec to generate recommendations for groups and threads in Yioop

Proposed Schedule

Week 1:
Aug 23 - Aug 29
  • Organizational meeting with advisor
Week 2 - Week 5:
Aug 30 - Sept 26
  • Discuss about the functionality of media job for detecting such media items which are either new or missing descriptions
  • Discuss and find suitable web sources for different media items file formats
  • Discuss about database schema updates to achieve above task
  • Implement above discussed media item and assign web sources based on file formats
Week 6 - Week 8:
Sept 27 - Oct 17
  • Discuss about functionality of crawling the web sources to find information about media item based on their titles
  • Implement above functionality and update descriptions of media items
Week 9 - Week 13:
Oct 18 - Nov 21
  • Discuss about updating many users experiment in Yioop to create test data about user interaction with media items
  • Implement the above experiment
  • Discuss about using Hash2Vec word embedding technique on descriptions of media items to create word embedding vectors
  • Update the recommendation media job to use Hash2Vec word vectors and TF-IDF scores to calculate similarities between media items
  • Discuss about showing top similar media items to user based on their view history of media items
  • Implement above feature
Week 14 - Week 16:
Nov 22 - Dec 12
  • Test and verify the media recommendations based on experiment data created previously
  • Prepare final report
  • Prepare slides for defense

Key Deliverables:

  • Software
    • Create a Yioop media job which detects media items which are either new or without description and depending upon the file format of media items assign web sources to crawl for finding descriptions
    • Extend above media job, to actually crawl the assigned web sources to find the information about media items based on their titles and update their descriptions in Yioop
    • Update many users experiment in Yioop to create test data about user interaction with media items
    • Extend Yioop recommendation feature to create recommendations for media items using Hash2Vec word embeddings on their descriptions
    • Test and verify the media item recommendations using the test data created by many user experiment
  • Report
    • CS 298 Report
    • CS298 Presentation

Innovations and Challenges

  • The major challenge for this project is to update the descriptions of media items by crawling information on websites using just their titles
  • The innovative aspect of this project is to recommend media types in different file formats using Hash2Vec word embedding and challenging as it is not done in past
  • Other challenge for the project is to handle cold start problem where user has never interacted with any media items

References:

[1] Argerich, Luis, Joaquin Torre Zaffaroni, and Matias J. Cano. "Hash2vec, feature hashing for word embeddings." arXiv preprint arXiv:1608.08940 (2016)

[2] Wenga, Carmel, et al. "A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems." arXiv preprint arXiv:2106.10679 (2021)

[3] Y D Pramudita, et al. "Extraction System Web Content Sports New Based On Web Crawler Multi Thread" 2020 J. Phys.: Conf. Ser. 1569 022077

[4] Luong Vuong, Nguyen & Nguyen, Tri-Hai & Jung, Jason. (2020). Content- Based Collaborative Filtering using Word Embedding: A Case Study on Movie Recommendation. 10.1145/3400286.3418253

[5] Zisopoulos, Charilaos & Karagiannidis, Savvas & Demirtsoglou, Georgios & Antaris, Stefanos. (2008). Content-Based Recommendation Systems