Chris Pollett > Students > Sodhi

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Power Point Presentations]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS 297_Report - PDF]

    [CS 298 Project Proposal]

    [CS 298 Project Report - PDF]

    [CS 298 Project Presentation - PDF]

























CS298 Proposal

AI for Classic Video Games Using Reinforcement Learning

Shivika Sodhi (shivika.sodhi@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Your_Committee.

Abstract:

Deep reinforcement learning is the road to build artificially intelligent machines that can perform tasks similar to that of human beings, without any kind of training. Because of DeepMinds recent research in the field of deep learning [3], computers can now automatically learn to play ATARI games from raw pixels. This project is on similar lines, where we want to build an artificially intelligent agent using a deep learning algorithm called convolutional neural network. This image processing algorithm will be used to approximate a Q function, and thus, rewards will be calculated based on the actions taken by the agent, in a specific environment, i.e., when the agent plays a classic game. Initially the agent will take random decisions while playing the game and screen shots of the same will be taken every one tenth of a second. Those screen shots will then be fed into the neural network to calculate the rewards based on the decisions taken by it and increase in the game score when a particular decision is taken. Once the neural network function is computed, those images will be discarded to save memory. The next iteration of the game will take in inputs from its previous iteration to take the best decision possible while playing it. Our primary motive behind choosing this topic was to understand the impact of deep learning on Artificial Intelligence. Hence, though this project we aim to demonstrate the basic ability of a neural network train an agent to automatically learn to play a classic video game at human level, that has never been played before.

CS297 Results

  • Designd a python program that can play a classic video game. While the game is being played, the program will take screen shots of it, downscale it and then supply it to the neural network model
  • Python program to control the game
  • Simple multi layer AI using Keras (Built on top of TensorFlow and THEANO)
  • Designed a python program that predicts a digit in real time, while its being drawn by the user
  • Final project report

Proposed Schedule

Week 1: 08 Feb 2017 - 13 Feb 2017 Discuss the deliverables for CS 298
Week 2,3: 14 Feb 2017 - 27 Feb 2017 Automate playing Archon: Design a program to play Archon where two pawns go to fight mode, quit the game once that mode is reached and then start again
Week 4: 28 Feb 2017 - 06 Mar. 2017 Implement Q Learning Algorithm
Week 5,6: 07 Mar. 2017 - 20 March 2017 Implement Q Learning Algorithm, where policies come from Neural Networks
Week 7: 21 Mar. 2017 - 27 Mar. 2017 Work further on the algorithm
Week 8: 28 Mar. 2017 - 3 Apr. 2017 Reiterate training and testing with improvements and updates
Week 9: 4 Apr. 2017 - 10 Apr. 2017 Final code cleanup and restructuring
Week 10: 11 Apr. 2017 - 17 Apr. 2017 Complete first draft of the report
Week 11: 18 Apr. 2017 - 24 Apr. 2017 Complete next draft of the report
Week 12: 25 Apr. 2017 - 1 May 2017 Finalize the report
Week 13: 02 May 2017 - 07 may 2017 Submit report to committee for feedback

Key Deliverables:

  • Software
    • Design a bot that plays Archon on Commadore 64 emulator
    • Implement Q learning Algorithm
    • Implement Q learning Algorithm, where policies come from Neural Networks
    • Automate playing Archon and experiment with the training
  • Report
    • CS 298 Report

Innovations and Challenges

  • There hasn't been a successful attempt yet, to learn to play Archon.
  • It's difficult to train the system using reinforcement learning as there is a time lag between the action and the reward.
  • The occasional feedback provided by reinforcement learning, to train the agent to make correct decisions is challenging to provide at real time.

References:

[1] Playing Atari with Deep Reinforcement Learning: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

[2] Temporal difference learning and td-gammon:

[3] The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research