CS298 Proposal
Navigating Classic Atari Games using Deep Learning
Ayan Abhiranya Singh (ayanabhiranya.singh@sjsu.edu)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Mark Stamp
Committee Members: Prof. Kevin Smith
Abstract:
Modern deep learning techniques such as convolutional neural networks have been utilized to play classic Atari games like Breakout and Space Invaders at a near superhuman level. The research over the course of this project involves experimenting with the performance of a model of our own (developed for Ms. PacMan) under various conditions of difficulty. In one case, we develop a framework to provide input to a deep Q-network in the form of video at a constant frame rate. The model is then enhanced and tested against input of varying frame rates instead of the constant frame rate input from our previous research. Finally, this enhanced deep learning model is used in transfer learning, where it is deployed on PacMan mazes that have not been encountered before. The results from the experiments are then collated and summarized, with the performance of this model compared to the previous models.
CS297 Results
- 1. Developed Q-learning implementation to learn dirt generation policy systems for Vacuum World.
- 2. Developed a neural network model to solve the vacuum world problem
- 3. Applied Q-learning knowledge to build a simple agent that can play Ms. PacMan
- 4. Implemented a Deep Q-Network (deep learning model that can play Ms. PacMan)
Proposed Schedule
Week 1: January 31 - February 7 | Finalize project proposal and find relevant research material |
Week 2: February 7 - February 14 | Research and discuss scope for Del-1 |
Week 3 - Week 4: February 14 - February 28 | Discuss progress on Del-1 and parameters that have been tweaked within existing deep-Q network |
Week 5: February 28 - March 7 | Read [4] and demonstrate effects of parameter modification on existing q-network |
Week 6: March 7 - March 14 | [Del-1] Modified deep Q-network implementation with variable frame rate input capability |
Week 7: March 14 - March 21 | Discuss Del-2 and expectations for model performance on varying frame rates |
Week 8 - Week 9: March 21 - April 4 | [Del-2] Modified deep Q-network implementation that is able to learn and react to inconsistent frame-rate input |
Week 10: April 4 - April 11 | Research [3] and discuss scope for Del-3 |
Week 11: April 11 - April 18 | Experiment with different forms of transfer learning utilizing the existing model and present results. |
Week 12: April 18 - April 25 | [Del-3] Transfer learning |
Week 13: April 25 - May 2 | Work on CS 298 Report and begin preparation for Master's defense |
Week 14: May 2 - May 9 | Review and revise 298 report and continue preparation for Master's defense |
Week 15: May 9 - May 16 | [Del-4] CS 298 Report and Master's defense |
Key Deliverables:
- Software
- 1. Modify existing deep Q-network to provide input at different frame rates instead of the constant frame rate input from the CS 297 deliverable.
- 2. Modify existing deep Q-network to learn and perform an efficient action on input at irregular frame rates.
- 3. Transfer learning to deploy our deep-Q model on different Ms. PacMan mazes.
- Report
- 4. CS 298 Report
- 5. CS 298 Presentation
Innovations and Challenges
- A major challenge in this project is experimenting with constant video input to a deep Q-agent to observe how its in-game performance is affected in real time. Papers researched as part of CS 297 (the work of the team at DeepMind on classic Atari games, in particular) have not discussed this topic and have restricted model demonstrations to a simple cycle of single-frame input and single-frame output. Essentially, the parameters that need to be tweaked to make this work are unknown.
- There is an additional technical challenge of actually recording gameplay as video and providing this as input to the deep learning model.
- There is a lack of comprehensive research where this kind of deep learning model has been deployed through transfer learning on different Ms. PacMan mazes. This would provide an interesting sandbox to observe the performance of the model on unknown mazes.
References:
[1] S. Russell and A. Norvig, "Part V, Machine Learning, Chapter 22 Reinforcement Learning," in Artificial Intelligence: A Modern Approach, 2021, pp. 789-821.
[2] V. Mnih et al., "Playing Atari with Deep Reinforcement Learning," arXiv [cs.LG], Dec. 19, 2013. [Online]. Available: http://arxiv.org/abs/1312.5602
[3] S. Gamrian and Y. Goldberg, "Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation," in Proceedings of the 36th International Conference on Machine Learning, 09--15 Jun 2019, vol. 97, pp. 2063-2072.
[4] A. Braylan, M. Hollenbeck, E. Meyerson, and R. Miikkulainen, "Frame skip is a powerful parameter for learning to play Atari." https://nn.cs.utexas.edu/downloads/papers/braylan.aaai15.pdf (accessed Feb. 01, 2023). |