CS298 Proposal

Navigating Classic Atari Games using Deep Learning

Ayan Abhiranya Singh (ayanabhiranya.singh@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Mark Stamp

Committee Members: Prof. Kevin Smith

Abstract:

Modern deep learning techniques such as convolutional neural networks have been utilized to play classic Atari games like Breakout and Space Invaders at a near superhuman level. The research over the course of this project involves experimenting with the performance of a model of our own (developed for Ms. PacMan) under various conditions of difficulty. In one case, we develop a framework to provide input to a deep Q-network in the form of video at a constant frame rate. The model is then enhanced and tested against input of varying frame rates instead of the constant frame rate input from our previous research. Finally, this enhanced deep learning model is used in transfer learning, where it is deployed on PacMan mazes that have not been encountered before. The results from the experiments are then collated and summarized, with the performance of this model compared to the previous models.

CS297 Results

1. Developed Q-learning implementation to learn dirt generation policy systems for Vacuum World.
2. Developed a neural network model to solve the vacuum world problem
3. Applied Q-learning knowledge to build a simple agent that can play Ms. PacMan
4. Implemented a Deep Q-Network (deep learning model that can play Ms. PacMan)

Proposed Schedule

Week 1: January 31 - February 7	Finalize project proposal and find relevant research material
Week 2: February 7 - February 14	Research and discuss scope for Del-1
Week 3 - Week 4: February 14 - February 28	Discuss progress on Del-1 and parameters that have been tweaked within existing deep-Q network
Week 5: February 28 - March 7	Read [4] and demonstrate effects of parameter modification on existing q-network
Week 6: March 7 - March 14	[Del-1] Modified deep Q-network implementation with variable frame rate input capability
Week 7: March 14 - March 21	Discuss Del-2 and expectations for model performance on varying frame rates
Week 8 - Week 9: March 21 - April 4	[Del-2] Modified deep Q-network implementation that is able to learn and react to inconsistent frame-rate input
Week 10: April 4 - April 11	Research [3] and discuss scope for Del-3
Week 11: April 11 - April 18	Experiment with different forms of transfer learning utilizing the existing model and present results.
Week 12: April 18 - April 25	[Del-3] Transfer learning
Week 13: April 25 - May 2	Work on CS 298 Report and begin preparation for Master's defense
Week 14: May 2 - May 9	Review and revise 298 report and continue preparation for Master's defense
Week 15: May 9 - May 16	[Del-4] CS 298 Report and Master's defense

Key Deliverables:

Software
- 1. Modify existing deep Q-network to provide input at different frame rates instead of the constant frame rate input from the CS 297 deliverable.
- 2. Modify existing deep Q-network to learn and perform an efficient action on input at irregular frame rates.
- 3. Transfer learning to deploy our deep-Q model on different Ms. PacMan mazes.
Report
- 4. CS 298 Report
- 5. CS 298 Presentation

Innovations and Challenges

A major challenge in this project is experimenting with constant video input to a deep Q-agent to observe how its in-game performance is affected in real time. Papers researched as part of CS 297 (the work of the team at DeepMind on classic Atari games, in particular) have not discussed this topic and have restricted model demonstrations to a simple cycle of single-frame input and single-frame output. Essentially, the parameters that need to be tweaked to make this work are unknown.
There is an additional technical challenge of actually recording gameplay as video and providing this as input to the deep learning model.
There is a lack of comprehensive research where this kind of deep learning model has been deployed through transfer learning on different Ms. PacMan mazes. This would provide an interesting sandbox to observe the performance of the model on unknown mazes.

References:

[1] S. Russell and A. Norvig, "Part V, Machine Learning, Chapter 22 Reinforcement Learning," in Artificial Intelligence: A Modern Approach, 2021, pp. 789-821.

[2] V. Mnih et al., "Playing Atari with Deep Reinforcement Learning," arXiv [cs.LG], Dec. 19, 2013. [Online]. Available: http://arxiv.org/abs/1312.5602

[3] S. Gamrian and Y. Goldberg, "Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation," in Proceedings of the 36th International Conference on Machine Learning, 09--15 Jun 2019, vol. 97, pp. 2063-2072.

[4] A. Braylan, M. Hollenbeck, E. Meyerson, and R. Miikkulainen, "Frame skip is a powerful parameter for learning to play Atari." https://nn.cs.utexas.edu/downloads/papers/braylan.aaai15.pdf (accessed Feb. 01, 2023).