Project Blog

Week 15: May 9, 2023

Final meeting of 298.

Presented defense slides.

Week 14: May 2, 2023

Presented second draft of 298 report.

To-Do :

1. Make changes to second draft of 298 Report.

2. Prepare defense slides.

Week 13: April 25, 2023

Presented first draft of 298 report.

To-Do :

1. Finish second draft of 298 Report.

Week 12: April 18, 2023

Presented deliverable 3: transfer learning.

To-Do :

1. Prepare first draft of 298 Report.

Week 11: April 11, 2023

Presented progress on transfer learning through user gameplay.

Discussed superimposing consecutive frames and bounding box tracking for transferring user-trials to train the agent.

Discussed documenting results on different mazes for a trained vs an untrained agent.

To-Do :

1. Continue with transfer learning deliverable. Experiment with bounding boxes to track PacMan's movement across frames.

Week 10: April 4, 2023

Presented progress on transfer learning through user gameplay, combining the OpenCV contouring process and PyTesseract to learn user actions and reward per frame.

Discussed brute forcing agent actions to play different maze in the game.

To-Do :

1. Continue with transfer learning deliverable.

Week 9: March 28, 2023

Break.

Week 8: March 21, 2023

Presented experiment results with input as difference between adjacent frames.

Presented experiment results with transfer learning across Ms PacMan and Alien.

Discussed potential ways to utilize frame difference to obtain input commands from external video for transfer learning.

Discussed issue where the agent gets stuck at local minima (corners of screen) after clearing most of the screen of food pellets.

To-Do :

1. Experimenting with possible attention models to avoid corners and focus on positions more central to the screen.

2. Increase learning rate while transferring model to Alien.

3. Experiment with alternating training between Ms PacMan and Alien to observe potential changes in performance.

4. Research potential methods of using OpenCV and the agent's acceleration across frames to obtain an input signal for videos that are used in training an agent.

Week 7: March 14, 2023

Presented experiment results simulating lag and stuttering video input.

Presented experiment results with different learning rates used while training an agent over 60,000 episodes.

Presented first trial for transfer learning by deploying the trained Ms PacMan agent to play Alien.

Discussed enhancements to current learning rate experiments.

Discussed potential ways to obtain input commands from video for transfer learning.

Discussed potential ways to transfer Ms PacMan learning to Alien such that the agent is able to play it to some reasonable standard.

To-Do :

1. Experimenting with neural net input as difference between adjacent frames.

2. Experiment with training an agent using the Adam optimizer for more episodes.

3. Experiment with training an agent on Ms PacMan and then training the same agent on Alien to observe potential changes in performance.

4. Research potential methods of using OpenCV to obtain an input signal for videos that are used in training an agent.

Week 6: March 7, 2023

Presented experiment results simulating lag and stuttering video input. Larger replay buffers performing slightly worse than expected.

Discussed roadmap for next two weeks, encompassing deliverable 1 and deliverable 2.

Discussed potential forms of transfer learning that could be used for deliverable 3.

To-Do :

1. Experimenting with learning rate within the deep learning model, to improve performance with 8 frame (or larger) replay buffers.

2. Experiment with larger replay buffers.

3. Start looking into building an external training dataset for transfer learning to our agent. For example, research whether existing recorded gameplay of PacMan, Ms PacMan (console version), Lock and Chase can be aggregated from YouTube or Twitch to form a learning dataset for the Q-agent.

Week 5: February 28, 2023

Presented understanding of Gym environment, reset() and step() functions as well as DeepMind's wrapper enhancements to store successive frames as a deque.

Presented experiment results estimating 1 frame within the Gym environment to correlate to 0.085 seconds of gameplay.

Presented experiment results duplicating frames in input for every 2 moves.

Discussed roadmap for next two weeks, encompassing deliverable 1 and deliverable 2.

To-Do :

1. Experiment with frame rate input: Update a new frame every 2 time steps.

2. Experiment with frame rate input: Update the net architecture to remember 8 frames instead of 4.

3. Experiment with frame rate input: Every 2 time steps, coin toss to select which of the 2 available frames to use.

Week 4: February 21, 2023

Presented experiment results simulating delay in gameplay input to the agent.

To-Do :

1. Understand Gym's stepping function and how frames are generated every time an action is executed in the game.

2. Time gameplay with frames generated to estimate what fraction of a second is represented by 1 frame.

3. Experiment with an input frame stack consisting of duplicated frames every 2 moves.

Week 3: February 14, 2023

Presented progress on deliverable 1, including preliminary experiment results for various frame selection from agent gameplay.

Discussed experiments for the upcoming week, including the frame rate parameters to adjust and how to break down input according to 60fps frame rate of classic Atari games.

To-Do :

1. Run new experiments using new frame rate parameters.

2. Prepare slides documenting experiments and results.

Week 2: February 07, 2023

Discussed expectations for deliverable 1, including potential experiments for frame rate inputs within the OpenAI Atari wrapper.

To-Do :

1. Run preliminary experiments on frame rate inputs for the current DQN.

Week 1: January 31, 2023

Discussed deliverables and requirements for CS 298 Proposal .

To-Do :

1. Complete CS 298 Proposal.

|---- CS 297 ----|

Week 15: December 06, 2022

Presented Deliverable 5: 297 Report.

Discussed changes to improve 297 report.

To-Do :

1. Make the recommended changes to the 297 report.

Week 14: November 29, 2022

Presented Deliverable 4: Deep Q-Network for Ms. Pacman.

Discussed access to HPC cluster for potential deep learning programs next semester.

Discussed final project report and actions to be taken over last week.

To-Do :

1. Complete the 297 report.

Week 13: November 22, 2022

Presented Deliverable 3: Q-agent for Ms. Pacman.

Discussed compute power limitations on M1 machine.

Discussed final deliverables for the semester and actions to be taken over last 2 weeks.

To-Do :

1. Enhance agent to utilize a deep Q-network for deliverable 4.

2. Prepare a rough draft for the 297 report.

Week 12: November 15, 2022

Presented understanding on Arcade Learning Environment.

Discussed dividing scope for deliverable 3 over 2 weeks, replacing previous deliverable 4 (frame rate experimentation).

Discussed doing frame rate experimentation as part of 298.

To-Do :

1. Prepare a first pass agent that can functionally play PacMan on Atari.

2. Read deep q-learning implementations for Atari game-playing agents.

Week 11: November 08, 2022

Presented understanding on Atari paper.

Discussed scope for deliverable 3.

Discussed strategies for winning at PacMan.

To-Do :

1. Prepare slides for Arcade Learning Environment.

2. Read [5].

Week 10: November 01, 2022

Presented completed version of Deliverable 2.

Presented progress on Atari paper.

Discussed deliverables for November, including the development of the "PacMan solver".

Discussed updates to CS 297 schedule.

To-Do :

1. Prepare slides for Atari paper.

2. Research PacMan or Lock and Chase games and how to fit those into our model.

3. Research existing strategies/exploits for beating PacMan/Lock and Chase.

4. Read [4].

Week 9: October 25, 2022

Presented Deliverable 2 and discussed enhancements to current solution.

To-Do :

1. Continue working on deliverable 2 and present progress.

2. Read [4].

Week 8: October 18, 2022

Discussed progress on Deliverable 2 and potential to refine current product.

To-Do :

1. Continue working on deliverable 2 and present progress.

Week 7: October 11, 2022

Presented understanding of mobile object detection using TFLite.

Discussed common CNN architectures and pooling layers used in image recognition.

To-Do :

1. Develop a first-pass neural net for judging random paths within Vacuum World.

Week 6: October 4, 2022

Presented tentative solution to upgrading current Vacuum World program to a Neural Net.

Discussed adding a random walk generator to the current vacuum world solution and using a neural net that is able to detect if the chosen path is good or not.

To-Do :

1. Complete slides for mobile object detection using TFLite.

2. Research further into building a NN solution for Vacuum World where the AI is judging whether the agent is playing well or not.

Week 5: September 27, 2022

Presented updates to deliverable 1.

Discussed reading on paper [2] and possible implementations of TensorFlow Lite within the vacuum world game.

Discussed possible neural net architectures for our problem, including function spaces, possible inputs, number of neurons.

To-Do :

1. Upload slides for transfer learning.

2. Continue reading [2] and prepare slides for presentation.

3. Research and propose a neural network architecture for the vacuum world problem we're trying to solve.

Week 4: September 20, 2022

Presented deliverable 1 - vacuum world through Q-learning.

Discussed enhancement of deliverable 1 to handle a dirt randomizer of k-many squares.

Discussed start of research into deliverable 2, converting the Q-learning vacuum world table into a neural net.

To-Do :

1. Research into deep Q-learning, specifically enhancing our vacuum world to utilize a neural net.

2. Begin reading research paper [2].

2. Modify existing vacuum world to handle k-many dirty cells.

Week 3: September 13, 2022

Presented progress on research into reinforcement learning.

Pitched possible solutions for our vacuum world problem.

Discussed utility function for passive reinforcement learning.

Discussed state definitions for vacuum world, including exploration scenarios, what constitutes a percept and how to develop the q-function for our agent.

To-Do :

1. Research further into developing a q-function for the agent within vacuum world.

2. Developed a first-pass implementation for vacuum world.

Week 2: September 6, 2022

Reviewed proposal, bio and blog updates.

Discussed implementation of Q-function in vacuum world, including exploration patterns, utility function design, randomizing dirt generation policies.

To-Do :

1. Research further into the utility function for the states across the vacuum world.

2. Prepare slides to demonstrate current understanding and progress.

Week 1: August 30, 2022

Discussed project proposal, took suggestions on how to bring it up to standard.

To-Do :

1. Make required changes to the project proposal

2. Read chapter 22 of Russell-Norvig (specifically Reinforcement Learning)