CS297 Proposal

Virtual Robot Climbing using Reinforcement Learning

Ujjawal Garg (ujjawal.garg@sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

Reinforcement Learning is a field of Artificial Intelligence that has gained a lot of attraction in recent years. Unlike supervised learning, where we need to have a training data, here we define the environment where each actor or agent can perform a set of specific actions. Each action has a reward that depends on the new state and previous state. In July 2017, Google published a paper and video showing a simulated body trained to navigate through a set of challenging terrains, using reinforcement learning. My project will use a similar approach to train a simulated body to climb a rocky wall. This process would be more complex because here every joint would play an important role in the movement.

Schedule:

Week 1: Jan. 31 - Feb. 6	Finalise Project proposal, and find relevant papers
Week 2: Feb. 7 - Feb. 13	Read and give presentation on "Q-learning" paper
Week 3: Feb. 14 - Feb. 20	Deliverable 1: Q-Learning implementation for Tic-Tac-Toe
Week 4: Feb. 21 - Feb. 27	Read "Strange Beta" paper on rock climbing
Week 5: Feb. 28 - Mar. 6	Read "FABRIK: A fast, iterative solver for the Inverse Kinematics problem" paper
Week 6: Mar 7 - Mar. 13	Read "Playing Atari with Deep Reinforcement Learning" paper
Week 7: Mar 14 - Mar. 20	Deliverable 2: Implement a rock-wall simulation environment
Week 8: Mar. 21 - Mar. 27	---Spring Break---
Week 9: Mar. 28 - Apr. 3	Read "Trust region policy optimization" paper
Week 10: Apr. 4 - Apr. 10	Deliverable 3: Add 3D locomotion behaviour to the agent
Week 11: Apr. 11 - Apr. 17	Read "Continuous control with deep reinforcement learning" paper
Week 12: Apr. 18 - Apr. 24	--- ---
Week 13: Apr. 25 - May. 1	Deliverable 4: Implement a policy gradient algorithm to start the training
Week 14: May. 2 - May. 8	Prepare CS 297 report
Week 15: May. 9 - May. 15	Deliverable 5: CS 297 report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Q-Learning implementation for Tic-Tac-Toe

2. Implement a rock-wall simulation environment (2D)

3. Add 3D locomotion behaviour to the agent

4. Implement a policy gradient algorithm to start the training

5. CS 297 report

References:

[1992] "Q-Learning". Christopher Watkins, Peter Dayan. Nature Publishing Group. 1992.

[2004] "Physiology of difficult rock climbing". Watts PB. DOI: 10.1007/s00421-003-1036-7. 2004

[2011] "Strange Beta: An Assistance System for Indoor Rock Climbing Route Setting Using Chaotic Variations and Machine Learning". Caleb Phillips, Lee Becker, and Elizabeth Bradley. arXiv:1110.0532v1 [cs.AI]. 2011

[2011] "FABRIK: A fast, iterative solver for the Inverse Kinematics problem". Andreas Aristidou, Joan Lasenby. Graphical Models. 2011

[2013] "Playing Atari with Deep Reinforcement Learning". Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. arXiv:1312.5602 [cs.LG]. 2013

[2015] "Trust region policy optimization". John Schulman, Sergey Levine, Pieter Abbeel, Michael I Jordan, and Philipp Moritz. 2015

[2015] "Continuous control with deep reinforcement learning". Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra. 2015

[2016] "Deep reinforcement learning through policy optimization". Pieter Abbeel and John Schulman. 2016

[2017] "Emergence of Locomotion Behaviours in Rich Environments". Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver. arXiv:1707.02286. 2017.

[2017] "AI Safety Gridworlds". Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg. arXiv:1711.09883v2 [cs.LG]. 2017.

[2017] "HoME: a Household Multimodal Environment". Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville. arXiv:1711.11017v1 [cs.AI]. 2017.