Chris Pollett > Students > Garg
[Bio] [Blog] [Q-Learning-Presentation - PDF] [Deep-Q-Networks-Presentation - PDF] |
CS297 ProposalVirtual Robot Climbing using Reinforcement LearningUjjawal Garg (ujjawal.garg@sjsu.edu) Advisor: Dr. Chris Pollett Description: Reinforcement Learning is a field of Artificial Intelligence that has gained a lot of attraction in recent years. Unlike supervised learning, where we need to have a training data, here we define the environment where each actor or agent can perform a set of specific actions. Each action has a reward that depends on the new state and previous state. In July 2017, Google published a paper and video showing a simulated body trained to navigate through a set of challenging terrains, using reinforcement learning. My project will use a similar approach to train a simulated body to climb a rocky wall. This process would be more complex because here every joint would play an important role in the movement. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Q-Learning implementation for Tic-Tac-Toe 2. Implement a rock-wall simulation environment (2D) 3. Add 3D locomotion behaviour to the agent 4. Implement a policy gradient algorithm to start the training 5. CS 297 report References: [1992] "Q-Learning". Christopher Watkins, Peter Dayan. Nature Publishing Group. 1992. [2004] "Physiology of difficult rock climbing". Watts PB. DOI: 10.1007/s00421-003-1036-7. 2004 [2011] "Strange Beta: An Assistance System for Indoor Rock Climbing Route Setting Using Chaotic Variations and Machine Learning". Caleb Phillips, Lee Becker, and Elizabeth Bradley. arXiv:1110.0532v1 [cs.AI]. 2011 [2011] "FABRIK: A fast, iterative solver for the Inverse Kinematics problem". Andreas Aristidou, Joan Lasenby. Graphical Models. 2011 [2013] "Playing Atari with Deep Reinforcement Learning". Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. arXiv:1312.5602 [cs.LG]. 2013 [2015] "Trust region policy optimization". John Schulman, Sergey Levine, Pieter Abbeel, Michael I Jordan, and Philipp Moritz. 2015 [2015] "Continuous control with deep reinforcement learning". Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra. 2015 [2016] "Deep reinforcement learning through policy optimization". Pieter Abbeel and John Schulman. 2016 [2017] "Emergence of Locomotion Behaviours in Rich Environments". Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver. arXiv:1707.02286. 2017. [2017] "AI Safety Gridworlds". Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg. arXiv:1711.09883v2 [cs.LG]. 2017. [2017] "HoME: a Household Multimodal Environment". Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville. arXiv:1711.11017v1 [cs.AI]. 2017. |