CS298 Proposal

Virtual Robot Climbing using Reinforcement Learning

Ujjawal Garg (ujjawal.garg@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Katerina Potika, Dr. Robert Chun

Abstract:

Reinforcement Learning is a field of Artificial Intelligence that has gained a lot of attraction in recent years. Unlike supervised learning, where we need to have a training data, here we define the environment where each actor or agent can perform a set of specific actions. Each action has a reward that depends on the new state and the previous state. In July 2017, Google published a paper and video showing a simulated body trained to navigate through a set of challenging terrains, using reinforcement learning. My project will use a similar approach to train a simulated body to climb a rock wall. This process would be more complex because here every joint would play an important role in the movement.

CS297 Results

Implemented a rock-wall simulation using MuJoCo
Implemented an OpenAI gym environment for the training
Implement a policy gradient algorithm (DDPG) [6] to start the training
Trained the simulation to take first step

Proposed Schedule

Week 1,2,3: Sep. 4 - Sep. 24	Train the simulation to grab a hold and pull upwards
Week 4,5,6: Sep. 25 - Oct. 15	Train the simulation to switch between holds
Week 7,8,9: Oct. 16 - Nov. 5	Train the simulation to reach the top
Week 10,11,12: Nov. 6 - Nov. 26	Improve the performance
Week 13: Nov. 27 - Dec. 3	CS 298 report and presentation.

Key Deliverables:

Software:
- Implementations of DDPG and PPO algorithms based on [6] and [9] using tensorflow. The model trained using these implementations will allow a humanoid simulation to climb randomly generated rock walls.
- An OpenAI gym environment which generates a random, rock wall configurations and a simulated humanoid body. This environment can be used by anyone to train the simulation.
Experiments:
- Analyze the performance of both implementations by comparing their training times vs success rates on random rock walls.
Report:
- CS298 Report
- Presentation

Innovations and Challenges

Since Google's humanoid experiment was for walking, only the joint elements below the abdomen played an important role. However for rock-climbing, every joint of the body would play an important role in the movement. So, the first key deliverable would be a challenge as I need to make these implementations work in this new environment.
One of the major challenges with previous reinforcement learning research is reproducibility. Using the second key deliverable (OpenAI gym environment), we can standardize the experiment so that anyone can easily reproduce the results.

References:

1. Watts, P. B. (2004). Physiology of difficult rock climbing. European Journal of Applied Physiology, 91(4), 361-372. doi:10.1007/s00421-003-1036-7.

2. Phillips, C., Becker, L., & Bradley, E. (2012). Strange beta: An assistance system for indoor rock climbing route setting. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(1), 013130. doi:10.1063/1.3693047.

3. Aristidou, A., & Lasenby, J. (2011). FABRIK: A fast, iterative solver for the Inverse Kinematics problem. Graphical Models, 73(5), 243-260. doi:10.1016/j.gmod.2011.05.003.

4. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

5. Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust region policy optimization. In International Conference on Machine Learning (pp. 1889-1897).

6. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

7. Abbeel, P., & Schulman, J. (2016). Deep reinforcement learning through policy optimization. Tutorial at Neural Information Processing Systems.

8. Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., ... & Silver, D. (2017). Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286.

9. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.