Chris Pollett > Students > Garg
[Bio] [Blog] [Q-Learning-Presentation - PDF] [Deep-Q-Networks-Presentation - PDF] |
Q-Learning implementation for Tic-Tac-Toe
Developed a python module for an agent and a bot that uses Q-Learning technique to train the agent on how to play the Tic-Tac-Toe game.
Here, by design, the value returned by instant_reward is 0 for most of the moves (all the non-terminating moves, actually). We hypothesized that
changing this function to q_lookup could make the learning faster. q_lookup returns the previously learned reward if present and the instant_reward otherwise. i.e.
However, since our bot uses random values (unless there is a winning or blocking move), we need to be able to reduce the
learned value also. With this change the values in Q-table started to accumulate, making it difficult to reduce the learned value. To fix this, we used the following approach to prevent the value from accumulating:
Through this change, we were able to get better results.
Download: Deliverable 1.zip |