CS298 Proposal
Bluff with AI
Tina Philip (talk2tinaphilip@gmail.com)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Robert Chun, Dr. Philip Heller
Abstract:
The goal of this project is to build an AI that learns how to play Bluff. Bluff is a multi-player card game in which each player makes a sequence of decisions based on a partially-observed game state that evolves under uncertainty. The main challenge of this game is that it is a game of imperfect information as each player knows nothing about the other player's hand. The back bone of the implementation is a neural network with back-propagation. Reinforcement learning is one powerful paradigm for doing so. It allows software agents to automatically determine the ideal behavior within a specific context, in order to maximize performance. Simple reward feedback is provided for the agent to learn its behavior known as the reinforcement signal. We will approximate a Q function and rewards will be calculated based on the actions taken by the agent during the game.The next iteration of the game will take in inputs from its previous iteration to take the best decision possible. Through this project we try to understand the impact of deep learning on games and aims to demonstrate the basic ability of a neural network to train an agent to automatically learn to play a card game.
CS297 Results
- Implemented the Base classes for the Bluff Game
- Implemented the Bluff Program for Human Players
- Implemented the Simple AI player to play Bluff
- Heuristic approach for Bluff
- Project report
Proposed Schedule
Week 1:
August 29, 2017 - September 5, 2017 | Complete the implementation of Bluff playing AI |
Week 2:
September 5, 2017 - September 12, 2017 | Complete the implementation of Bluff playing AI |
Week 3:
September 12, 2017 - September 19, 2017 | Develop a database of current state - Deliverable 1 due |
Week 4:
September 19, 2017 - September 26, 2017 | Q-Learning neural net to determine whether to call Bluff |
Week 5:
September 26, 2017 -October 3, 2017 | Q-Learning neural net to determine whether to call Bluff |
Week 6:
October 3, 2017 - October 10, 2017 | Deliverable 2 due |
Week 7:
October 10, 2017 - October 17 , 2017 | Q-Learning neural net to determine which cards to play |
Week 8:
October 17, 2017 - October 24, 2017 | Deliverable 3 due |
Week 9:
October 24, 2017 - October 31, 2017 | Performance experiments and results of algorithm vs. Human agents |
Week 10:
October 31, 2017 - November 7, 2017 | Deliverable 4 due |
Week 11:
November 7, 2017 - November 14, 2017 | Deliverable 5 - CS298 Report |
Week 12:
November 14, 2017 - November 21 , 2017 | Submit report to committee for feedback |
Week 13:
November 21, 2017 - November 28, 2017 | Prepare for presentation |
Week 14:
December 5, 2017 | Project presentation |
Key Deliverables:
- Software
- 1. Complete the implementation of Bluff playing AI
- 2. Q-Learning neural net to determine whether to call Bluff
- 3. Q-Learning neural net to determine which cards to play
- 4. Performance experiments and results of algorithm vs. Human agents
- Report
- CS298 Report
- CS298 Presentation
Innovations and Challenges
- AI's for Bluff haven't been considered before and so proved to be an area worth investigating.
- Unlike other games that use Q-learning, Bluff uses Q-learning in 2 different contexts. 1. To determine which card to play (This decision would help identify whether to Bluff or not). 2. Whether to challenge the opponents, by calling Bluff on them.
- Multi player games are more challenging with incomplete info than adversarial games.
References:
[1] Hurwitz, Evan, and Tshilidzi Marwala. "Learning to bluff." Systems, Man and
Cybernetics, 2007. ISIC. IEEE International Conference on. IEEE, 2007. Available:
https://pdfs.semanticscholar.org/ff49/bcf422168c6bfe4f115f02d098ad7bf49065.pdf
[2] Darse Billings. "Algorithms and Assessment in Computer Poker, " Ph.D. Dissertation,
2006, University of Alberta, Edmonton, Alta., Canada. AAINR22991.
[3] Russell, Stuart, Peter Norvig, and Artificial Intelligence. "A modern
approach."Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25 (1995): 27.
Available:
https://pdfs.semanticscholar.org/137b/8be0b10c645bb4ec56a4eac3958d4a60ac6a.pdf
[4] Pollett, C. (2015, Feb 2), Random Permutations, the Birthday Problem, Ball and
Bins Arguments. [Powerpoint slides]. Retrieved from:
http://www.cs.sjsu.edu/faculty/pollett/255.1.15s/Lec02022015.html#(1)
[5] Eastaugh, B. (2014, Feb 8). The Mathematics of Bluffing [Blog post]. Retrieved from
https://ibmathsresources.com/2014/02/08/the-mathematics-of-bluffing/
[6] Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N and Bowling, M.
(2017). DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. arXiv
preprint arXiv:1701.01724. |