CS298 Proposal

Bluff with AI

Tina Philip (talk2tinaphilip@gmail.com)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Robert Chun, Dr. Philip Heller

Abstract:

The goal of this project is to build an AI that learns how to play Bluff. Bluff is a multi-player card game in which each player makes a sequence of decisions based on a partially-observed game state that evolves under uncertainty. The main challenge of this game is that it is a game of imperfect information as each player knows nothing about the other player's hand. The back bone of the implementation is a neural network with back-propagation. Reinforcement learning is one powerful paradigm for doing so. It allows software agents to automatically determine the ideal behavior within a specific context, in order to maximize performance. Simple reward feedback is provided for the agent to learn its behavior known as the reinforcement signal. We will approximate a Q function and rewards will be calculated based on the actions taken by the agent during the game.The next iteration of the game will take in inputs from its previous iteration to take the best decision possible. Through this project we try to understand the impact of deep learning on games and aims to demonstrate the basic ability of a neural network to train an agent to automatically learn to play a card game.

CS297 Results

Implemented the Base classes for the Bluff Game
Implemented the Bluff Program for Human Players
Implemented the Simple AI player to play Bluff
Heuristic approach for Bluff
Project report

Proposed Schedule

Week 1: August 29, 2017 - September 5, 2017	Complete the implementation of Bluff playing AI
Week 2: September 5, 2017 - September 12, 2017	Complete the implementation of Bluff playing AI
Week 3: September 12, 2017 - September 19, 2017	Develop a database of current state - Deliverable 1 due
Week 4: September 19, 2017 - September 26, 2017	Q-Learning neural net to determine whether to call Bluff
Week 5: September 26, 2017 -October 3, 2017	Q-Learning neural net to determine whether to call Bluff
Week 6: October 3, 2017 - October 10, 2017	Deliverable 2 due
Week 7: October 10, 2017 - October 17 , 2017	Q-Learning neural net to determine which cards to play
Week 8: October 17, 2017 - October 24, 2017	Deliverable 3 due
Week 9: October 24, 2017 - October 31, 2017	Performance experiments and results of algorithm vs. Human agents
Week 10: October 31, 2017 - November 7, 2017	Deliverable 4 due
Week 11: November 7, 2017 - November 14, 2017	Deliverable 5 - CS298 Report
Week 12: November 14, 2017 - November 21 , 2017	Submit report to committee for feedback
Week 13: November 21, 2017 - November 28, 2017	Prepare for presentation
Week 14: December 5, 2017	Project presentation

Key Deliverables:

Software
- 1. Complete the implementation of Bluff playing AI
- 2. Q-Learning neural net to determine whether to call Bluff
- 3. Q-Learning neural net to determine which cards to play
- 4. Performance experiments and results of algorithm vs. Human agents
Report
- CS298 Report
- CS298 Presentation

Innovations and Challenges

AI's for Bluff haven't been considered before and so proved to be an area worth investigating.
Unlike other games that use Q-learning, Bluff uses Q-learning in 2 different contexts. 1. To determine which card to play (This decision would help identify whether to Bluff or not). 2. Whether to challenge the opponents, by calling Bluff on them.
Multi player games are more challenging with incomplete info than adversarial games.

References:

[1] Hurwitz, Evan, and Tshilidzi Marwala. "Learning to bluff." Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on. IEEE, 2007. Available: https://pdfs.semanticscholar.org/ff49/bcf422168c6bfe4f115f02d098ad7bf49065.pdf

[2] Darse Billings. "Algorithms and Assessment in Computer Poker, " Ph.D. Dissertation, 2006, University of Alberta, Edmonton, Alta., Canada. AAINR22991.

[3] Russell, Stuart, Peter Norvig, and Artificial Intelligence. "A modern approach."Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25 (1995): 27. Available: https://pdfs.semanticscholar.org/137b/8be0b10c645bb4ec56a4eac3958d4a60ac6a.pdf

[4] Pollett, C. (2015, Feb 2), Random Permutations, the Birthday Problem, Ball and Bins Arguments. [Powerpoint slides]. Retrieved from: http://www.cs.sjsu.edu/faculty/pollett/255.1.15s/Lec02022015.html#(1)

[5] Eastaugh, B. (2014, Feb 8). The Mathematics of Bluffing [Blog post]. Retrieved from https://ibmathsresources.com/2014/02/08/the-mathematics-of-bluffing/

[6] Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N and Bowling, M. (2017). DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. arXiv preprint arXiv:1701.01724.