Introduction

Last Wednesday, we started talking about search problems which arise in multi-agent, competitive environments and called them adversarial search problems or games.
We gave a formal definition of such games in terms of initial state, PLAYER(s), ACTIONS(s), RESULT(a,s), TERMINAL-TEST(s), and UTILITY(s,p).
We then focused our attention on two player, zero-sum games with perfect information.
Starting from the INITIAL_STATE determining actions from states using ACTION(s), and generating resulting states using RESULT(a,s) we can define a game tree for a given game.
In class, we gave the game tree for tic-tac-toe.
Finally, last day, we defined the MINIMAX function for a game state which gives the payoff value for a state to a player.
Today, we use this function to compute what a player should do in a given board situation.

The Minimax Algorithm

The minimax algorithm below computes the minimax decision from the current state. So we could use it to actually make an agent that could play a game.

Here `argmax_(a in s) f(a)` returns the element `a` of set `S` that has the maximum value of `f(a)`.

function MINIMAX-DECISION(state) return an action //it is assume is MAX's turn
    return argmax_(a in ACTION(state)) MIN-VALUE(RESULT(state, a))

function MAX-VALUE(state) return a utility value
    if (TERMINAL_TEST(state) == true) then return UTILITY(state, MAX)
    v := -infty
    for each a in ACTION(state) do
      v := MAX(v, MIN-VALUE(RESULT(state, a))) 
    return v

function MIN-VALUE(state) return a utility value
    if (TERMINAL_TEST(state) == true) then return UTILITY(state,MAX)
    v := infty
    for each a in ACTION(state) do
      v := MIN(v, MAX-VALUE(RESULT(state, a))) 
    return v

Remarks on Minimax Algorithm

That always playing according to the minimax decision guarantees a payoff of at least the minimax value regardless of how one's opponent plays is due to John von Neumann (1928) -- his Minimax Theorem is actually a little more general.
If the maximum depth of the game tree is `m`, and expected branching factor is `b`, then time complexity of minimax is `O(b^m)`.
It is possible depending on implementation to have a linear space complexity, ergo space complexity is not an issue.
`O(b^m)` for time complexity is impractical.
Can we do better?
There are a variety of pruning strategies that allow a player to quickly determine that a branch is not worth following.
We look next at one such strategy that can improve the time complexity to `O(b^(m/2))`.

Quiz

Which of the following is true?

An admissible heuristic for the `A^(star)`-algorithm never underestimates the cost to a solution.
`k`- Local beam search is another name for hill-climbing with `k`-restarts.
If the UTILITY to MAX of a winning tic-tac-toe board is 1, a losing board is -1, and a tie board is 0, then the MINIMAX value of an empty tic-tac-toe board is 0.

Alpha-beta Pruning

Consider two level tree:
MAX has a current backed-up value of `3`. On the first branch under node C, MIN has a 2, so the largest value MIN could pick is 2 and this is less than 3. So MAX doesn't need to expand the other two nodes under C. The backed-up value `3` is called the alpha value, and ignoring the two remaining branches under C is called doing an alpha pruning, or alpha cut of the tree.
The analogous thing for MIN is a beta value. And MIN can do beta pruning (make beta cuts) of tree.
For MIN, the beta value is the largest value as opposed to alpha's smallest value.
On average, alpha/beta pruning makes the minimax algorithm time complexity `O(b^(m/2))`.
So in the same amount of time one can view a tree twice as deep as straight minimax.
In order, to achieve this one needs to expand the nodes somehow in close to best first order. For chess, one can get within a factor of two of best first, by considering capture moves before, threat moves, before forward, before backward moves.
According to Wikipedia and our book, the alpha-beta pruning technique was discovered multiple times: It was described by John McCarthy to his students in 1956, was used by Newell and Simon in 1958 in NSS Chess, and was used by Kotok in McCarthy-Kotok Chess 1962. It was published by Brudno in 1963, but also by Hart and Edwards in 1961. Arthur Samuel, Richards, Levine also have a claim in the same period. Asymptotic optimality of alpha-beta pruning among all fixed-depth game-tree search algorithms was shown by Pearl in 1982.

Imperfect Real-Time Decisions

For a game like chess, we can't completely expand out the game tree.
So we can't determine all the leaf nodes, so how do we do minimax?
Typically, we have a a heuristic function Eval(s) which gives an estimate for how good a given state is.
We also have a CUTOFF-TEST(s,d) which returns whether or not to keep evaluating given we are at a depth `d` and in state `s`.
Given these we can define a heuristic minimax as:
`H-MINIMAX(s, d) :=`
`{ ( EVAL(s), if CUTOFF-TEST(s,d)), (max_(a in ACTION(s))H-MINIMAX(RESULT(s, a), d+1),if PLAYER(s) = MAX), (min_(a in ACTION(s))H-MINIMAX(RESULT(s, a), d+1),if PLAYER(s) = MIN):}`

Stochastic Games

A stochastic game is a game (adversarial search problem) which also has a random element to it.
Examples of such games would be things like Backgammon, Parcheesi, Sorry, or the Connect Earthquake game of the homework.
To simulate things like rolling dice, we can add chance nodes after player turns in our game tree (as in the above game tree).
Such chance layers might be before or after every player move as in backgammon, or only after pairs of moves as in Connect Earthquake.
We can modify the minimax function to handle such moves by considering the expected value of a board rather than its exactly value. This gives the EXPECTIMINIMAX function (MICHIE 1966): `EXPECTIMINIMAX(s) := `
` { ( UTILITY(s), if TERMINAL-TEST(s)), (max_(a in ACTION(s))EXPECTIMINIMAX(RESULT(s, a)),if PLAYER(s) = MAX), (min_(a in ACTION(s))EXPECTIMINIMAX(RESULT(s, a)),if PLAYER(s) = MIN), (sum_rP(r)EXPECTIMINIMAX(RESULT(s, r)),if PLAYER(s) = CHANCE):}`
In the last line `r` is cycling over the elementary random events (like different dice values), `P(r)` denotes the probability of that event, so the sum is the expected value of the board given the random events.
Ballard (1983) showed a way to modify alpha-beta pruning to this situation. One way to modify, alpha-beta pruning is to compute upper and lower bounds on the values of boards from chance moves, and use those when determining cuts.

Partial Observable Games

Often in a card game, you can see your cards, but not your opponents, so the game is partially observable
You known though that the game is being played with a standard card deck of 52 cards.
One way to choose moves in such a situation is via some kind of Monte Carlo simulation you do the following steps some fixed number of times:
1. Pick a random permutation on the cards you can't see
2. Run minimax with alpha-beta pruning on that
3. See what move you do, and record it.
From the moves that were chosen by doing the above, you then pick the one that occurred the most often.

What is a Constraint Satisfaction Problem?

So far when discussing search, we have looked at environments in the states were indivisible, that is, atomic.
We now consider states which are allowed to have field variables. i.e., we consider environments with factored representations.
For factored representations, we say a state solves the problem when each field variable satisfies all the constraints on that variable.
A problem described in this way is called a constraint satisfaction problems, or CSP.
Sometimes using a factored representation allows one to eliminate large portions of the search space all at once by identifying variable/value combinations that violate the constraints.

CSP Definition

A constraint satisfaction problem consists of three components `X`, `D`, and `C` where:
- `X` is a set of variables, `{X_1, ... X_n}`
- `D` is a set of domains, `{D_1, ..., D_n}`, one for each variable.
- `C` is a set of constraints that specify allowable combinations of values.
Each domain `D` consists of a set of allowable values, `{v_1, ..., v_k}` for `X_i`.
Each constraint in `C` consists of a pair `langle scope, rel rangle`, where `scope` is a tuple of variables that participate in the constraint and `rel` is a relation that those variables can take on.

Definition Example

Suppose `X = {X_1, X_2}` and `D={{A,B}, {A,B}}`.
We would like to have the constraint that `X_1` and `X_2` take different values.
To do this we could set `C = {langle (X_1, X_2), rel rangle}` where `rel` is the relation `{ (A,B), (B,A) }`.
It is often convenient to use common abbreviations for well-known relations.
I.e., we could write `C` as `{langle (X_1, X_2), X_1 ne X_2 rangle}`.
Notice the variables themselves are obvious from the relation, so we often abbreviate `langle (X_1, X_2), X_1 ne X_2 rangle` further as just `X_1 ne X_2`.

A CSP Solution

To solve a CSP we need to define a state space and the notion of a solution.
Each state in a CSP is defined by an assignment of values to some or all of the variables, `{X_i = v_i, X_j = v_j, ...}`.
An assignment which does not violate any constraints is called a consistent or legal assignment.
A complete assignment is one in which every variable is assigned; an assignment which only assigns values to some of the variables is called a partial assignment.
A solution to a CSP is a complete, consistent assignment.

Example: Map Coloring

Australia consists of seven states and territories. Let's call a state or territory, a region.
We are given the task of coloring each region red, green, or blue on a map in such a way that no neighboring regions have the same color.
For this problem, `X = {WA, NT, Q, NSW, V, SA, T}`
The domain `D_i` for each of these variables is `{red, green, blue}`.
`C = {SA ne WA, SA ne NT, SA ne Q, SA ne NW, SA ne V, WA ne NT, NT ne Q, Q ne NSW, NSW ne V}`
An example solution to the problem might be:
`{WA = red, NT = green, Q = red, NSW = green, V = red, SA = blue, T = red}`.

Remarks

There exist general-purpose CSP-solving systems. So if you can formulate your program as a CSP, you can just run one of these systems on your problem to get an answer.
We could have formulated the map problem as a state-space search problem.
However, in the CSP formulation we can eliminate large portions of the search space more quickly.
For example, we might start with the empty map, then choose `SA = blue`, due to the constraints, we can conclude immediately that none of the five neighbors can take on the value blue.
On the other hand a state space searcher would have `3^5` assignments to the neighbors, rather than the reduced `2^5` we get because of this constraint.

Example: Job-shop Scheduling

Factories have the job of scheduling a day's worth of jobs, subject to various constraints.
Consider the problem of scheduling the assembly of a car.
The whole job is composed of tasks, and each task can be modeled as a variable, the value of each variable is the time the task starts, expressed as an integer number of minutes.
Constraints for job scheduling express things like one task must occur before another task (put wheel on before hubcap), and that certain jobs take a certain amount of time to complete.
As a concrete example of job scheduling, our variables `X` might be:
`X = {Axl\e_F, Axl\e_B, Wheel_(RF), Wheel_(LF), Wheel_(RB), Wheel_(LB), Nuts_(RF),`
`quad quad Nuts_(LF), Nuts_(RB), Nuts_(LB), Cap_(RF), Cap_(LF), Cap_(RB), Cap_(LB), Inspect}`
Next we model the precedence constraints between tasks. These are constraints of the form:
`T_1 + d_1 le T_2`
indicating that `T_1` must be done before `T_2` and takes at least `d_1` time.
For our example, the precedence constraints look like:
`Axl\e_F + 10 le Wheel_(RF), quad quad Axl\e_F + 10 le Wheel_(LF)`
`Axl\e_B + 10 le Wheel_(RB), quad quad Axl\e_B + 10 le Wheel_(LB)`
`Wheel_(RF) +1 le Nuts_(RF), quad quad Nuts_(RF) + 2 le Cap_(RF)`
`Wheel_(LF) +1 le Nuts_(LF), quad quad Nuts_(LF) + 2 le Cap_(LF)`
`Wheel_(RB) +1 le Nuts_(RB), quad quad Nuts_(RB) + 2 le Cap_(RB)`
`Wheel_(LB) +1 le Nuts_(LB), quad quad Nuts_(LB) + 2 le Cap_(LB)`

More Job-shop scheduling

Suppose we had four workers to install wheels, but they have to share one tool that puts the axle in place.
We need a disjunctive constraint to say `Axl\e_F` and `Axl\e_B` must not overlap in time; either one come first or the other does:
`(Axl\e_F + 10 le Axl\e_B) or (Axl\e_B + 10 le Axl\e_F)`
We also might need to assert that the inspection comes last and takes 3 minutes.
To do this for every variable except Inspect we add a constraint of the form `X +d_X le Inspect`.
As final constraint, we might have the requirement that the whole assembly be done in 30 minutes.
We can achieve this by limiting the domain of all the variables to:
`D_i = {1,2,3, ...27}.`

Variations on the CSP Formalism

The simplest kind of CSP variables have discrete, finite domains.
Map-coloring and job-scheduling with time limits are both of this kind.
The 8-queens problem can be formulated in this way where the variables are `Q_1, ... Q_8` which range over the domains `D_i = {1,...,8}`, the position for a given queen.
A discrete domain can be infinite, for example, it might be the integers.
In such cases, a constraint language such as `T_1 + d le T_2` must be used to understand constraint without have to enumerate the set of pairs of allowable values `(T_1, T_2)`.
There are special solution algorithms for linear constraints on integers, but the book doesn't discuss them (look up ellipsoid method).
The situation where the domain is the integers and the constraints are nonlinear can be shown to be undecidable.
If we take the domains to be an complete, totally-ordered field such as the reals, the domain is said continuous.
Continuous CSPs with linear constraints are known as linear programming problems.

Finish Adversarial Games, Constraint Satisfaction

Outline