Turtle Tournaments

Turtle Tournaments customize the BAM2 framework. The active turtles interacts with a selected candidate by playing a mutual dilemma game:

to interact [candidate]
   play-game-with candidate
   ; etc.
end

In a mutual dilemma game two players, A and B, are presented with a mutual dilemma. Each must choose one of two options: fight or flee, cooperate or defect, hold 'em or fold 'em, etc. The choice may be based on strategy and history, but is made without knowing the opponent's choice.

After the choices are made, each player is awarded points based on the game's payoff matrix:

Payoffs

B

FALSE

TRUE

A

FALSE

a1/b1

a2/b2

TRUE

a3/b3

a4/b4

The a3/b3 entry indicates that if A chooses TRUE and B chooses FALSE, then A receives a3 points and B receives b3 points.

An Implementation

Here's the basic game playing procedure:

to play-game-with [candidate]
  let my-choice choice? candidate
  let candidate-choice [choice? myself] of candidate
  update-attributes candidate my-choice candidate-choice
end

The choice can be based on knowledge of the candidate's history and the strategy used by the active turtle. The details must be filled in. For now the choice is random:

to-report choice? [candidate]
  report (random 2) < 1 ; for now
end

Note the use of myself when asking the candidate to make a choice:

let candidate-choice [choice? myself] of candidate

The of command is similar to the ask command. The ask command allows turtle T1 to ask turtle T2 to execute a block of commands, while the of command allows T1 to ask T2 to evaluate a block of expressions and report the value of the last one in the block. In both cases, the active turtle is T2. T2 sits on top of T1 on the turtle stack. Inside the block "self" refers to T2 and "myself" refers to T1.

After the choice is made, energy, history, and number of games played must be updated:

to update-attributes [opponent my-choice opponent-choice]
  set num-games-played num-games-played + 1
  ask opponent [set num-games-played num-games-played + 1]
  ; update my history & opponent's history if necessary
  if my-choice and opponent-choice
  [
    set energy energy + payoff-a4
    ask candidate [set energy energy + payoff-b4]
    stop
  ]
  if my-choice and not opponent-choice
  [
    set energy energy + payoff-a3
    ask candidate [set energy energy + payoff-b3]
    stop
  ]
  if not my-choice and opponent-choice
  [
    set energy energy + payoff-a2
    ask candidate [set energy energy + payoff-b2]
    stop
  ]
  if not my-choice and not opponent-choice
  [
    set energy energy + payoff-a1
    ask candidate [set energy energy + payoff-b1]
    stop
  ]
end

The payoff matrix is stored in eight global variables:

globals [
  halt?
  world-diam
  ; payoffs:
  payoff-a1   ; payoff for A if A & B chose TRUE
  payoff-a2   ; payoff for A if A chooses TRUE and B FALSE
  payoff-a3   ; payoff for A if A chooses FALSE and B TRUE
  payoff-a4   ; payoff for A if A & B choose FALSE
  payoff-b1   ; payoff for B if A & B chose TRUE
  payoff-b2   ; payoff for B if A chooses TRUE and B FALSE
  payoff-b3   ; payoff for B if A chooses FALSE and B TRUE
  payoff-b4   ; payoff for B if A & B choose FALSE
]

Examples

Chicken

In Rebel without a Cause James Dean's character must prove his courage by playing a dangerous game of Chicken. He and his opponent race stolen cars toward each other. Each driver must choose: to swerve or not to swerve. The first to swerve is labeled "chicken".

Here's a typical payoff matrix for Chicken

Payoffs

B

FALSE

TRUE

A

FALSE

0/0

5/0

TRUE

0/5

1/1

The 0/5 entry in the matrix says that if A chooses to swerve (swerve = TRUE) and B chooses not to swerve (swerve = FALSE), then A is awarded 0 points and B is awarded 5 points. In other words, A is the chicken and B is the hero.

We see the game of Chicken being played out in the real world all the time. Price wars and arms races are examples.

Coordination

Coordination is similar to Chicken: Two cars are speeding toward each other on a narrow road. Instead of choosing to swerve or not to swerve, the cars must choose between swerving left or swerving right. If they make the same choice, then a terrible crash is averted.

Here's a typical playoff matrix:

Payoffs

B

FALSE

TRUE

A

FALSE

3/3

0/0

TRUE

0/0

3/3

The 3/3 entry in the upper-left corner says that A and B both chose not to swerve left (SL? = FALSE). In other words, they both swerved right and therefore a crash was averted.

Coordination is the game companies play when they want to develop a product that adheres to the same standards as similar products their competitors are developing, but don't want to reveal information about the product.

Battle of the Sexes

It's George and Martha's anniversary. They want to be together and agreed to meet after work at the movies. The trouble is no decision was made about which movie to see. The phones are out, so they can't communicate. George really wants to see Planet of the Apes, but Martha wants to see Wuthering Heights. George and Martha must decide independently, should they both show up at the theater playing Wuthering Heights?

Here's a typical playoff matrix for Battle of the Sexes:

Payoffs

B

FALSE

TRUE

A

FALSE

3/2

1/1

TRUE

0/0

2/3

The 3/2 entry says that if George (= A) and Martha (= B) both decide not to go to Wuthering Heights, but go to Planet of the Apes instead (go to WH? = FALSE),  then George receives 3 points because he gets to be with Martha and gets to see the movie he wants to see. Martha only receives 2 points. She gets to be with George, but must endure Planet of the Apes.

Of course Battle of the Sexes is played out every time two collaborators must choose between cake and eating it too.

Prisoner's Dilemma

Prisoner's Dilemma (PD) is the most famous dilemma game. Two men are accused of a crime. They are separated and each is asked to testify against the other. If both refuse, then both receive light sentences on a lesser charge. If both agree, then both receive moderate sentences reduced as a reward for their testimony. However, if one agrees to testify and the other refuses, then the former goes free while the later receives a stiff sentence.

Here's a typical playoff matrix:

Payoffs

B

FALSE

TRUE

A

FALSE

1/1

0/5

TRUE

5/0

3/3

The 5/0 entry says that A chooses to testify (testify? = TRUE) and B refuses (testify? = FALSE), then A receives a 5 year prison reduction while B receives no reduction.

Prisoner's Dilemma is played each time a business deal is made. Shall A cheat B for a big payoff or be honest for a moderate payoff? If both cheat, then the payoff is minimal.

Iterated Prisoner's Dilemma

In an Iterated Prisoner's Dilemma (IPD) tournament turtles play the Prisoner's Dilemma game. Each turtle has a strategy, a history, and keeps track of the number of games played:

turtles-own [energy vision mobility strategy history num-games-played]

PD Strategies

Four commonly used PD strategies are:

to-report choice?
  if strategy = "never-cheat" [report false] ; I won't cheat
  if strategy = "always-cheat" [report true] ; I will cheat
  if strategy = "randonmly-cheat" [report random 2 < 1]
  if strategy = "tit-for-tat" [???]
  report false ; default
end

Out of 100 turtles, assume each strategy is used by 25 turtles. Which one is best in the long run?

Tit-for-Tat

The Tit-for-tat strategy says to choose what your opponent chose the last time you played him. This means each turtle must maintain a history list of N Booleans, where N = the turtle count:

history = [true true false ... true]

A turtle's choice is simply the item in the history list at position p where p is the id (who) of the candidate.

Of course a turtle must replace this item with the candidate's choice after each game.

Problem

Complete ipd1.nlogo by adding the tit-for-tat strategy.

Vindictiveness

In a society of cheaters and saints, the cheaters will always win. So why are there any saints at all? The answer: vindictiveness. Vindictiveness is the tendency for an agent who has been cheated to punish his cheater.

There are a couple of caveats. First, punishing a cheater isn't free. There is a fixed energy cost. For example, the cheater might not like getting punished and might punch the punisher in the nose!

Second, cheaters don't punish other cheaters. This is the thieves' honor code.

Add vindictiveness to the  ipd1.nlogo model.

In this extension every turtle has a vindictiveness attribute set to a random number less than the vindictiveness of the society:

set vindictiveness random max-vindictiveness

Where max-vindictiveness is a global set by a slider.

If a turtle has been cheated by his opponent, but didn't himself cheat, then in addition to energy points awarded, he does this:

if random 100 < vindictiveness
[
   set energy energy – punishment-cost
   ask opponent [set enery energy – punishment]
]

Where punishment and punishment-cost are also globals set by sliders.

Of course if the opponent cooperated but the active turtle cheated, then the opponent might punish the active turtle.

Genetically grown strategies

In the genetic version of IPD turtles mate every 100 ticks:

to interact [candidate]
 ifelse ticks mod 100 = 0
 [
   mate-with candidate
 ]
 [
   play-game-with candidate
 ]
end

All turtles have the same strategy. Here's how it works. A turtle remembers the last three choices made by every other turtle:

history = [[true true false] [false true true] ...]

Note that for any opponent, there are eight possible histories.

A turtle's strategy maps the opponent's history onto a random Boolean value. This is the turtle's choice. For example, if the active turtle's opponent is the turtle with id = 1, then this opponent's history if [false true true] this means the last time the active turtle played with this turtle, he chose false. The previous two times he chose true. The active turtle's strategy might map this choice to false:

strategy: [false true true] -> false

This means the active turtle will try to cheat turtle #1 in the next game.

One way to implement this is to define strategy to be a list of eight random Booleans:

strategy = [false false true false false true true false]

We can use the opponent's history to compute and index into this list. We can do this by translating false = 0 and true = 1 and simply compute the corresponding binary number. For example:

[false true true] = 2 * (2 * 0) + 1) + 1 = 3

Our choice is then:

item 3 strategy = false

Of course we must update the opponent's history list after each game.

Mating

When mating the active turtle invokes hatch, then dies. This is a form of population control that keeps the number of turtles fixed.

hatch 1
  [
    set strategy hatchling-strategy
    set vision random world-diam
    set mobility random world-diam
    set energy random 100
    set history []
    repeat count turtles
    [
      set history fput [true true true] history
    ]
    set id hatchling-id
  ]
  die

Each turtle has a programmer-defined id number. This is different from the system-defined id number called who. The id number is used to select the opponent history from the history list. The hatchling inherits the id number of the dying parent.

The hatchling's strategy is computed by appending the last 8 – N entries of the candidate's strategy to the first N entries of the active turtle's strategy, where N is a random number below 8. This is genetic splicing. Next, a random item in the hatchling's strategy is changed with a probability p where p is very small.

Problem

Modify the  ipd1.nlogo model by replacing all strategies with genetic strategies and by adding mating.