Prisoner's Dilemma Labs

PDLab is a collection of programs that allow users (sociologists, economists, political scientists, etc.) to experiment with negotiation strategies and the establishment of social norms.

Structure of PDLab

PDLab consists of five packages:

The pd package contains a model of a PD tournament played between two agents.

The ipd package contains models of PD tournaments played between many agents. At the end of these tournaments genetic algorithms are used to generate a new population of agents and the cycle repeats.

pdUI and ipdUI contain user interfaces for pd and ipd, respectively.

pdUtils contain utility classes used by the other packages.

Simple Iterated Prisoner's Dilemma

In a simple prisoner's dilemma tournament two agents play N rounds of Prisoner's Dilemma. At the end of the tournament the total fitness (score) of each agent is computed and displayed. The design of the program makes it easy to experiment with different PD strategies.

Structure of pd

Here are most of the classes contained in the pd package:

Implementation of pd

Agent

In an agent competition, two agents repeatedly play Prisoner's Dilemma. During an iteration, each agent decides if it will cooperate with the other or not by calling its getMove method. If both cooperate, the fitness of each is incremented by 3 points (MID_PAYOFF). If neither cooperates, the fitness of each is incremented by 1 point (MIN_PAYOFF). If one cooperates and the other doesn't, then the defector's fitness is incremented by 5 points (MAX_PAYOFF) and the cooperator gets nothing.

Here's the implementation:

public void compete(Agent other) {
   for (int i = 0; i < PDConsts.TOURNAMENT_LENGTH; i++) {
      coop1 = this.getMove();
      coop2 = other.getMove();

      if (coop1 && coop2) {
         this.setFitness(this.getFitness() + PDConsts.MID_PAYOFF);
         other.setFitness(other.getFitness() + PDConsts.MID_PAYOFF);
      } else if (coop1) {
         other.setFitness(other.getFitness() + PDConsts.MAX_PAYOFF);
      } else if (coop2) {
         this.setFitness(this.getFitness() + PDConsts.MAX_PAYOFF);
      } else {
         this.setFitness(this.getFitness() + PDConsts.MIN_PAYOFF);
         other.setFitness(other.getFitness() + PDConsts.MIN_PAYOFF);
      }
      this.opponentHistory <<= 1;
      other.opponentHistory <<= 1;
      if (coop2) opponentHistory += 1;
      if (coop1) other.opponentHistory += 1;
   }
}

Some strategies depend on the opponent's history. An agent keeps track of its opponent's history in its opponentHistory field. Although this is an integer, an agent thinks of it as 32 independent bits. The bit in position n (counting from right to left) is 0 if the opponent cheated n + 1 iterations ago, and 1 if the opponent cooperated in that iteration. After each iteration, each agent's opponentHistory field is shifted one bit to the left:

this.opponentHistory <<= 1;
other.opponentHistory <<= 1;

The shift operation fills the right-most bit with a 0. This is fine if the opponent cheated in the current implementation. If not, then we must replace this 0 by a 1. This can be done by incrementing the opponentHistory by 1.

As dictated by the Strategy Pattern, the getMove method delegates to the getMove method of the strategy:

public boolean getMove() {
   return strategy.getMove();
}

Strategies

Strategies extend the abstract Strategy class:

abstract public class Strategy {
   protected Agent agent;
   abstract public boolean getMove();
   // etc.
}

The Evil Strategy

An agent employing the evil strategy always defects:

public boolean getMove() {
   return false;
}

The Naive Strategy

An agent employing the naive strategy always cooperates:

public boolean getMove() {
   return true;
}

The Random Strategy

An agent employing the random strategy always uses Java's Random.nextBoolean() method to determine its next move:

public boolean getMove() {
   return PDConsts.generator.nextBoolean();
}

The Tit-for-Tat Strategy:

An agent employing the Tit-for-Tat strategy cooperates if the opponent cooperated on the last iteration, otherwise it defects:

   public boolean getMove() {
        if (agent == null) return false;
        return (agent.getOpponentHistory() & 1) == 1;
    }

If the opponent cooperated on the last iteration, then the bit in position 0 of the opponentHistory field should be 1. Bitwise disjunction of opponentHistory with 1 produces 0 if the least significant bit is 0, and 1 otherwise.

 

Project 1

A. If a tournament consists of a single iteration, what is the most logical move an agent can make? How could you prove your answer is correct?

B. Complete the implementation of the pd package.

C. Implement an Unforgiving strategy. An agent using this strategy cooperates until its opponent defects. After that, it alwasy defects.

D. Implement a Forgiving-Tit-for-Tat strategy. This is like the Tit-for-Tat strategy except an agent defects if its opponent defects with a probability p. This can prevent cycles of revenge.

E. Create a PDMain class with a main method that pits every strategy against every other strategy. Since we have discussed 6 strategies, there will be 36 tournaments. Each tournament should consist of the same number of iterations (20 – 50). The output should be a table consisting of 6 rows and 6 columns. The entry in row i, column j should contain the fitness of each agent1 playing strategy i and agent2 playing strategy j.

F. Run Main.main several times. Are the results similar? Is there a pattern?

G. Which strategy is best if the goal is to maximize the sum of the fitness of both agents? Which is the second best? Which is the worst?

H. Create a GUI for PDTournament. The GUI should consist of three panels: Agent1 panel, Agent2 Panel, and Tournament Panel. The agent panels provide drop down menus that allow the user to select strategies for the agent and a label that displays the agent's current fitness. The Tournament Panel contains a text field that lets the user decide how many iterations to run, a run button that iterates the competition the specified number of times, and a reset button that resets the fitness of both agents back to 0 (without changing their strategies.)

Multi-generational Prisoner's Dilemma

In the multi-generational iterated prisoner's dilemma tournament a population of agents compete against each other using the Agent.compete method described earlier. For example, if there are N agents, then there will be N(N – 1)/2 competitions. (Proof?)

After all competitions are finished, the median fitness is computed. Those agents with fitness greater than or equal to the median fitness are allowed to mate M times, producing at most N offspring. These offspring replace the previous population and the cycle repeats:

for(int i = 0; i < PDConsts.TOURNAMENT_LENGTH; i++) {
   tournament.compete(); // agents compete
   tournament.mate(); // some mate, offspring takeover
   System.out.println(tournament);
}

Structure of pdi

Here are the classes in the ipd package:

Breeders and Breeder Tournaments

A breeder only competes with other breeders. It uses the inherited compete method:

public void compete(Breeder other) {
   super.compete(other);
}

But it overrides the inherited getMove method with an abstract method.

A breeder can also mate with other breeders using the (abstract) mate method. This method returns a breeder offspring as an output.

Historians and Historian Tournaments

An historian agent determines its next move based on the last K bits of the inherited opponentHistory field. These K bits are used as an index into an array of 2K boolean values. Initially, the values in this array are random.

When historians mate they produce an offspring with fitness 0 and an array obtained by spicing the first X entries of the mother's array with last Y = 2K – X entries of the father's array, where X is randomly determined. This is called crossover. Also, some of the booleans in the offspring's array may be negated with a mutation probability of P.

An historian tournament is a breeder tournament that populates the agent list with hitorians.

Project 2

A. Implement the HistorianTournament class. Of course this means you will also have to implement all of the classes it depends upon. Provide a main method that performs N iterations of the compete-mate cycle. At the end of each cycle display the generation, population size, and median fitness.

B. Does the median fitness rise?

C. What happens with the population size?

D. Experiment with different values of K, the number of past opponent moves an agent consults to determine its next move. Do larger values of K produce a higher median fitness?

E. How do the strategies used by agents in the last generation compare to the Tit-for-Tat strategy?

F. Implement a GUI for HistorianTournament. Your GUI should allow users to run the compete-mate cycle a specified number of times. When the run is finished, draw a bar graph in a panel consisting of G bars, each a different color, where G is the number of generations so far. The height of each bar should be the height of the panel times the median fitness of that generation divided by the maximum possible fitness. The width of a bar should be the width of the panel divided by G.

Opportunists and Opportunist Tournaments

According to Richard Dawkins, genes are selfish. They are only concerned with propagating their code as far into the future as possible. Genes are never altruistic. And yet we see people sacrificing themselves for a cause all of the time: war heroes, suicide bombers, firemen, policemen, etc. Why do we see altruism among humans? How come individuals with altruistic tendencies didn't become extinct 50,000 years ago? The answer is: social norms.

A social norm is an unwritten rule that can evolve in a society. For example, during WWI a conscientious objector could be exposed to scorn and derision. During the Vietnam War the opposite was true.

An opportunist is a breeder that cooperates unless he thinks he can get away with defecting. If other agents see him defecting, they can punish him.

An opportunist has three critical attributes:

0 <= boldness, vindictiveness, position <= 1

 In the first generation the values of these attributes are random.

An opportunist defects if the percentage of nearby opportunists is less than the agent's boldness.

An opportunist punishes a defector if the defector is nearby and if a random number is less than the punisher's vindictiveness. Punishing reduces the fitness of the defector by I and the punisher by J, where J < I.

When opportunists mate they produce an offspring with random position and boldness and vindictiveness equal to a weighted average of its parents' boldness and vindictiveness values. The weight is randomly determined.

An opportunist tournament is a breeder tournament that populates the agent list with opportunists. It also provides a bulk punish method that gives every agent in the agent list an opportunity to punish a defector.

Project 3

A. Implement the OpportunistTournament class. Of course this means you will also have to implement all of the classes it depends upon. Provide a main method that performs N iterations of the compete-mate cycle. At the end of each cycle display the generation, population size, median fitness, average vindictiveness, and average boldness.

B. What happens to these measures from generation to generation?

C. How does changing the cost of defecting (I) and the cost of punishing (J) change the medians and averages? Does it make a difference if I = J = 0?

D. Implement a GUI for OpportunistTournament. Your GUI should allow users to run the compete-mate cycle a specified number of times. When the run is finished, draw red, green, and blue paths in a panel. Each path connects G points, where G is the current number of generations. A point (x, y) on the red graph if in generation x the average boldness was y. On the blue graph it means the average vindictiveness of generation x was y. On the red graph it means the average fitness of generation x divided by the maximum possible fitness was y.