Implementing a PL Virtual Machine

In Brainerd and Landwebers' book, Theory of Computation, they define a simple assembly-like language called PL. They define the notion of PL-Computable, which means computable using a PL algorithm. Of course assuming Church's Thesis we can replace PL with any programming language.

PL Syntax

A PL program is a sequence of commands separated by newline characters:

COMMAND ::= (LABEL:)?LOAD | (LABEL:)?INC | (LABEL:)?GOTO | (LABEL:)?LOOP
LOAD ::= load VAR, (VAR | NAT)
INC ::= inc VAR
GOTO ::= goto LABEL
LOOP ::= loop (VAR | NAT) COMMAND+ end
VAR ::= [a-z]+
LABEL ::= [A-Z]+
NAT::= [0-9]+

Notice that commands can have labels. (But not "end".) Labels are uppercase strings, while variables are lowercase strings.

PL Semantics

PL programs are executed by a PL virtual machine (VM):

Users can load programs into a VM one command at a time using the add method or from a file using the compile method. The compile method reads commands from a file one line at a time and passes them to the add method. The add method parses commands as strings into instances of the Command class, where command fields can be more easily accessed. Our version of add matches its input to a regular expression. A matcher can then be used to extract these fields into the command object. Note that a regular expression can't verify that loop and end commands are matched. (Why?) Thus, unmatched loop or end commands will turn into runtime errors. Otherwise, add should do as much error checking as possible.

The run method uses the program counter (pc) to iterate through the program executing each command using the execute method:

execute(program.get(pc++));

The heart of the VM, where command semantics are specified, is the execute method:

void execute(Command cmmd) {
   if (cmmd.opcode.equals("load")) { ... }
   else if (cmmd.opcode.equals("inc")) { ... }
   else if (cmmd.opcode.equals("goto")) { ... }
   else if (cmmd.opcode.equals("loop")) { ... }
   else if (cmmd.opcode.equals("end")) { ... }
   else { error, unrecognized opcode ... }
}

As an example, here's a simple program:

public class TestVM {

   public static void main(String[] args) {
      try {
         VM vm = new VM();
         vm.add("load x 10");
         vm.add("load y 5");
         vm.add("loop x");
         vm.add("inc y");
         vm.add("end");
         vm.add("goto AAA");
         vm.add("inc y");
         vm.add("AAA: inc y");
         vm.run();
         System.out.println(vm);
      } catch(Exception e) {
         System.out.println(e);
      }
   }
}

Here's the program output:

pc = 8
vars = {x=10, y=16, AAA=7}

A variable is a row in the vars table. The key is the name of the variable and the associated value is its content. The load instruction assigns a new content to a variable. The increment instruction increments the content of a variable. (Note: variables are automatically defined and initialized to 0 when they are first referenced.)

A loop command, like loop x, creates a frame and pushes it onto the loop control stack. A frame contains the current pc (which should be the address of the next instruction, and the count, x, which is the number of times the loop will be executed.

The end command peeks at the top most framed on the control stack. If the program is properly written, this should be the frame created by a matching loop command. If the count is 0, the frame is popped, otherwise the count is decremented and the VM's pc is set to the frame's pc.

When a labeled command is added to the program, its label is turned into a variable with content equal to the command's position in the list. The goto label command sets the VM's pc to the content of the label variable.

In the example above the loop x command executes the inc y command 10 times before the end command pops the loop command's frame off of the control stack. At this point the value of y is 15. The goto AAA command skips the next increment command by setting the pc to the value of AAA, which is 7, the position of the final inc y instruction in the program. Now the value of y is 17 and the program ends.

Notes:

We can think of PL programs as parse trees. The root node represents the entire program. Parent nodes represent loop/end pairs, and leaf nodes represent inc, load, and goto instructions. The children of a parent node corresponds to the body of a loop instruction.

Transferring control into the body of a loop command from outside is not allowed. Instead, control is transferred to the outer-most loop containing the target. In effect, this would be a sibling of the goto instruction.

Transferring control to a non-existent label halts the program.

Labels should be unique.

Labs

Lab 1

Write a regular expression that matches PL instructions. Create groups (using parentheses) for each instruction field: label, opcode, arg1, and arg2.

Note: You will need to modify the grammar since a regular expression can't enforce the balance of begin/end statements. (Why?) Instead, treat end as another instruction that could, in principle, occur anywhere. However, executing an end statement that is not properly paired with a loop statement may cause the program to crash. (The advantage of a really good grammar is that a lot of errors are caught at compile time rather than runtime.)

Lab 2

Implement the VM add and compile methods. Test them by loading a program, then printing it to see if fields are being properly extracted. (Hint: it helps to provide the Command class with a toString method.

Lab 3

Complete the implementation of the VM class. Test it by writing a PL program that computes n + m.

Lab 4

Further test your VM class by writing PL programs that compute the following functions:

a)     n * m

b)    max(m, n)

c)     m – n (which returns 0 if m < n)

Lab 5

Create and test a console for the VM. The console allows users to load and inspect variables, load and run programs from files, and step through programs one command at a time.

Lab 6

Add macros to PL. A macro is a file containing a sub-routine. When it appears in a program it is automatically replaced by the file's contents. Be careful with macros as subroutine variables and labels may conflict with variables and labels of the parent program. This problem can be somewhat mitigated by using weird naming conventions in sub-routines.

Lab 7

Add a feature to your VM that prints the number of variables used and the number of instructions executed at the end of each program execution.

Lab 8

Implement the following functions in PL – {goto}

double(x) = x + x
exp(x) = double(double(... double(0)...)) // m-times
hyperExp(m: Int) = exp(exp(...exp(0)...)) // m-times
hyper2Exp(m: Int) = hyperExp(hyperExp(... hyperExp(0)...)) // m-times
hyper3Exp(m: Int) = hyper2Exp(hyper2Exp(... hyper2Exp(0)...)) // m-times