In Brainerd and Landwebers' book, Theory of Computation, they define a simple assembly-like language called PL. They define the notion of PL-Computable, which means computable using a PL algorithm. Of course assuming Church's Thesis we can replace PL with any programming language.
There was a bug in the design of version 1.0 of the PL Virtual Machine. The problem seemed simple enough: the command loop 0 executed its body instead of skipping over it. I managed a very hacky fix by having each command in a loop body keep track of its enclosing loop parent. If the parent's count was <= 0, then the command was ignored.
To avoid the hack I decided to redesign the Command and VM classes. I was pleased with the result. No control stack is needed in the new version. Instead, goto, loop, and end commands are treated similarly in that each knows the position in the program of its target. The loop command also maintains a count variable that is used by the matching end command to determine how many times the loop body has been executed. The resulting execute method is much simpler.
The price paid for the simplicity of execute is that after all commands have been loaded into the program, a special resolveLabels method must be called. This method iterates through the program setting the targets mentioned above.
A PL program is a sequence of commands separated by newline characters:
COMMAND ::= (LABEL:)?LOAD | (LABEL:)?INC | (LABEL:)?GOTO |
(LABEL:)?LOOP
LOAD ::= load VAR, (VAR | NAT)
INC ::= inc VAR
GOTO ::= goto LABEL
LOOP ::= loop (VAR | NAT) COMMAND+ end
VAR ::= [a-z]+
LABEL ::= [A-Z]+
NAT::= [0-9]+
Notice that commands can have labels. (But not "end".) Labels are uppercase strings, while variables are lowercase strings.
PL programs are executed by a virtual machine (VM) equipped with a table of variable-value bindings:
Users can load programs into a VM one command at a time using the add method or from a file using the compile method. The compile method reads commands from a file one line at a time and passes them to the add method.
The add method passes the command (as a string) and the pc (program counter) to the Command constructor, which uses a pattern matcher to extract the opcode, label (if any), and arguments (if any) and assign them to the corresponding command fields:
program.add(new Command(cmmd, pc++));
Note that each command knows its position in the program (pc). Loop, goto, and end commands also know the position of their targets (these will be set by the resolveLabels method). Loop commands also keep track of the number of times the loop has been executed (this is count, which is set by the execute method).
The run method resolves all labels, then uses the program counter (pc) to iterate through the program executing each command using the execute method:
public void run() throws Exception {
resolveLabels();
pc = 0;
while(pc < program.size()) {
execute(program.get(pc++));
}
}
The heart of the VM, where command semantics are specified, is the execute method:
void execute(Command cmmd) {
if
(cmmd.getOpcode().equals("load")) { ... }
else if
(cmmd.getOpcode().equals("inc")) { ... }
else if
(cmmd.getOpcode().equals("goto")) { ... }
else if
(cmmd.getOpcode().equals("loop")) { ... }
else if
(cmmd.getOpcode().equals("end")) { ... }
else { error, unrecognized opcode ... }
}
As an example, here's a simple program:
public class TestVM {
public static void main(String[] args)
{
try {
VM vm = new VM();
vm.add("load x 10");
vm.add("load y 5");
vm.add("loop x");
vm.add("inc y");
vm.add("end");
vm.add("goto AAA");
vm.add("inc y");
vm.add("AAA: inc y");
vm.run();
System.out.println(vm);
} catch(Exception e) {
System.out.println(e);
}
}
}
Here's the program output:
pc = 8
vars = {x=10, y=16}
A variable is a row in the vars table. The key is the name of the variable and the associated value is its content.
The load instruction assigns the value of arg2 (which can be a variable or an integer) to arg1 in the vars table. If arg1 isn't present in the vars table, a new entry for it is first added to the table.
The increment instruction increments the content of arg1 in the vars table.
Note: variables are automatically defined and initialized to 0 when they are first referenced.
Note: Do not add a new row to the table if the variable being loaded or incremented is already in the table.
The goto command simply loads its target into the pc.
The loop command sets its count to the value of arg1 (which can be a variable or an integer). If the count is initially <= 0, pc is set to the target + 1. This is the index of the instruction immediately following the matching end command.
The end command fetches the corresponding loop command:
Command myLoop = program.get(cmmd.getTarget());
It decrements the loop command count. If the count is > 0, then the pc is set to the loop command's pc + 1. This is the index of the first instruction in the loop body.
Resolving labels uses a command stack to match loop commands with corresponding end commands and a hash map to match labels with program indices:
private void resolveLabels() {
Stack<Command> loopStack = new
Stack<Command>();
Map<String, Integer> targets =
new HashMap<String, Integer>();
// pass 1
for(Command cmmd: program) { ... }
// pass 2
for(Command cmmd: program) { ... }
}
Labels and targets are resolved by iterating through the program twice.
During the first pass:
· If a labeled command is encountered, its label and pc are put into the targets map
· If a loop command is encountered, it is pushed onto the loop stack
· If an end command is encountered, we assume the corresponding loop command is at the top of the loop stack. We pop it off the stack, set its target to the pc of the end command, and set the end command's target to the pc of the loop command.
During the second pass we set the targets of goto commands. This is done by searching the target map for the pc of the label (arg1) and load this into the goto command's target.
Note that it makes the most sense to resolve labels after all of the commands have been added to the program.
We can think of PL programs as parse trees. The root node represents the entire program. Parent nodes represent loop/end pairs, and leaf nodes represent inc, load, and goto instructions. The children of a parent node corresponds to the body of a loop instruction.
Transferring control into the body of a loop command from outside is not allowed. Instead, control is transferred to the outer-most loop containing the target. In effect, this would be a sibling of the goto instruction.
Transferring control to a non-existent label halts the program.
Labels should be unique.
Write a regular expression that matches PL instructions. Create groups (using parentheses) for each instruction field: label, opcode, arg1, and arg2.
Note: You will need to modify the grammar since a regular expression can't enforce the balance of begin/end statements. (Why?) Instead, treat end as another instruction that could, in principle, occur anywhere. However, executing an end statement that is not properly paired with a loop statement may cause the program to crash. (The advantage of a really good grammar is that a lot of errors are caught at compile time rather than runtime.)
Implement the VM add and compile methods. Test them by loading a program, then printing it to see if fields are being properly extracted. (Hint: it helps to provide the Command class with a toString method.)
Complete the implementation of the VM class. Test it by writing a PL program that computes n + m.
Further test your VM class by writing PL programs that compute the following functions:
n * m
max(m, n)
m – n (which returns 0 if m < n)
Create and test a console for the VM. The console allows users to load and inspect variables, load and run programs from files, and step through programs one command at a time.
Add macros to PL. A macro is a file containing a sub-routine. When it appears in a program it is automatically replaced by the file's contents. Be careful with macros as subroutine variables and labels may conflict with variables and labels of the parent program. This problem can be somewhat mitigated by using weird naming conventions in sub-routines.
Add a feature to your VM that prints the number of variables used and the number of instructions executed at the end of each program execution.
Implement the following functions in PL – {goto} (Hint: these require nested loops.)
double(x) = x + x
exp(x) = double(double(... double(1)...)) // m-times
hyperExp(m: Int) = exp(exp(...exp(1)...)) // m-times
hyper2Exp(m: Int) = hyperExp(hyperExp(... hyperExp(1)...)) // m-times
hyper3Exp(m: Int) = hyper2Exp(hyper2Exp(... hyper2Exp(1)...)) //
m-times