Assignment #1

CS 152
due September 27, 2005
100 points

In this exercise you are to write a parser for the simple grammar G given below. Your parser should also be able to simulate the maintenance of a symbol table for the simple block-structured language L(G) generated by G. For this assignment, you are to use Java. Version 5.0 of Java is recommended.

Specifically, you are to define a class Parser with a method parse that takes a purported program in L(G), represented as a String, and constructs and periodically prints the current version of a symbol table (as described below). You are also to print an informative error message for those inputs that do not represent programs in L(G). You only need to do this for one error; you needn't attempt to find them all. You may stop parsing the input once you find the first error. It is okay if you have already printed one or more versions of the symbol table before you find the first error. The parse method needn't return a value; the presence of an error message is to be necessary and sufficient evidence that the input did not parse.

The grammar G has the productions given below. The start symbol is <program>. Note that the second production is recursive, so programs in L(G) have a nested block structure. Also note that every declaration declares an identifer to be of a particular type, and that each declaration belongs to a block. Finally, note that variable names, block names, and type names are to be tokens of a particular type that is not defined by the grammar. As this type name suggests, your parser is to enforce the constraint that such tokens must begin with a letter (upper or lower case). Any other behavior counts as a syntax error. It is not an error to use the token declare as a name of one of these types.

  <program> ::= <block>
  <block> ::= { <block-name> <declaration>* <block>* <statement>* } 
  <statement> ::= <letter-initial-token>* ;
  <declaration> ::= declare <variable-name> <type-name> ;
  <variable-name> ::= <letter-initial-token>
  <block-name> ::= <letter-initial-token>
  <type-name> ::= <letter-initial-token>
Braces (and semicolons} are nonterminal symbols of G, as opposed to metasymbols. You may assume that the tokens of a program in L(G) are separated by those white space characters that are used as default delimiters by Java's StringTokenizer class. In fact, it will be helpful to use an object of this class to extract tokens from the input.

Your parse method should proceed through its input from top to bottom. The block-structured nature of legal programs in L(G) means that at any point during the parser's execution, it will be processing some particular point in the program. This point will be contained in one or more nested blocks. If a variable declaration or a new block is encountered at this point, the variable name or block name should be associated with the innermost of these blocks.

The symbol table is to contain a representation, for each active identifer, of its associated memory address and of its type. Active identifiers for L(G) may be either variable names or block names (not type names). The type of a variable is the type given by its declaration; the type of a block is to be function. Since the set of active identifiers changes as processing enters and exits blocks, the symbol table will need to change as well. In other words, each block has its own version of the symbol table.

Your parse method is to print each block's version of the symbol table when processing of that block is complete. The entries in each symbol table are to be printed in alphabetical order.

The associated memory address in a symbol table entry is to be represented as an ordered pair of nonnegative integers. For a variable x in block B's version of the symbol table, the first integer is to represent the distance in blocks between B and the block B' associated with the declaration of x. The second integer is to represent the position of x among the identifiers associated with B'. So for example, in the program of Figure 5.24 of the text, the variable b in D's symbol table would be associated with the pair (1,1), and the variable x with the pair (0,0).

Block names are treated similarly (remember we are thinking of block names as names of functions which can be called). In the Assignment 1 analog of the program of Figure 5.24, the block name B would be associated with (2,2) in D's symbol table, since the name B belongs to block A, which is 2 blocks distant from D (note that C is not nested inside B, as seen also in the second figure on page 161), and since B follows x and y among the identifers belonging to A.

A second declaration of the same symbol in the same block is to be treated as an error. In this context, blocks count as declarations of their block names. That is, a block is not to have a variable and a subblock with the same name. In other words, a single name space is to be used for variable names and block names.

Section 5.3 of the text gives a strategy for updating the symbol table when processing enters or exits a block. Here each name is associated with a linked stack. Typically the collection of names and their associations is stored in a hash table, although for this assignment the requirement of frequent printing in sorted order might suggest a TreeMap rather than a HashMap.

When you print a symbol table, each identifier is to be associated with just one memory address and type. So if you use a stack as suggested above, do not print the entire stack for each variable as part of the symbol table -- just print its first element.

In the terminology of Section 5.3, this assignment uses static scoping rather than dynamic scoping.

Your parser should treat the entire program as a global block called main. This symbol table for this global block should contain the name of the outermost block defined by the program, as well as the predefined global identifers "sqrt" (of type "function") and "null" (of type "null-type"). These three names should of course be available to other blocks in the program. Your parser should print this symbol table at the very end of its processing.

Your program needn't concern itself with any representation of function bodies, environments and their frames, or object code.

You may define any classes and methods that you find useful, in addition to Parser and parse.

You are to test your program with the test class A1.java available from the class web site. You are to turn in both hard and soft copies of all of your code (but not A1.java, unless you modify it) as well as a hard copy of the results of your testing. Unless I say otherwise in class, the soft copy of your code should be turned in on a diskette.