#| The "parse" function of this file will take a program represented as a character string, determine whether it is a legal program with respect to the block-structured language defined by the grammar given below, and print symbol tables for each of the blocks. The grammar is hard-coded into the function definition and that of its auxiliary functions. The program representation is as a string of tokens separated by whitespace characters as recognized by the char-whitespace? predicate. The determination of legality is implicit; if there is an error then an appropriate error message will be printed; otherwise the program has been determined to be legal. The symbol table for each block associates each symbol with a stack of values, each with 3 components. The components represent (1) the distance in blocks between the given block and the block where the symbol is resolved, (2) the offset in symbols between the beginning of this latter block and the declaration of the symbol in that block, and (3) the type of the symbol in that block. A machine address is reprsented by a pair of integers corresponding to (1) and (2) above. Each block is associated with a symbol of type "function". Whenever two blocks have the same immediately enclosing block, the second block is assumed to be inaccessible from the first one, and its symbol doesn't appear in the symbol table of the first block. The grammar for programs assumed by the parser has productions ::= ::= { * * * } ; ::= * ; ::= declare ; ::= ::= ::= One symbol of lookahead is sufficient for top-down parsing, including recursive descent parsing. |# #| This structure is for values associated with symbols in a symbol table The fields represent the offset in blocks from the main block the offset in symbols from the first symbol of the block the type of the symbol |# (define-struct symbol-table-value (offset1 offset2 type-name)) (define (symbol-table-value->string v) (string-append (symbol-table-value-offset1 v) ":" (symbol-table-value-offset2 v) + ", " (symbol-table-value-type-name v))) ; This function adds a binding of name ; to v to an assocation list alist, ; and returns the resulting association ; list (define (add-to-alist name v alist) (cons (cons name v) alist)) #| Prints the symbol table represented by the assocation list "alist" Each symbol appears on a separate line, with a value as documented above. class and as printed by its toString method. The auxiliary function "get-unique-names" returns a list of unduplicated first values of pairs in the assocation list. The auxiliary function "sort-names" sorts the names returned by get-unique-names, alphabetically. The main function maps an anonymous printing function to the result of sorting the unique names. There is no check that the assocation list is empty. In the current appication, this cannot happen. |# (define (print-symbol-table alist) (define (get-unique-names alist) (if (null? alist) null (let ((rest-of-names (get-unique-names (cdr alist)))) (if (member (caar alist) rest-of-names) rest-of-names (cons (caar alist) rest-of-names))))) (define (sort-names ls) (define (insert name ls) (cond ((null? ls) (list name)) ((string-cistring (reverse current-token)) output-list)) ((not (char-whitespace? (car char-list))) (rec-tokenize (cdr char-list) (cons (car char-list) current-token) output-list)) ((null? current-token) (rec-tokenize (cdr char-list) null output-list)) (else (rec-tokenize (cdr char-list) null (cons (list->string (reverse current-token)) output-list))))) (reverse (rec-tokenize (string->list s) null null))) ; This is a structure for the information about the state of ; a parser that is most commonly returned from, and passed to, ; functions. The components represent the list of unseen tokens ; (excluding the very next one, which is the lookahead), the ; lookahead token itself, the current association list of bindings, ; and the offset within the current block (the second component ; of an offset pair). (define-struct parser-state (token-list lookahead alist offset)) ;;;;;;;;;;;;;;;;;; THE PARSER ;;;;;;;;;;;;;;;;;;;;;;;;;; ; The main parse function ; To parse s, convert it to tokens, and parse the list ; of tokens as a block with a suitably initialized ; association list and the first token as lookahead. ; If s is a legal program according to the grammar above, ; the function will print the symbol tables of each ; of its blocks, in the order that their parsing is ; finished. Otherwise, it will print an appropriate ; error message. ; The program works by recursive descent. ; In the auxiliary functions, the program is assumed to exist ; as a list of tokens (represented as strings) in the ; token-list component of a parser-state argument ; called "state", except that the first token ; resides in the lookahead component of that argument. ; When parsing a block, this token will be a left brace. ; The auxiliary methods generally return an updated version ; of the parser state if there has been no error, and ; a string representing the error otherwise. (define (parse s) ; This checks the return value from a parse function ; to see whether it is an error value, as opposed ; to a normal return value (typically a parser-state value) (define (error-returned? return-value) (string? return-value)) ; This determines whether its input string is letter initial (define (letter-initial? string) (let ((c (car (string->list string)))) (and (char-ci>=? c #\A) (char-ci<=? c #\Z)))) #| This method parses a block in a program, according to the grammar that appears in the class documentation. The other arguments represent the distance between the parent block and the outermost block, and the name of the parent block. The parsing proceeds by recursive descent. The method adds the block's identifer to the parent's frame of the symbol table, after checking whether the identifier is a legal, nonduplicate block name. This makes the block name available for subsidiary blocks. It does not add its own block name into the list of the parent's declared symbols. This is the caller's responsibility, so this method needs to return its own id to the caller. The method that parses declarations adds the declared symbols to the current symbol table frame. The method that parses blocks does not. This is because declarations may be of different types, while block id's cannot be. Neither method updates the offset counter; this is the repsonsibility of the current method. |# ; The lookahead variable contains the initial left brace ; upon entry to the method. (define (parse-block state parent-level parent-blockname) ; a predicate to determine whether a given name has ; a local binding in a given association list. ; Here a local binding is one declared at the parent level (define (in-parent-block? name alist) (let ((stack (assoc name alist))) (if (not stack) #f (= parent-level (symbol-table-value-offset1 (cdr stack)))))) (let ((token-list (parser-state-token-list state))) (cond ; make sure there's a next token ((null? (parser-state-token-list state)) (string-append "premature end of program " " at beginning of block in " parent-blockname)) ; if so, this token is the block name ; make sure it's not already declared locally ((in-parent-block? (car token-list) (parser-state-alist state)) (string-append "duplicate declaration of " (car token-list) " in " parent-blockname)) ; and make sure that it's a legal block name ((not (letter-initial? (car token-list))) (string-append (car token-list) " is an illegal subblock name in " parent-blockname)) ; if there are more tokens left, ((null? (cdr token-list)) (string-append "premature end of program after declaring subblock " (car token-list) " of " parent-blockname)) ; proceed with the recursive descent parsing ; by updating the parent's alist to contain ; the new block name, ; and parsing the declarations (else (let* ((block-name (car token-list)) (level (add1 parent-level)) (parent-alist (add-to-alist (car token-list) (make-symbol-table-value parent-level (parser-state-offset state) 'function) (parser-state-alist state))) (return-value (parse-declarations (make-parser-state (cddr token-list) (cadr token-list) parent-alist 0) level block-name))) ; if the declarations parsed ok, ; try parsing the subblocks (if (error-returned? return-value) return-value (let ((return-value (parse-subblocks return-value level block-name))) ; if the subblocks ok, try parsing the statements (if (error-returned? return-value) return-value (let ((return-value (parse-statements return-value block-name))) ; if the statements parsed ok, print the ; symbol table ; if there's no more text left (at the top level) ; or there's more text left (below the top level) ; build and return the current parser state (if (error-returned? return-value) return-value (begin (print-symbol-table (parser-state-alist return-value)) (if (= parent-level 0) (if (null? (parser-state-token-list return-value)) (make-parser-state null null parent-alist (add1 (parser-state-offset state))) "additional text after end of program") (if (null? (parser-state-token-list return-value)) (string-append "premature end of program " "after declaration of subblock " block-name " in " parent-blockname) (make-parser-state (cdr (parser-state-token-list return-value)) (car (parser-state-token-list return-value)) parent-alist (add1 (parser-state-offset state)) )))))))))))))) ; The arguments to the parsing function for subblocks are ; the current parser state, ; the current nesting level (distance from the outermost block) ; and the current block name ; The lookahead token should be a left brace. ; The function simply parses a single block, and calls itself ; recursively unless there's been an error (define (parse-subblocks state level block-name) (if (equal? (parser-state-lookahead state) "{") (let ((return-value (parse-block state level block-name))) (if (error-returned? return-value) return-value (parse-subblocks return-value level block-name))) state)) ; The arguments to the parsing function for statements are ; the current parser state, ; the current nesting level (distance from the outermost block) ; and the current block name ; The lookahead token should be anything but a right brace. ; The function simply parses a single statement, and calls itself ; recursively unless there's been an error (define (parse-statements state block-name) (if (equal? (parser-state-lookahead state) "}") state (let ((return-value (parse-statement state block-name))) (if (error-returned? return-value) return-value (parse-statements return-value block-name))))) #| This method parses a statement in a program, according to the grammar that appears at the beginning of this file. The arguments are the current state of the parser, and the name of the current block. The lookahead component of the state contains the first token of the statement upon entry to the function. Anything preceding a semicolon is assumed to be part of the current statement. So the algorithm proceeds as follows: If the next token is illegal (not letter initial), or doesn't exist, an error is returned. If it's a semicolon, the current state of the parser is collected and returned. Otherwise the function calls itself recursive with updated values for the lookahead and remaining token components of the current state. |# (define (parse-statement state block-name) (cond ((null? (parser-state-token-list state)) (string-append "premature end of program in statements in " block-name)) ((equal? (parser-state-lookahead state) ";") (make-parser-state (cdr (parser-state-token-list state)) (car (parser-state-token-list state)) (parser-state-alist state) (parser-state-offset state))) ((not (letter-initial? (parser-state-lookahead state))) (string-append "bad token " (parser-state-lookahead state) " in statement in " block-name)) (else (parse-statement (make-parser-state (cdr (parser-state-token-list state)) (car (parser-state-token-list state)) (parser-state-alist state) (parser-state-offset state)) block-name)))) ; The arguments to the parsing function for declarations are ; the current parser state, ; the current nesting level (distance from the outermost block) ; and the current block name ; The lookahead token should be "declare". If it isn't, the ; method simply returns the current state so that processing ; can continue. ; The function simply parses a single declaration, and calls itself ; recursively unless there's been an error (define (parse-declarations state level block-name) (if (equal? (parser-state-lookahead state) "declare") (let ((return-value (parse-declaration state level block-name))) (if (error-returned? return-value) return-value (parse-declarations return-value level block-name))) state)) #| This function parses a declaration in a program, according to the grammar that appears in the class documentation. The arguments represent the current state of the program the nesting depth of the current block (that is, the distance from the outermost block), and the name of the current block. The function updates the symbol table for the variable being declared (i.e., updates the frame of the block immediately enclosing the declaration. This function is reponsible for rejecting a multiply declared variable and an illegal variable. It's responsible for updating the updating symbol table (since only it knows the variable's type) and the current block's list of local variables. The lookahead variable has the value "declare" upon entry to the method |# (define (parse-declaration state level block-name) ; Returns an error message. ; To be called if no tokens remain to be read. (define (announce-exhaustion) (string-append "premature end of program in declarations in " block-name)) ; a predicate to determine whether a given name has ; a local binding in a given association list. ; Here a local binding is one declared at the current level (define (in-current-block? name alist) (let ((stack (assoc name alist))) (if (not stack) #f (= level (symbol-table-value-offset1 (cdr stack)))))) (cond ; if there are tokens yet unread, ; the next token is the variable being declared, ; so check whether it's legal ((null? (parser-state-token-list state)) (announce-exhaustion)) (else (let ((variable (car (parser-state-token-list state))) (token-list (cdr (parser-state-token-list state)))) (cond ((in-current-block? variable (parser-state-alist state)) (string-append "doubly declared variable " variable " in block " block-name)) ((not (letter-initial? variable)) (string-append "bad variable " variable " declared in " block-name)) ; if there are tokens yet unread, ; the next token is the type for the declaration, ; so check whether it's legal ((null? token-list) (announce-exhaustion)) (else (let ((type-name (car token-list)) (token-list (cdr token-list))) (cond ((not (letter-initial? type-name)) (string-append variable " has bad type in " block-name)) ; the next token should be a semicolon ; if so, and if additional tokens remain, ; update the parser state by adding the ; binding of variable to the ordered ; pair that represents its address, ; and updating the offset counter giving the ; next second component of the pair. ; Return the updated state. ((null? token-list) (announce-exhaustion)) (else (let ((semicolon (car token-list)) (token-list (cdr token-list))) (cond ((not (equal? semicolon ";")) (string-append "missing semicolon in declarations for " block-name)) ((null? token-list) (string-append "premature end of program " "after declaration in " block-name)) (else (make-parser-state (cdr token-list) (car token-list) (add-to-alist variable (make-symbol-table-value level (parser-state-offset state) type-name) (parser-state-alist state)) (add1 (parser-state-offset state))))))))))))))) ; begin parse ; Processing begins by tokeninzing the input. ; Assuming that there is at least one token, and that the initial ; token is a left brace, the function constructs the initial ; state with the correct predefined variables in the symbol table, ; and parses the input as a block. If an error is returned, ; the corresponding error message will be printed. Otherwise, ; the symbol table for the block will be printed. (let ((token-list (tokenize s))) (cond ((null? token-list) "empty program") ((equal? (car token-list) "{" ) (let ((result (parse-block (make-parser-state (cdr token-list) (car token-list) (add-to-alist "sqrt" (make-symbol-table-value 0 0 "function") (add-to-alist "null" (make-symbol-table-value 0 1 "null-type") null)) 2) 0 "main"))) (cond ((error-returned? result) (display result)) (else (print-symbol-table (parser-state-alist result)))))) (else (display "program must begin with left brace")))))