Parsing Tools




CS152

Chris Pollett

Feb. 25, 2009

Outline

A Simple Lex Example

%{
#include <stdio.h>
int wordCount = 0;
%}
word [^ \t\n]+ /* make an abbreviation word for the expr [^ \t\n]+ */
%%
[\t\n ]+ {printf("I see whitespace\n");} //what to do if see pattern
{word} {wordCount++;}
%%
int main()
{
   yylex(); //call the lexer. Gets input from command line until ^D
   printf("word count: %d", wordCount); return 0;
}
To compile:
lex  -o lextest.c lextest.l #default output is lex.yy.c
gcc  lextest.c -o lextest -ll  #-ll not needed if use flex.

A Yacc Example

%{
#include <stdio.h>
%}
%token ARTICLE NORMAL_NOUN PROPER_NOUN
%%
noun_phrase : PROPER_NOUN { printf("Proper Noun\n"); }
    | ARTICLE NORMAL_NOUN {printf("Usual Noun\n"); }
%%
int main(int argc, char **argv)
{
   extern FILE *yyin;
   yyin = fopen(argv[1], "r"); //sets up lexer to use this file as input
  yyparse();  
  fclose(yyin);  
}

More on Yacc Example

Still More on Yacc Example

Yacc $ variables

Yacc refers to parts of a rule using variables which begin with a dollar sign:

expression : expression '+' expression {$$ = $1 + $3;}
        | expression '-' expression {$$ = $1 - $3;}
	    | NUMBER {$$ =$1;}
        ;

$$ refers to the left hand side of the rule value. $n refers to the nth item on the right hand side.

Typing Tokens

More on Typed Tokens

To set up YYSTYPE in your grammar (will appear in y.tab.h file after yacc'ing):

%{
//stuff
%}
%union {
      double dval; // in this case we have two possibilities
      int ival; // could have more. In real world possibilities
                   // would include a struct for a syntax tree.
}
%token <ival> INTEGER
%token <dval> DOUBLE
%type <dval> expression /*notice can say type of nonterminal */

Typed Tokens and the Lexer

Error Handling in Your Grammar