HW#2 --- last modified February 28 2019 22:25:29..
Solution set.
Due date: Mar 11
Files to be submitted:
Hw2.zip
Purpose: To learn about Lex (Flex) and Yacc (Bison) and in the process create
an interpreter for the command line of a simple database system.
Related Course Outcomes:
The main course outcomes covered by this assignment are:
LO5 -- Read and produce context-free grammars.
LO6 -- Write recursive-descent parsers for simple languages, by hand or with a parser generator.
LO8 -- Write interpreters for simple languages that involve arithmetic expressions, bindings of values to names, and function calls.
Specification:
For this homework you will create a very simple database management system called YoctoDB.
In many real world situations it is useful to have a database system that supports both
a command line interface and is tiny enough that it can also be embedded in other application
which make use of it only through a C API. For instance, Firefox, the web-browser, makes use of
SQLite. There is also the very classical Berkeley DB API. The O'Reilly book for Lex and Yacc
contains a parser for the SQL language, so we will develop our own language for YoctoDB.
To build your project, the grader will unzip your Hw2.zip and switch into that directory. The grader
will then do a "make all". The all target should depend on two subtargets: commandline and apitest.
The commandline target should build a commandline interface program for YoctoDB. This should be runnable
by the grader with a line like:
./yoctodb
This program should then display the yoctodb prompt:
yoctodb>
At which point the grader can enter a YoctoDB command, see its output, get a new prompt, enter a new command, see its output and continue doing this for as long as desired.
The apitest target should be runnable by the grader by typing a line like:
./apitest
The program should test the functions of your C API to YoctoDB. To use the C API, a programmer should
only need to include the header yoctodb.h . Your api can have as many functions as you like; however,
it must support at least the following functions:
int yoctoInit(int *instance);
int yoctoLock(int instance, int *lock);
int yoctoUnlock(int instance, int *lock);
int yoctoExec(int instance, int lock, char *cmd, int *responseSize);
int yoctoResult(int instance, int lock, char *result, int numBytes);
int yoctoDestroy(int instance);
Each of the functions above return 0 if they worked successfully
and -1 otherwise. In order to understand what these functions do, we first briefly describe what
a database is. For us, a database instance will be a collection of tables. A table
is a collection of rows. Each row, for us, will be a tuple of strings. For example
one might have the table $person(fname, lname, age) and an example might be
("Chris", "Pollett", "21+"). The goal of a database management system such as
YoctoDB is to make it easy to manage databases.
We want the YoctoDB to be fast and
so we want as much as possible to keep things in memory rather than on disk. We might
use a yoctoExec call to read in a database into memory from disk, but thereafter, if
we execute further yoctoExec command on this database, we don't want to have to re-read it
in. So we use global data structures to store our databases. To fix
a specific representation for these data yoctoDB manages, let's assume you use a vector array of databases, each database
being a hash table of database tables, and each table being a linked-list of rows. Try to
keep your design flexible. Prefix the given global variables by "yocto" to avoid namespace
collisions.
yoctoInit tries to create a new empty database in the vector array of databases.
If it is successful then the value the variable instance points to will be set to the index of this new
database in the vector array.
yoctoDestroy is used to delete the database instance making
sure to free up any memory allocated for its tables.
Now in the real world one might be trying
to use yoctoDB in a threaded environment. We want to avoid situations where weird interleaving
of operations from two yoctoDB using threads leave a database in an inconsistent state. To avoid
this yoctoDB will support a crude database level locking: In order to use the database at all, you
must have the lock for it.
yoctoLock checks if the value of the yoctoDatabase[instance].lock is 0.
If it is not 0 the function returns 1. If it is 0, yoctoLock generates a random number less than some MAX_LOCK_NAME.
It sets the value of yoctoDatabase[instance].lock to this number and also sets this as the value stored where lock pointer
points.
yoctoUnlock checks if the value stored at lock is yoctoDatabase[instance].lock. If it isn't
then yoctoUnlock returns with an error. Otherwise, yoctoUnlock sets yoctoDatabase[instance].lock = 0 .
Now in some of the discussion above I glossed over some of the checking that will be needed by the given function.
So for yoctoExec I will say things in a little more detail.
yoctoExec checks that instance is less than
the yoctoDatabase array size, that yoctoDatabase[instance] is not NULL, that yoctoDatabase[instance].lock is not 0
and that yoctoDatabase[instance].lock is equal to the value to which lock points. If any one of these conditions
fails, then the function returns -1. Otherwise, the command pointed to by cmd is executed on yoctoDatabase[instance].
The number of bytes needed for the response that YoctoDB gives is set as the value to which responseSize points.
yoctoResult performs the same checks on instance and lock that yoctoExec does. If these succeed, then
yoctoResult returns the first numByte many characters of the response from the last command issued by yoctoExec.
Probably, the easiest way to create the command line interface is to first write the C API above and then use it
to build the command line program. We assume that the command line program when run sets up a blank database
using yoctoInit, and then the user issues commands on this database.
There are two possible command types that can be sent to yoctoExec: control instructions and assignment instructions.
A ';' is used to indicate the end of either kind of command. If the yoctoExec interpreter begins parsing a command but gets to the end of cmd without seeing a semicolon to terminate it then yoctoExec should return an error. For all commands whitespace is ignored.
Here are the possible control instructions and their function:
\import filename; -- reads the file filename executing each instruction in it in turn on the current database.
\export filename; -- writes to the file filename commands to create each of the tables in the current database.
\list; -- prints a list of the names of all the tables in the current database.
\rows tablename; -- prints tablename newline, then prints out each row contained in tablename. By convention, we assume the first row of the table contains column headings for each of the columns of the table.
\forget tablename; -- deletes tablename from the current database.
\quit; -- causes yoctoExec to return the value 1. If the command line interface program receives this value it knows it should terminate.
Assignment instruction make use of table names, strings, numbers, and variables. A table name is a dollar sign followed by 1 or more alphanumeric characters. For example, $0, $person, $employee122, etc. A string begins with a double quote and continue
until the next un-escaped double quote. An escaped double quote looks like \". A number consists of 1 or more of the digits 0-9. A variable consists of a capital letter followed by an optional alphanumeric string. For example, X, HiThere, Bo2b.
An assignment instruction has a left hand side followed by "=" followed by a right hand side. The left hand side of an assignment, ignoring whitespace, is always a table name. If this table name already exists, then the instruction fails. The semantics of an assignment instruction is always to create a new table of the left hand side table name using the contents obtained from evaluating the right hand side of the assignment. There are four legal right hand side types:
-
A list of n-tuples. Here an n-tuple consists of an open parenthesis followed by n comma separated strings, followed by a close parenthesis. An example of an assignment using this case is:
$employee = ("First Name", "Last Name", "Age") ("Chris", "Pollett", "21+");
- A table name followed by an operation followed by another table name. An operation can be one of -, +, *. "-" means return all the rows in the first table that are not in the second table, "+" means return the union of the rows of the two tables, "*" means table the cartesian product of the rows in the two tables. An example of an assignment using this case is:
$mytable = $table1 + $table2;
- A table name followed by an open square bracket followed by a comma separated list of numbers followed by a close square bracket.
This purpose of this operation is to allow one to create a new table by projecting columns out of an old table. To calculate this right hand side one cycles through each row of table name, outputting a new row whose first column is the column from the original row given by the first number in the list between square bracket; whose second column is the column from the original row given by the second number that appeared between square brackets, and so on. For example, after the instructions
$employee = ("First Name", "Age", "Last Name") ("Chris", "21+", "Pollett");
$fullName = $employee[3, 1];
$fullname would be the table ("Last Name", "First Name")("Pollett", "Chris").
-
A table name followed by an open parenthesis followed by a comma separated list of variables or strings followed by a close parenthesis. The purpose of this operation is to select rows from the table that match the pattern of variables and strings. To calculate this, one cycles through each row of table name. For a given row r, one checks each column. If in the right hand side expression
that column was a string s, we check if that column in r also has value s. If not, we skip to the next row. If in the right hand side expression that column was a variable, then if we haven't bound that variable we assign it the value (for the rest of the row only) the value given by that column in r. If we have bound that variable we check if the value matches the value of that column in r. If not, we skip to the next row. If, on the other hand, all columns in r match the right hand side expression pattern, then r is put into the output.
For example, after:
$setup =
("a","b",c","d", "e") ("a","b","d","a","a") ("a","b","b","a","a")
("a","b","b","a","b") ("c","d","d","c","a");
$test = $setup(X, Y, Y, X, "a");
$test would be the table ("a","b","b","a","a") ("c","d","d","c","a").
This completes the description of the YoctoDB system. You should use either Lex or Flex to recognize tokens. You should
use either Yacc or Bison or write a recursive descent parser to handle parsing. In either case, you should create a file
grammar.txt containing a grammar that could be used to recognize legal YoctoDB commands. The checking that all tuples
in a list of n-tuple assignment are in fact n-tuples can't be done only using your grammar, your parse has to verify this
itself, so you don't worry about this in the CFG in grammar.txt.
Point Breakdown
Departmental C++ coding guidelines for your C program followed |
1pt
|
grammar.txt is as described |
1pt
|
Makefile has targets as described |
1pt
|
yoctodb.h as described and apitest, tests each of these functions. |
2pts
|
From the yoctoDB command line, the grader was able to verify each of the
6 control instructions and 4 types of assignment intructions worked (1/2 point each). |
5pts
|
Total | 10pts |
|