Chris Pollett >
Students > [Bio] [Del1] [Del2] [Del3] |
Adding new server Functionality to postgresDescription: This project implements the XML Parser & the functionality to insert parsed data into a PostgreSQL DB using a C Based UDF. The file that is parsed and data for which is inserted is test.xml To compile, $ make clean; make To test, do the following: $ psql testdb psql=> \i proj1.sql This will display the parsed & inserted XML data. This deliverable serves the purpose of adding a new functionality to the already existing postgres database. The newly added functionality enables us to extract information (first name, last name, street address, state and country) from an xml file (provided as input to the user. Information from the xml file is extracted and shredded into the database. Example: Input : ------- <address-book> <entry> <person> <first>Basil</first> <last>Elton</last> </person> <street>North Point Lighthouse</street> <state>CA</state> <country>USA</country> </entry> <entry> <person> <first>Elton</first> <last>John</last> </person> <street>Monroe Street</street> <state>NY</state> <country>USA</country> </entry> <entry> <person> <first>John</first> <last>Grisham</last> </person> <street>bailey Street</street> <state>England</state> <country>UK</country> </entry> <entry> <person> <first>John R.</first> <last>Legrasse</last> </person> <street>121 Bienville St.</street> <state>LA</state> <country>USA</country> </entry> </address-book> Output: -------- $ psql testdb Welcome to psql 8.1.4, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit testdb=# \i proj1.sql DROP FUNCTION CREATE FUNCTION DROP TABLE CREATE TABLE INSERT 0 4 fname | lname | saddr | state | cntry ---------+----------+------------------------+---------+------- Basil | Elton | North Point Lighthouse | CA | USA Elton | John | Monroe Street | NY | USA John | Grisham | bailey Street | England | UK John R. | Legrasse | 121 Bienville St. | LA | USA (4 rows) As observed in the above example, the data (addresses) from the XML file have been shredded and inserted to the table. The information being observed above is the output of the Select statement from the table tblP. The below code consists of 2 important modules, proj1.sql and proj1.c proj1.sql is used to perform database related activities, for eg : Creation of tables, functions and insertion of data into the same. Most of the functionality is on proj1.c ; this c program parses the xml document, stores data into a binary tree structure. later data is extracted from this binary tree and passed to the calling sql function, which inserts the same into the database. Interaction between the .c file and .sql file is performed via shared library (created from the .c file). Makefile is the file which consists of instructions as to how to compile the .c file and create a shared library from the same. This shared library is used be the .sql file to obtain the functionality which is present in the .c file. -------------- proj1.c -------------- #include <stdio.h> #include "xmlparse.h" #include "xml_node.h" #include <string.h> #include <ctype.h> struct xml_node* curr_node = NULL; struct xml_node* head_node = NULL; struct addr_rec* rec_list = NULL; struct addr_rec* head_rec_list = NULL; #define MAX_STR_LEN 100 void delete_xml_tree(struct xml_node* head_node); void delete_addr_rec(struct addr_rec* rec_list); //Uncomment the if 0 below & the corresponding #elsif for // testing the XML driver parser. //#if 0 #include <postgres.h> #include <string.h> #include <fmgr.h> #include <funcapi.h> PG_FUNCTION_INFO_V1(get_xml_data); PG_FUNCTION_INFO_V1(str_example); Datum get_xml_data(PG_FUNCTION_ARGS) { FuncCallContext *f_ctx; MemoryContext oldC; TupleDesc tup_desc; AttInMetadata *attinmeta; int call_cntr; int max_calls; struct addr_rec* xml_addr = NULL; text* t; int32 fsize; if (SRF_IS_FIRSTCALL()) { f_ctx = SRF_FIRSTCALL_INIT(); oldC = MemoryContextSwitchTo(f_ctx->multi_call_memory_ctx); f_ctx->max_calls = PG_GETARG_UINT32(0); t = PG_GETARG_TEXT_P(1); fsize = VARSIZE(t) - VARHDRSZ; create_xml_tree((char *)VARDATA(t), fsize); if (get_call_result_type(fcinfo, NULL, &tup_desc) != TYPEFUNC_COMPOSITE) { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("function returning record called" "in contect that cannot accept" "type record"))); } attinmeta = TupleDescGetAttInMetadata(tup_desc); f_ctx->attinmeta = attinmeta; f_ctx->user_fctx = (void *)head_rec_list; MemoryContextSwitchTo(oldC); } f_ctx = SRF_PERCALL_SETUP(); call_cntr = f_ctx->call_cntr; max_calls = f_ctx->max_calls; attinmeta = f_ctx->attinmeta; xml_addr = (struct addr_rec *)f_ctx->user_fctx; if (call_cntr < max_calls) { char **values; HeapTuple tuple; Datum result; values = (char **)palloc(5* sizeof(char *)); values[0] = (char *)palloc(MAX_STR_LEN * sizeof(char)); values[1] = (char *)palloc(MAX_STR_LEN * sizeof(char)); values[2] = (char *)palloc(MAX_STR_LEN * sizeof(char)); values[3] = (char *)palloc(MAX_STR_LEN * sizeof(char)); values[4] = (char *)palloc(MAX_STR_LEN * sizeof(char)); snprintf(values[0], MAX_STR_LEN, xml_addr->fname); snprintf(values[1], MAX_STR_LEN, xml_addr->lname); snprintf(values[2], MAX_STR_LEN, xml_addr->street); snprintf(values[3], MAX_STR_LEN, xml_addr->state); snprintf(values[4], MAX_STR_LEN, xml_addr->country); tuple = BuildTupleFromCStrings(attinmeta, values); result = HeapTupleGetDatum(tuple); pfree(values[0]); pfree(values[1]); pfree(values[2]); pfree(values[3]); pfree(values[4]); pfree(values); f_ctx->user_fctx = (void *)(xml_addr->next); SRF_RETURN_NEXT(f_ctx, result); } else { //delete_xml_tree(head_node); //delete_addr_rec(head_rec_list); SRF_RETURN_DONE(f_ctx); } } //#endif void delete_xml_tree(struct xml_node* head_node) { unsigned int idx = 0; if (head_node == NULL) return; if (head_node->nchilds == 0) { if (head_node->name != NULL) { free(head_node->name); head_node->name = NULL; } if (head_node->value != NULL) { free(head_node->value); head_node->value = NULL; } free(head_node); return; } while (head_node->child[idx] != NULL) { delete_xml_tree(head_node->child[idx]); head_node->nchilds --; idx ++; } } void delete_addr_rec(struct addr_rec* rec_list) { struct addr_rec* next_node = NULL; if (rec_list == NULL) return; next_node = rec_list->next; free(rec_list); rec_list = NULL; delete_addr_rec(next_node); } int is_empty(const char* str, int len) { unsigned int idx = 0; while (idx < len) { if (!isspace(str[idx])) return 0; idx ++; } return 1; } void charHandle(void *userData, const XML_Char *s, int len) { char* my_val; if (is_empty((const char *)s,len)) return; my_val = (char *)malloc((len+1)*sizeof(char)); memcpy(my_val, s, len); my_val[len] = '\0'; curr_node->value = my_val; } struct xml_node* create_node(const char* name, int depth) { unsigned int idx = 0; struct xml_node* new_node; char* my_nm = (char *)malloc((strlen(name) + 1)*sizeof(char)); strncpy(my_nm, name, strlen(name) + 1); my_nm[strlen(name)] = '\0'; new_node = (struct xml_node *)malloc(sizeof(struct xml_node )); memset(new_node, 0, sizeof(struct xml_node)); new_node->depth = depth; new_node->prev = NULL; for (idx = 0; idx < MAX_CHILD ; idx ++) { new_node->child[idx] = NULL; } new_node->name = my_nm; new_node->nchilds = 0; return new_node; } void startElement(void *userData, const char *name, const char **atts) { struct xml_node* tmp_node = NULL; if (curr_node == NULL) { curr_node = create_node(name, 0); head_node = curr_node; } else { tmp_node = create_node(name, (curr_node->depth + 1)); curr_node->child[curr_node->nchilds] = tmp_node; curr_node->nchilds ++; tmp_node->prev = curr_node; curr_node = tmp_node; } } struct addr_rec* create_addr_rec(void) { struct addr_rec* tmp = NULL; tmp = (struct addr_rec *)malloc(sizeof(struct addr_rec)); tmp->fname = NULL; tmp->lname = NULL; tmp->street = NULL; tmp->state = NULL; tmp->country = NULL; tmp->next = NULL; return tmp; } void endElement(void *userData, const char *name) { //printf("DEPTH[%u]:ELEM:[%s]=%s\n", curr_node->depth, curr_node->name, curr_node->value); if (rec_list == NULL) { rec_list = create_addr_rec(); head_rec_list = rec_list; } if (curr_node->nchilds == 0) { char* val = curr_node->value; if (rec_list->fname == NULL) { rec_list->fname = val; } else if (rec_list->lname == NULL) { rec_list->lname = val; } else if (rec_list->street == NULL) { rec_list->street = val; } else if (rec_list->state == NULL) { rec_list->state = val; } else if (rec_list->country == NULL) { rec_list->country = val; } else if (rec_list->next == NULL) { rec_list->next = create_addr_rec(); rec_list = rec_list->next; rec_list->fname = val; } } curr_node = curr_node->prev; } void print_xml_tree(struct xml_node* head_node) { unsigned int idx = 0; if (head_node == NULL) return; if (head_node->name != NULL) { printf("DEPTH = %u NAME = %s", head_node->depth, head_node->name); } if (head_node->value != NULL) { printf("VALUE = %s\n", head_node->value); } if (head_node->nchilds == 0) { return; } printf("\n"); fflush(stdout); for (idx = 0; idx < head_node->nchilds; idx ++) { print_xml_tree(head_node->child[idx]); } } void print_addr_rec(struct addr_rec* ll_node) { if (ll_node == NULL) return; printf("FIRST NAME: %s: " "LAST NAME: %s: " "STREET: %s: " "CITY: %s: " "COUNTRY: %s\n", ll_node->fname, ll_node->lname, ll_node->street, ll_node->state, ll_node->country); fflush(stdout); print_addr_rec(ll_node->next); } void create_xml_tree(char* fname, int size) { char buf[BUFSIZ]; int done; XML_Parser parser = XML_ParserCreate(NULL); XML_SetElementHandler(parser, startElement, endElement); XML_SetCharacterDataHandler(parser, charHandle); fname[size] = '\0'; FILE *fp = fopen(fname, "r"); /* ereport(INFO, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("FILE :%s:", fname))); FILE *fp = fopen("/home/hlthantr/proj_one/test.xml", "r"); */ do { size_t len = fread(buf, 1, sizeof(buf), fp); done = len < sizeof(buf); if (!XML_Parse(parser, buf, len, done)) { fprintf(stderr, "%s at line %d\n", XML_ErrorString(XML_GetErrorCode(parser)), XML_GetCurrentLineNumber(parser)); return; } } while (!done); fclose(fp); XML_ParserFree(parser); return; } ---------------- proj1.sql ---------------- DROP FUNCTION get_xml_data(integer, text); CREATE OR REPLACE FUNCTION get_xml_data(IN integer, IN text, OUT f6 VARCHAR, OUT f7 VARCHAR, OUT f8 VARCHAR, OUT f9 VARCHAR, OUT f10 VARCHAR) RETURNS SETOF record AS '/home/hlthantr/proj_one/libproj1', 'get_xml_data' LANGUAGE C IMMUTABLE STRICT; DROP TABLE tblP; CREATE TABLE tblP(fname VARCHAR, lname VARCHAR, saddr VARCHAR, state VARCHAR, cntry VARCHAR); INSERT INTO tblP VALUES ((get_xml_data(4, '/home/hlthantr/proj_one/test.xml')).f6, (get_xml_data(4, '/home/hlthantr/proj_one/test.xml')).f7, (get_xml_data(4, '/home/hlthantr/proj_one/test.xml')).f8, (get_xml_data(4, '/home/hlthantr/proj_one/test.xml')).f9, (get_xml_data(4, '/home/hlthantr/proj_one/test.xml')).f10 ); SELECT * FROM tblP; -------------- makefile -------------- CC=gcc # If you know what your system's byte order is, define XML_BYTE_ORDER: # use -DXML_BYTE_ORDER=12 for little-endian byte order; # use -DXML_BYTE_ORDER=21 for big-endian (network) byte order. # -DXML_NS adds support for checking of lexical aspects of XML namespaces spec # -DXML_MIN_SIZE makes a smaller but slower parser # -DXML_DTD adds full support for parsing DTDs #PGINC= #PGLIB=. PGINC=/usr/local/pgsql/include/server PGLIB=/usr/local/pgsql/lib CFLAGS=-Wall -O2 -Ixmltok -Ixmlparse -I$(PGINC) -DXML_NS -DXML_DTD -fPIC AR=ar CC=gcc LIBOBJS=xmltok/xmltok.o \ xmltok/xmlrole.o \ xmlparse/xmlparse.o \ proj1.o LIB=libproj1.so all: $(LIB) $(LIB): $(LIBOBJS) $(CC) -shared -o $(LIB) $(LIBOBJS) clean: rm -f $(OBJS) $(LIBOBJS) $(LIB) xmltok/nametab.h: gennmtab/gennmtab$(EXE) rm -f $@ gennmtab/gennmtab$(EXE) >$@ gennmtab/gennmtab$(EXE): gennmtab/gennmtab.c $(CC) $(CFLAGS) -o $@ gennmtab/gennmtab.c xmltok/xmltok.o: xmltok/nametab.h .c.o: $(CC) $(CFLAGS) -c -o $@ $< ------------------------------ Compilation of the .c file ------------------------------ make -f Makefile clean make -f Makefile --------------------------- Execution of proj1.sql --------------------------- psql testdb (Where testdb is the database which had been created) /i proj1.sql |