A relational database is a collection of related tables stored in secondary memory.
The rows of a table represent entities or events in some application domain. The columns of a table represent row attributes.
Assume the University of Northern California (UNC) wants to keep track of its students, courses, and instructors. Here's a simple domain model for UNC:
Corresponding to these classes, the UNC database might contain STUDENT, COURSE, and INSTRUCTOR tables.
The rows of the STUDENT table correspond to students enrolled at the university. The columns correspond to attributes such as student id number (SID), last name, first name, phone number, GPA, etc:
The rows of the INSTRUCTOR table correspond to the UNC instructors:
The rows of the COURSES table correspond to sections of courses taught at UNC:
An attribute with a unique value, such as the student ID number, is called a key. A key can be used in one table to refer to a row in another table. For example, the values in the COURSES.INSTRUCTOR field correspond to the values in the INSTRUCTORS.IID field. When a key from one table appears in another table it is called a foreign key. So INSTRUCTOR.IID and COURSES.CID are keys, while COURSES.INSTRUCTOR is a foreign key.
We can also create a link table that links together the keys from two different tables. For example, to keep track of which courses students are taking, we can create a table called TAKES that associates SIDs to CIDs:
Note that the rows of this table don't contain unique keys. Instead they contain only foreign keys.
The type of a table is called a schema. A schema specifies the names and types of each column of a table.
For example, the STUDENTS schema is:
STUDENTS(
SID: INTEGER {pk},
LNAME: VARCHAR(30),
FNAME: VARCHAR(20),
PHONE: CHAR(10)
GPA: DECIMAL(3, 2))
The ISO SQL standard defines six possible column types:
characer
CHAR(N) // filled with blanks to pad it out to N
VARCHAR(N)
bit
BIT(N)
exact numeric
INTEGER or INT
SMALLINT
DECIMAL(P, S) // P = precision = #digits, S = scale = .#digits
NUMERIC(P, S)
approximate numeric
FLOAT(P)
REAL
DOUBLE PRECISION
datetime
DATE
TIME ...
TIMESTAMP ...
interval
INTERVAL ...
The possible constraints include:
UNIQUE {ck} = candidate key, all values must be unique
PRIMARY KEY {pk} = a unique column containing a candidate key
FOREIGN KEY {fk} = a primary key from another table
NOT NULL {nn} = values for this column can't be null
A relational database management system (RDBMS) is a program that manages a collection of users and their relational databases. Examples of RDBMS include Oracle and MySQL.
Think of an RDBMS as a server with application programs as clients. In a typical interaction an application logs in, then asks the RDBMS to execute an SQL (Structured Query Language) command. These commands may be requests for the RDBMS to search or update the database. In the case of a search, the RDBMS returns all of the data that fit the search criteria:
SQL is an ISO standard language for searching and updating relational databases.
SQL is divided into the Data Definition Language (DDL) and the Data Manipulation Language (DML):
SQL = DDL + DML
DDL provides commands for creating and dropping (i.e., deleting) databases and tables.
DML provides commands for commands for inserting, deleting, and updating rows in a table.
DML also provides the SELECT command for searching the entire database by performing combinations of selections, projections, and joins.
The SELECT command allows users to select all rows from a table that satisfy a specific condition:
SELECT * FROM STUDENTS WHERE GPA > 3.0
The SELECT command allows users to select specific columns from a table:
SELECT LNAME, FNAME FROM STUDENTS
The product of two tables, T1 and T2, is a temporary table denoted T1 x T2. The rows of T1 x T2 consist of every possible row of T1 appended to every possible row of T2. Thus, if T1 contains 3 rows and T2 contains 5 rows, then T1 x T2 will contain 15 rows.
The (inner) join of two tables, T1 and T2, is a temporary table denoted T1 >< T2.
T1 >< T2 is got by selecting from T1 x T2 all rows that satisfy a join condition.
For example, executing the SELECT command:
SELECT TITLE, LNAME FROM COURSES, INSTRUCTORS WHERE COURSES.INSTRUCTOR = INSTRUCTORS.IID
creates the projection of the join table:
Note that the result of executing a SELECT command is a table. The table produced is not necessarily one of the original base tables in the database. For example, the table produced by the previous SELECT command is not among the four base tables: STUDENTS, COURSES, INSTRUCTORS, and TAKES. We call the tables created by executing SELECT commands virtual tables or views.
There is some correspondence between relational databases and object-oriented models, but the object-oriented paradigm is newer hense richer than the relational paradigm, so OO models contain features that don't correspond neatly to features in a relational database:
RDB OOM
table class
row object
attribute attribute
key OID, reference, or pointer
??? method
??? inheritance
??? polymorphism
??? encapsulation
Programmers and programs must spend a lot of time translating between these paradigms. The energy wasted by these translations is called impedance mismatch.
Impedance mismatch can be mitigated in several ways.
Although not part of the ISO SQL standard, there are several widely used libraries that provide programmers with an SQL interface that can talk to any relational database.
Typically, such an interface might provide an implementation of Session and RecordSet interfaces:
Here is a Java elaboration:
interface Session {
boolean connect(
String serverLocation, // URL or IP
address of RDBMS
String userID, String password,
String dbase); // = true if
connection succeeds
void disconnect();
RecordSet execute(String sqlCommand) {
... }
}
A record set is an object representing the virtual table returned by executing a SELECT command. It provides methods for accessing individual entries:
interface RecordSet {
int getNumRows();
int getNumCols();
int getColType(int col);
String getEntry(int row, int col);
}
The most widely used API is ODBC (Open Database Connectivity). This is a language-independent, procedural API.
The platform-independent Java implementation is JDBC (Java Database Connectivity).
The Gateway Pattern is one of several patterns aimed at wrapping the interface to the database into a small number of classes.