Introduction

Last Monday, we went over the syllabus for the class and then started a brief intro lecture on databases.
We said a database was a collection of related data where data is a collection of known fact with some explicit meaning.
We said databases are a kind of mini-world of information used to represent some facet of the real world and have an intended purpose.
For example, one might have a database concerning employees in a company. The database serves as an easy to look up memory of certain characteristics of employees.
We said when changes in the real world happen, these are reflected in the database. When employees are hired or fired, when employees get raises, etc., these events should be recorded in the database.
We said a database management system (DBMS) is used to create, provide access to, and maintain a database.
We then gave a concrete example of a student courses databases.
Today, we start by briefly describing some characteristics of databases before giving a short history of how database systems developed...

Characteristics of Database Systems

A database system is self-describing -- a DBMSs uses a database to keep track of what databases are stored in it, what tables they have, etc.
A database system provides insulation between programs and data. It supports data abstraction (conceptual representation of the data).
A database system supports for multiple views of the data.
Often a database system allows for sharing of data and multi-user transaction processing.

Database Administrators (DBAs)
Database Designers
End user
- Casual end users (managers)
- Naive or parametric users (data entry)
- Sophisticated End-Users (engineers, scientists)
- Stand-alone users
And more
- DBMS designers
- Tool developers
- Operators and maintenance personnel.

Suppose you need to efficiently maintain a list of Employee records on your computer's filesystem using only Java.
A record needs to hold: (Employee ID, First Name, Last Name, Data of Birth, Address).
You need to be able to support the following operations:
1. Add a new record
2. Look up an existing record
3. Show all records in sorted order
Suggest how you would implement a class to hold a single record.
What classes and methods you would use to read and write records as well as how you serialize, deserialize the record?
Finally, give a high-level description of how you could implement the list of records on the filesystem so as to be able to do the three operations above.
I am not expecting complete code, just English descriptions that might involve Java class names, methods, etc.
Please post your solution to the Aug 23 In-Class Exercise Thread.

The first important applications of DBMSs were ones where data was composed of many small items, and many queries or manipulations of these items were made.
For example:
1. Banking systems: maintaining accounts and making sure that system failures did not cause money to disappear.
2. Airline reservation systems: these also need to make sure data won't be lost, and they must accept very large volumes of small actions by customers.
3. Corporate Record Keeping: employment and tax records, inventories, sales records, and a great variety of other types of information.
Early systems required programmers to visualize data much as it was stored, either in tree-like hierarchies (hierarchal model) or in simple directed acyclic graphs (network model).
This limited the kinds of queries that could be performed on the data.

The foundations of the relational model appeared in a paper by Ted Codd in 1970.
Codd proposed that a DBMS should present the user with a view of data organized as tables called relations.
This notion was abstracted away from the low level details of how the records in the relations/tables are stored.
Storage might include data structures that allowed rapid response to a variety of queries.
Queries could be expressed in a high level language which greatly increased the efficiency of database programmer.
Codd's paper motivated the development of several early systems such as IBM's System R, Berkeley's Ingres, and later Oracle, such that, by the 1990s, most DBMSs were based on the relational model.

Originally, DMBSs were large expensive systems running on large computers.
As time has progressed, both the memory and processing capabilities needed for DBMSs became available on cheaper and cheaper machines.
Today, all but the cheapest cell phones, all modern browsers, etc run some form of a database (usually Sqlite, a relational database developed in the 2000s).
This is used to keep track of persistent information in apps such as user preferences, etc.
During the same time period there has been a rise in the quantity of semi-structured data. Semi-structured data is data partially split into fields and attributes or nested attributes. For example, web data in HTML, JSON, or XML.
During this time period many relational systems started to incorporate ways to handle this kind of data natively both for manipulation and for querying.

Another trend of the last several decades is for the quantity of data stored by systems to grow.
In the 1980s and 1990s a gigabyte (`10^9` bytes) of DBMS data was considered a lot. Now systems with terabytes of data are common (`10^12` bytes), and large corporate systems might need to deal with petabytes (`10^15` bytes) or even exabytes of data (`10^18` bytes).
For example:
1. A billion web pages can safely be stored in less than 20 terabytes of web pages.
2. Google stores between 10-20 billion web pages and needs petabytes of memory to handle all of the indices used for querying of this data.
3. Satellites send down petabytes of information daily into specialized databases for geographical information.
4. Picture sites like flikr, 4chan, etc store millions of pictures each of several thousands of bytes. Even Amazons has databases of over a million pictures.
5. An hour of video takes around a gigabyte. YouTube holds about 1.2 billion of such videos and needs to make them available.
6. Peer-to-peer systems and content delivery networks use large networks of conventional computers to store and distribute data of various kinds.
Often the systems above are so large that it is more efficient to use modification to the basic relational model to handle the data they contain.
This led to the development of various NoSQL databases.