To protect against media failures, we want to keep an archive.
That is, we want to maintain a copy of the database separate from the database itself.
There are two levels of archiving:
A full dump, in which the entire database is copied.
An incremental dump, in which only those database elements changed
since the previous full or incremental dump are copied.
Archiving is different from just backing up the log file, since if we did the latter,
then over time we would need to store way too much data.
They are connected though. To restore, we will generally use the most recent full
archive together with subsequent incremental dumps and the log file since the last archive.
Non-quiescent Archiving
As we cannot shut the database down while archiving,
a non-quiescent dump tries to capture the database as it was
at the start of the dump even though transactions continue to be processed.
We assume redo or undo/redo logging is being used.
The steps to do an archive are:
Write a `langle START\ DUMP rangle` record.
Perform a check point.
Perform a full or incremental dump of the data disks as desired,
copying the data in some fixed order, so don't copy the same info twice and eventually finish copying.
Copy enough of the log that it includes at least the prefix of the log up to an including
the end of item (2).
Write a log record `langle END\ DUMP rangle`. We can now throw away old log from
last archive to previous checkpoint.
Recovery Using an Archive and a Log
To restore the database from the archive involves:
Finding the most recent full dump and reconstruct the database from it.
If there are later incremental dumps, modifying the database according to each, earliest first.
Modifying the database using the surviving log. This in turn involves,
using the method of recovery suited to the type of logging being used.
Quiz
Which of the following statements is true?
If `langle T,X,v rangle` is an undo log record, then `v` is the new value for `X`
`langle T,X,v rangle` is a redo log record, then `v` is the new value for `X`.
The second step of nonquiescent, Undo/Redo checkpointing is to wait until all the transaction that were active
at the start of the checkpoint commit or abort.
Concurrency Control
Interactions between transactions can cause the database to become inconsistent even
when the transactions individually preserve the correctness of the database state.
This is because transactions can interleave their actions.
The job of figuring out which operation of which transaction is performed next
is done by the database scheduler.
The process of ensuring in such a concurrent set up that the database
stays in a consistent state is called concurrency control.
We are now going to study conditions which guarantee database consistency.
Schedules
A schedule is a time-ordered sequence of the important actions taken
by one or more transactions.
We are interested in reads and write and not in outputs.
A schedule is said to be a serial schedule if all of its actions consist of all
the actions of one transaction, followed by all the actions of another transaction,
etc. without interleaving of transaction operations.
The example of the last slide was a serial schedule.
If each transaction maps the database from a consistent state to a consistent state,
then a serial schedule will map the database from a consistent state to a consistent state.
Serializable Schedules
Serial schedules don't allow two transactions to be working on the DB at the same time.
So we want a better notion of a good schedule so that we can get better concurrency.
A serializable schedule is a schedule whose effect on the database is the same as some serial schedule.
has the same effect on the database as the schedule a couple slides back.
Usually, we don't record the local variables of a transaction when we write our
schedules to keep them simple. i.e., we write `W(A)` for `W(A,t)` and we wouldn't write actions like `t:=t+100`.
Conflict Serializable
Serializable is still too general, and it is hard to ensure a schedule is serializable.
We will next look at a weaker notion, which will imply serializable, allows some concurrency,
but maybe allows less concurrency than serializability.
We say a pair of operations `O_1,...,O_2` in a schedule from two different transactions `S` and `T` do not conflict if any of the following hold:
They are both reads.
They are `R_S(X)` and `W_T(Y)` and `X` is not equal to `Y`.
They are `W_S(X)` and `R_T(Y)` and `X` is not equal to `Y`.
They are `W_S(X)` and `W_T(Y)` and and `X` is not equal to `Y`.
Otherwise, the two actions are said to conflict.
Two schedules are conflict equivalent if they can be turned into each
other by swapping non-conflicting transactions.
A transaction is conflict-serializable if it is conflict equivalent to a serial schedule.
Testing for Conflict Serializability
From our definition of conflicting operation, two operations from different
transactions are in conflict if they involve the same database element and one of the operations is a write.
We can use conflicts to make a graph based on a schedule.
We put an edge `T_i -> T_j` in the graph, if there is an operation
`O_i` of `T_i` in the schedule which is in conflict with operation `O_j` of `T_j` in the schedule
and `O_i` occurs before `T_j`.
The schedule will be conflict serializable iff this precedence graph is acyclic.
If it the graph has a topological ordering, the schedule given by this ordering will be conflict
equivalent to the original schedule. This topologically ordered schedule will be a serial one.
To topologically order the graph, we repeatedly take out of the graph those elements
that have no predecessors and put them in our topological order.
As it has a cycle, so it is not conflict serializable.
If we deleted the operation, `W_1(Y)`, it would have been conflict serializable.
Serializable versus Conflict Serializable
Conflict serializable implies serializable.
However, serializable does not imply conflict serializable.
For example,
`W_1(Y)`, `W_1(X)`, `W_2(Y)`, `W_2(X)`, `W_3(X)`
is a serial schedule which has the same effect on the database as
`W_1(Y)`, `W_2(Y)`, `W_2(X)`, `W_1(X)`, `W_3(X)`,
but this latter is not conflict serializable.