Checkpointing

So far the recovery techniques we've considered require that the entire log file be looked at.
Even if a transaction has written a COMMIT to the log one might still need to look at its operations during a recovery because several other transaction might be running at the same time.
To simplify the issue we can checkpoint the log periodically.
This involves doing the following:
1. Stop accepting new transactions
2. Wait until all current transactions commit or abort and have written the COMMIT or ABORT to the log.
3. Flush the log to disk.
4. Write a log record `langle CKPT rangle` and flush the log again.
5. Resume accepting transactions.
If checkpoints are used then there is no need to look in the log file prior to the last checkpoint.

Nonquiescent Checkpointing

The problem with checkpointing is that we effectively stop the system until all current operations have committed or aborted.
Nonquiescent checkpointing is a technique that avoids this bottleneck. The steps to do this are to:
1. Write a record `langle START\ CKPT(T_1,...,T_k) rangle` and flush log. Here `T_1,..., T_k` are the active transactions.
2. Wait until all of `T_1, ..., T_k` commit or abort, but allow new transactions to start.
3. When all of `T_1,..., T_k` have written COMMIT or ABORT then write `langle END\ CKPT rangle` to the log.
If this kind of checkpoint is being used and a crash occurs, then we look backwards through the log for the first `langle START\ CKPT(T_1,...,T_k) rangle` or `langle END\ CKPT rangle`.
If we see an `langle END\ CKPT rangle`, we know we only have to consider after the `langle START\ CKPT(T_1,...,T_k) rangle`.
If we see a `langle START\ CKPT(T_1,...,T_k) rangle` but no `langle END\ CKPT rangle`, then we need to only consider after the transactions `T_1, ..., T_k` began.

In-Class Exercise

`I_i`, `O_i`, `R_i`, `W_i`, etc, indicate that transaction `i` did the input, output, read, write, etc.
Consider the sequence of operations and log records:
`langle START\ T_1 rangle`, `I_1(A)`, `R_1(A,t_1)`, `t_1:=t_1*2`, `langle START\ T_2 rangle`,
`langle T_1, A, 4 rangle`, `W_1(A,t_1)`, `langle START\ T_3 rangle`, `I_2(B)`, `R_2(B,t_2)`, `t_2:= t_2*2`,
`langle T_2, B, 12 rangle`, `W_2(B,t)`, `langle T_3, C, 1 rangle`, `W_3(C,5)`, `O_1(A)`, `langle COMMIT\ T_1 rangle`, `O_2(B)`.
Suppose we started a non-quiescent checkpoint right after the `R_1(A,t_1)`, what transactions would be listed in the start checkpoint?
When would be the earliest that an end checkpoint could be written?
If an end checkpoint was written at that location, and if a crash occurred immediately after the operations above, what would be undone during recovery?
Which of these undo operations would actually affect something on disk?
Post your solutions to the Apr 4 In-Class Exercise Thread.

Redo Logging

A problem with undo logging is that we cannot commit a transaction without first writing all its changed data to disk. This might cost us I/Os.
This requirement can be avoided using redo logging.
The principle differences between undo and redo logging are:
1. Undo logging cancels the effects of incomplete transactions and ignores committed ones; redo logging ignores incomplete transactions and redoes committed ones as necessary.
2. Undo logging requires changed DB elements to be written to disk prior to the commit log record being written to disk; redo logging requires the COMMIT appear on disk before any changed values reach the disk.
3. We had rules `U1` and `U2` for undo logging to guarantee undo logging worked. We will have a redo log rule `R1` which replaces these two rules.

The Redo Logging Rule

(R1) [Also called the write-ahead logging (WAL) rule] Before modifying X on disk, all log records pertaining to this modification of `X` (that is, both the update record `langle T, X, v rangle` and the `langle COMMIT\ T rangle` record) must appear on disk.
Here `v` is now the new value not the old value as in undo logging.
Redo logging gives rise to the following order for writing things to disk:
1. Log records of changed DB elements.
2. A Commit record.
3. The Changed DB elements themselves

Recovery with Redo Logging

To recover when using redo logging, we:
1. Identify the committed transactions.
2. Scan the log forward from the beginning. For each log record `langle T, X, v rangle` encountered:
  1. If `T` is not a committed transaction, do nothing.
  2. If `T` committed, write the value for database element `X`.
3. For each incomplete transaction `T`, write an `langle ABORT\ T rangle` record to the log and flush the log.

Checkpointing a Redo Log

Checkpointing can also be done with redo logs. The steps are:
1. Write a log record `langle START\ CKPT(T_1,...,T_k) rangle`, where `T_1,..., T_k` are the active transactions.
2. Write to disk all database elements that were written to buffer for transactions that had committed when the START CKPT began.
3. Write an `langle END\ CKPT rangle` record.
To recover with a checkpointed redo log, look for last record of type `langle END\ CKPT rangle`.
We only need to redo `T_i`'s or transaction committed after the corresponding `langle START\ CKPT(T_1,...,T_k) rangle`.

Undo/Redo Logging

Both undo and redo logging have disadvantages.
An example disadvantage of redo logging is that all modified blocks must be kept in buffers until the transaction commits and the log records have been flushed.
Both cause problems if one has a block in memory that was modified by a transaction that is ongoing and modified by a transaction that has committed.
We can try to combine the best of both worlds and use undo/redo logging.
In this set up we have the following new rules:
- (UR1) Before modifying `X` on disk because of `T`, it is necessary that the update record `langle T,X,v,w rangle` appear on disk. (`v` is the old value; `w` is the new)
- (UR2) A `langle COMMIT\ T rangle` must be flushed to disk as soon as it appears in the log.
To recover when using undo/redo logging, we:
1. Redo all the committed transactions in the order earliest first, and
2. Undo all the incomplete transactions in the order latest first.

Checkpointing an Undo/Redo Log

The steps this time are:
1. Write a `langle START\ CKPT(T_1, ..., T_k) rangle` to the log
2. Write all dirty buffers to disk.
3. Write an `langle END\ CKPT rangle`, flush the log.
To recover we can look for the last `langle END\ CKPT rangle` in the log and find its corresponding `langle START\ CKPT(T_1, ..., T_k) rangle`.
We know transactions that have committed prior to this have had all their data on disk, so don't need to be redone.
So we only need to redo committed transaction after `langle START\ CKPT(T_1, ..., T_k) rangle`.
Similarly, we only have to undo uncommitted transactions up to the `langle START\ CKPT(T_1, ..., T_k) rangle`.
We require a transaction not write any values even to memory buffers until it is certain whether or not to abort.

Logging in DBMS systems

Much of the recovery techniques currently in use in DBMS, date to IBMs System R (Gray, et al 1981) and to ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) (Mohan et al 1992) also developed at IBM.
System R did not use logging: A database was treated as a collection of pages accessed through a page table.
When a transaction changes a page, it first copies the pages to a shadow page, and then makes changes to the shadow page.
After this is done, the transaction modifies the page table to change the old page for the shadow page. When this happens the operation is viewed as "committing".
To abort a transaction its shadow pages are discarded.
This approach can lead to data fragmentation, and so was eventuallly superseded by logging.
What ARIES does is essentially what we called Undo/Redo logging with checkpointing.
It also keeps track of a dirty page table and a transaction table.
The transaction table keeps track of all the transactions that are currently running and the dirty page table keeps track of all pages that have been modified but not written to disk (makes it easier to do checkpointing).
To make it easier to scan through the logs, records have the format: `langle` Sequence Number, Transaction ID, Page ID, Redo, Undo, Previous Sequence Number `rangle`. Here Sequence number refers to log entry, and the previous sequence number gives the last entry involving the same transaction.
ARIES also adds one twist to the recovery process it also writes Compensation Log Records to record when an operation has already been undone. They have the form `langle` Sequence Number, Transaction ID, Page ID, Redo, Previous Sequence Number, Next Undo Sequence Number `rangle` and make it faster to keep going if there is a crash during a recovery.
ARIES is used by DB2, SQL Server, and InnoDB in MySQL/MariaDB. Oracle uses a variation on undo/redo logging.
Postgres uses Write Ahead Logging (i.e., redo-based recovery).
The Sqlite database command PRAGMA journal_mode = value; can be used to set the kind of logging Sqlite does. By default, its value is delete which roughly correspond to undo logging. It also support WAL.

Checkpointing, Redo, Undo/Redo Logging

Outline

Checkpointing

Nonquiescent Checkpointing

In-Class Exercise

Redo Logging

The Redo Logging Rule

Recovery with Redo Logging

Checkpointing a Redo Log

Undo/Redo Logging

Checkpointing an Undo/Redo Log

Logging in DBMS systems