Last week, we were looking at concurrency control in databases.
The idea is we have a database scheduler that receives operations from various transactions and tries to determine the order in which to execute them.
We assume that each transaction by itself (if no other transactions running) would map the database from a consistent state to a consistent state.
Fixing some order on the transactions and then executing each transaction in turn from start to completion before executing the next, maps the database from a consistent state to a consistent state. This kind of schedule for the operations we said was a serial schedule.
Typically, to maximize the use of time on our DB we want to allow our transaction operations to interleave, but we would like to have their effect on the database be as if a serial schedule had been done.
We defined various notions of serializability to achieve this.
We then showed that by using two phase locking (2PL), we could achieve conflict serializable schedules.
We looked at different variations on lock types to try to increase the allowed interleaving of DB operations while still entailing conflict serializable schedules.
We said it was the job of the lock scheduler in the DBMS to handle who gets locks for what and when (not the transaction programmer).
We then looked at the data structure the lock scheduler might use to achieve this.
We begin today by looking at the kinds of database element which might be locked in one go.
Lock Granularity
We want to be able to have hierarchies of possible locks --
from locks on only one element to a single lock for many elements.
These hierarchies will typically come in the form of trees.
For instance, a table lock would be higher in such a
tree structure than a block level lock, which is turn higher than a row lock.
As another example, we might want to be able to lock single index
entries or whole subtrees of a B-tree.
An example of the advantage of having big locks is that, if a transaction
is frequently accessing a table, it is faster to only do one lock than one for each access.
Usually, if the DBMS sees you doing multiple reads on a table, it tries to increase
the granularity of the shared lock you have.
Table Locks
One way to manage such hierarchies is to introduce new kinds of lock called warning locks (Gray, Putzolo, Traiger 1976),
`IS` and `IX`, in addition to `S` and `X`. The `I` indicates intention to acquire a lock. (The idea is similar to update locks)
Let's assume we have three levels of locks: table, block, row.
Let `T` be the tree of all locks currently held on table `S`.
Start at root of hierarchy. If we are at the element we want request either an `S` or `X` lock.
If not traverse down the tree until we get to the element we want. For each node on this path, we request an `IS` or `IX` lock.
The idea is if we want to get the lock exclusive for a row `A`, we first get an `IX` lock for the table `A` so no one can get a shared lock for
the whole table which we might want to modify. Similarly, we get the `IX` lock for the block `A` belongs to prevent someone else getting a shared lock on
the whole block.
Warning Lock Compatibility Matrix
Here's what the compatibility matrix looks like:
Lock Requested
IS
IX
S
X
Lock Held in Mode
IS
Y
Y
Y
N
IX
Y
Y
N
N
S
Y
N
Y
N
X
N
N
N
N
Phantoms and Handling Insertions Correctly
Notice in our locking scheme we can only lock items that already exist.
Consider the query where we sum the lengths of all movies made by Disney.
While we are performing this query someone might insert a row with a Disney movie.
If we repeat our query in the same transaction we might see a different sum.
This problem tuple is called a phantom tuple.
The solution to the problem is to require locking the whole table for reads or writes,
but this come with a heavy performance hit.
The Tree Protocol (Silberschatz Kedem 1980)
Standard locking protocols make it very hard to do concurrency when doing locking with B-trees.
The problem is we would typically need to lock a whole path down the tree in case
the tree changes during an update. This prevents other people from looking at the tree.
Instead, to get concurrency for B-trees, we don't use two phase locking and:
a transaction's first lock in the tree may be at any node of the tree
(typically lock first node that could be changed by your transaction.)
subsequent locks can be obtained iff the transaction has the parent lock.
nodes can be unlocked at any time
a transaction may not relock a node it has released.
It turns out if we use this protocol for our tree structures, then transaction operations will be serializable
(you can prove this by induction on the number of nodes in the tree).
Quiz
Which of the following statements is true?
Taking a non-quiescent dump involves flushing an end checkpoint log record to disk.
All serializable schedules are conflict serializable schedules.
The use of locks always guarantees serializability.
Other Concurrency Control Techniques
Besides locking there are two other common methods for doing concurrency control:
timestamping -- give each transaction and database element a timestamp and compare these
to determine if the schedule is equivalent to the serial schedule given by the transactions' timestamps. (This
idea originated in Bernstein Goodman 1980)
validation -- also uses timestamps but will yield schedules equivalent to the serial schedule
given by transaction commit times. (This idea is due to Kung and Robinson 1981)
We'll focus on the first technique.
Timestamps
We'll write `TS(T)` to denote a number(timestamp) assigned by the scheduler to transaction.
This number must be such that later transactions receive a higher number.
Such numbers could be based on the system clock or based on using a counter.
For each database element `X`, the scheduler also maintains the following numbers:
`RT(X)` -- timestamp of last `T` to read `X`,
`WT(X)` -- timestamp of last `T` to write `X`,
`C(X)` -- a boolean value saying whether the last transaction to write `X` has committed or not.
Physically Unrealizable Behaviors
Suppose each operation in a schedule executed at the time of the timestamp
assigned by the scheduler to the DB items.
Two problems could occur:
Read too late: (abort `T`)
Since `T` has an earlier timestamp than `U`, when it tries to read `X`, it should get a value written by a transaction
with an earlier timestamp then it. The scheduler in this case can look at `WT(X)` and notice that it is after `T` (as is `U`'s timestamp),
then force `T` to abort.
Write too late: (abort `T`)
Since `T` has an earlier timestamp than `U`, when it tries to write `X`, no transaction (namely `U`) with a higher timestamp should have read a value
for `X` from an earlier transaction than `T`. The scheduler in this case can look at `RT(X)` and notice that it is after `T` (as is set to `U`'s timestamp),
then force `T` to abort.
Problems with Dirty Data
The two problems of the last slide could be resolved by comparing `TS(T)` and `WT(X)` or `RT(X)`.
What is the `C(X)` used for? Consider the following sequence:
The value read by `U` is bogus since `T` aborted. This is called a dirty read.
We can check to make sure `C(X) = true`.
The scheduler maintains `C(X)` so works. The scheduler can delay `U`'s read until after `T` commits or aborts
to make a valid schedule.
Timestamp-Based Scheduling
The scheduler, when given a read or write request by `T`, can either:
Grant it,
Abort `T`,
Delay `T` until some other transaction commits or aborts.
The next slide has the exact rules.
Rules for Timestamp-Based Scheduling -- Read and Write
If the request is a `R(X)`:
If `TS(T) gt WT(X)`, the read is physically realizable.
If `C(X)` is true:
Grant the request.
If `TS(T) gt RT(X)` set `RT(X) := TS(T)`
If `C(X)` is false:
Delay `T` until either `C(X)` becomes true or the transaction that wrote `X` aborts.
If `TS(T) lt WT(X)`, the read isn't realizable.
Rollback `T`, and restart it with a larger timestamp.
If the request is a `W(X)`:
If `TS(T) ge RT(X)` and `TS(T) ge WT(X)`, the write is physically realizable and must be performed.
Set new value for `X`.
Set `WT(X) := TS(T)`
Set `C(X) := false` ...
If `TS(T) ge RT(X)` but `TS(T) lt WT(X)`, then the write is realizable but there is already a later value of `X` stored in DB.
If `C(X)` is true, then we can ignore the write request.
If `C(X)` is false, we must delay `T` until the other transaction commits or aborts.
If `TS(T) lt RT(X)`, the operation is unrealizable so it is rolled back.
Rules for Timestamp-Based Scheduling -- Commits and Aborts
If the scheduler receives a commit request from `T`:
All of the elements written by `T` have `C(X`) marked true.
Delayed transactions are notified so they can now proceed.
If `T` aborts:
Transactions waiting on `T` are notified.
These transactions can now repeat the requested
operation that was blocked by `T` to see if it is now legal.
We want to allow for greater concurrency than basic timestamping allows.
To do this we will reduce the number of reads that cause transactions to abort.
Suppose transactions arrive in the order:`T_1`, `T_3`, `T_2`, `T_4`.
Consider the schedule:
`R_1(A)`, `W_1(A)`, `R_2(A)`, `W_2(A)`, `R_3(A`), `R_4(A)`.
`T_3` must abort according to timestamping (`T_2` started later than `T_3`, but wrote `A` before `T_3` tried to read it).
However, if we had kept an old value of A around for `T_3` we might not need to abort it.
More on Multiversion Timestamps
How do we manage multiple versions of database elements?
When a write `W_T(X)` occurs, if it is legal, a new version of `X` is created. Its write time is `TS(T)` and we will call it `X_t` where `t=TS(T)`.
When a `R_T(X)` occurs, the scheduler finds the first version of `X_t` of `X` such that `t le TS(T)`, and such that there
is no `X_{t'}` with `t lt t' le TS(T)`.
Write times are now associated with versions of an element and they never change.
Each `X_t`, now has its own, independent read time, `RT(X_t)`, indicating the most recent timestamp of a transaction that read version `X_t`.
Read times are used to reject certain writes as will be indicated on the next slide.
When a version `X_t` has a time `t` such that no active transaction has a timestamp less than `t`,
then we may delete any version `X` of previous to `X_t`.
What kind of writing should be rejected?
`S` should have read `T`'s value, but reads `X_50` instead. So `T` would be aborted in this case.
Timestamps versus Locking
Basic rule of thumb:
Timestamping works better when most transactions are read-only,
or it is rare that concurrent transaction will read and write the same element.
Locking works better in high conflict situations.
The reasoning is:
Locking frequently delays transactions as they have to wait for locks and can lead to deadlocks.
If concurrent transactions have frequent reads and writes in common then timestamping
will tend to cause transactions to rollback frequently making the throughput less than with locking.
Commercial DBMS systems try to get the best of both by allowing user's to set transactions as read only.
Such transactions are then handled using multiversion timestamping, and otherwise, the DBMS uses locking
for other kinds of transactions.
Transactions and SQL
To indicate the start of a transaction in SQL we can use the command: START TRANSACTION.
Many DBMSs also allow the commands: BEGIN, BEGIN WORK, BEGIN TRANSACTION to do the same thing.
A transaction continues until either a COMMIT command or ROLLBACK command is executed.
Many DBMSs allow END or END TRANSACTION as an alias for COMMIT, and ABORT as an alias for ROLLBACK.
If you connect to a database and never do a START TRANSACTION, then the DBMS often implicitly assumes there
is a START TRANSACTION before each operation you do and a COMMIT after it is done.
To tell the DBMS about your transaction you can use the SET TRANSACTION command. For example,
SET TRANSACTION READ ONLY;
SET TRANSACTION READ WRITE;
The first line above says that your transaction should only be allowed to do reads, the second says
it can do both reading and writing.
In the first case above, as we said on the previous slide, the DBMS will typically use multiversion timestamping
to manage concurrency.
It should be noted, although Postgres supports explicit locking, by default it uses multiversion concurrency control.
SQL Isolation Levels
If a transaction is READ WRITE (DBMS will typically assume this by default), we can control the degree to which
its operations are isolated from other transactions by setting its isolation level. The possible variants on this are:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; -- In this case, the DBMS can do the operations of the transaction and
not bother with locking. The transaction `T` might read data that is not committed (dirty reads) and values that `T` reads
might be changed while `T` is still operating (unrepeatable reads). Also, phantom records can be inserted while
the transaction is operating. This level might be suitable if you wanted to compute some statistics (which you write), and where you don't care
if the statistics are perfectly accurate, but you do want them calculated quickly.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- In this case, the DBMS obtains exclusive locks before writes and holds them
until the end of the transaction. It obtains share locks before reads, but releases them immediately after the read. Given another transaction
at this level or higher, this guarantees that a read has been committed as exclusive locks are only released when a transaction ends.
It is possible to have unrepeatable reads and phantoms with this kind of locking.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- In this case, the DBMS obtains exclusive locks before writes,
and shared locks before reads, but does not release any locks until the transaction commits or aborts. This is called strict 2PL. It is still possible
for phantom inserts to occurs at this level, but all reads will be repeatable.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- In this case, the DBMS obtains locks as in the REPEATABLE READ case, but also obtains
locks on any indexes and structure granularities that might be involved with `T`. Since an insert by another transaction `S`
would involve changing a table, and at this level the database would get an `IS` or `IX` lock on a table that `T` needs as well
as locks on an index elements, the insert can't happen until after `T` commits or aborts.