Concurrency Control 2: Optimistic Concurrency Control

Kung & Robinson

Attractive, simple idea: optimize case where conflict is rare.

Basic idea: all transactions consist of three phases:

Read. Here, all writes are to private storage (shadow copies).
Validation. Make sure no conflicts have occurred.
Write. If Validation was successful, make writes public. (If not, abort!)

When might this make sense? Three examples:

All transactions are readers.
Lots of transactions, each accessing/modifying only a small amount of data, large total amount of data.
Fraction of transaction execution in which conflicts "really take place" is small compared to total pathlength.

The Validation Phase

Goal: to guarantee that only serializable schedules result.
Technique: actually find an equivalent serializable schedule. That is,

Assign each transaction a TN during execution.

Ensure that if you run transactions in order induced by "<" on TNs, you get an equivalent serial schedule.

Suppose TN(Ti) < TN(Tj). Then if one of the following three conditions holds, it’s serializable:

Ti completes its write phase before Tj starts its read phase.
WS(Ti) Ç RS(Tj) = Ć and Ti completes its write phase before Tj starts its write phase.
WS(Ti) Ç RS(Tj) = Ć and WS(Ti) Ç WS(Tj) = Ć and Ti completes its read phase before Tj completes its read phase.

Is this correct? Each condition guarantees that the three possible classes of conflicts (W-R, R-W, W-W) go one way only.

For condition 1 this is obvious (true serial execution!)
For condition 2,

No W-R conflicts since WS(Ti) Ç RS(Tj) = Ć
In all R-W conflicts, Ti precedes Tj, since the write phase (and hence the read phase) of Ti precedes that of Tj.
In all W-W conflicts, Ti precedes Tj by definition.

For condition 3,

No W-R conflicts since WS(Ti) Ç RS(Tj) = Ć .
No W-W conflicts since WS(Ti) Ç WS(Tj) = Ć .
In all R-W conflicts, Ti precedes Tj, since the read phase of Ti precedes the write phase of Tj.

Assigning TN's: at beginning of transactions is not optimistic; do it at end of read phase. Note: this satisfies second half of condition (3).

Note: a transaction T with a very long read phase must check write sets of all transactions begun and finished while T was active. This could require unbounded buffer space.
Solution: bound buffer space, toss out when full, abort transactions that could be affected.

Gives rise to starvation. Solve by having starving transaction write-lock the whole DB!

Serial Validation

Only checks properties (1) and (2), since writes are not going to be interleaved.

Simple technique: make a critical section around <get xactno; validate (1) or (2) for everybody from your start to finish; write>. Not great if:

write takes a long time
SMP – might want to validate 2 things at once if there’s not enough reading to do

Improvement to speed up validation:

repeat as often as you want {

get current xactno.

Check if you’re valid with everything up to that xactno.

}

<get xactno; validate with new xacts; write>.

Note: read-only xacts don’t need to get xactnos! Just need to validate up to highest xactno at end of read phase (without critical section!)

Parallel Validation

Want to allow interleaved writes.
Need to be able to check condition (3).

Save active xacts (those which have finished reading but not writing).
Active xacts can’t intersect your read or write set.
Validation:

<get xactno; copy active; add yourself to active>
check (1) or (2) against everything from start to finish;
check (3) against all xacts in active copy
If all’s clear, go ahead and write.
<bump xact counter, remove yourself from active>.

Small critical section.
Problems:

a member of active that causes you to abort may have aborted

can add even more bookkeeping to handle this
can make active short with improvement analogous to that of serial validation

Š 1998, Joseph M. Hellerstein. Last modified 08/18/98.
Feedback welcomed.