Distributed DBMS: Overview & Concurrency Control

Location transparency: users don’t know where the data is
Performance transparency: performance independent of submission site
Copy transparency: objects can be copied, and copies are maintained automatically. Critical for availability
Transaction transparency: looks like single-site xacts
Fragment "transparency": tables can be fragmented to different sites
Schema change transparency: schema updates at a single site affect global schema
Local DBMS transparency: shouldn’t matter what DBMS is running at each site (ha!)

Nobody provides all of this (#7 & #3 are still major research field)

IBM Almaden’s R* was the most "real" DDBMS prototype.

Others:

SDD-1: done at CCA, never really ran. PDP-10’s on Arpanet. Strange design decisions based on unreasonably slow network.
Distributed INGRES: UCB, also never ran. Lack of UNIX networking software (pre TCP/IP!).
Commercial vendors beginning to provide this functionality

Goals

Tables can be

Table names: user_name@user_site.object_name@birth_site

Catalogs store info on local objects, and objects born locally (along with pointers to current sites)

Remote catalog info is cached as hints:

a catalog entry for a table:

To find a catalog entry:

Every transaction gets an xactno: site_name.time (or SN)

deadlocks: shoot the youngest, largest-numbered xact

Commit Protocol – you can’t just decide to commit:

Two Phase Commit (2PC):

This does the right thing, but it’s pretty costly. Next paper improves this.

Distributed Deadlock Detection: