Access Methods: B-Trees, R-Trees & GiSTs
First, a Review of Disks
- hardware: platters, arms, spindle. Often a cache (often 1 track) -- very
device-specific.
- data layout: sectors, blocks, tracks, cylinders
- "access time": ha!
- seek time, rotational delay, transfer time
- random vs. sequential I/O
- Examples:
- Seagate Hawk 2XL (2.15Mb) -- 5.5 msec average rotation delay, 9 msec average seek time
(varies from 1 to 22 msec), 5MB/sec transfer rate
- Note: average disk access time is about 10-15 msec, RAM access is more like 5 microsecs!
- Another note: while transfer rate and capacity are growing wildly every year, seek time
is not shrinking nearly as fast!
Access Methods: General Issues
- Come in two flavors: heap files and indexes
- Support AMI interface:
- open (possibly with selection condition)
- get_next
- close
- insert, delete, update_field
- indexes usually do two things:
- partition: partition a dataset or domain into buckets
- label: provide a label for each bucket
- this is sometimes done hierarchically (trees), sometimes not (hashing)
- the whole game boils down to how you partition & label!
- index's utility defined by queries, not data!
- typically random I/O in index
- Performance goals:
- cold vs. hot lookups
- # of I/Os vs. # of blocks worth of output
- because of random vs. sequential, should always compare to heap file!
- Implementation issues:
- concurrency & recovery
- cost estimation
- bulk loading
The "Ubiquitous" B-tree
Query load: equality & linear range predicates
Basics
- Basic B-tree properties: balanced, high fanout, some minimal fill factor (order)
- in practice, may want minimal fill factor to be less than 50% (delete at empty has been
shown good in TP applications)
- B-tree vs. B+-tree. Note that B+-tree internal keys simply "direct
traffic"; this is very useful.
- intra-node operations
Compression
- rear & front key compression
- pointer compression
- Very common in practice, since B+-tree is on critical path of TP apps
Variable-length keys
- not too hard -- fill factor now expressed in terms of bytes, not entries.
Multiuser & Recovery issues
- tricky, solved problem
- high concurrency locking: need not be 2PL
- watch repeatable read, too!
- we will revisit this issue in detail later
Leaves usually contain record pointers (as in VSAM)
VSAM
- unified B+-tree-based implementation for a file-system (recall Stonebraker's gripe)
- store sibling leaves and parent in one cylinder
- replicate parent along one track to minimize rotation time
R-trees
Query load: Equality and Range predicates in n dimensions
- point in polygon
- polygon in polygon
- Overlaps polygon
- Contains polygon
Applications: GIS, VLSI, n-d range queries over traditional data where n is typical
Notation.
R-tree
- Structure: like a B+-tree, but keys are MBRs (rectilinear minimum bounding rectangles),
one key per pointer
- Search: Traverse multiple paths! (DFS)
- Insertion:
- Where: ChooseLeaf. Least enlargement. Ties broken by picking smallest key.
- Node splitting as in B+-tree. Question: what goes left, what goes right?
- AdjustTree required. Why not in B+-tree?
- Deletion:
- Seek and destroy
- CondenseTree: throw entries from under-full pages back into tree. Make sure they end up
at same level. Why different from B+-tree?
- Update: delete, update, reinsert
- Split algorithms:
- exhaustive
- quadratic: start with most distant seeds, greedily add based on maximal preference
- linear: like quadratic, but picks seeds differently
- Performance study
- RISC-II VLSI data set
- In practice, people (Postgres & SHORE) seem to use M/2, quadratic split
R-Tree Variants
R*-tree (Beckman, et al., SIGMOD 90)
- like R-tree, but 3 distinctions:
- insertion minimizes different badness metric (depends on tree level)
- new & improved split algorithm (nlogn, more effective)
- forced reinsert of 30% most "extreme" entries
- decreases overlap of siblings
- improves storage utilization
- splits less often
- shapes tend to be more quadratic (pack better, generate smaller parents, minimizes split
of associated pixels)
- Popular, since it out-performs R-tree on search. Concurrency problems
because of reinsertion.
R+-Tree (Sellis, et al, VLDB87)
- instead of overlapping keys, use multiple inserts (i.e. replicate items in leaves)
- speeds search, complicates insert/delete/update
- buggy!
- NOTE: data vs. space partitioning. R-trees partition data. R+-trees partition space.
- other space-partitioning trees: K-D-B trees, Quad trees
TV(telescopic vector)-tree (Lin, et al, VLDB Journal, 1994)
- Goal: solve high-d problems
- Keys: compressed (center, radius) pairs in n-d
- compression "telescopes" out to only the appropriate dimensions at each level
(other dimensions are "wildcards")
- The "dimensionality curse" remains! (See discussion of indexability
below)
hB (holey-brick B) trees (Lomet & Salzburg, TODS 90)
- solve three problems
- naturally handles swiss-cheese polygons
- speeds intra-node search
- space partitioning & data partitioning
- idea: put a main-memory k-d tree (space-partitioning tree) on each node. This tree can
represent a holey brick!
- Neat idea, pretty complicated. Not really a tree -- can be a DAG. Journal paper is
buggy.
- Not necessarily more effective than R*-trees
...and a recent flurry of new trees. In practice, most 2-d access methods have
proved to be within 20-30% of each other, though no systematic benchmarking has been done.
Generalized Search Trees
Indexing in OO/OR Systems
- Quick access to user-defined objects
- Support queries natural to the objects
- Two previous approaches
- Specialized Indices (ABCDEFG-trees)
- redundant code: most trees are very similar
- concurrency control, etc. tricky!
- Extensible B-trees & R-trees (Postgres/Illustra)
- B-tree or R-tree lookups only!
- E.g. WHERE movie.video < Terminator 2
A Third Approach
- A generalized search tree. Must be:
- Extensible in terms of queries
- General (B+-tree, R-tree, etc.)
- Easy to extend
- Efficient (match specialized trees)
- Highly concurrent, recoverable, etc.
Uses for GiSTs
- New indexes needed for new apps...
- find all supersets of S
- find all molecules that bind to M
- your favorite query here (multimedia?)
- ...and for new queries over old domains:
- find all points in region from 12 to 1 oclock
- find all strings that match R. E.
Structure: balanced tree of (p, ptr) pairs
- p is a key predicate
- p holds for all objects below ptr
- keys on a page may overlap
- Key predicates: a user-defined class
- This is the only extensibility required!
Key Methods
- Search
- Consistent(E,q): E.p && q? (no/maybe)
- Labeling
- Union(P): new key that holds for all tuples in P
- Partitioning
- Penalty(E1,E2): penalty of inserting E2 in subtree E1
- PickSplit(P): split P into two groups of entries
Search
- General technique:
- traverse tree where Consistent is TRUE
- For fancier things, see [Aoki98].
Insert
- descend tree along least increase in Penalty
- if theres room at leaf, insert there
- else split according to PickSplit
- propagate changes using Union
- Notes:
- on overflow, can do R*-tree style reinsert
Delete
- find the entry via Search, and delete it
- propagate changes using Union
- on underflow:
- ordered keys, do B+-tree style borrow/coalesce
- else reinsert stuff on page and delete page
GiSTS over (B+-trees)
- Logically, keys represent ranges [x,y)
- Queries:
- Consistent(E,q): (x<b) && (y > a)
- Union(P): [MIN(xi), MAX(yi))
- Penalty(E1, E2): return MAX(y2 - y1, 0) + MAX(x1 - x2, 0)
- if E1 is leftmost or rightmost, drop a term
- PickSplit(P): split evenly in order
Key Compression
- Keys may take up too much room on a page
- Two extra key methods:
- Compress(E)/Decompress(E)
- Compression can be lossy: over-generalization OK
- B+-tree Compression
- Compress(E=([x,y), ptr)):
- if E is leftmost return NULL, else return x
- Decompress(E=(p, ptr)):
- if E is leftmost, let x = -infinity, else let x = p.
- if E is rightmost, let y = infinity, else let y be the value stored in the next key on
the right.
- if E is rightmost on a leaf page, let y = x+1.
GiSTs over R2 (R-tree)
- Logically, keys represent bounding boxes
- Queries: Contains, Overlaps, Equals
- Consistent(E,q): E.p overlap/contain q?
- Union(P): bounding box of all entries
- Compress(E): form bounding box
- Decompress(E): identity function
- Penalty(E,F): size(Union({E,F}) - size(E)
- PickSplit(P): R-tree or R*-tree methods
GiSTs over P(Z) (RD-tree)
- Logically, keys represent bounding sets
- Queries: Contains, Overlaps, Equals
- Consistent(E,q):
- Union(P): set-union of keys
- Compress(E): Bloom filters, rangesets, etc.
- Decompress(E): match compress
- Penalty(E,F): |E.p U F.p| - |E.p|
- PickSplit(P): R-tree algorithms
Research Issues
|