Mariposa: An Agoric Distributed DBMS for WANs
Theme: to scale DDBMS to 1000's of sites, need loose coupling. This
can be done with an economic paradigm for resource allocation: query optimization
(scheduling) and data placement.
Bad things about old DDBMSs:
-
Static data allocation
-
Single administrative structure
-
Homogeneity of hardware, software, networks
Goals of Mariposa:
-
Scale to 1000's of sites
-
Data Mobility
-
Local autonomy
-
Easily and autonomously configurable policies
Mechanism:
-
Economic (Agoric) paradigm (agora = marketplace)
-
Allows point-to-point decision-making to work in a more global context
of resource utilization
-
removes need for centralized decision-making (e.g. global optimizer)
-
in essence, implicit aggregate information (i.e. feedback) is passed
around the network, resulting in rational behavior
Some important cautionary notes:
-
Economics is just a useful metaphor for doing decentralized resource allocation.
-
This decentralized scheme will not be optimal. Goal is to scale at
the expense of controlling optimality.
-
One can spend years arguing about policy, and trying to tune the system
by changing policy.
-
Mariposa implemented mechanisms, but did very little research on policy.
-
Premature discussions of policy can be VERY distracting -- first put mechanism
in place, THEN play!
Architecture and Life of a Query
-
Data layout: horizontally partitioned table fragments (logical or not),
and replicas (of varying freshness)
-
3-tier architecture: clients, middleware, local site manager. Mariposa
requires local sites to run Postgres.
-
Query is generated by application, with accompanying "bid curve" ($/Delay).
-
Query planning is multi-step, working (more or less) as follows:
-
optimizer (middleware) runs Selinger as if it were a local single-site
query
-
fragmenter (middleware) breaks resulting query plan into pieces,
arranges pieces in trivially parallelizable "strides" (which are NOT pipelined
together, for no apparent reason)
-
broker (middleware) sends out Requests For Bid (RFBs) to sites that
might be interested (more on this later)
-
bidder (local site) returns a "bid" for a piece of work, consisting
of triple (Cost,Delay,Expiration)
-
coordinator (middleware) accepts bids, constructs a final plan,
and informs local sites of their jobs
-
coordinator and local executors process the plan
Details of Bidding
-
Budget B(t) (bid curve) a non-increasing function of time
-
Broker "bids out" single-table subplans or joins of two fragments
-
Two protocols: the Long Protocol ("expensive bid") and the Short Protocol
("purchase order")
-
Long protocol is as described above
-
In short protocol, heuristics determine where to send work orders (e.g.
scans at storage sites)
-
Choosing among bids:
-
they do this stride-by-stride, but sum of all strides delay must be <=
B(D)
-
note that there's parallelism within a stride, so delay for a stride's
bid collection is MAX of delays of the bids in the collection (whereas
cost is SUM of costs)
-
ideally, want to pick the "lowest point below the bid curve" -- i.e. maximize
difference
= B(D) - C
-
greedy heuristic used in Mariposa:
-
start with minimal-delay bid collection for each stride.
-
for each longer-delay collection, compute cost gradiant: (decrease
in cost)/(increase in time) if we switch to this collection
-
swap in maximum cost-gradiant alternative if difference increases.
Recompute cost gradiants.
-
repeat until difference will not be improved
-
This is described as a "bottom-up" strategy. An alternative "top-down"
strategy bids out large sub-pieces, which can in turn be busted up by the
bidders and "subcontracted" into separate pieces
Discussion: backing off from the details, what is the plan space,
what is "bidding" for, and what about other approaches?
Metadata
-
How does the broker find sites to get bids? Yellow Pages and
advertisements.
-
Servers can register services (with prices) at a variety of yellow page
servers. These are timestamped so that freshness can be considered
in bidding.
-
Discussion of various costing policy issues: sale prices, coupons, bulk
purchase contracts, etc.
-
Don't think any fancy economics got implemented in Mariposa
Costing
-
Simple: charge for CPU cycles and I/O bandwidth, translate to a delay via
site-specific multiplicative factor
-
Scale delay or cost by current machine load: gives a simple (though crude)
system-wide feedback for load-balancing. An example of economics
taking the place of a centralized load leveler
-
Can set pricing per fragment (why?)
-
They talk about network bandwidth reservation, though this was never implemented
(and is not clearly a win in the network world)
Storage
-
Sites can buy and sell fragments; access history required to judge value
of fragments
-
To run a subquery, Mariposa requires the site to buy any fragments that
aren't resident; this may require evicting (selling) other fragments.
Note lack of pipelining causing a problem here.
-
No discussion here of copies; there was a great deal of thought
about how sites could buy copies and contract for updates at varying rates.
In turn, queries could reason about the age of copies. Since they
built on Postgres, time-travel could allow for consistent (though perhaps
dated) query results regardless of the copies chosen.
-
Fragments could get too big or too small, and splitting/coalescing fragments
was seen as an important optimization issue. Either this could be
determined by economics, or by a more direct mechanism.
Names & Nameservice
-
Mariposa used a different naming scheme than R*.
-
Really no reason to think it's better than R*'s
Status
-
Mariposa worked well enough for Jeff Sidell to run some TPC-D queries on
half a dozen machines. The simple load-balancing pricing strategy
easily adapted to changing workloads.
-
Running queries across the internet resulted in very unpredictable delays.
This undercuts the economic model? See Franklin's "Query Scrambling"
work tomorrow.
-
Cohera Corp
-
Commercializing Mariposa
-
Sssshh....wait 'til March 29 (www.cohera.com)
Discussion II: Why economics? When to use it, when to use
more direct mechanisms? If economics is a metaphor for doing resource
allocation, what are other plausible metaphors/mechanisms that would scale?
|