Databases are often distributed across multiple sites for two potential reasons: availability (so the database is accessible even when one site is down) and scalability (so the database can be accessed efficiently even as it faces higher utilization).
One major design issue is how to distribute data across the multiple sites. The first option is to use replication, where we store copies of the data at each site. Replication provides higher availability, and it allows read-only transactions to execute more quickly. But transactions that modify the database will inherently go more slowly, since they now must modify multiple databases. Synchronization of data across replicas can become a major issue.
An alternative is fragmentation, where we store different portions of the data at different sites. The more common fragmentation technique is called horizontal fragmentation, which has each site storing a different set of rows from a table; for example, we might have a San Francisco site storing the data for users from the Mountain and Pacific time zones, while a Chicago server holds the data for users from the Eastern and Central time zones. But an alternative is vertical fragmentation, where different sites store different columns of a table: Perhaps we have an orders table, and we choose to store its id and shipping_addr columns in Memphis, where are shipping department is, but our New York office (which has the billing department) stores the id, customer_id, and credit_card columns.
In practice, a database designer will often choose a combination of these techniques, perhaps using horizontal fragmentation but still designing in some replication.
In a single-site database, we don't really think about how to decide about committing at all. But this becomes surprisingly complex with a distributed system, as we need a way for all sites involved with a transaction to come to an identical decision about committing to a transaction. The standard approach for this is called the two-phase commit protocol.
- Phase 1 begins: The coordinator logs and flushes “PREPARE T”.
- The coordinator sends “prepare T” to each other participating site.
- Each non-coordinating site, upon receiving “prepare T”, decides whether it believes T should commit or abort based on the local information at the site. Note in some cases the site may need to do some work before deciding this (like flushing buffers to disk if it is using an undo log).
- If a non-coordinating site decides to commit, it performs whatever steps it needs to ensure T's data cannot be lost, it logs and flushes “READY T”, and finally it sends “ready T” to the coordinator. If it decides to abort, it logs and flushes “DON'T COMMIT T”, and then it sends “don't commit T” to the coordinator.
- Phase 2 begins: If the coordinator receives “ready T” from all sites, it commits locally, logs and flushes “COMMIT T” and finally sends “commit T” to each other participating site.
- Each non-coordinating site, upon receiving “commit T”, commits locally and logs and flushes “COMMIT T”.
- If the coordinator receives “don't commit T” from any sites, or if enough time has elapsed to give up on receiveng a response from a site, the coordinator aborts logs and flushes “ABORT T” and then sends “abort T” to all sites.
- Each non-coordinating site, upon receiving “abort T”, aborts locally and logs and flushes “ABORT T”.
If a non-coordinating site crashes, it must determine what happened to each transaction in its log. In each case, it can look to the final record regarding the transaction. If the last record from the transaction is START, WRITE, DON'T COMMIT, or ABORT, then the transaction must have been aborted. If the last record in COMMIT, then the transaction must have been committed. The hard case is when the last record is READY; in that case, the site cannot determine on its own what happened to the transaction, and so it must consult another site to determine what it knows about the transaction.
A much more difficult case is when the coordinator crashes. In this case, the remaining sites will need to elect a new coordinator to determine how to dispose of the transaction. Each remaining site will report to the newly elected coordinator what it knows about the transaction; of course, if any of the remaining sites lack READY in their logs, then they know that the crashed coordinator never received “ready” from these sites, and so it can safely assume that ABORT is a reasonable decision. If any have COMMIT in their logs, then they know that the crashed coordinator reached that decision. However, if all sites have READY in their logs but none have COMMIT, then ultimately nobody knows what the coordinator was about to decide; the only possibilities are either to wait until the crashed coordinator recovers to report its decision, or to pull in a human operator to resolve the dilemma.
Many issues arise with distributed databases, so that most of the concepts we've studied must be revisited. We won't do that, but we will look at another issue: What can we do to support locking for concurrency control?
The simplest technique is to have a single, centralized site that manages all locks. But when that single site becomes unavailable, then the entire database becomes unavailable, which is hardly an appealing solution.
Another technique is to have a group of sites that each tracks its own locks. When a transaction requires an exclusive lock, it acquires the lock from all of the sites. But if it requires a shared lock, it only need acquire the lock from one site. If a site goes down, then nobody can get exclusive locks, but we can at least acquire shared locks, which permits read-only transactions to continue.
(One issue to be addressed is a renewed possibility of deadlock: If two transactions want an exclusive lock on one element, and they happen to start sending the requests to different sites, then they'll end up in deadlock before either can get the exclusive lock.)
Finally, an alternative technique is again to have a group of sites managing locks. But both an exclusive lock and a shared lock require that a majority of managers agree to allocate the lock. This has reduced performance for read-only transactions, but writing transactions can still hope to complete even if one of the managing sites becomes unavailable.