Final Review B
In a distributed database, a table's data may be distributed among sides by fragmenting the data horizontally or vertically. Distinguish these two fragmenting approaches.
In horizontal fragmentation, we place different rows of a table between different sites within the distributed database. In vertical fragmentation, different sites have their own selection of columns for a table.
Consider the following protocol for committing a transaction in a distributed database.
- The coordinator sends the message “commit-request T” to all sites involved.
- Each site commits T, logs that T is committed, and sends back the response “committed T.”
- The coordinator logs that T is now committed.
What is wrong with this protocol (which leads to the introduction of the two-phase commit protocol)?
If after receiving the “commit-request” message, any site finds that it is not in a state to commit T — or if any site has become unavailable due to network or computer failure — then this protocol would end up where some sites have committed but others have not. This would violate atomicity in a rather ugly way.
Describe the problem that leads to the definition of the two-phase commit protocol.
Since a transaction can involve changes to data at multiple sites, all sites involved with a transaction must agree to an identical decision regarding whether the transaction should be committed: If one site cannot commit while another thinks it should, the transaction's atomicity would be violated. One site may not be able to commit due to some local failure (two possibilities are deadlock or a hardware error), and in that case all sites will need to make it so that the transaction had no effect.
Describe briefly the two phases of the two-phase commit protocol.
In the first phase, the coordinator asks all sites whether they are ready to commit the transaction, and all sites respond. In the second phase, the coordinator decides whether the transaction will abort or commit, and it sends this decision to all involved sites.
In the two-phase commit protocol, what must a non-coordinating site do if it recovers from a crash and sees that the last log entry for a transaction is “READY T”?
It must query another site to determine the fate of the transaction. Once it determines the answer for the other site, it logs the answer into its own log.
As we saw, a distributed database that has just one host maintaining the lock for a data element suffers from being unable to access the element when that host is down. Describe how a system might use three hosts to maintain a lock for an element so that it can tolerate any one host being unavailable.
Any transaction that requires a lock of the element will only be required to obtain the lock from any two of the three systems. If any is down, then of course it will request the lock from the remaining two systems. But it will never be that two transactions manage to obtain the same exclusive lock, since each transaction requires two sites to agree, and there are only three sites.
Describe at least two features that distinguish an object-relational DBMS from a relational DBMS.
Here are three:
- Support for aggregate types like sets, lists, and structs.
- Methods can be associated with a row/object.
- References allow for direct references between tuples — essentially pointers, except that the tuples are actually stored on disk.
Explain the concept of reference in an object-relational DBMS, and give an example illustrating how references might be used in an SQL query.
A reference essentially behaves as a memory address describing a row in a table. (It does not in fact reference memory, since the database is stored on disk.) It allows one to have a table that includes direct references to rows in other tables. By contrast, in relational databases, a reference to a row in another table is stored indirectly, by listing the primary key values for that other table.
References allow a query such as the following, in which the
SELECT clause includes an implicit lookup into an
table to identify the name associated with each account. With a
relational DBMS, we would need to join the
SELECT acct->name, SUM(amount)
GROUP BY acct