Final Review A
Recall that a DBMS has transactions interact with memory buffers using READ and WRITE operations, while it performs LOAD and STORE operations to send buffers to and from the disk. Why not have transactions simply modify the disk directly?
Disk accesses are quite slow relative to memory accesses, so it pays to cache the disk contents in memory, accumulating writes in memory until the information must appear on disk (as with committing a transaction in an undo-log system) or the buffer must be used for other purposes. Also, deferring writes makes canceling a transaction easier.
How does logging help to achieve the ACID transaction properties?
Logging helps primarily to achieve durability: We place information into the log whenever we want to ensure that information will be available should the database immediately crash and lose all information in RAM. It is also helpful for atomicity, as it gives us a way of undoing the transaction should it end up needing to abort.
Summarize what happens with an undo log when a transaction reaches the point where it is ready to commit. Explain all that must be placed onto disk at that point.
When a transaction requests a commit, the DBMS must first ensure that all buffers containing data modified by the transaction are flushed to disk. Then the DBMS inserts a COMMIT record into the log, and it flushes the log to disk. Only at that point can the DBMS report that the transaction has successfully completed.
What is a checkpoint in the context of database logs, and why is checkpointing worthwhile?
A checkpoint is a mark in the log indicating that on recovery, the log records previous to the checkpoint do not require recovery. (Sometimes, the checkpoint may identify a small set of transactions who may have records requiring recovery previous to the checkpoint.)
Without checkpointing, the database log can grow extremely long, and the recovery process would take many hours or even days. Checkpointing allows the recovery process to be completed in a more realistic time period.
Suppose we have a system using an undo log, where each data element is a full block of data consisting of a single number, and where we have three buffers in memory — one for the log, and two to hold two different blocks of memory. We have three data elements A, B, and C, whose initial data values are 1, 2, and 3 respectively. As two transactions T and U requests each of the following operations (to be inserted into the undo log), what must the DBMS do?
T writes 4 to A
T writes 5 to B
U writes 6 to B
U writes 7 to C
|T starts||write |
START Tinto log buffer
|T writes 4 to A||load A into buffer, write |
WRITE T, A, 1into log buffer, modify A buffer to 4
|U starts||write |
START Uinto log buffer
|T writes 5 to B||load B into buffer, write |
WRITE T, B, 2into log buffer, modify B buffer to 5
|U writes 6 to B||write |
WRITE U, B, 5into log buffer, modify B buffer to 6
|T commits||flush log buffer to disk,
flush A and B buffers to disk, write |
COMMIT Tto log and flush log again
|U writes 7 to C||load C into buffer (in place of A),
WRITE U, C, 3into log buffer, modify C buffer to 7
|U commits||flush log buffer to disk,
flush C buffer to disk,
COMMIT Uto log and flush log again
Recall that at any time in its execution a transaction might be canceled, perhaps in response to a user action, or perhaps because of an internal database problem (such as detection of deadlock). When a transaction is canceled, the database state must be rolled back so that it exists as if the transaction never happened. How can this be managed if our system uses an undo log?
When a transaction is canceled, we go backward through the undo log to find WRITE records by the canceled transaction, and we write the logged value back to the element. After ensuring that all such database elements are flushed to disk, we add an ABORT record into the log. (We would also abort any transactions that had read the value as updated by the transaction, but there will be no such transactions if the DBMS uses strict locking.)
Suppose we have a system using an redo log, where each data element is a full block of data consisting of a single number, and where we have three buffers in memory — one for the log, and two to hold two different blocks of memory. We have three data elements A, B, and C, whose initial data values are 1, 2, and 3 respectively. As two transactions T and U requests each of the following operations (to be inserted into the undo log), what must the DBMS do?
T writes 4 to A
T writes 5 to B
U writes 6 to B
U writes 7 to C
|T starts||write “START T” into log buffer|
|T writes 4 to A||load A into buffer, write “WRITE T, A, 4” into log buffer, modify A buffer to 4|
|U starts||write “START U” into log buffer|
|T writes 5 to B||load B into buffer, write “WRITE T, B, 5” into log buffer, modify B buffer to 5|
|U writes 6 to B||write “WRITE U, B, 6” into log buffer, modify B buffer to 6|
|T commits||write “COMMIT T” to log buffer, flush log buffer to disk|
|U writes 7 to C||flush A buffer to disk, load C into buffer (in place of A), write “WRITE U, C, 7” to log buffer, modify C buffer to 7|
|U commits||write “COMMIT U” to log buffer, flush log buffer to disk|
If our DBMS uses a redo log for recovery, and we have a transaction that writes a value to a database element X, what is the rule for when X can be saved to disk?
X cannot be written to disk until the transaction commits.
Describe the recovery process when a DBMS uses a redo log.
The DBMS first determines the set of all transactions that have a COMMIT record in the log. It then goes through the redo log starting from the beginning, and for each WRITE record performed by a COMMITted transaction, it re-performs that write. (Finally, an ABORT record is written into the log for all transactions that have a START record but neither a COMMIT nor an ABORT record.)
Contrast the advantages of undo logs and redo logs.
A redo log requires less disk activity than undo log, since database buffers need not be flushed to disk until it must be replaced with another buffer. On the other hand, the redo log requires that enough buffers in RAM to store all current transactions' modifications, limiting how many modifications a single transaction can perform; an undo log does not have such a restriction. Also (and less significantly), the recovery process from a redo log following a crash is less efficient than recovery from an undo log.
Explain what a DBMS must do during the checkpoint process if it uses redo logging.
It first logs a record BEGIN CKPT including a list of all active transactions. It then forces onto disk all memory buffers that were modified by transactions that have completed. After finishing, it logs a record END CKPT and flushes the log onto disk.
When a transaction requests to commit:
a. What happens during the commit process if the system uses an undo log?
b. What if the system uses a redo log?
a. With an undo log, the log must be flushed to disk, then all memory buffers modified by the transaction must be flushed to disk, and finally “COMMIT T” should be entered into the log and flushed to disk.
b. With a redo log, “COMMIT T” should be entered into the log and flushed to disk. (After this, any memory buffers modified by the transaction may be flushed to disk, immediately or much later.)
Explain the notion of templating in the context of Web development and describe why having a templating system is advantageous.
Distinguish browser-side templating (such as Dust) versus server-side templating.
In browser-side templating, the server passes JSON data back to the browser in response to AJAX requests, and the browser applies the template to generate the HTML that appears on the page. By contrast, with server-side templating, the server applies the template to generate the HTML, which is passed to the browser for it to plug directly into the page.
Identify at least two advantages of server-side templating over browser-side templating.
- Server-side templating has been around longer, so the solutions are more established.
- Server-side templating requires less code to be sent to the browser (no dust-core.js or my-template.js), so the browser can load the page faster. (This comes at the expense of network response time to later AJAX requests, where JSON provides a more compact representation.)
- Server-side templating requires less browser-side computation, which is particularly useful for weak processors like those often found on smartphones.
- Servers can apply caching to avoid re-applying templates.
Recall that NoSQL systems typically support the BASE properties (Basically Available, Soft state, Eventual consistency) rather than the ACID properties. Summarize what these BASE properties mean.
The database tries to remain fully operational, it allows for queries retrieving slightly stale values, and distributed systems will eventually come to agreement about what the correct value is for each element.
The main task of Memcached could be accomplished through a hash table in the regular Web server's memory. Identify two advantages of using Memcached over this approach.
- Memcached can distribute the hash table over multiple servers, allowing the hash table to occupy more RAM.
- Memcached runs in a separate process, so the Web server can be restarted without losing what is in the hash table.
- Memcached has a built-in procedure for throwing out old entries once memory becomes full.
How is data structured in a Cassandra database?
A Cassandra database includes several tables similar to a relational database, but the number of columns can easily be quite large since each row only stores the data for columns that are applicable to that row. Unlike relational databases, the value saved in a column could be a collection such as a set or list.
How is data structured in a MongoDB database?
A MongoDB database consists of named collections. Each collection is a set of JSON objects called documents. As a JSON object, each document is a map associating keys (that are strings) to arbitrary values.
A MongoDB database includes several collections, each containing a set of documents. What differentiates its way of structuring data from a relational database, which includes several tables, each containing a set of rows?
A MongoDB database allows sparse representation, where a particular key can exist in only a handful of documents; with a relational database, we would need a full column with NULL values for all rows for which the column does not apply. Also, a relational table requires that each cell contains one, and only one, value, but with MongoDB, a value associated with a key is allowed to be an aggregate type such as a list or dictionary.