CS133 Lab 4: SimpleDB Transactions

Deadlines:

Part 1: Exercise 1-2: Thursday, April 6 11:59 PM PT
Final: Exercise 3-5: Thursday, April 13 11:59 PM PT

Lab Description

In this lab, you will implement a simple locking-based transaction system in SimpleDB using page-level locking. You will need to add lock and unlock calls at the appropriate places in your code, as well as code to track the locks held by each transaction and grant locks to transactions as they are needed.

The remainder of this document describes what is involved in adding transaction support and provides a basic outline of how you might add this support to your database.

As with the previous lab, we recommend that you start as early as possible. Locking and transactions can be quite tricky to debug!

Quick jump to exercises:

Section 2.4 for Exercise 1
Section 2.5 for Exercise 2
Section 2.6 for Exercise 3
Section 2.7 for Exercise 4
Section 2.8 for Exercise 5

Jump to Submission instructions.

1. Getting started

You should begin with the code you submitted for Lab 2 (or Lab 3). (If you did not submit code for Lab 2, or your solution didn't work properly, contact us to discuss options.). In the tar file you will download below, we have provided you with extra test cases as well as two new source code files (DeadlockException and a skeleton LockManager) for this project that are not in the original code distribution you received. We reiterate that the unit tests we provide are to help guide your implementation along, but they are not intended to be comprehensive or to establish correctness.

Use of the skeleton LockManager class is optional; it is provided as a guide for encapsulating the logic for managing locks. You may decide to create a different helper class or to implement all locking functionality directly within the BufferPool class. The test cases only uses methods from the BufferPool.

You will need to add the new files to your release. The easiest way to do this is to untar the new code in the same directory as your top-level simpledb directory, as follows:

Make a backup of your Lab 2 (or Lab 3) solution into a new directory for Lab 4 by typing:

$ cp -r cs133-lab2 cs133-lab4

Download the new tests and skeleton code for Lab 4 from https://www.cs.hmc.edu/~beth/courses/cs133/lab/cs133-lab4-supplement.tar.gz:
```
$ wget https://www.cs.hmc.edu/~beth/courses/cs133/lab/cs133-lab4-supplement.tar.gz
```
Extract the new files for Lab 4 by typing:
```
tar -xvzkf cs133-lab4-supplement.tar.gz
```

Now all files from lab 2 and lab 4 will be in the cs133-lab4 directory.

To work in Eclipse, create a new java project named cs133-lab4 like you did for previous labs.

2. Transactions, Locking, and Concurrency Control

Before starting, you should make sure you understand what a transaction is and how strict two-phase locking (which you will use to ensure isolation and atomicity of your transactions) works.

In the remainder of this section, we briefly overview these concepts and discuss how they relate to SimpleDB.

2.1. Transactions

A transaction is a group of database actions (e.g., inserts, deletes, and reads) that are executed atomically; that is, either all of the actions complete or none of them do, and it is not apparent to an outside observer of the database that these actions were not completed as a part of a single, indivisible action.

2.2. The ACID Properties

To help you understand how transaction management works in SimpleDB, we briefly review how it ensures that the ACID properties are satisfied:

Atomicity: Strict two-phase locking and careful buffer management ensure atomicity.
Consistency: The database is transaction consistent by virtue of atomicity. Other consistency issues (e.g., key constraints) are not addressed in SimpleDB.
Isolation: Strict two-phase locking provides isolation.
Durability: A FORCE buffer management policy ensures durability (this topic is covered in the lecture on Recovery and reviewed in Section 2.3 below).

2.3. Recovery and Buffer Management

To simplify your job, in this lab you will implement a NO STEAL/FORCE buffer management policy. As we discussed (or will discuss) in class, this means that:

You shouldn't evict dirty (updated) pages from the buffer pool if they are locked by an uncommitted transaction (this is NO STEAL).
On transaction commit, you should force dirty pages to disk (e.g., write the pages out) (this is FORCE).

To further simplify your life, you may assume that SimpleDB will not crash while processing a transactionComplete command. Note that these three points mean that you do not need to implement log-based recovery in this lab, since you will never need to undo any work (you never evict dirty pages) and you will never need to redo any work (you force updates on commit and will not crash during commit processing).

You will implement this NO STEAL/FORCE policy in Exercise 3 (Section 2.6).

2.4. Granting Locks

You will need to add calls to SimpleDB (in BufferPool, for example), that allow a caller to request or release a (shared or exclusive) lock on a specific object on behalf of a specific transaction. You may decide to encapsulate most of this functionality in a Lock Manager class.

You will need to create data structure(s) that keep track of which locks each transaction holds and that check to see if a lock should be granted to a transaction when it is requested.

You will implement shared and exclusive locks; recall that these work as follows:

Before a transaction can read an object, it must have a shared lock on it.
Before a transaction can write an object, it must have an exclusive lock on it.
Multiple transactions can have a shared lock on an object.
Only one transaction may have an exclusive lock on an object.
If transaction t is the only transaction holding a shared lock on an object o, t may upgrade its lock on o to an exclusive lock.

A transaction will need to acquire a shared lock on any page before it reads it, and an exclusive lock on any page before it writes it. You will notice that we are already passing around Permissions objects in the BufferPool; these objects indicate the type of lock that the caller would like to have on the object being accessed (see Permissions.java). Permissions.READ_ONLY indicates you need a shared lock, while Permissions.READ_WRITE requires an exclusive lock.

If a transaction requests a lock that it should not be granted, your code should block, waiting for that lock to become available (i.e., be released by another transaction running in a different thread). Example code for waiting (a "sleep") can be found in the LockManager class.

Exercise 1. The Buffer Pool will be responsible for granting and releasing page-level locks; in this exercise you will write the methods that acquire and release locks in the BufferPool. More details on using the LockManager class are below, but at a high level, you will need to do the following:

Modify getPage() in BufferPool to block and acquire the desired lock for a transaction as described above. As needed, it should catch a DeadlockException and re-throw that exception as a TransactionAbortedException. If you are using the LockManager class, you'll accomplish this by adding a call to LockManager.acquireLock() at the very beginning of your BufferPool.getPage() like this (assuming your Lock Manager instance is called lockmgr):
```
try {
      lockmgr.acquireLock(tid, pid, perm);
} catch (DeadlockException e) { 
      throw new TransactionAbortedException(); // caught by caller, 
                                               // who calls transactionComplete()
}
```

Implement releasePage(). This method is primarily used for testing, and at the end of transactions. If you are using the LockManager class, uncomment the call to releaseLock() in BufferPool and complete LockManager.releaseLock(), described below.

Implement holdsLock() so that logic in Exercise 2 can determine whether a page is already locked by a transaction. If you are using the LockManager class, uncomment the call to holdsLock() in BufferPool and complete LockManager.holdsLock(), described below.

Here are some further implementation hints and details if you are using the provided LockManager class.

You won't need to add more code to acquireLock() until Exercise 5, but you may wish to look at it briefly now as it uses methods you will implement in this exercise.

LockManager constructor: Your constructor should create whatever data structure(s) you will be using to represent your lock table. The design of this is entirely up to you. You may decide to use multiple data structures, create a helper class, etc. As a design guideline, you should ensure your data structure(s) allow you to answer these questions:
- Given a transactionId, which pages does it have locked?
- Given a page Id, which transactions hold a lock on the page?
- Given a page, which Permissions is it locked with?
If you create a helper class for this and want to be able to check equality of instances of that class, be sure to implement its equals() method.

lock(): To get a lock, this method first checks if the given transaction can acquire a lock using locked(), which you will implement next. Right now you should just add code to lock() to update your lock table assuming it's okay (this is the "else" case in the code).

locked(): This method returns a boolean indicating whether a transaction is "locked out" from acquiring a lock on the given page with the given permissions. Logic for this method appears in the code in comments above the method. Be careful with == vs. equals and be sure to check Java documentation for whatever data structures you are using for your lock table to see when methods can return null.

holdsLock(): Simple method used by Buffer Pool to determine whether the given transaction has any type of lock on the given page.

releaseLock(): Release whatever lock the given transaction has on the given page, updating your lock table. Used for testing (used by BufferPool.releasePage()) and to help at the end of transactions, as you'll see in Exercise 4.

You may need to implement the next exercise before your code passes the unit tests in LockingTest.

Note: if it seems like LockingTest is hanging forever before it even runs any of its tests, the problem is likely happening in the setup() for LockingTest! Check out what happens there. In particular, does your implementation for locked() allow a transaction to get a lock it already has? And allow a transaction to get a lock on a page if no other transaction holds a lock on that page?

Debugging tip: When running tests with ant, note that the printing of standard output will be delayed until the tests complete. If this is making debugging hard, you could try temporarily commenting out parts of the unit tests.

2.5. Lock Lifetime

Now that you've implemented the core functionality for acquiring and releasing locks, you will need to implement strict two-phase locking. Recall that this means that transactions should acquire the appropriate type of lock on any object before accessing that object and shouldn't release any locks until after the transaction commits (or aborts).

Depending on your implementation, it is possible that you may not have to acquire a lock anywhere besides what you've already implemented in Buffer Pool. It is up to you to verify this in the next exercise!

You will need to think about when to release locks as well. It is clear that you should release all locks associated with a transaction after it has committed or aborted to ensure strict 2PL. You will implement this later in Exercise 4. However, it is possible for there to be other scenarios in which releasing a lock before a transaction ends might be useful. For instance, you may release a shared lock on a page after scanning it to find empty slots (as described in Exercise 2 below).

Exercise 2. Ensure that you acquire and release locks throughout SimpleDB. Some (but not necessarily all) actions that you should verify work properly:

Reading tuples off of pages during a SeqScan (if you implemented locking in BufferPool.getPage(), this should work correctly as long as your HeapFile.iterator() uses BufferPool.getPage().)

Inserting and deleting tuples through BufferPool and HeapFile methods. Note that your implementation of HeapFile.insertTuple() and HeapFile.deleteTuple(), as well as the implementation of the iterator returned by HeapFile.iterator() should access pages using BufferPool.getPage(). Double check that that these different uses of BufferPool.getPage() pass the correct permissions object (e.g., Permissions.READ_WRITE or Permissions.READ_ONLY).

Marking dirty pages. You may also wish to double check that your implementation of BufferPool.insertTuple() and BufferPool.deleteTupe() call markDirty() on any of the pages they access (you should have done this when you implemented this code in Lab 2, but we did not test for this case.)

You will also want to think about acquiring and releasing locks in the following situations:
- Adding a new page to a HeapFile. When do you physically write the page to disk? Are there race conditions with other transactions (on other threads) that might need special attention at the HeapFile level, regardless of page-level locking? You may need to synchronize the part of your code where you write a blank page to disk, e.g.,
  synchronized (this) { // code to write blank page }
- Looking for an empty slot into which you can insert tuples. Most implementations scan pages looking for an empty slot, and will need a READ_ONLY lock to do this. Surprisingly, however, if a transaction t finds no free slot on a page p, t may immediately release the lock on p. Although this apparently contradicts the rules of two-phase locking, it is ok because t did not use any data from the page, such that a concurrent transaction t' which updated p cannot possibly effect the answer or outcome of t.

At this point, your code should pass the unit tests in LockingTest.
Code hanging? See the notes at the end of Exercise 1.

2.6. Implementing NO STEAL

In a NO STEAL policy, updates from a transaction cannot be written to disk before it commits. This means we must be sure not to evict dirty pages from the buffer pool until commit time.

Note that, in general, evicting a clean page that is locked by a running transaction is OK when using NO STEAL, as long as your lock manager keeps information about evicted pages around, and as long as none of your operator implementations keep references to Page objects which have been evicted. You don't need to do this for the lab.

Exercise 3. Implement the necessary logic for page eviction without evicting dirty pages. You will need to modify the evictPage method in BufferPool. In particular, it must never evict a dirty page. If your eviction policy currently chooses a dirty page for eviction, you will have to find a way to evict an alternative page. In the case where all pages in the buffer pool are dirty, you should throw a DbException.

This functionality is not tested until you've completed Exercise 4.

2.7. Transactions

In SimpleDB, a TransactionId object is created at the beginning of each query. This object is passed to each of the operators involved in the query. When the query is complete, the BufferPool method transactionComplete is called.

Calling transactionComplete either commits or aborts the transaction, as specified by the parameter flag commit. At any point during its execution, an operator may throw a TransactionAbortedException exception, which indicates an internal error or deadlock has occurred. The test cases we have provided you with create the appropriate TransactionId objects, pass them to your operators in the appropriate way, and invoke transactionComplete when a query is finished. We have also implemented TransactionId.

Exercise 4. You will now implement transactionComplete in BufferPool.java to finish a transaction, adhering to a FORCE policy and Strict 2PL.

transactionComplete(tid,commit): This method should first deal with dirty pages in the buffer pool for committing or aborting to adhere to a FORCE policy. If the xact is committing (i.e., commit==true ), you should flush dirty pages associated with the xact to disk. Else, if the xact is aborting, you should throw away its changes to pages in the Buffer Pool (e.g., by using its "before image" or even just removing it from the Buffer Pool)..

It should then release all locks associated with that transaction. If you are using the LockManager class, you can do this by uncommenting the call to releaseAllLocks and implementing that method in the Lock Manager as described next.

Note that there is another version of transactionComplete that takes a single argument; you do not need to add any code there.

releaseAllLocks(tid): This method in the Lock Manager should update your lock table data structure(s) to release all locks held by the given xact. Be careful: Java doesn't like it if you are iterating over a collection while you are also removing elements from that collection; you may end up seeing a ConcurrentModificationException.

At this point, your code should pass the TransactionTest unit test and the AbortEvictionTest system test. You may find the TransactionTest system test illustrative, but it will likely fail until you complete the next exercise.

2.8. Deadlocks and Aborts

It is possible for transactions in SimpleDB to deadlock due to a cycle of transactions waiting for each other to release locks. You will need to detect and resolve this situation!

There are different ways to detect deadlock. For example, you may:

Implement a timeout policy that aborts a transaction if it has not completed after a given period of time. (Easier to implement, but might not pass all the unit tests).
Alternately, implement cycle-detection in a "waits-for" dependency graph data structure as discussed in class. In this scheme, you would check for cycles in a waits-for graph whenever a xact tries to acquire a new lock and abort that xact if a cycle would be created.

After you have detected that a deadlock exists, you must improve the situation. Suppose you have detected a deadlock while transaction t is waiting for a lock; you can decide to abort t to give other transactions a chance to make progress. This is most easily done by aborting the xact t (by throwing a DeadlockException) when it tries to acquire a lock that will cause a cycle (if you are trying the waits-for-graph approach) or if too much time has elapsed (if using the timeout approach).

Exercise 5. Implement deadlock detection and resolution in BufferPool.java. If you are using the Lock Manager class, you will be checking for deadlocks (and possibly throwing a DeadLockException) in acquireLock().

You will want to check for a deadlock whenever a transaction attempts to acquire a lock and finds another transaction is holding the lock (note that this by itself is not a deadlock, but may be symptomatic of one). E.g., for the waits-for-graph approach, you could check if a cycle of transactions waiting has formed. Please describe your approach to dealing with deadlock in the lab writeup.

Aborting a transaction
To abort a transaction in the Lock Manager's acquireLock, you can simply throw a DeadlockException which should be caught by BufferPool.getPage() and re-thrown as a TransactionAbortedException as described in Exercise 1. This TransactionAbortedException will be caught by the code executing the transaction. (e.g., TransactionTest.java), which calls transactionComplete() to clean up after the transaction. You are not expected to automatically restart a transaction which fails due to a deadlock -- you can assume that higher level code in the unit tests will take care of this.

Unit testing
We have provided some (not-so-unit) tests in DeadlockTest. They are actually a bit involved, so they may take more than a few seconds to run (depending on your policy). If they seem to hang indefinitely, then you probably have an unresolved deadlock. These tests construct simple deadlock situations that your code should be able to escape. The tests will print TransactionAbortedExceptions corresponding to the deadlocks it successfully resolved to the console.

Note that there are two timing parameters near the top of DeadLockTest.java; these determine the frequency at which the test checks if locks have been acquired and the waiting time before an aborted transaction is restarted. You may observe different performance characteristics by tweaking these parameters if you use a timeout-based detection method.

In addition to DeadlockTest, your code should now should pass the TransactionTest system test (which may also run for quite a long time, but timing out at 10 minutes). Note that if you used a timeout approach to deadlock detection, you might not be able to pass all the sub-tests.

Debugging tip: TransactionTest runs actual queries--see the run method in XactionTester. So if DeadlockTest passes but TransactionTest does not, the issue may lie with the query plan operators that are used by this test, namely, Insert.java and Delete.java. If your implementation of those operators suppresses TransactionAbortedExceptions because it catches all exceptions, that could be the issue. Feel free to consult Lab 2 solution code.

3. Submission and Grading Details

You must submit your code (see below) as well as a short (2 page, maximum) writeup describing your approach. This writeup should:

Describe design decisions, such as deadlock detection policy, and justify any changes you made to the API.
Describe any missing or incomplete elements of your code.
Describe how long you (and your partner) spent on the lab, and whether there was anything you found particularly difficult or confusing.

3.1. Collaboration

This lab can be completed alone or with a partner. Please indicate clearly who you worked with, if anyone, on your writeup. Only one person needs to submit. On Gradescope, click "Group Members" at the bottom of the page after uploading your files to add your partner.

3.2. Submitting your assignment

You will submit a tarball of your code on Gradescope for intermediate deadlines and for your final version. You only need to include your writeup for the final version.

Generating Tarball

You can generate the tarball by using the ant handin target. This will create a file called cs133-lab.tar.gz that you can submit. You can rename the tarball file if you want, but the filename must end in tar.gz.

The autograder won't be able to handle it if you package your code any other way!

Submitting on Gradescope

Click Lab 4 on your Gradescope dashboard. For deadlines besides the final version, you only need to upload or resubmitcs133-lab.tar.gz.
For the final version: click Lab 4 and then click the "Resubmit" button on the bottom of the page ; upload both cs133-lab.tar.gz and writeup.txt containing your writeup.

If you worked with a partner, be sure to enter them as a group member on Gradescope after uploading your files.

3.3 Grading

Your grade for the lab will be based on the final version after all exercises are complete.

75% of your grade will be based on whether or not your code passes the test suite. Before handing in your code, you should make sure it produces no errors (passes all of the tests) from both ant test and ant systemtest.

Important: before testing, we will replace your build.xml and the entire contents of the test directory with our version of these files. This means you cannot change the format of .dat files! You should also be careful changing our APIs. You should test that your code compiles the unmodified tests. In other words, we will untar your tarball, replace the files mentioned above, compile it, and then grade it. It will look roughly like this:

[untar your tar.gz file]
[replace build.xml and test]
$ ant test
$ ant systemtest

If any of these commands fail, we'll be unhappy, and, therefore, so will your grade.

An additional 25% of your grade will be based on the quality of your writeup, our subjective evaluation of your code, and on-time submission for the intermediate deadlines.

ENJOY!!

Acknowledgements

Thanks to our friends and colleagues at MIT and UWashington for doing all the heavy lifting on creating SimpleDB!