In theory, database transactions present a very simple interface to developers: group related operations together and they’ll be atomically committed if doing so is possible and doesn’t conflict with other changes. Otherwise, the transaction will fail and you can just try again. Rinse and repeat. This is very simple, but also very powerful because the same semantics work whether you are writing a single user application or a distributed system with thousands of concurrent users and random outages from time to time.
Unfortunately, many database system implement transactions in an overly complicated way by exposing internals like read phenomena and deadlocks for developers to reason about. So, when we added support for transactions in version 0.3, we had three major design goals: 1) a simple API, 2) strong consistency, and 3) high performance. Creating the API was easy, but there is a natural tension between strong guarantees and high performance, so the last two goals required some creative engineering.
Why not use snapshot isolation
Before I explain how Concourse solves the problem of offering transactions with high performance and strong consistency, I want to explain why we rejected the most popular approach–snapshot isolation.
Snapshot isolation uses multiple versions of data to guarantee that all reads within a transaction see a consistent snapshot and avoid all read phenomena. And since Concourse is a version control database, implementing transactions in this fashion seemed almost trivial. But snapshot isolation is prone to another anomaly called write skew that we found unacceptable.
Write skews generally happen when there are application level constraints that can’t be detected when snapshots contain stale data. An example (which I’m borrowing from Wikipedia) is the scenario where a user has two bank accounts, both with $100. There is a rule in place that allows a single account to have a negative balance as long as the sum of both accounts is not negative. Seems reasonable, but a bank using snapshot isolation could end up losing money.
Let’s assume that a woman goes to the ATM to withdraw $200 from one account. And, at the exact same time, her spouse goes to another ATM and tries to withdraw $200 from the other account. The flow that handles the withdrawals looks something like:
- start transaction
- set the balance in account equal to the current balance - 200
- if the balance in account and the balance in otherAccount is >= 0 - commit
Under snapshot isolation, the final check reads stale data from an old snapshot and does not account for new updates that have been committed by the other transaction. Thus both transactions commit and the balance in each account ends up being -$100, which violates the application constraint.
So, if snapshot isolation allows write-skews, why do most databases prefer it to the stronger and anomaly free serializable isolation? Because serializable locking is a big blow to performance…
Not your grandfather’s serializability
Classic serializability is pessimistic because it assumes concurrent transactions are likely to conflict and therefore grabs locks to block those outside changes. For Concourse, this is unacceptable because it degrades performance, forces developers to deal with potential deadlocks and may be done in vain if the client fails or decides to abort the transaction before committing. So, we needed a solution that was much more optimistic.
We initially tried classic optimistic concurrency control measures that, instead of locking, check data versions before committing to see if a transaction’s work has been invalidated. Unfortunately, this approach is prone to a race conditions and doesn’t guarantee the strong consistency we require. It became clear that we couldn’t achieve serializable isolation without locking, so we decided to come up with an approach that avoids it until absolutely necessary: just in time locking.
As the name suggest, JIT locking views a lock as a resource that should not be invested unless and until its necessary, less it be wasted. With JIT locking, the Concourse transaction protocol works as follows:
- Each transaction makes changes in an isolated buffer. When changes are made, the transaction registers itself to be notified of any conflicting commits by other transactions in real-time. At this point, NO locking is done.
- If the transaction is notified about a conflicting commit, it fails immediately and the client is notified. This means that there is generally no locking cost associated with failed transactions.
- If the transaction is never notified of conflicts, it is allowed to commit, at which point it attempts to lock any needed resources. During this process, the transaction may still be notified of conflicting updates or fail to grab a lock it needs (because another transaction is in the process of committing and got to the lock first). In both cases, the transaction fails immediately and the caller is notified. Any locks that were grabbed are released immediately.
- If the transaction grabs all the necessary locks, it takes a backup (for crash recovery purposes) and immediately commits all of its data.
- After the transaction data is committed, the backup is deleted and all the locks are released.
JIT locking offers higher throughput and better performance than classic serializability. It also prevents deadlocks because no locking occurs until a transaction is absolutely sure it can commit without conflict.
In addition to only locking resources until absolutely necessary, Concourse is able to handle high concurrency because locks are incredibly granular. When reading, Concourse grabs a shared lock on only the key in the record you are reading (this is the equivalent of locking a single cell in a relational database table). Concourse only ever locks the entire record If the read touches all the data in the record (i.e. browse(record)).
Shared locks block writers but allow multiple concurrent readers. This means that multiple transactions that read the values for name in record 1 can commit at the same time, but no transaction that writes to name in record 1 can commit until all the readers are done. On the other hand, other transactions are free to write to other records or other keys in record 1 while the values are read from name in record 1.
When writing, Concourse grabs an exclusive lock on the key in the record to which you are writing (this is the equivalent of locking a single cell in a relational database table) . Exclusive locks block both readers and other writers. But, since these locks are granular, other transactions are free to commit reads from or writes to other records or other keys in the same record concurrently.
When performing a find query, Concourse grabs a range lock on the values and potential values that could be included in the query. Range locks are shared, so they allow concurrent readers within the range, but they block writers.
For example, consider the age key which has the following values in each of the specified records:
Record | Value
1 | 15
2 | 18
3 | 21
4 | 23
5 | 27
6 | 32
7 | 49
8 | 55
9 | 70
10 | 70
If you were to do find(“age”, Operator.GREATER_THAN, 17) Concourse would grab a range lock that prevents other transactions from committing writes with values that are greater than 17 to the age key (i.e. you wouldn’t be able to do add(“age”, 50, 100)) until the transaction was done committing. If you were to change the query to find(“age”, Operator.BETWEEN, 17, 34) then the range lock would only block writers trying to write to the age key if the values they were writing fell between 17 and 34. That means another transaction could simultaneously commit a write with a value of 50 to the key.