Last Updated on Feb 26, 2021

Data persistence can seem like a straightforward process, but it has many pitfalls to watch out for. A system has to deal with a wide variety of faults and ensure they don't cause catastrophic failures. But implementing Fault Tolerance is a lot of work - most of the time goes into imagining the different kinds of failures and devising ways to test them.

Databases have traditionally relied on the concept of transactions to simplify these issues.

A transaction is a way for an application to group an operation's reads and writes together into a logical unit. Conceptually, data changes and queries are executed as one operation: either the entire transaction succeeds and commits or fails and aborts.

Transactions simplify the programming model for applications. Databases take on the responsibility of preventing Dirty Reads and Dirty Writes, thus avoiding most concurrency issues while still allowing long-running transactions to use snapshots.

Also, error handling becomes much more straightforward because the system does not have to worry about partial failure (where some operations succeed and some fail, for whatever reason). Consequent retries become much more straightforward, though there are pitfalls in automatic retries.

Databases provide data safety guarantees through a set of properties called ACID.

ACID stands for Atomicity, Consistency, Isolation, and Durability. Any sequence of database operations that satisfies the ACID properties can be called a transaction.

Databases that provide these guarantees are said to be ACID-compliant. In truth, ACID implementations differ subtly among databases. Even when a system claims to be ACID-compliant, it is unclear what requirements are actually being met. It is imperative to dive deeper and understand each property to avoid ambiguity.

Atomicity is the guarantee against partial writes.

A database that supports the Atomicity property guarantees that when multiple objects are updated, either all of them will be persisted to the database or none of them will. Without atomicity, if an error were to occur halfway through a transaction, the system would be left in an intermediate state.

Consistency is the idea that certain statements about application data should always remain true.

Such facts are usually called the application's invariants, and they are part of the application's business logic. Invariants ensure that application data is in a valid state at all times. If all writes preserve data validity, it is guaranteed that invariants will remain satisfied.

But in reality, Consistency is the responsibility of the application, not the database. Apart from certain kinds of invariants like foreign key constraints or uniqueness constraints that could be enforced by the database, what data is valid or invalid is ultimately defined by the application.

Isolation defines data visibility rules under conditions of concurrency.

Isolation is a guarantee against partial reads. It guarantees that transactions don't accidentally interfere with each other. If databases did not provide this guarantee, transactions would erroneously read uncommitted data changes in progress within other transactions, leading to incorrect decisions and data corruption.

Isolation ensures that concurrently executing transactions are isolated from each other. Each transaction is run as if it is the only transaction running on the entire database.

© 2022 Ambitious Systems. All Rights Reserved.