Large applications benefit from being schemaless

Data in Relational databases always conforms to one schema for both reading and writing data—all schema changes to be applied to the database and data migrated before introducing application changes.

But there is a class of databases that do not enforce a schema while writing data. Collectively called Schemaless, document and graph databases store data without associating it with a schema.

In truth, schemaless is a misnomer.

Even when the database imposes no schema, there is always a schema implicitly assumed by the application when it reads data. This schema-on-read may be different for different use cases but is specified by the application during the read operation.

This nature of schemaless databases fits well with cases where enforcing one single schema is challenging. For example, when ingesting external data or deriving data from a document, we have no real control over the data format. It is preferable to be free of schema concerns when persisting such data.

Large applications benefit in multiple ways from this schemaless nature.

While it is true that Applications have to make up for the shortcomings of data models, the benefits of going schemaless can outshine the development effort in large applications. A few examples:

Systems that cannot afford downtime either because they are mission-critical or have users spread across the globe with no apparent time slot for maintenance can work with old and new data formats at the same time.
Teams can be agile and deploy multiple times a day, without the headache of migrating the database repeatedly.
Upgrade cycles of client-side applications are at the mercy of the user, so multiple versions will have to work with the same database simultaneously.
Applications can perform a rolling upgrade, where the new version is deployed to a few nodes at a time and checked for performance and stability. If all is well, the version is gradually deployed to all the nodes. This staged rollout further encourages more frequent releases without service downtime.

But there is a common denominator to all these situations.

Code enhancements and data schema changes do not happen simultaneously.

There are many strategies to deploy code with minimal downtime, but data upgrades need special attention. Data in such schemaless databases can be a mixture of data written in old and new data formats.

It is a fair assumption that all permutations of code and data usage will need to be supported.

Newer code should be able to read data written by older code. This backward compatibility is often easy to achieve because the newer code needs to deal with existing data structures.

On the other hand, older code should be able to read data written by newer code. This forward compatibility can be trickier because older code will need to ignore changes introduced by the more recent code.

It is, of course, possible to use one single schema and achieve all of the above. But it will lead to suboptimal performance, complexity in application code, and challenges in deployment. For example, the database over time can become sparsely populated, which will need some kind of optimization like using Sparse columns if supported by the database.

All in all, being schemaless gives advantages to large applications far beyond the database.

In truth, schemaless is a misnomer.

Large applications benefit in multiple ways from this schemaless nature.

Code enhancements and data schema changes do not happen simultaneously.

Related: