Smaller Aggregates Are Better · Ambitious Systems

I love Lego. I enjoy forming a structure first in my head - at least the outlines - before I start bringing the form to life with simple building blocks.

If you have ever built a lego structure, you would know that it is pretty tricky to think through an entire assembly, brick by brick. It is far easier to visualize distinct blocks of the structure and imagine how they fit into each other.

Aggregates are such building blocks in Domain-driven Design - they bring related elements of a Domain together to represent a wholesome concept. Properties become data attributes of the Aggregate, while its behavior models business rules and workflows.

To model the domain well, we need to ensure that Aggregates are of the right size - not too big, not too small. But the size of an Aggregate usually emerges over time from the domain model, creating a Chicken-and-Egg situation.

You need to start somewhere and iterate.

Smaller Aggregates are a good starting point.

You would think it would be easier to naturally bundle things together and fragment as you grow and learn. But that's a sure-shot way to end up with a Big Ball of Mud. By the time you get to refactoring the system, it will be a big, ugly monster waiting to eat you up.

Here are some ideas on why Smaller Aggregates are better in general.

Aggregates are Cohesive Wholes.

Aggregates represent conceptually complete domain concepts. They need to be loaded in entirety into memory to guarantee that all Business rules are valid and satisfied. As a consequence, the smaller the Aggregate, the faster it loads.

Consider an Account Aggregate with an embedded Transaction Entity. If the domain deals with Mortgages, the number of transactions under each account will be fewer - amounts released and installments paid. The design works fine - there is no performance cost to load an Account with all its transactions.

If the application is a Personal Banking Application and Account we are dealing with is a Checking Account, and assuming you will want to maintain the banking relationship for a long time, the number of transactions will grow. Loading all of them will mean creating hundreds of Transaction objects in memory.

It then makes sense to manage Transactions as a separate Aggregate, linked to an Account through an ACCOUNT_ID. The Account Aggregate remains small, even with growing transactions over time.

Aggregates are Transaction Boundaries.

Aggregates persist data changes to all Entities enclosed within them atomically. We need to ensure that persistence mechanisms guarantee this behavior.

If the application uses a database that supports Atomic Transactions (like an RDBMS), the database locks records for the duration of an update command. If the Aggregate is too large, the chances of deadlocks are higher because multiple processes are updating the same data records.

With databases that don't support transactional guarantees, Aggregate Versioning can help manage atomicity.

An Aggregate's version is always in sync between itself and all its entities. Any inconsistency simply means that the Aggregate data is not reliable.

Having large Aggregates will mean that records can be overwritten accidentally, because of multiple updates happening in a small time interval - between checking the version number and persisting updates to the database.

Either way, having fewer entities under an Aggregate translates to better guarantees of atomicity.

An example of a Warehouse Management System can help clarify further. If you were to make the Warehouse an Aggregate, then there would be simply too many conflicting entries updating the status of items in the warehouse. It would be much better to use a Rack or a Shelf as an Aggregate and keep the number of updates small, thereby reducing deadlocks.

Smaller Aggregates are more natural to understand and reason.

Aggregates are Modeled Processes. As with any Model, they can have different representations:

The one in Reality explains Business processes, domain users, and interaction with the external world.
The one in Mind is an understanding of the model expressed succinctly to capture on paper or verbally.
The one in Code, a programmatic translation of the model to code and data objects.
The one in Memory is an instance of code running in a computational environment with a copy of the codebase.

These models, working together, close the loop with reality.

While a developer needs to have the same model as the business user, there are bound to be some differences. And the model that is in the developer's mind is what eventually turns into code.

Think about what somebody would understand when they hear Account. Is that a Bank Account? Or a User Account that holds the profile information? Or is it just used for Authentication purposes? The worst possible outcome would be to group all these aspects under one umbrella (which could happen if one is building a Banking Application). It would be better to isolate and locate them in the right Bounded Context (Authentication, Profile Management, Banking) to keep the concepts plain and evident.

The easier an Aggregate is to understand and capture conceptually, the better it translates into code. Also, when developers understand the domain's data and behavior better, the application tends to have fewer bugs because of unexpected edge cases.

Aggregates guarantee data sanctity.

Aggregates form distinct boundaries within a Bounded Context. All Business Invariants are satisfied at all times in the context. The primary responsibility of this task falls on Aggregates.

Aggregates can provide this guarantee by acting as gateways to data updates. They typically expose methods that mirror a user's intention to alter the system.

Smaller Aggregates tend to have fewer business invariants to validate, so they are easy to maintain and manage over a Product's long life cycle.

Shipping is a domain that can potentially have a lot of business constraints. A simple task of deciding whether a package can go into a specific container can have a wide range of rules like the kind of bag, nature of contents, other items already in the container, route of travel, the client sending the package, discounts, prices, and more.

There are simply too many invariants to satisfy.

Aggregates are responsible for a single, distinct part of the domain.

In line with the Single Responsibility Principle, Aggregates should perform one unique task in the system, and one only. Domain Rules tend to get complicated when too many concerns combine.

A Large Aggregate is a good indicator that the Aggregate handles too many things or addresses too many concerns. The problem is even more devious because Aggregates tend to grow in size over time, as applications grow in complexity and become mature.

Order Management, a domain popularly used to explain DDD concepts, is an excellent example of how an Aggregate can grow out of control. Most implementations of this domain would have an Order Aggregate at the heart of the system. Order is a prime Aggregate in the system. Still, unless you create other Aggregates to support it, it may become responsible for - amongst many other things - accumulating orders, initiating payments, sending confirmation emails, and dispatching ordered items.

A heuristic of sticking with smaller Aggregates helps discover potential technical debts in the system. Acting on these debts ensures that the application remains maintainable over the long term, with continuous refactoring.

Aggregates need to be 100% testable.

Complex domains tend to have a ton of conflicting and mutually exclusive rules. Ensuring these rules are not broken or violated even as the application grows is a big challenge. It is almost impossible without some automated testing process running the background. Usually bundled under the Continuous Integration process, these tests ensure that a change to the domain does not break past functionality.

Most DDD practitioners use Behavior-Driven Development (also known as Specification by Example) methods to capture testable requirements, which can be understood both by Developers as well as Business Users. Discrete test cases in the application represent use cases in the domain.

In such a well-tested system, large and complex Aggregates become liabilities. They are harder to test for behavior and data consistencies. Without keeping the complexity low, the test harness itself can grow in complexity and become increasingly unmaintainable.

So should you always keep Aggregates small?

Preferably, and whenever you can. But it is crucial to understand that Smaller Aggregates serve as a Heuristic for the system's overall health and are not an end goal in themselves. It is more important to model as closely as possible to the domain in reality, than to break Aggregates into smaller chunks artificially.

If a large Aggregate serves your purpose well, requires less maintenance, and protects data sanctity while not facing too many updates, it is perfectly fine to stick with it. It is better to grow into this situation than to start with it on day one.

Start with one single Entity in an Aggregate.

Such an entity would also double up as the Aggregate Root. You inevitably make the Aggregate simple to understand, have a smaller memory footprint, and always valid. Did I mention that it also keeps your domain sane and straightforward?

It is also easier to start small and grow over time than start big and become granular over time, just like building lego structures.

Fixing a problem in a giant Lego structure is a nightmare.

If you misplaced one single brick in a giant Lego model, going back and figuring out what you missed can be a time suck. A fix is even possible only by viewing the Lego structure as distinct pieces. You can then go about debugging each segment to identify the problem.

Keeping Aggregates small helps you do the same in your domain.