State

If you look up on dictionary.com the first two definitions of state are:

the condition of a person or thing, as with respect to circumstances or attributes:
a state of health.
The condition of matter with respect to structure, form, constitution, phase, or the like:
water in a gaseous state.

Software is all about data, which is the state. The state could be your basic data, such as your name, last name, address, phone, email, bank account ID, or any other IDs and information you might have. Buying software has many side effects, many of which are negative. Such data needs to be captured somewhere, usually in a relational database. But it could be in any other data store like a NoSQL Database like Cassandra, a K/V store like Redis, or an object store like S3. Ideally, the state should be contained and even isolated. However, in distributed systems, we often need a distributed state. The state is necessary. However, we might create additional complexity if we don't deal with the state properly.

Monolith

When working with monoliths, we often mistakenly believe that state is not a problem because all states are in the same monolithic shared database. However, having all states in a single database can, in fact, make joins and queries "easier" to some degree.

However, as much as we can scale monoliths, they are not as easy to scale because the reality is that they are full of technical debt. We are coupling concerns we should not be coupling in the first place, so that's why it feels "easy," but in reality, the monolith hides several problems.

Modular Monolith

State trade-offs occur in a monolith; let's imagine the monolith is modular and has internal boundaries like this.

In a proper modular monolith, there is a separation of concerns. Imagine the UI layer has multiple SPAs, let's say S1, S2, and S3, which are decoupled from each other. The backend has modules deployed as a single deployment unit, but there is a separation of concerns, and the modules are called M1, M2, and M3. Finally, even though we use a single shared database cluster, we use schemas, and each module only accesses its own schema, being S1, S2, and S3.

Here, we must decide how much "state" or data is shared across modules. If this state(data) changes, some sort of "sync" must occur. The same trade-off scenario occurs with proper SOA Services and even Microservices.

Trade-offs

We basically have 2 dimensions of trade-offs here in regards to state. The first one if how we will deal with the state itself, it will be:

Usually, a state is mutable(talking about databases), but we can make a state immutable, such as a ledger. No matter if the state is mutable or immutable. When the state changes, we will need to deal with it. That leads us to the second spectrum of trade-offs:

State changes could be dealt with in a centralized fashion (reconciliation processes are one example) or dealt with in a distributed fashion like we see in Event Sourcing (ES).

Reconciliation

Reconciliation is the old-school, traditional way of handling state changes. There are two ways we can do reconciliation; the first one is centralized via databases.

Consider that we have four core systems and states across them. One example is basic user profile data, like name, email, and phone. Because we have distributed systems and, therefore, four different databases, we need to "sync" if the name changes in one system across all four systems. Imagine the user calling the call center and changing his email or name on the payroll system. This is enough to create all sorts of issues.

When we do direct database calls like this (reconciliation), we are creating a distributed monolith by definition, and such architecture is an anti-pattern and highly fragile. It can break very quickly; we just need 1 thing to change in any of the four databases where we have an issue.

Of course, this kind of reconciliation can be 10 times worse if done by every single system, not only a bigger distributed monolith but much more fragile. We can make reconciliation a bit better by using services and having a centralized service like this:

Now, we do not have a distributed monolith, and we are solely relying on APIs. We will need to expose proper APIs on each of the four systems. Someone might say, hold on, what if some of these systems are proprietary, and we don't have a way to build them because we don't have the source code. We could create a Service that exposes the functionality we want. If it is just reading, it's okay, but if it is writing them, we are in more trouble because we risk creating a distributed monolith. Such an approach is still centralized; now, let's see how we can fix this problem with a more distributed approach.

Event Sourcing

We can deal with state changes in the form of events; events are immutable, and we can deal with them in a distributed fashion. Such a solution requires all systems to listen for events in a distributed log like Kafka or an event bus like RabbitMQ.

The cool thing about Event sourcing is that we have an immutable log of all the things happening, and we have auditing for free. We might need to store events in a permanent store like S3 because often distributed logs like Kafka or similar solutions like Kineses have low retention (few days).

You should see event sourcing as a modern form of reconciliation but also in a distributed fashion. Event Sourcing, by nature, is asynchronous and will result in eventual consistency. Which is fine if you know how to do proper UX with Event Sourcing (ES). IMHO, this is one of the best ways to solve the problem. Again, if you have a proprietary system, you can write some connectors that expose data in Kafka.

No Database

Another distributed solution, which, IMHO, is not so well explored, is to just perform HTTP calls. IF you do not store state in your database and always when you need something you call a service, you are in fact, "free" from synchronization issues because there is nothing to be synched.

In this diagram, I'm simulating all services, calling all services just as one example. You can store a state in your database as long as you own that state. However, if you are not the owner, you should always make a call.

Such an approach is standard and works well in telecom with Erlang OTP and Akka(nowadays called Apache Pekko) because they can build cheap and effective modular monoliths. However, with fully distributed systems, HTTP CALLs are a bit more expensive—not super expensive—and can be mitigated with caching. Of course, deciding when and how to invalidate the cache creates state sync issues again.

Services

Finally, we also have another way to solve this problem, which is by using services. Sometimes, we have problems because our boundaries are not appropriately split. That problem can be solved by doing simple things like:

Merging Services: Combine two or more services together
New Services: Creating new services that don't exist but are needed.
Enhancing Existent Services: Adding concepts and functionalities to existing services is another option.

Using services we could fix this problem like this. Remember I mentioned that we might need to change user data? Well, we can centralize that, and by adding new services, we can reduce the number of cross-service calls. Like this:

Here two services was introduced. One called "Product Catalog / Entitlement Service" which knows all existing products we can offer in our sudo architecture / use case here. The Entitlements Services tell what kind of service level the user has alongside feature flags. Finally the core use data is present in Entitlement Service, so state like name, email, phone is there, centralized in one place.

Enrollment Service works with Product Catalog service, since different products have different data requirements and different enrollment process. The good thing about centralized enrollment is that we can re-use processes between different products. Now we have these two new services, we can reduce how much services need to know about each other and even remove the need for storing internal IDs, and we can store one unique ID coming from the Product Catalog / Entitlement Service.

These services are in the center of the universe now; they will be called by all other services, so they need to be very well-tuned and with decent SLO. But if done right, it can help us to mitigate some of the State issues and tradeoffs.

How to make it better?

The non-obvious thing is that. When we buy software, we create a lot of problems. Buying is not for free. Proprietary closed-source systems are tough to deal with; they are hard to integrate, troubleshoot, upgrade, and maintain. When doing build vs buy analysis always consider vendors that have APIs and proper ways to expose data and functionalities. Besides that we can also:

Configure Systems to not mutable some data (disable some functionality) via configuration. So the user cannot change his email.
Properly avoid people(often operations) to mutabe data via process. Functionality is there, but operations teams do not click it. The Ops team won't click on the wrong place.
Consider Design for Immutability: Don't allow the user to change his name or email on the app. Make it a supposed call or admin interface.
Know the difference between Read and Write. Read is less bad than write; when you do write, you are mutating data; it is better to write in a single place; parallel reads often are not the problem; it's how we scale databases and Big Data.
Always leverage SOA and service thinking: consider adding new services, merging, or enhancing existing services.
Be explicit on how state will be managed across all services (Centralized vs Distributed).
Reduce the distributed state as much as possible. The less you need to know about other services, the better; otherwise, you might have some leaking contracts.
Consider also using Aggregator Services. Ensuring nobody calls underlying services is the first step to creating a Facade or Strangler that can be changed later (Refactoring).
Event sourcing is a modern and elegant way to propagate state changes and allow each system to manage them.

Principles of Software Architecture Modernization is full of examples, principles, and techniques on how to deal with monoliths and distributed monoliths at Scale. Continuous Modernization covers the mindset, practices, and shift to better work data with teams dealing with such systems.

Cheers,

Diego Pacheco

Search This Blog

Diego Pacheco Tech blog

State

Monolith

Modular Monolith

Trade-offs

Reconciliation

Event Sourcing

No Database

Services

How to make it better?

Popular posts from this blog

Having fun with Zig Language

C Unit Testing with Check

Cool Retro Terminal