Jonathan Devine on Software: Does Event-Based Data Synchronisation with Microservices Work?

Microservices in conjunction with DDD subscribe to the idea that each microservice should hold its own data store and be the "source of truth" for that information. Problems arise however when other bounded contexts need a copy of that information - how does bounded context Y obtain a copy of bounded context X's data?

One strategy for doing this is event-based data synchronisation. X broadcasts a message to a queue (or topic) for Y to pick up changes to the data structure.

The onus therefore is on Y to interpret this information and store a copy of it in its own database to make use of later on. This seems to have several serious flaws:

1. Complexity of piecing data back together

In simple cases, like taking a copy of country codes, piecing the data back together may not be difficult. On the other hand, if Y needs complex information from X to perform it's function or security policies, things can escalate very quickly. Data like effective dates, join relationships between entities and hierarchies can be very complex and piecing this information back together perfectly is no easy task.

2. Out of order events

Given that scalability is one of the key aims of microservices, it is very likely that multiple message receivers will be called into action to deal with messages put onto a queue. This therefore introduces a race condition and possibilities of messages arriving out of order ie:

* Message 126: Vehicle 123 assigned to Order xyz
* Message 120: Vehicle 123 Created

Here a message for the creation of vehicle 123 is not received until after the message to take some action with that vehicle - ie processing is attempted before it even exists. There are various strategies to deal with this, although none trivial or neat. One potential idea is to simply allow message 126 processing to fail and return to the queue until message 120 has completed, though this is dubious and depends on the specifics of the queue infrastructure.

3. Publication of Bad Messages

Event-Based synchronisation assumes that all messages make sense and will be interpreted successfully by the receiver. However in reality it is very possible that a message with bad information may go out due to a coding bug, or perhaps badly validated user input. In this case how should we handle the bad message? If it goes to a dead letter queue what do we do? Republish it? But other events may have superseded it in the meantime meaning republishing it would corrupt the data further. Alternatively messages could be sequenced (although this may not be reliably possible) and the onus put on the receiving microservice to sort out the mess, which is also non-trivial.

4. Omitted Messages

Given the "I'm a microservice and don't care about anyone else" philosophy, it would be very easy for X to make a change to its domain and forget that other microservices are interested in the domain or resulting data change and make no attempt to publish any necessary events in the new code paths that have been introduced. If this code makes it into production, the result is that Y and possibly dozens or hundreds of other microservices are left with bad data.

How can hundreds of microservices with bad copies of data from X be reset with the correct information? Throwing into the mix that these copies of data from X may be structured slightly differently to X, and slightly differently to each other, meaning each microservice needs a bespoke individual script or strategy to fix their data. Multiple this as well by data sourced from other bounded contexts A, B and C, and the complexity of fixing data in production seems enormous.

Possible Solutions

Context Enrichment

One variation of this pattern suggests that the consuming bounded context receives minimal information from the publisher in a message and calls back into the publisher for any other information necessary. Does this alleviate the problems above?

Problem (1) - possibly. If the subscriber receives messages knowing that record R1 has changed, it can call back to the publisher for a new copy of record R1. This is simpler than if the publisher has several different events relating to different fields of record R1, which would all need separate handlers and update methods. With context enrichment the subscriber can have a single update mechanism, rather than a collection of granular event handlers.

Problem (2) - possibly. If a message related to a vehicle that has not yet been created is received, the receiver could call back and ask for the record related to that vehicle before it attempts any further processing. This case would be ok, however a more subtle out-of-order case may not be so easy for the receiver to know how to compensate for.

Problem (3) - Limited help. If the subscriber assumes the data received is correct it could still make mistaken assumptions (eg and update the wrong record). However messages with less information in them have the advantage of having less scope to be incorrect.

Problem (4) - No. If a message is not received, the subscriber has no realistic opportunity to do anything about it.

Global Read-Only Data Store

This Article describes the idea of application database and integration databases as separate concepts. Application databases are one per bounded context, with each bounded context owning that data, almost as "session" data. The data is replicated similar to CQRS style to the integration database which all contexts can access in a read-only fashion.

This removes the majority of complexity around message synchronisation, message versioning, failing messages and bounded context data that goes out of sync.

On the other hand it introduces database schema coupling, meaning that changes to the integration schema must be carefully managed. For example to change a table structure, it may be necessary to run new and old versions of the table in parallel for a while to allow systems using the old structure to migrate, before the old structure is removed.

Jonathan Devine on Software

Thursday, 20 July 2017

Does Event-Based Data Synchronisation with Microservices Work?

Possible Solutions

No comments:

Post a Comment