That Syncing Feeling

June 26. 2018 0 Comments

As I’ve written about a bunch of times before, we have a data synchronization process for freeing customer data from restrictive confines of their office. We built it to allow the user to get at their valuable information when they are on the road, as our users tend to be, as something of a stopgap while we continue to develop and improve our cloud platform.

A fantastic side effect of having the data is that we can make it extremely easy for our customers to migrate from our legacy product to the new cloud platform, because we’ve already dealt with the step of getting their information into a place where we can easily access it. There are certainly challenges inherent in this approach, but anything we can do to make it easier to move from one system to another is going to make our customers happier when they do decide to switch.

And happy customers are the best customers.

As is always the case with software though, while the new cloud platform is pretty amazing, its still competing with a piece of software that’s been actively developed for almost 20 years, and not everyone wants to make the plunge just yet.

In the meanwhile, it would be hugely beneficial to let our customers use at least some part of the new system that we’ve been developing both because we honestly believe its a simpler and more focused experience and because the more people that use the new system, the more feedback we get, and that equals a better product for everyone.

Without a link between the two systems though, its all very pointless. Customers aren’t going to want to maintain two sets of identical information, and rightly so.

But what if we did it for them?

Data…Remember Who You Were

We already have a decent automated migration process. Its not perfect, but it ticks a lot of boxes with regards to bringing across the core entities that our customers rely on on a day to day basis.

All we need to do is execute that migration process on a regular basis, following some sort of cadence and acceptable latency that we figure out with the help of some test users.

This is all very good in theory, but the current migration process results in a brand new cloud account every time its executed on a customers data, which is pretty unusable for if we’re trying to make the customers lives easier. Ideally we want to re-execute the migration, but targeting an existing account. New entities get created, existing entities get updated and deleted entities get deleted.

Its basically another sync algorithm, except this time its more complicated. The current on-premises to remote sync is a relatively simple table by table, row by row process, because both sides need to look identical at the end. This time though, transforms occur, entities are broken down (or consolidated) and generally there is no one-to-one relationship that’s easy to follow.

On the upside, the same transformations are executed for both the initial migration and the update, so we can deal with the complexity somewhat by simply remembering the outcome of the previous process (actually all previous processes), and using that to identify what the destination was for all previously migrated entities.

What we’re left with is an update process that can be executed on top of any previous migration, which will:

Identify any previously migrated entities by their ID’s, and perform appropriate update or replacement operations
Identify new entities and perform appropriate creation operations, storing the ID mapping (i.e. original to new) in a place where it can be retrieved later

But what about deletions?

Well, at this point in time we’re just prototyping, so we kind of just swept that under the rug.

I’m sure it will be easy.

I mean, look at how easy deletions were to handle in the original data synchronization algorithm.

Its All Just Too Much

Unfortunately, while the prototype I described above works perfectly fine to demonstrate that an account in the cloud system can be kept up to date with data from the legacy product, we can’t give it to a customer yet because every migration “update” is a full migration of all of the customers data, not just the stuff that’s changed

Incredibly wasteful.

For our test data set, this isn’t really a problem. A migration only takes a minute at most, usually less, and its only really moving tends of entities around.

For a relatively average customer though, a migration takes like 45 minutes and moves thousands to tens of thousands of entities. Its never really been a huge problem before, because we only did it once or twice while moving them to the new cloud platform, but it quickly becomes unmanageable if you want to do it in some sort of constant loop.

Additionally, we can’t easily use a dumb timer to automate the process or we’ll get into problems with re-entrancy, where two migrations for the same data are happening at the same time. Taking that into account though, even if we started the next migration the moment the previous one finished (in some sort of eternal chain), we’re still looking at a total latency that is dependent on the size of the customers complete data set, not just the stuff they are actively working with. Also, we’d be using resources pointlessly, which is a scaling nightmare.

We should only be acting on a delta, using only the things that have changed since the last “update”.

A solvable problem, but it does add to the overall complexity, as we can’t fall back on the row based versioning that we used for the core data synchronization algorithm because of the transforms, so we have to create some sort of aggregate version from the constituent elements and store that for later use.

Its Time To Move On

Even if we do limit our migration updates to only the things that have changed since last time, we still have a problem if we’re just blindly running the process on a timer (or eternal chain). If there are no changes, why do a bunch of work for nothing?

Instead, we should be able to reactto the changes as the existing synchronization system detects them.

To be clear, the existing process is also timer based, so its not perfect. It would be nice to not add another timer to the entire thing though.

With some effort we should be able to produce a stream of information from our current sync API that describes the changes being made. We can then consume that stream in order to optimise the migration update process, focusing only on those entities that have changed, and more importantly, only doing work when changes actually occur.

Conclusion

Ignoring the limitations for a moment, the good news is that the prototype was enough to prove to interested stakeholders that the approach has merit. At least enough merit to warrant a second prototype, under the assumption that the second prototype will alleviate some of the limitations and we can get it in front of some real customers for feedback.

I have no doubt that there will be some usability issues that come out during the user testing, but that’s to be expected when we do something as lean as this. Whether or not those usability issues are enough to cause us to run away from the idea altogether is the different question, but we won’t know until we try.

Even if the idea of using two relatively disparate systems on a day to day basis is unpalatable, we still get some nice things from being able to silently migrate our customers into our cloud platform, and keep that information up to date

I mean, what better way is there to demo a new piece of software to a potential customer than to do it with their own data?