Isolating Danger

January 26. 2016 0 Comments

When building a series of services to allow clients to access their own (previously office locked) data over the greater internet, there are a number of considerations to be made.

The old way was simple. There is a database. Stuff is in the database. When you want stuff, access the database. As long as the database in one office was powerful enough for the users in that office, you would be fine.

Moving all of that information into the cloud though…

Now everyone needs to access all their stuff at the same time. Now efficiency and isolation matters.

Well technically it mattered before as well, just not as much to the people who came before me.

I’m going to be talking about two things briefly in this post.

The first is isolating our upload and synchronization process from the actual service that needs to be queried.

The second is isolating binary data from all other requests.

Data Coming Right Up

In order to grant remote access to previously on-premises locked data we need to get that data out somehow. Unfortunately, for this system, the source of truth needs to stay on-premises for a number of different reasons that I won’t go into in too much detail. What we’re focusing on is allowing authenticated read-only access to the data from external systems.

The simplest solution to this is to have a replica of the data available in the cloud, and use that data for all incoming remote requests. Obviously this isn’t perfect (its an eventually consistent model), but because its read-only and we have some allowances for data latency (i.e. its okay if a mobile application doesn’t see exactly what is in the on-premises data the moment that it’s changed).

Of course, all of this data constantly being uploaded can cause a considerable amount of strain on the system as a whole, so we need to make sure that if there is a surge in the quantity of synchronization requests that the service responding to queries (get me the last 100 X entities) is not negatively impacted.

Easiest solution? Simply separate the two services, and share the data via a common store of some sort (our initial implementation will have a database).

With this model we gain some protection from load on on side impacting the other.

Its not perfect mind you, but the early separation gives us a lot of power moving forward if we need to change. For example, we could queue all synchronization requests to the sync service fairly easily, or split the shared database into a master and a number of read replicas. We don’t know if we’ll have a problem or what the solution to that problem will be, the important part is that we’ve isolated the potential danger, allowing for future change without as much effort.

10 Types of People

The system that we are constructing involves a moderate amount of binary data. I say moderate, but in reality, for most people who have large databases on premises, a good percentage of that data is binary data in various forms. Mostly images, but there are a lot of documents of various types as well (ranging from small and efficient PDF files to monstrous Word document abominations with embedded Excel spreadsheets).

Binary data is relatively problematic for a web service.

If you grant access to the binary data from a service, every request ties up one of your possible request handlers (whether it be threads, pseudo-threads or various other mechanisms of parallelism). This leaves less resources available for your other requests (data queries), which can make things difficult in the long run as the total number of binary data requests in flight at any particular moment slowly rises.

If you host the data outside the main service, you have to deal with the complexity of owning something else and making sure that it is secure (raw S3 would be ideal here, but then securing it is a pain).

In our case, our plan is to go with another service, purely for binary data. This allows us to leverage our existing authentication framework (so at least everything is secure), and allows us to use our existing logging tools to track access.

The benefits of isolating the access to binary data like this is that if there is a sudden influx of requests for images or documents or something, only that part of the system will be impacted (assuming there are no other shared components). Queries to get normal data will still complete in a timely fashion, and assuming we have written our integration well, some retry strategy will ensure that the binary data is delivered appropriately once the service resumes normal operation.

Summary

It is important to consider isolation concerns like I have outlined above when you are designing the architecture of a system.

You don’t necessarily have to implement all of your considerations straight away, but you at least need to know where your flex areas and where you can make changes without having to rewrite the entire thing. Understand how and when your architecture could adapt to potential changes, but don’t build it until you need it.

In our case, we also have a gateway/router sitting in front of everything, so we can remap URLs as we see fit moving into the future. In the case of the designs I’ve outlined above, they come from past (painful) experience. We’ve already encountered those issues in the past, while implementing similar systems, so we decided to go straight to the design that caters for them, rather than implement something we know would have problems down the track.

Its this sort of learning from your prior experiences that really makes a difference to the viability of an architecture in the long run.