0 Comments

Accidentally breaking backwards compatibility is an annoying class of problem that can be particularly painful to deal with once it occurs, especially if it gets out into production before you realise.

From my experience, the main issue with this class of problem is that its unlikely to be tested for, increasing the risk that when it does occur, you don’t notice until its dangerously late.

Most of the time, you’ll write tests as you add features to a piece of software, like an API. Up until the point where you release, you don’t really need to think too hard about breaking changes, you’ll just do what is necessary to make sure that your feature is working right now and then move on. You’ll probably implement tests for it, but they will be aimed at making sure its functioning.

The moment you release though, whether it be internally or externally, you’ve agreed to maintain whatever contract you’ve set in place, and changing this contract can have disastrous consequences.

Even if you didn’t mean to.

Looking Backwards To Go Forwards

In my case, I was working with that service that uses RavenDB, upgrading it to have a few new endpoints to cleanup orphaned and abandoned data.

During development I noticed that someone had duplicated model classes from our models library inside the API itself, apparently so they could decorate them with attributes describing how to customise their serialisation into RavenDB.

As is normally the case with this sort of thing, those classes had already started to diverge, and I wanted to nip that problem in the bud before it got any worse.

Using attributes on classes as the means to define behaviour is one of those things that sounds like a good idea at the time, but quickly gets troublesome as soon as you want to apply them to classes that are in another library or that you don’t control. EF is particularly bad in this case, featuring attributes that define information about how the resulting data schema will be constructed, sometimes only relevant to one particular database provider. I much prefer an approach where classes stand on their own and there is something else that provides the meta information normally provided by attributes. The downside of this of course is that you can no longer see everything to do with the class in one place, which is a nicety that the attribute approach provides. I feel like the approach that allows for a solid separation of concerns is much more effective though, even taking into account the increased cognitive load.

I consolidated some of the duplicate classes (can’t fix everything all at once) and ensured that the custom serialisation logic was maintained without using attributes with this snippet of code that initializes the RavenDB document store.

public class RavenDbInitializer
{
    public void Initialize(IDocumentStore store)
    {
        var typesWithCustomSerialization = new[]
        {
            typeof(CustomStruct),
            typeof(CustomStruct?),
            typeof(CustomStringEnumerable), 
            typeof(OtherCustomStringEnumerable)
        };

        store.Conventions.CustomizeJsonSerializer = a => a.Converters.Add(new RavenObjectAsJsonPropertyValue(typesWithCustomSerialization));
        store.ExecuteIndex(new Custom_Index());
    }
}

The RavenObjectAsJsonPropertyValue class allows for the custom serialization of a type as a pure string, assuming that the type supplies a Parse method and a ToString that accurately represents the class. This reduces the complexity of the resulting JSON document and avoids mindlessly writing out all of the types fields and properties (which we may not need if there is a more efficient representation).

After making all of the appropriate changes I ran our suite of unit, integration and functional tests. They all passed with flying colours, so I thought that everything had gone well.

I finished up the changes relating to the new cleanup endpoint (adding appropriate tests), had the code reviewed and then pushed it into master, which pushed it through our CI process and deployed it to our Staging environment, reading for a final test in a production-like environment.

Then the failures began.

Failure Is Not An Option

Both the new endpoint and pre-existing endpoints were failing to execute correctly.

Closer inspection revealed that the failures were occurring whenever a particular object was being read from the underlying RavenDB database and that the errors occurring were related to being unable to deserialize the JSON representation of the object.

As I still remembered making the changes to the custom serialization, it seemed pretty likely that that was the root cause.

It turns out that when I consolidated the duplicate classes, I had forgotten to actually delete the now useless duplicated classes. As a result of not deleting them, a few references to the deprecated objects had sneakily remained, and because I had manually specified which classes should have custom serialization (which were the non-duplicate model classes), the left over classes were just being serialized in the default way (which just dumps all the properties of the object).

But all the tests passed?

The tests passed because I had not actually broken anything from a self contained round trip point of view. Any data generated by the API was able to be read by the same API.

Data generated by previous versions of the API was no longer valid, which was unintentional.

Once I realise my mistake, the fix was easy enough to conceive. Delete the deprecated classes, fix the resulting compile errors and everything would be fine.

Whenever I come across a problem like this though I prefer to put measures in place such that the exact same thing never happens again.

In this case, I needed a test to verify that the currently accepted JSON representation of the entity continued to be able to be deserialized from a RavenDB document store.

With a failing test I could be safe in the knowledge that when it passed I had successfully fixed the problem, and I can have some assurance that the problem will likely not reoccur in the future if someone breaks something in a similar way (and honestly, its probably going to be me).

Testing Is An Option Though

Being that the failure was occurring at the RavenDB level, when previously stored data was being retrieved, I had to find a way to make sure that the old data was accessible during the test. I didn’t want to have to setup and maintain a complete RavenDB instance that just had old data in it, but I couldn’t just use the current models to add data to the database in the normal way, because that would not actually detect the problem that I experienced.

RavenDB is pretty good as far as testing goes though. Its trivially easy to create an embedded data store and while the default way of communicating with the store is to use classes directly (i.e. Store<T>), it gives access to all of the raw database commands as well.

The solution? Create a temporary data store, use the raw database commands to insert a JSON document of the old format into a specific key then use the normal RavenDB session to query the entity that was just inserted via the generic methods.

[Test]
[Category("integration")]
public void WhenAnEntityFromAPreviousVersionOfOfTheApplicationIsInsertedIntoTheRavenDatabaseDirectly_ThatEntityCanBeReadSuccessfully()
{
    using (var resolverFactory = new IntegrationTestResolverFactory())
    {
        var assemblyLocation = Assembly.GetAssembly(this.GetType()).Location;
        var assemblyFolder = Path.GetDirectoryName(assemblyLocation);

        var EntitySamplePath = Path.Combine(assemblyFolder, @"Data\Entity.json");
        var EntityMetadataPath = Path.Combine(assemblyFolder, @"Data\Entity-Metadata.json");

        var Entity = File.ReadAllText(EntitySamplePath);
        var EntityMetadata = File.ReadAllText(EntityMetadataPath);

        var key = "Entity/1";

        var store = resolverFactory.Application().Get<IDocumentStore>();
        store.DatabaseCommands.Put(key, null, RavenJObject.Parse(Entity), RavenJObject.Parse(EntityMetadata));

        using (var session = store.OpenSession())
        {
            var entities = session.Query<Entity>().ToArray();
            entities.Length.Should().Be(1);
        }
    }
}

The test failed, I fixed the code, the test passed and everyone was happy. Mostly me.

Conclusion

I try to make sure that whenever I’m making changes to something I take backwards compatibility into account. This is somewhat challenging when you’re working at the API level and thinking about models, but is downright difficult when breaking changes can occur during things like serialization at the database level. We had a full suite of tests, but they all worked on the current state of the code and didn’t take into account the chronological nature of persisted data.

In this particular case, we caught the problem before it became a real issue thanks to our staging environment accumulating data over time and our excellent tester, but that’s pretty late in the game to be catching these sorts of things.

In the future it may be useful to create a class of tests specifically for backwards compatibility shortly after we do a release, where we take a copy of any persisted data and ensure that the application continues to work with that structure moving into the future.

Honestly though, because of the disconnected nature of tests like that, it will be very hard to make sure that they are done consistently.

Which worries me.