Eventually, It All Comes Together

June 28. 2016 0 Comments

As I mentioned briefly last week, I’m currently working on some maintenance endpoints that allow us to clean up data that has been identified as abandoned in one of our services (specifically, the service backed by a RavenDB database).

The concept at play is not particularly difficult. We have meta information attached to each set of data belonging to a customer, and this meta information allows us to decide whether or not the data (which is supposed to be transitory anyway) should continue to live in the service, or whether it should be annihilated to reduce load.

In building the endpoints that will allow this functionality to be reliably executed, I needed to construct a set of automated tests to verify that the functionality worked as expected with no side-effects. In other words, the normal approach.

Unfortunately, that’s where the challenge mode kicked in.

Like Oil And Water

One of the key concepts in RavenDB is the index.

Indexes are used for every query you do, except for loading a document by ID, finding the set of documents by ID prefix or finding documents by updated timestamp.

Helpfully, RavenDB will create and maintain one or more indexes for you based on your querying profile (automatic indexes), which means that if you don’t mind relinquishing control over the intricacies of your persistence, it will handle all the things for you. Honestly, it does a pretty good job of this, until you do a query that is somewhat different from all the previous queries, and RavenDB creates a new index to handle the query, which results in a massive CPU, memory and disk spike, which craters your performance.

Our inexperience with RavenDB when this service was first built meant that the difference between automatic and static indexes was somewhat lost on us, and the entire service uses nothing but automatic indexes. Being that we don’t allow for arbitrary, user defined queries, this is fine. In fact, unless automatic indexes are somehow at the root of all of our performance problems, we haven’t had any issues with them at all.

A key takeaway about indexes, regardless of whether they are automatic or static, is that they are an eventually consistent representation of the underlying data.

All writes in RavenDB take place in a similar way to writes in most relational databases like SQL Server. Once a write is accepted and completes the data is available straight away.

The difference is that in RavenDB, that data is only available straightaway if you query for it based off the ID, the prefix (which is part of the ID) or the order in which it was updated.

Any other query must go through an index, and indexes are eventually consistent, with an unknown time delay between when the write is completed and the data is available via the index.

The eventually consistent model has benefits. It means that writes can be faster (because writes don’t have to touch every index before they are “finished”), and it also means that queries are quicker, assuming you don’t mind that the results might not be completely up to date.

That’s the major cost or course, you trade-off speed and responsiveness against the ability to get a guaranteed definitive answer at any particular point in time.

Or Maybe Fire And Gasoline?

Eventual consistency is one of those concepts that infects everything it touches. If you have an API that uses a database that is eventually consistent, that API is also eventually consistent. If your API is eventually consistent, your clients are probably eventually consistent. So on and so forth all the way through the entire system.

If you’re used to relying on something being true after doing it, you have to completely rearrange your thinking, because that is no longer guaranteed to be true.

This is especially painful when it comes to tests, which brings us full circle back to where I started this post.

A standard test consists of some sort of setup (arrange), followed by the execution of some operation or set of operations (act), followed by some validation (assert).

In my case, I wanted to write a Functional Test that validated if I added a bunch of data and then executed the maintenance endpoint, some of that data should have been removed (the stuff identified as abandoned) and the rest should still be present. A nice high level test to validate that everything was working together as it should be, to be executed against the API itself (via HTTP calls).

Our functional tests are typically executed against a component deploy via our continuous integration/deployment system. We run Unit and Integration tests after the build, deploy via Octopus to our CI environment, then run Functional tests to validate that all is well. The success of the Functional tests determines whether or not that particular build/package should be allowed to propagate out into our Staging environment and beyond.

The endpoints in play were:

GET /entities
PUT /entities/{id}
GET /admin/maintenance/entities/abandoned
DELETE /admin/maintenance/entities/abandoned

Conceptually, the test would:

Seed a bunch of entities to the API (some meeting the abandoned conditions, some not)
Run a GET on the maintenance endpoint to confirm that the appropriate number of entities were being identified as abandoned
Run a DELETE on the maintenance endpoint, removing a subset of the abandoned entities, returning a response detailing what was removed and how much is left
Run a GET on the maintenance endpoint to validate that its response concurred with the DELETE response
Repeatedly do DELETE/GET calls to the maintenance endpoint to delete all abandoned entities
Run a GET on the entities endpoint to validate that all of entities that should still be present are, and all entities that should have been deleted are

Like I said, a nice, high level test that covers a lot of functionality and validates that the feature is working as it would be used in reality. Combined with a set of Unit and Integration tests, this would provide some assurance that the feature was working correctly.

Unfortunately, it didn’t work.

The Right Answer Is Just A While Loop Away

The reason it didn’t work was that it was written with the wrong mindset. I was expecting to be able to add a bunch of data and have that data be immediately visible.

That’s not possible with an eventually consistent data store like RavenDB.

At first I thought I could just deal with the issue by wrapping everything in a fancy retry loop, to check that it was eventually the value that I wanted it to be. I would be able to use the same test structure that I already had, all I would have to do would be make sure each part of it (GET, DELETE, GET) executed repeatedly until it matched the conditions, up to some maximum number of attempts.

To that end I wrote a simple extension method in the flavour of FluentAssertions that would allow me to check a set of assertions against the return value from a Func in such a way that it would repeatedly execute the Func until the assertions were true or it reached some sort of limit.

using System;
using System.Linq;
using FluentAssertions;
using NSubstitute;
using Serilog;

namespace Api.Tests.Shared.Extensions
{
    public static class FluentAssertionsExtensions
    {
        public static void ShouldEventuallyMeetAssertions<TResult>(this Func<TResult> func, Action<TResult>[] assertions, IRetryStrategy strategy = null)
        {
            strategy = strategy ?? new LinearBackoffWithCappedAttempts(Substitute.For<ILogger>(), 5, TimeSpan.FromSeconds(2));
            var result = strategy.Execute
                (
                    func, 
                    r => 
                    {
                        foreach (var assertion in assertions)
                        {
                            assertion(r);
                        }

                        return true;
                    }
                );

            if (!result.Success)
            {
                throw result.EvaluationErrors.Last();
            }
        }
    }
    
    public interface IRetryStrategy
    {
        RetryResult<TResult> Execute<TResult>(Func<TResult> f, Func<TResult, bool> isValueAcceptable = null);
    }
    
    public class LinearBackoffWithCappedAttempts : IRetryStrategy
    {
        public LinearBackoffWithCappedAttempts(ILogger logger, int maximumNumberOfAttempts, TimeSpan delayBetweenAttempts)
        {
            _logger = logger;
            _maximumNumberOfAttempts = maximumNumberOfAttempts;
            _delayBetweenAttempts = delayBetweenAttempts;
        }

        private readonly ILogger _logger;
        private readonly int _maximumNumberOfAttempts;
        private readonly TimeSpan _delayBetweenAttempts;

        public RetryResult<TResult> Execute<TResult>(Func<TResult> f, Func<TResult, bool> isValueAcceptable = null)
        {
            isValueAcceptable = isValueAcceptable ?? (r => true);
            var result = default(TResult);
            var executionErrors = new List<Exception>();

            var attempts = 0;
            var success = false;
            while (attempts < _maximumNumberOfAttempts && !success)
            {
                try
                {
                    _logger.Debug("Attempting to execute supplied retryable function to get result");
                    result = f();

                    var valueOkay = isValueAcceptable(result);
                    if (!valueOkay) throw new Exception(string.Format("The evaluation of the condition [{0}] over the value [{1}] returned false", isValueAcceptable, result));

                    success = true;
                }
                catch (Exception ex)
                {
                    _logger.Debug(ex, "An error occurred while attempting to execute the retryable function");
                    executionErrors.Add(ex);
                }
                finally
                {
                    attempts++;
                    if (!success) Thread.Sleep(_delayBetweenAttempts);
                }
            }

            var builder = new StringBuilder();
            if (success) builder.Append("Evaluation of function completed successfully");
            else builder.Append("Evaluation of function did NOT complete successfully");

            builder.AppendFormat(" after [{0}/{1}] attempts w. a delay of [{2}] seconds between attempts.", attempts, _maximumNumberOfAttempts, _delayBetweenAttempts.TotalSeconds);
            builder.AppendFormat(" [{0}] total failures.", executionErrors.Count);
            if (executionErrors.Any())
            {
                builder.AppendFormat(" Exception type summary [{0}].", string.Join(", ", executionErrors.Select(a => a.Message)));
            }

            _logger.Information("The retryable function [{function}] was executed as specified. Summary was [{summary}]", f, builder.ToString());

            return new RetryResult<TResult>(success, result, builder.ToString(), executionErrors);
        }
    }

    public class RetryResult<TResult>
    {
        public RetryResult(bool success, TResult result, string summary, List<Exception> evaluationErrors = null)
        {
            Success = success;
            Result = result;
            Summary = summary;
            EvaluationErrors = evaluationErrors;
        }

        public readonly bool Success;
        public readonly TResult Result;
        public readonly List<Exception> EvaluationErrors;
        public readonly string Summary;
    }
}

Usage of the extension method, and some rearranging let me write the test in a way that was now aware of the eventually consistent nature of the API.

When I started thinking about it though, I realised that the approach would only work for idempotent requests, which the DELETE verb on the new endpoint was definitely not.

Repeatedly executing the DELETE was changing the output each time, and depending on whether the stars aligned with regards to the RavenDB indexes, it was unlikely to ever meet the set of assertions that I had defined.

Happiness Comes From Compromise

In the end I had to adapt to the concepts at play, rather than adapting the concepts to fit my desired method of testing.

I could still leverage the eventually consistent assertion extension that I wrote, but I had to slightly rethink what I was trying to test.

I would still have to seed a bunch of entities to the API (some meeting the abandoned conditions, some not)
Instead of doing a single GET, I would have to assert that at least all of my seeded data was eventually present.
Instead of running a single DELETE and checking the result, I would have to loop and delete everything until the service told me nothing was left, and then check that it deleted at least all of the things I seeded
Instead of doing a final GET to verify, I would have to again assert that at least all of my seeded data that was not supposed to be removed was still present

Subtly different, but still providing a lot of assurance that the feature was working as expected. I won’t copy the full test here, because including helper methods to improve readability, it ended up being about 150 lines long.

Conclusion

Sometimes it feels like no matter where I look, RavenDB is there causing me some sort of grief. It has definitely given me an appreciation of the simple nature of traditional databases.

This isn’t necessarily because RavenDB is bad (its not, even though we’ve had issues with it), but because the eventual consistency model in RavenDB (and in many other similar databases) makes testing more difficult than it has to be (and testing is already difficult and time consuming, even if it does provide a world of benefits).

In this particular case, while it did make testing a relatively simple feature more challenging than I expected, it was my own approach to testing that hurt me. I needed to work with the concepts at play rather than to try and bend them to my preferences.

Like I said earlier, working with an eventually consistent model changes how you interact with a service at every level. It is not a decision that should be taken lightly.

Unfortunately I think the selection of RavenDB in our case was more “shiny new technology” than “this is the right fit for our service”, which is definitely less gravitas than it should have been given.