I definitely would not say that I am an expert at load testing web services. At best, I realise how valuable it is in order to validate your architecture and implementation, to help you get a handle on weak or slow areas and fix them before they can become a problem.

One thing I have definitely learned in the last 12 months however, is just how important it is to make sure that your load profile (i.e. your simulation for how you think your system will be loaded) is as close to reality as possible. If you get this wrong, not only will you not be testing your system properly, you will give yourself a false sense of confidence in how it performs when people are using it. This can lead to some pretty serious disasters when you actually do go live everything explodes (literally or figuratively, it doesn’t matter).

Putting together a good load profile is a difficult and time consuming task. You need to make assumptions about expected usage patterns, number of users, quality (and quantity) of date and all sorts of other things. While you’re building this profile, it will feel like you aren’t contributing directly to the software being written (there is code to write!), but believe me, a good load profile is worth it when it comes to validation all sorts of things later on. Like a good test suite, it keeps paying dividends in all sorts of unexpected places.

Such a Tool

It would be remiss of me to talk about load tests and load profiles without mentioning at least one of the tools you can use to accomplish them, as there are quite a few out there. In our organisation we use JMeter, mostly because that’s the first one that we really looked into in any depth, but it helps that it seems to be pretty well accepted in the industry, as there is a lot of information already out there to help you when you’re stuck. Extremely flexible, extendable and deployable, its an excellent tool (though it does have a fairly steep learning curve, and its written in Java so for a .NET person it can feel a little foreign).

Back to the meat of this post though.

As part of the first major piece of work done shortly after I started, my team completed the implementation of a service for supporting the remote access and editing of data that was previously locked to client sites. I made sure that we had some load tests to validate the behaviour of the service when it was actually being used, as opposed to when it was just kind of sitting there, doing nothing. I think it might have been the first time that our part of the organisation had ever designed and implemented load tests for validating performance, so it wasn’t necessarily the most…perfect, of solutions.

The load tests showed a bunch of issues which we dutifully fixed.

When we went into production though, there were so many more issues than we anticipated, especially related to the underlying persistence store (RavenDB, which I have talked about at length recently).

Of course, the question on everyone’s lips at that point was, why didn’t we see those issues ahead of time? Surely that was what the load tests were meant to catch?

The Missing Pieces

There were a number of reasons why our load tests didn’t catch any of the problems that started occurring in production.

The first was that we were still unfamiliar with JMeter when we wrote those tests. This mostly just limited our ability to simulate complex operations (of which there are a few), and made our profile a bit messier than it should have been. It didn’t necessarily cause the weak load tests, but it certainly didn’t help.

The second reason was that the data model used in the service is not overly easy to use. When I say easy to use, I mean that the objects involved are complex (100+KB of JSON) and thus are difficult to create realistic looking random data for. As a result, we took a number of samples and then used those repeatedly in the load tests, substituting values as appropriate to differentiate users from each other. I would say that the inability to easily create realistic looking fake data was definitely high up there on the list as to why the load tests were ineffective in finding the issues we encountered in production.

The third reason why our load tests didn’t do the job, was the actual load profile itself. The simulation for what sort of calls we expected a single user (where user describes more than just one actual person using the system) to make was just not detailed enough. It did not cover enough of the functionality of the server and definitely did not accurately represent reality. This was unfortunate and unexpected, because we spent a significant amount of time attempting to come up with a profile, and we got agreement from a number of different parties that this profile would be good enough for the purposes of testing. The root cause of this one was simply unfamiliarity with the intended usage of the system.

Finally, and what I think is probably the biggest contributor to the ineffectiveness of the load tests, we simply did not run them for long enough. Each load test we did only went for around 48 hours (at the high end) and was focused around finding immediate and obvious performance problems. A lot of the issues that we had in production did not manifest themselves until we’d been live for a week or two. If we had of implemented the load tests sooner, and then started and kept them running on our staging environment for weeks at a time, I imagine that we would have found a lot of the issues that ended up plaguing us.


Of course, there is no point thinking about these sort of things unless you actually make changes the next time you go to do the same sort of task.

So, what did we learn?

  1. Start thinking about the load tests and simulating realistic looking data early. We came into the service I’ve been talking about above pretty late (to clean up someone else’s mess) and we didn’t really get a chance to spend any time on creating realistic looking data. This hurt us when it came time to simulate users.
  2. Think very very hard about your actual load profile. What is a user? What does a user do? Do they do it sequentially or in parallel? Are there other processes going on that might impact performance? Are there things that happen irregularly that you should include in the profile at random? How big is the data? How much does it vary? All of those sorts of questions can be very useful for building a better load profile. Make sure you spend the time to build it properly in whatever tool you are using, such that you can tweak it easily when you go to run it.
  3. To run our load tests early and then for as much time as possible. To us, this means we should run them in an infinite loop on top of our staging environment pretty much as soon as we have them, forever (well, until we’re no longer actively developing that component anyway).

The good thing to come out of the above is that the service we completed did not flop hard enough that we don’t get a second chance. We’re just now developing some other services (to meet similar needs) and we’ve taken all of the lessons above to heart. Our load test profiles are much better and we’ve started incorporating soak tests to pick up issues that only manifest over long periods of time.

At least when it breaks we’ll know sooner, rather than when there are real, paying, customers trying to use it.

I imagine though, that we will probably have to go through this process a few times before we really get a good handle on it.


The last post I made about our adventures with RavenDB outlined the plan, upgrade to RavenDB 3. First step? Take two copies of our production environment, leave one at RavenDB 2.5 and upgrade the other to RavenDB 3. Slam both with our load tests in parallel and then see which one has better performance by comparing the Kibana dashboard for each environment (it shows things like CPU usage, request latency, disk latency, etc).

The hope was that RavenDB 3 would show lower resource usage and better performance all round using approximately the same set of data and for the same requests. This would give me enough confidence to upgrade our production instance and hopefully mitigate some of the issues we’ve been having.

Unfortunately, that’s not what happened.

Upgrade, Upgrade, Upgrade, Upgrade!

Actually upgrading to RavenDB 3 was painless. For RavenDB 2.5 we build a Nuget package that contains all of the necessary binaries and configuration, along with Powershell scripts that setup an IIS website and application pool automatically on deployment. RavenDB 3 works in a very similar way, so all I had to do was re-construct the package so that it worked in the same way except with the newer binaries. It was a little bit fiddly (primarily because of how we constructed the package the first time), but it was relatively easy.

Even better, the number of binaries and dependencies for RavenDB 3 is lower than RavenDB 2.5, which is always nice to see. Overall I think the actual combined sized may have increased, but its still nice to have a smaller number of files to manage.

Once I had the package built, all I had to do was deploy it to the appropriate environment using Octopus Deploy.

I did a simple document count check before and after and everything was fine, exactly the same number of documents was present (all ~100K of them).

Resource usage was nominal during this upgrade and basically non-existent afterwards.

Time to simulate some load.

What a Load

I’ve written previously about our usage of JMeter for load tests, so all I had to do was reuse the structure I already had in place. I recently did some refactoring in the area as well, so it was pretty fresh in my mind (I needed to extract some generic components from the load tests repository so that we could reuse them for other load tests). I set up a couple of JMeter worker environments in AWS and started the load tests.

Knowing what I do now I can see that the load tests that I originally put together don’t actually simulate the real load on the service. This was one of the reasons why our initial, intensive load testing did not find any of the issues with the backend that we found in production. I’d love to revisit the load profiles at some stage, but for now all I really needed was some traffic so that I could compare the different versions of the persistence store.

RavenDB 2.5 continued to do what it always did when the load tests were run. It worked just fine. Minimal memory and CPU usage, disk latency was low, all pretty standard.

RavenDB 3 ate all of the memory on the machine (16GB) over the first 10-15 minutes of the load tests. This caused disk thrashing on the system drive, which in turn annihilated performance and eventually the process crashed and restarted.

Not a good sign.

I’ve done this test a few times now (upgrade to 3, run load tests) and each time it does the same thing. Sometimes after the crash it starts working well (minimal resource usage, good performance), but sometimes even when it comes back from the crash it does the exact same thing again.

Time to call in the experts, i.e. the people that wrote the software.

Help! I Need Somebody

We don’t currently have a support contract with Hibernating Rhinos (the makers of RavenDB). The plan was to upgrade to RavenDB 3 (based on the assumption that its probably a better product), and if our problems persisted, to enter into a support contract for dedicated support.

Luckily, the guys at Hibernating Rhinos are pretty awesome and interact regularly with the community at the RavenDB Google Group.

I put together a massive post describing my current issue (mentioning the history of issues we’ve had to try and give some context), which you can find here.

The RavenDB guys responded pretty quickly (the same day in fact) and asked for some more information (understandably). I re-cloned the environment (to get a clean start) and did it again, except this time I was regularly extracting statistics from RavenDB (using the /stats and /admin/stats endpoints), as well as dumping the memory when it got high (using procdump) and using the export Debug information functionality built into the new Raven Studio (which is so much better than the old studio that it’s not funny). I packaged together all of this information together with the RavenDB log files and posted a response.

While looking through that information, Oren Eini (the CEO/Founder of Hibernating Rhinos) noticed that there were a number of errors reported around not being able to find a Lucene.Net.dll file on the drive where I had placed the database files (we separated the database files from the libraries, the data lives on a large, high throughput volume while the libraries are just on the system drive). I don’t know why that file should be there, or how it should get there, but at least it was progress!

The Battle Continues

Alas, I haven’t managed to return to this particular problem just yet. The urgency has diminished somewhat (the service is generally running a lot better after the latest round of hardware upgrades), and I have been distracted by other things (our Octopus Deploy slowing down our environment provisioning because it is underpowered), so it has fallen to the wayside.

However, I have plans to continue the investigation soon. Once I get to the root of the issue, I will likely make yet another post about RavenDB, hopefully summarising the entire situation and how it was fixed.

Software developers, perpetually hopeful…