I definitely would not say that I am an expert at load testing web services. At best, I realise how valuable it is in order to validate your architecture and implementation, to help you get a handle on weak or slow areas and fix them before they can become a problem.
One thing I have definitely learned in the last 12 months however, is just how important it is to make sure that your load profile (i.e. your simulation for how you think your system will be loaded) is as close to reality as possible. If you get this wrong, not only will you not be testing your system properly, you will give yourself a false sense of confidence in how it performs when people are using it. This can lead to some pretty serious disasters when you actually do go live everything explodes (literally or figuratively, it doesn’t matter).
Putting together a good load profile is a difficult and time consuming task. You need to make assumptions about expected usage patterns, number of users, quality (and quantity) of date and all sorts of other things. While you’re building this profile, it will feel like you aren’t contributing directly to the software being written (there is code to write!), but believe me, a good load profile is worth it when it comes to validation all sorts of things later on. Like a good test suite, it keeps paying dividends in all sorts of unexpected places.
Such a Tool
It would be remiss of me to talk about load tests and load profiles without mentioning at least one of the tools you can use to accomplish them, as there are quite a few out there. In our organisation we use JMeter, mostly because that’s the first one that we really looked into in any depth, but it helps that it seems to be pretty well accepted in the industry, as there is a lot of information already out there to help you when you’re stuck. Extremely flexible, extendable and deployable, its an excellent tool (though it does have a fairly steep learning curve, and its written in Java so for a .NET person it can feel a little foreign).
Back to the meat of this post though.
As part of the first major piece of work done shortly after I started, my team completed the implementation of a service for supporting the remote access and editing of data that was previously locked to client sites. I made sure that we had some load tests to validate the behaviour of the service when it was actually being used, as opposed to when it was just kind of sitting there, doing nothing. I think it might have been the first time that our part of the organisation had ever designed and implemented load tests for validating performance, so it wasn’t necessarily the most…perfect, of solutions.
The load tests showed a bunch of issues which we dutifully fixed.
When we went into production though, there were so many more issues than we anticipated, especially related to the underlying persistence store (RavenDB, which I have talked about at length recently).
Of course, the question on everyone’s lips at that point was, why didn’t we see those issues ahead of time? Surely that was what the load tests were meant to catch?
The Missing Pieces
There were a number of reasons why our load tests didn’t catch any of the problems that started occurring in production.
The first was that we were still unfamiliar with JMeter when we wrote those tests. This mostly just limited our ability to simulate complex operations (of which there are a few), and made our profile a bit messier than it should have been. It didn’t necessarily cause the weak load tests, but it certainly didn’t help.
The second reason was that the data model used in the service is not overly easy to use. When I say easy to use, I mean that the objects involved are complex (100+KB of JSON) and thus are difficult to create realistic looking random data for. As a result, we took a number of samples and then used those repeatedly in the load tests, substituting values as appropriate to differentiate users from each other. I would say that the inability to easily create realistic looking fake data was definitely high up there on the list as to why the load tests were ineffective in finding the issues we encountered in production.
The third reason why our load tests didn’t do the job, was the actual load profile itself. The simulation for what sort of calls we expected a single user (where user describes more than just one actual person using the system) to make was just not detailed enough. It did not cover enough of the functionality of the server and definitely did not accurately represent reality. This was unfortunate and unexpected, because we spent a significant amount of time attempting to come up with a profile, and we got agreement from a number of different parties that this profile would be good enough for the purposes of testing. The root cause of this one was simply unfamiliarity with the intended usage of the system.
Finally, and what I think is probably the biggest contributor to the ineffectiveness of the load tests, we simply did not run them for long enough. Each load test we did only went for around 48 hours (at the high end) and was focused around finding immediate and obvious performance problems. A lot of the issues that we had in production did not manifest themselves until we’d been live for a week or two. If we had of implemented the load tests sooner, and then started and kept them running on our staging environment for weeks at a time, I imagine that we would have found a lot of the issues that ended up plaguing us.
Of course, there is no point thinking about these sort of things unless you actually make changes the next time you go to do the same sort of task.
So, what did we learn?
- Start thinking about the load tests and simulating realistic looking data early. We came into the service I’ve been talking about above pretty late (to clean up someone else’s mess) and we didn’t really get a chance to spend any time on creating realistic looking data. This hurt us when it came time to simulate users.
- Think very very hard about your actual load profile. What is a user? What does a user do? Do they do it sequentially or in parallel? Are there other processes going on that might impact performance? Are there things that happen irregularly that you should include in the profile at random? How big is the data? How much does it vary? All of those sorts of questions can be very useful for building a better load profile. Make sure you spend the time to build it properly in whatever tool you are using, such that you can tweak it easily when you go to run it.
- To run our load tests early and then for as much time as possible. To us, this means we should run them in an infinite loop on top of our staging environment pretty much as soon as we have them, forever (well, until we’re no longer actively developing that component anyway).
The good thing to come out of the above is that the service we completed did not flop hard enough that we don’t get a second chance. We’re just now developing some other services (to meet similar needs) and we’ve taken all of the lessons above to heart. Our load test profiles are much better and we’ve started incorporating soak tests to pick up issues that only manifest over long periods of time.
At least when it breaks we’ll know sooner, rather than when there are real, paying, customers trying to use it.
I imagine though, that we will probably have to go through this process a few times before we really get a good handle on it.