Real Meaty Analysis

April 21. 2015 0 Comments

As I’ve already mentioned, I’ve been doing load testing.

Obviously, the follow-up to executing load tests is to analyse the data that comes out of them.

With our new service, we now log many disparate sources of information into a single location, using Elasticsearch, Logstash and Kibana. As the load tests are executed, streams of data come into our log aggregator, ready to be sliced in various ways to see exactly how the service is operating and to find out where the bottlenecks are.

Requesting Assistance

The service leverages Nxlog to ship all IIS and application logs to the log aggregator. This gave me the ability to track average/max request times, error responses and errors occurring inside the application, in real time, as the load test was being run. A side effect of this is that it will also do the same once the service is made publically available.

IIS logs gave me access to fields like sc-status (the returned response code, i.e. 200, 401 or 500), time-taken (the total elapsed time for the request to be completed), request path (i.e. entity/images/1) and bytes uploaded and downloaded (sc-bytes and cs-bytes), along with many others.

Our application logs mostly just gave me access to whether or not unhandled exceptions had occurred (we use Nancy, and our OnError pipeline simply logs the error out while returning a 500) as well as some information about what the service is doing internally with authentication/authorisation.

For the first run of the load test I chose 50 users with no ramp-up, i.e. they would all start working through the test plan all at once. Each user would execute a series of actions that approximated what usage on the service would be like over a day, compressed into a much smaller time period.

I used Kibana to construct a chart that showed the average and maximum request latency (using the time-taken field from IIS) across all requests over time, and another chart that showed the top 5 status codes, also over time.

I had some initial difficulties with using time-taken for numerical aggregations in Kibana. When I originally set up the Nxlog config that shipped the log entries to our log aggregator, I had neglected to type the time-taken field as an integer (its measured in whole milliseconds), so Elasticsearch had inferred the type of the field as string. As you can imagine, strings can’t participate in numerical aggregations. I still don’t quite understand ES mappings, but I had to change the type of time-taken in Nxlog (by casting/parsing to integer in the Exec blog of my input), delete the index in ES and then it correctly inferred the type as a number. When I looked at the JSON being output by Nxlog, the TimeTaken field values were enclosed in quotes, so its likely that’s why ES inferred them as strings.

My first set of results were informative, but they didn’t exactly point me towards the area that needed improvement.

As you can see from the chart above, latency was fairly terrible, even with only 50 users. Looking at the timeline of the test, latency was okay during the periods that didn’t involve images, rose when images were being uploaded and then rose to unacceptable levels when images were being downloaded.

I had standardised all of the images so that they were approximately 1.2MB, so a service that scales well with traffic should show an increase in latency (to account for the time to physically upload and download the images) but it should be consistent with the size of the images.

Looking at the response codes, everything was mostly fine (if slow) until images started to be downloaded, and then 500’s would start to appear. Investigating the 500’s showed that they were primarily caused by timeouts, indicating that part of the system was under so much load that it could no longer service requests within the timeout period. This lined up with the increasing latency of the service.

The timeouts were coming from our underlying database, RavenDB.

Cogs in the Machine

From the IIS and application logs I knew how the service was acting from the point of view of an external party (mostly). What I didn’t know was how the underlying machines were acting, and I didn’t have any eyes on the database underlying the service at all. My first implementation had only aggregated logs from the API instances.

It was easy enough to customize the Nxlog deployment that I built to ship different configurations to difference machines, so it didn’t take very long to start getting logs from the database machine (IIS).

I also needed machine statistics, to identify why the underlying database was unable to service requests in a reasonable amount of time. Ideally these statistics would be available in the same place as the other events, so that I could make correlations between the data sets.

As far as I can tell, on Windows machines if you want to log statistics about the system itself, you use performance counters. You can track just about anything, but for now I limited myself to CPU Utilization, Free Memory and Free Disk Space. I assumed that the database machine was being sitting at 100% CPU or running out of memory and resorting to virtual memory or something.

I created a project similar to the Nxlog one, except focusing on installing a performance counter based on a configuration file, again, deployed though Octopus. It leveraged the command line tool logman to configure and start the performance counter, and simply focused on outputting system statistics to a known log file. I then altered the Nxlog configuration file for both the API and Database instances to read those statistics in and push them out to ES via Logstash, just like the IIS and application logs.

With system statistics now being available, I improved the Kibana dashboard to show more information. I added charts of CPU utilization, free memory and free disk, and slightly altered my request latency charts. Most of the time the requests with the highest latency were those which involved images. The main reason for this is that it takes time to move data around the internet, and all of my test images were 1.2MB, so the time to transfer the raw data was not trivial. In order to better understand the latency of various requests, I needed to differentiate image requests from non-image requests. At the same time I wanted to distinguish different resources from one another, so I could see which resources were slow. I added some custom code to the Nxlog config to parse out resource information from the path and stored the resource identifier as a field along with all of the other data. I used the same approach to identify the requests that dealt with images, and tagged them as such.

I then used this information to mutate my latency chart into two charts. One shows the latency (average, max) of image requests, broken down by Verb (PUT, POST, GET, DELETE) and the other shows latency for each resource in the system (one chart per resource).

At this point it was helpful to define and save some queries in Kibana that immediately narrowed down the total available data to the appropriate subset for charting. We tag our log messages with an application id (something uniquely identifying the system), component id (like API or DB) and source (like IIS or application). I defined some queries for Application + Component:API + Source:IIS + Images/Not Images, which I then used as the inputs for the latency chart visualizations I mentioned above. Its hard to filter inside the visualizations themselves, so I recommend creating and saving queries for that purpose.

My new monitoring dashboard looked like this:

After running another load test, I could clearly see that the images were the slowest part (as expected), and that when images were being downloaded, the performance of every other resource suffered. Alas, all of the machine statistics looked fine at the time of the high latency. None of the machines were showing a bottleneck in any of the areas that I was monitoring. CPU and Free Memory were fine, and yet the requests were taking upwards of 100 seconds (on average) to complete when image downloads were occurring.

Moar Stats

As I could see the timeouts coming from the underlying database in the API logs, I knew that that machine was the issue, I just didn’t know how.

I suspected that the issue was probably disk based (because it was the one thing I wasn’t monitoring) so I added a greater selection of statistics (average disk latency, disk queue length) and re-ran the load tests.

The result was clear. During the period where the load test was downloading images, the latency of the data disk was reaching 800ms per transfer. That's insane! Normal disk latency is sub 30ms.

Finally I had found the obvious bottleneck, but now I had to fix it.

Birds are Weird

Long story short, I tried to improve the performance by making the data disk more awesome (thanks to AWS, this was fairly easy as you can configure the desired amount of IOPS), and making the database machine itself more awesome (again, thanks AWS and its big chunky instance types) but nothing seemed to make much of a difference.

Finally we had to make the hard call and stop storing images in our underlying database. It just didn’t do it well.

We were still using an old version of RavenDB, and were using its attachment feature to store the images. I have to assume we were doing something wrong, because RavenDB is a pretty good product as far as I can see, and I don’t see why it wouldn’t work. I’m honestly not that familiar with RavenDB and the decision to use it was made before I started, so I couldn’t really determine why it wasn’t performing as we expected.

Moving our images to a different storage mechanism was fairly easy, our external API didn’t need to change, and we just moved the image storage into S3.

The next run of the load tests showed average response times under load going down to about 7 seconds, which is about the time it takes to move 1.2MB on our internet connection + some overhead.

The charts still look basically the same, but the numbers are a hell of a lot lower now, so we’ve got that going for us, which is nice.

Summary

Always load test. Always.

We had already guessed before we started load testing that the underlying database would probably be a bottleneck, as there was only one of them, and we had issues adding more in parallel during development. We had assumed that any performance issues with the database could be solved by simply making it more awesome in whatever way it was bottlenecked. Unfortunately, that turned out to be a false assumption, and we had to change the internals of the system in order to scale our performance when it came to image storage.

Without load testing and the analysis of its results via Kibana and its lovely charts, we wouldn’t have been able to identify the scaling issue and resolve it, with real numbers to back up our improvements.

I cannot recommend using a log aggregator enough. Literally you should aggregate all the things. Once you have all of that information at your fingertips you can do all sorts of amazing analysis, which lead to greater insights into how your software is actually running (as opposed to how you think its running).

The Law of JMeter

April 7. 2015 0 Comments

(Get it, like the law of Demeter, except completely different and totally unrelated).

My focus at work recently has been on performance and load testing the service that lies at the heart of our newest piece of functionality. The service is fairly straightforward, acting as a temporary data store and synchronization point between two applications, one installed at the client side and one mobile. Its multi-tenant, so all of the clients share the same service, and there can be multiple mobile devices per client, split by user (where each device is assumed to belong to a specific user, or at least is authenticated as one).

I’ve done some performance testing before, but mostly on desktop applications, checking to see whether or not the app was fast and how much memory it consumed. Nothing formal, just using the performance analysis tools built into Visual Studio to identify slow areas and tune them.

Load testing a service hosted in Amazon is an entirely different beast, and requires different tools.

Enter JMeter.

Yay! Java!

JMeter is well and truly a Java app. You can smell it. I tried to not hold that against it though, and found it to be very functional and easy enough to understand once you get passed that initial (steep) learning curve.

At a high level, there are solution-esque containers, containing one or more Thread Groups. Each Thread Group can model one or more users (threads) that run through a series of configured actions that make up the content of the test. I’ve barely scratched the surface of the application, but you can at least configure loops and counters, specify variables and of course, most importantly, HTTP requests. You can weave all of these things (and more) together to create whatever load test you desire.

It turns out, that’s actually the hard bit. Deciding what the test should, well, test. Ideally you want something that approximates to average expected usage, so you need to do some legwork to work out what that could be. Then, you use the expected usage example and replicate it repeatedly, up to the number of concurrent users you want to evaluate.

This would probably be difficult and time consuming, but luckily JMeter has an extremely useful feature that helps somewhat. A recording proxy.

Configure the proxy on your endpoints (in my case, an installed application on a virtual machine and a mobile device) and start it, and all requests to the internet at either of those places will be helpfully recorded in JMeter, ready to be played back. Thus, rather than trying to manually concoct the set of actions of a typical user, you can just use the software as normal, approximate some real usage and then use those recorded requests to form the baseline of your load test.

This is obviously useful when you have a mostly completed application and someone has said “now load test it” just before release. I don’t actually recommend this, as load and performance testing should be done throughout the development process and performance metrics are set early and measured often. Getting to the end only to discover that your software works great for 1 person, but fails miserably for 10 is an extremely bad situation to be in. Not only that, but throwing all the load and performance testing infrastructure together at the end doesn’t give it enough time to mature, leading to oversights and other nasty side effects. Test early and test often applies to performance as much as functional correctness.

Helpfully, if you have a variable setup (say for the base URL) and you’re recording, any instances of the value in the variable will be replaced by a reference to the variable itself. This saves a huge amount of time going through the recorded requests and changing them to allow for configurability.

The variable substitution is a bit of a double edged sword as I found out though. I had a variable set at the global scope with a value of 10 (it was a loop counter or something) and JMeter dutifully replaced all instances of 10 with references to that variable. Needless to say, that recording session had to be thrown out, and I moved the variable to the appropriate Thread Group level scope shortly after.

Spitting Images

All was not puppies and roses though, as the recording proxy seemed to choke on image uploads. The service deals in images that are attached to other pieces of data, so naturally image uploads are part of its API.

Specifically, when the proxy wasn’t in place, everything worked fine. As soon as I configured the proxy on either endpoint, the images failed to upload. Looking at the request, all of the content that would normally be attached to an image upload was being stripped out. By the time the request got to the service, after having passed through the proxy, the request looked like there should be an image attached, but there was none available.

With the image uploads failing, I couldn’t record a complete picture of interactions with the service, so I had to make some stuff up.

It wasn’t until much later, when I started incorporating image uploads into the test plan manually that I discovered JMeter does not like file uploads as part of a PUT request. It’s fine with POST, but the multi-part support does not seem to work with PUT. I wasn’t necessarily in a position to change either of our applications to force them to use POST just so I could record a load test though, and I think PUT better encapsulates what we are doing (placing content at a known ID), so I just had to live with my artificially constructed image uploads that approximated usage.

Custom Rims

Now that I had a collection of requests defining some nice expected usage I only had one more problem to solve, kind of specific to our API, but I’ll mention it here because the general approach is probably applicable.

Our API uses an Auth Header. Not particularly uncommon, but our Auth Header isn’t as simple as a token obtained from sending the username/password to an auth endpoint, or anything similarly sane. Our Auth Header is an encrypted package containing a set of customer information, which is decrypted on the server side and then validated. It contains user identifying information plus a time stamp, for a validity period.

It needs to be freshly calculated on almost every outgoing request.

My recorded requests stored the Auth Header that they were made with, but of course, the token can’t be reused as time goes on, because it will time out. So I needed to create a fresh Auth Header from the information I had available. Obviously, this being a very custom function (involving AES encryption and some encoding), JMeter can’t help out of the box.

So it was time to extend.

I was impressed with JMeter in this respect. It has a number of ways of entering custom scripts/code when the in-built functionality of JMeter doesn’t allow you to do what you need to do. Originally I was going to use a BeanShell script, but after doing some reading, I discovered that BeanShell can be slow when executed many times (like for example on every single request), so I went with a custom function written in Java.

Its been a long time since I’ve written Java. Its wouldn’t say its bad, but I definitely like (and are far more experienced) with C#. Java generics are weird.

Anyway, once I managed to implement the interface that JMeter supplies for custom functions, it was a simple matter to compile it into a JAR file and include the location of the JAR in the search_path when starting JMeter. JMeter will automatically load all of the custom functions it finds, and you can freely call them just like you would a variable (using the ${} syntax, functions are typically named with __ to distinguish them from variables).

Redistributable Fun

All of the load testing goodness above is encapsulated in a Git repository, as is normal for sane practices

I like my repositories to stand alone when possible (with the notable exception of taking an external dependency on NuGet.org or a private NuGet feed for dependencies), so I wanted to make sure that someone could pull this repository, and just run the load tests with no issues. It should just work.

To that end, I’ve wrapped the running of JMeter with and without a GUI into 2 Powershell scripts, dealing with things like including the appropriate directory containing custom functions, setting memory usage and locating the JRE and JMeter itself.

In the interests of “it should just work”, I also included the Java 1.8 JRE and the latest version of JMeter in the repository, archived. They are dynamically extracted as necessary whenever the scripts that run JMeter are executed.

In the past I’ve shied away from including binaries in my repositories, because it tends to bloat them and make them take longer to pull down. Typically I would use a NuGet package for a dependency, or if one does not exist, create one. I considered doing this for the Java and JMeter dependencies, but it wasn’t really worth the effort at this stage.

You can find a cut down version of the repository (with a very lightweight jmx file of little to no substance) on GitHub, for your perusal.

Summary

Once I got passed the initial distaste of a Java UI and then subsequently got passed the fairly steep learning curve, I was impressed with what JMeter could accomplish with regards to load testing. It can take a bit of reading to really grok the underlying concepts, but once you do, you can use it to do almost anything. I don’t believe I have the greatest understanding of the software, but I was definitely able to use it to build a good load test that I felt put our service under some serious strain.

Of course, now that I had a working load test, I would need a way to interpret and analyse the results. Also, you can only do so much load testing on a single machine before you hit some sort of physical limit, so additional machines need to get involved to really push the service to its breaking point.

Guess what the topic of my next couple of blog posts will be?

What If, Is a Great Question to Ask

January 27. 2015 0 Comments

Like me, I assume you get into a situation sometimes where you want to execute a script, but you definitely don’t want some of its more permanent side effects to happen. Scripts can do all sorts of crazy things, like commit files into git, make changes to the local machine or publish files into production and you definitely don’t want those things to happen when you’re not intending them to.

This becomes even more important when you want to start writing tests for your scripts (or at least their components, like functions). You definitely don’t want the execution of your tests to change something permanently, especially if it changes the behaviour of the system under test, because then the next time you run the test its giving you a different result or executing a different code path. All things you want to avoid to get high quality tests.

In my explorations of the issue, I have come across two solutions. One helps to deal with side effects during tests and the other gives you greater control over your scripts, allowing you to develop them with some security that you are not making unintended changes.

Testing the Waters

I’ve recently started using Pester to test my Powershell functions.

The lack of testing (other than manual testing of course) in my script development process was causing me intense discomfort, coming from a strong background where I always wrote tests (before, after, during) whenever I was developing a feature in C#.

Its definitely improved my ability to write Powershell components quickly and in a robust way, and has improved my ability to refactor, safe in the knowledge that if I mess it up (and in a dynamic scripting language like Powershell with no Visual Studio calibre IDE, you will mess it up) the tests will at least catch the most boneheaded mistakes. Maybe even the more subtle mistakes too, if you write good tests.

Alas, I haven’t quite managed to get a handle on how to accomplish dependency injection in Powershell, but I have some ideas that may turn up in a future blog post. Or they may not, because it might be a terrible idea, only time will tell.

To tie this apparently pointless and unrelated topic back into the blog post, sometimes you need a way to control the results returned from some external call or to make sure that some external call doesn’t actually do anything. Luckily, Powershell being a dynamic language, you can simply overwrite a function definition in the scope of your test. I think.

I had to execute a Powershell script from within TestComplete recently, and was surprised when my trusty calls to write-host (for various informational logging messages) would throw errors when the script was executed via the Powershell object in System.Management.Automation. The behaviour makes perfect sense when you think about it, as that particular implementation of a host environment simply does not provide a mechanism to output anything not through the normal streams. I mention this because it was a problem that I managed to solve (albeit in a hacky way) by providing a blank implementation of write-host in the same session as my script, effectively overriding the implementation that was throwing errors.

Pester provides a mechanism for doing just this, through the use of Mocks.

I’d love to write a detailed example of how to use Mocks in Pester here, but to be honest, I haven’t had the need as of yet (like I said, I’ve only very recently started using Pester). Luckily the Pester wiki is pretty great, so there’s enough information there if you want to have a read.

I’m very familiar with mocking as a concept though, as I use it all the time in my C# tests. I personally am a fan of NSubstitute, but I’ve used Moq as well.

The only point I will make is that without dependency injection advertising what your components dependencies are, you have to be aware of its internal implementation in order to Mock out its dependencies. This makes me a little bit uncomfortable, but still, being able to Mock those dependencies out instead of having them hardcoded is much preferred.

Zhu Li, Do the Thing (But Not the Other Thing)

The second approach that I mentioned is a concept that is built into Powershell, which I have stolen and bastardised for my own personal gain.

A lot of the pre-installed Powershell components allow you to execute the script in –WhatIf mode.

WhatIf mode essentially runs the script as normal, except it doesn’t allow it to actually make any permanent changes. It’s up to the component exactly what it considers to be permanent changes, but its typically things like changing system settings and interacting with the file system. Instead it just writes out messages to whatever appropriate stream stating the action that would have normally occurred. I imagine that depending on how your component is written it might not react well to files it asks to be created or deleted not occurring as expected, but its still an interesting feature all the same.

In my case, I had a build and publish script that changed the AssemblyInfo of a C# project and then committed those changes to git, as well as tagging git with a build number when the publish completed successfully. I had to debug some issues with the script recently, and I wanted to run it without any of those more permanent changes happening.

This is where I leveraged the –WhatIf switch, even though I used it in a slightly different way, and didn’t propagate the switch down to any components being used by my script. Those components were mostly non-powershell, so it wouldn’t have helped anyway (things like git, MSBuild and robocopy).

I used the switch to turn off the various bits that made more permanent changes, and to instead output a message through the host describing the action that would have occurred. I left in the parts that made permanent changes to the file system (i.e. the files output from MSbuild) because those don’t have any impact on the rest of the system.

Of course you still need to test the script as a whole, which is why we have a fully fledged development environment that we can freely publish to as much as we like, but its still nice to execute the script safe in the knowledge that it’s not going to commit something to git.

I’ve found the WhatIf approach to be very effective, but it relies entirely on the author of the script to select the bits that they thing are permanent system changes and distinguish them from ones that are not as permanent (or at least easier to deal with, like creating new files). Without a certain level of analysis and thinking ahead, the approach obviously doesn’t work.

I’ve even considered defaulting WhatIf to on, to ensure that its a conscious effort to make permanent changes, just as a safety mechanism, mostly to protect future me from running the script in a stupid way.

Summary

When programming, its important to be aware of and to limit any side effects of the code that you have written, both for testing and development. The same holds true of scripts. The complication here is that scripts tend to bundle up lots of changes to the system being acted upon as their entire purpose, so you have to be careful with selecting which effects you want to minimise while developing.

In other news, I’ve been working on setting up a walking skeleton for a new service that my team is writing. Walking skeleton is a term referenced in Growing Object Oriented Software, Guided By Tests, and describes writing the smallest piece of functionality possible first, and then ensuring that the entire build and deployment process is created and working before doing anything else.

I suspect I will make a series of blog posts about that particular adventure.

Spoiler alert, it involves AWS and Powershell.

TestComplete is Kind of Amazing

January 20. 2015 0 Comments

I’ve been doing a lot of Test Automation lately. In a weird co-incidence, a lot of swearing and gnashing of teeth as well. Strange that.

One of the automation tools that I’ve been using is TestComplete, and I’ve been pleasantly surprised to find that it doesn’t get in my way nearly as much as I would have expected. In comparison, I’ve had far more problems with the behaviour of the application under test rather than the execution environment, which says a lot for TestComplete’s ability to interact with horrifying VB6/WinForms chimera applications.

Its not all puppies and roses though. TestComplete is very much a tabs and text boxes kind of application and while parts of the tool have some nice Intellisense (more than I would have expected) its certainly no Visual Studio. In addition, binary files make meaningful diffs impossible, its licensing model is draconian at best and the price is insane.

But aside from those gripes, it certainly does get the job done.

When I decided that in order to test a piece of functionality involving the database, I would need to leverage some AWS EC2 images, I was unsurprised to find that it had support for executing Powershell scripts. I seems to be able to do just about anything really.

Well, I suppose support is a strong word.

Scripted Interactions

TestComplete supports a number of scripting languages for programmatically writing tests. There is also a very useful record-replay GUI which actually creates a programmatically accessible map of your application, which can then be used in scripts if you want.

When you create a TestComplete project, you can choose from VBScript, JScript and two other scripting varieties that slip my mind. After you’ve created the project, you don’t seem to be able to change this choice, so make sure you pick one you want to use.

I did not create the project.

The person who did picked JScript, which I have never written before. But at least it wasn't VBScript I suppose.

My understanding of JScript is limited, but I believe it is a Microsoft implementation of some Javascript standard. I have no idea which one, and its certainly not recent.

Anyway, I’m still not entirely sure how it all fits together, but from the JScript code you can access some of the .NET Framework libraries, like System.Management.Automation.

Using this namespace, you can create an object that represents a Powershell process, configure it with some commands, and then execute it.

Hence Powershell support.

I Have the Power

I’ve written about my growing usage of Powershell before, but I’ve found it to be a fantastically powerful tool. It really can do just about anything. In this case, I gravitated towards it because I’d already written a bunch of scripts for initialising AWS EC2 instances, and I wanted to reuse them to decrease the time it would take me to put together this functional test. For this test I would need to create an AMI containing a database in the state that I wanted it to be in, and then spin up instances of that AMI whenever I wanted to run the test.

In the end, as it is with these sorts of things, it wasn’t setting up the EC2 instances that took all of my time. That yak had already been shaved.

The first problem I had to solve was actually being able to run a Powershell script file using the component above.

Anyone who has ever used Powershell is probably very familiar with the Execution Policy of the system in question. Typically this defaults to Restricted, which means you are not allowed to execute scripts.

Luckily you can override this purely for the process you are about to execute, without impacting on the system at large. This, of course, means that you can change the Execution Policy without needing Administrative privileges.

Set-ExecutionPolicy RemoteSigned -Scope Process -Force

The second problem I had to solve was getting results out of the Powershell component in TestComplete. In my case I needed to get the Instance ID that was returned by the script, and then store it in a TestComplete variable for use later on. I’d need to do the same thing with the IP address, obtained after waiting for the instance to be ready.

Waiting for an instance to be ready is actually kind of annoying. There are two things you need to check. The first is whether or not the instance is considered running. The second is whether or not the instance is actually contactable through its network interface. Waiting for an instance to be running only takes about 15-30 seconds. Waiting for an instance to be contactable takes 5-10 minutes depending on how capricious AWS is feeling. As you can imagine, this can make testing your “automatically spin up an instance” script a very frustrating experience. Long feedback loops suck.

When you execute a series of commands through the Powershell component, the return value is a collection of PSObjects. These objects are what would have normally been returned via the output stream. In my case I was returning a single string, so I needed to get the first entry in the collection, and then get its ImmediateBaseObject property. To get the string value I then had to get the OleValue property.

Tying both of the above comments together, here is the collection of functions that I created to launch Powershell from TestComplete.

function CreatePowershellExecutionObject()
{
  var powershell = dotNET.System_Management_Automation.Powershell.Create();
  // This is set before every script because otherwise we can't execute script files,
  // Thats where most of our powershell lives.
  powershell.AddScript("Set-ExecutionPolicy RemoteSigned -Scope Process -Force");
  
  // This redirects the write-host function to nothingness, because otherwise any
  // script executed through this componenent will fail if it has the audacity to
  // write-host. This is because its non-interactive and write-host is like a special
  // host channel thing that needs to be implemented.
  powershell.AddScript("function write-host { }");
  
  return powershell;
}

function InvokePowershellAndThrowIfErrors(powershell)
{
  var result = powershell.Invoke();
  
  if (powershell.HadErrors)
  {
    var firstError = powershell.Streams.Error.Item(0);
    if (firstError.ErrorDetails != null)
    {
      throw new Error(firstError.ErrorDetails);
    }
    else
    {
      throw new Error(firstError.Exception.ToString());
    }
  }
  
  return result;
}

function GetCommonPowershellAwsScriptArguments()
{
  var awsKey = Project.Variables.AwsKey;
  var awsSecret = Project.Variables.AwsSecret;
  var awsRegion = Project.Variables.AwsRegion;
  return "-SuppliedAwsKey " + awsKey + " -SuppliedAwsSecret " + awsSecret + " -SuppliedAwsRegion " + awsRegion + " "
}

function StartAwsVirtualMachineWithFullDatabaseAndSetInstanceId()
{
  var gatewayVersion = GetVersionOfGatewayInstalled();
  var scriptsPath = GetScriptsDirectoryPath();
  var executeInstanceCreationScript = "& \"" + 
    scriptsPath + 
    "\\functional-tests\\create-new-max-size-db-ec2-instance.ps1\" " + 
    GetCommonPowershellAwsScriptArguments() +
    "-BuildIdentifier " + gatewayVersion;

  var powershell = CreatePowershellExecutionObject();
  powershell.AddScript(executeInstanceCreationScript);
  
  Indicator.PushText("Spinning up AWS EC2 Instance containing Test Database");
  var result = InvokePowershellAndThrowIfErrors(powershell);
  Indicator.PopText();
  
  var instanceId = result.Item(0).ImmediateBaseObject.OleValue; 
  
  KeywordTests.WhenADatabaseIsAtMaximumSizeForExpress_ThenTheFilestreamConversionStillWorks.Variables.FullDbEc2InstanceId = instanceId;
}

Notice that I split the instance creation from the waiting for the instance to be ready. This is an optimisation. I create the instance right at the start of the functional test suite, and then execute other tests while that instance is being setup in AWS. By the time I get to it, I don’t have to wait for it to be ready at all. Less useful when testing the database dependent test by itself, but it shaves 6+ minutes off the test suite when run together. Every little bit counts.

Application Shmaplication

Now that the test database was being setup as per my wishes, it was a simple matter to record the actions I wanted to take in the application and make some checkpoints for verification purposes.

Checkpoints in TestComplete are basically asserts. Record a value (of many varied types, ranging from Onscreen Object property values to images) and then check that the application matches those values when it is run.

After recording the test steps, I broke them down into reusable components (as is good practice) and made sure the script was robust in the face of failures and unexpected windows (and other things).

The steps for the executing the test using the application itself were easy enough, thanks to TestComplete.

Tidbits

I did encounter a few other things while setting this test up that I think are worth mentioning.

The first was that the test needed to be able to be run from inside our VPC (Virtual Private Cloud) in AWS as well as from our local development machines. Actually running the test was already a solved problem (solved when I automated the execution of the functional tests late last year), but making a connection to the virtual machine hosting the database in AWS was a new problem.

Our AWS VPC is fairly locked down (for good reason) so machines in there generally can’t access machines in the outside word except over a few authorised channels (HTTP and HTTPS through a proxy for example). Even though the database machine was sitting in the same VPC as the functional tests worker, I had planned to only access it through its public IP address (for simplicity). This wouldn’t work without additional changes to our security model, which would have been a pain (I have no control over those policies).

This meant that in one case I needed to use the Public IP Address of the instance (when it was being run from our development machines) and in the other I needed to use the Private IP Address.

Code to select the correct IP address fit nicely into my Powershell script to wait for the instance to be ready, which already returned an IP address. All I had to do was test the public IP over a specific port and depending on whether or not it worked, return the appropriate value. I did this using the TCPClient class in the .NET framework.

The second thing I had to deal with was a dependency change.

Previously our TestComplete project was not dependent on anything except itself. It could be very easily compressed into a single file and tossed around the internet as necessary. Now that I had added a dependency on a series of Powershell scripts I had to change the execution script for our Functional Tests to pack up and distribute additional directories. Nothing too painful luckily, as it was as simple enough matter to include more things in the compressed archive.

The final problem I ran into was with the application under test.

Part of the automated Functional Test is to open the database. When you ask the application to do that, it will pop up a fairly standard dialog with server & database name.

Being the helpful VB6 application that it is, it also does a search of the network to find any database servers that are available, so you can quickly select them from a dropdown. Being a single threaded application without a great amount of thought being put into user experience, the dialog freezes for a few moments while it does the search.

If you try to find that Window or any of its components using TestComplete while its frozen, the mapping for the dialog box changes from what you would expected (Aliases.Application.DialogName) to something completely different (Process(“ApplicationName”).Window(“DialogName”).Control(“Name”)). Since the test was recorded using the first alias, it then times out when looking for the control it expects, and fails.

I got around this by introducing a delay before even attempting to look for any controls on that dialog.

Depressingly, this is a common solution to solving automation problems like that.

Conclusion

If you were expecting this section to be where I tie everything together and impart on you some TL;DR lessons, then prepare yourself for disappointment.

The one thing I will say though is that setting up good automated Functional Tests takes a hell of a lot of effort. it gets easier as you spend more time doing it, but sometimes I question the benefits. Certainly its an interesting exercise and you learn a lot about the system under test, but the tests you create are usually fragile and overly complex.

Functional Tests that automate an application definitely shouldn’t be your only variety of test that’s for sure.

Regardless of the adventure above, TestComplete is pretty great, and it certainly makes the whole automated testing process much easier than it would be if you used something else.

Like CodedUI.

Which I have done before.

I don’t recommend it.

You’re Mocking Me Aren’t You

January 6. 2015 0 Comments

Object Relational Mappers (ORMs) are useful tools. If you don’t want to have to worry about writing the interactions to a persistence layer yourself, they are generally a good idea. Extremely powerful, they let you focus on describing the data that you want, rather than manually hacking out (and then maintaining!) the queries yourself, and help with change tracking, transactional constructs and other things.

Testing code that uses an ORM, however, is typically a pain. At least in my experience.

People typically respond to this pain by abstracting the usage of their ORM away, by introducing repositories or some other persistence strategy pattern. They use the ORM inside the repository, and then use the more easily mocked repository everywhere else, where it can be substituted with a much smaller amount of effort. There are other benefits to this approach, including the ability to model the domain more accurately (can abstract away the persistence structure) and the ability to switch out the ORM you use for some other persistence strategy without having to make lots of changes. Possibly.

The downside of creating an abstraction like the above is that you lose a lot of ORM specific functionality, which can be quite powerful. One of the most useful feature of ORMs in C# is to be able to write Linq queries directly against the persistence layer. Doing this allows for all sorts of great things, like only selecting the properties you want and farming out as much of the work as possible to the persistence layer (maybe an SQL database), rather than doing primitive queries and putting it all together in memory. If you do want to leverage that power, you are forced to either make your abstraction leaky, exposing bits of the ORM through it (which makes mocking it harder) or you have to write the needed functionality again yourself, except into your interface, which is duplicate work.

Both of those approaches (expose the ORM, write an abstraction layer) have their upsides and downsides so like everything in software it comes down to picking the best solution for your specific environment.

In the past, I’ve advocated creating an abstraction to help isolate the persistence strategy, usually using the Repository pattern. These days I’m not so sure that is the best way to go about it though, as I like to keep my code as simple as possible and the downsides of introducing another layer (with home grown functionality similar to but not quite the same as the ORM) have started to wear on me more and more.

EF You

I’ve recently started working on an application that uses Entity Framework 6, which is a new experience for me, as all of my prior experience with ORM’s was via NHibernate, and to be brutally honest, there wasn’t much of it.

Alas, this application does not have very many tests, which is something that I hate, so I have been trying to introduce tests into the codebase as I add functionality and fix bugs.

I’m going to assume at this point that everyone who has ever done and programming and writes tests has tried to add tests into a codebase after the fact. Its hard. Its really hard. You have to try and resist the urge to rebuild everything and try and find ways to add testability to an architecture that was never intended to be testable without making too many changes or people start to get uncomfortable.

I understand that discomfort. I mean that's one of the biggest reasons you have tests in the first place, so you can make changes without having to worry about breaking stuff. Without those tests, refactoring to introduce tests is viewed as a risky activity, especially when you first start doing it.

Anyway, I wanted to write an integration tests for a particular piece of new functionality, to verify that everything worked end to end. I’ve written about what I consider an integration test before, but in essence it is any test that involves multiple components working together. These sorts of tests are usually executed with as many things configured and setup the same as the actual application, with some difficult or slow components that sit right at the boundaries being substituted. Persistence layers (i.e. databases) are a good thing to substitute, as well as non-local services, because they are slow (compared to in memory) and usually hard to setup or configure.

In my case I needed to find a way to remove the dependency on an external database, as well as a number of services. The services would be easy, because its relatively trivial to introduce an interface to encapsulate the required behaviour from a service, and then provide an implementation just for testing.

The persistence layer though…

This particular application does NOT use the abstraction strategy that I mentioned earlier. It simply exposes the ability to get a DbContext whenever something needs to access to a persistent store.

A for Effort

Being that the application in question used EF6, I thought that it would be easy enough to leverage the Effort library.

Effort provides an in-memory provider for Entity Framework, allowing you to easily switch between whatever your normal provider is (probably SQL Server) for one that runs entirely in memory.

Notice that I said I thought that it would be easy to leverage Effort…

As is always the case with this sort of thing, the devil is truly in the details.

It was easy enough to introduce a factory to create the DbContext that the application used instead of using its constructor. This allowed me to supply a different factory for the tests, one that leveraged Effort’s in-memory provider. You accomplish this by making sure that there is a constructor for the DbContext that takes a DbConnection, and then use Effort to create one of its fancy in-memory connections.

On the first run of the test with the new in-memory provider, I got one of the least helpful errors I have ever encountered:

System.InvalidOperationException occurred: Sequence contains no matching element
  StackTrace:
       at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source, Func`2 predicate)
       at System.Data.Entity.Utilities.DbProviderManifestExtensions.GetStoreTypeFromName(DbProviderManifest providerManifest, String name)     at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.Configure(EdmProperty column, EntityType table, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.<>c__DisplayClass1.<Configure>b__0(Tuple`2 pm)
       at System.Data.Entity.Utilities.IEnumerableExtensions.Each[T](IEnumerable`1 ts, Action`1 action)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.Configure(IEnumerable`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.BinaryPropertyConfiguration.Configure(IEnumerable`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.StructuralTypeConfiguration.ConfigurePropertyMappings(IList`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.ConfigurePropertyMappings(DbDatabaseMapping databaseMapping, EntityType entityType, DbProviderManifest providerManifest, Boolean allowOverride)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.Configure(EntityType entityType, DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.ConfigureEntityTypes(DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.Configure(DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.DbModelBuilder.Build(DbProviderManifest providerManifest, DbProviderInfo providerInfo)
       at System.Data.Entity.DbModelBuilder.Build(DbConnection providerConnection)
       at System.Data.Entity.Internal.LazyInternalContext.CreateModel(LazyInternalContext internalContext)
       at System.Data.Entity.Internal.RetryLazy`2.GetValue(TInput input)

Keep in mind, I got this error when attempting to begin a transaction on the DbContext. So the context had successfully been constructed, but it was doing…something…during the begin transaction that was going wrong. Probably initialization.

After a significant amount of reading, I managed to find some references to the fact that Effort doesn’t support certain SQL Server specific column types. Makes sense in retrospect, although at the time I didn’t even know you could specify provider specific information like that. I assume it was all based around automatic translation between CLR types and the underlying types of the provider.

There is a lot of entities in this application and the all have a large amount of properties. I couldn’t read through all of the classes to find what the problem was, and I didn’t even know exactly what I was looking for. Annoyingly, the error message didn’t say anything about what the actual problem was, as you can see above. So, back to first principles. Take everything out, and start reintroducing things until it breaks.

I turns out quite a lot of the already existing entities were specifying the types (using strings!) in the Column attribute of their properties. The main offenders were the “timestamp” and “money” data types, which Effort did not seem to understand.

Weirdly enough, Effort had no problems with the Timestamp attribute when specified on a property. It was only when the type “timestamp” was specified as a string in the Column attribute that errors occurred.

The issue here was of course that the type was string based, so the only checking that occurred, occurred at run-time. Because I had introduced a completely different provider to the mix, and the code was written assuming SQL Server, it would get to the column type in the initialisation (which is lazy, because it doesn’t happen until you try to use the DbContext) and when there was no matching column type returned by the provider, it would throw the exception above.

Be Specific

Following some advice on the Effort discussion board, I found some code that moved the SQL Server specific column types into their own attributes. These attributes would then only be interrogated when the connection of the DbContext was actually an SQL Server connection. Not the best solution, but it left the current behaviour intact, while allowing me to use an in-memory database for testing purposes.

Here is the attribute, it just stores a column type as a string.

public class SqlColumnTypeAttribute : Attribute
{
    public SqlColumnTypeAttribute(string columnType = null)
    {
        ColumnType = columnType;
    }

    public string ColumnType { get; private set; }
}

Here is the attribute convention, which EF uses to define a rule that will interpret attributes and change the underlying configuration.

public class SqlColumnTypeAttributeConvention : PrimitivePropertyAttributeConfigurationConvention<SqlColumnTypeAttribute>
{
    public override void Apply(ConventionPrimitivePropertyConfiguration configuration, SqlColumnTypeAttribute attribute)
    {
        if (!string.IsNullOrWhiteSpace(attribute.ColumnType))
        {
            configuration.HasColumnType(attribute.ColumnType);
        }
    }
}

Here is a demo DbContext showing how I used the attribute convention. Note that the code only gets executed if the connection is an SqlConnection.

public partial class DemoDbContext : DbContext
{
    public DemoDbContext(DbConnection connection, bool contextOwnsConnection = true)
        : base(connection, contextOwnsConnection)
    {
    
    }
    
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        if (Database.Connection is SqlConnection)
        {
            modelBuilder.Conventions.Add<SqlColumnTypeAttributeConvention>();
        }
    }
}

Finally, here is the attribute being used in an entity. Previously this entity would have simply had a [Column(TypeName = “timestamp”)] attribute on the RowVersion property, which causes issue with Effort.

public partial class Entity
{
    [Key]
    public int Id { get; set; }

    [SqlColumnType("timestamp")]
    [MaxLength(8)]
    [Timestamp]
    public byte[] RowVersion { get; set; }
}

Even though there was a lot of entities with a lot of properties, this was an easy change to make, as I could leverage a regular expression and find and replace.

Of course, it still didn’t work.

I was still getting the same error after making the changes above. I was incredibly confused for a while, until I did a search for “timestamp” and found an instance of the Column attribute where it supplied both the data type and the order. Of course, my regular expression wasn’t smart enough to pick this up, so I had to manually go through and split those two components (Type which Effort didn’t support and Order which it did) manually wherever they occurred. Luckily it was only about 20 places, so it was easy enough to fix.

And then it worked!

No more SQL Server dependency for the integration tests, which means they are now faster and more controlled, with less hard to manage dependencies.

Of course, the trade-off for this is that the integration tests are no longer testing as close to the application as they could be, but that’s why we have functional tests as well, which run through the instaled application, on top of a real SQL Server instance. You can still choose to run the integration tests with an SQL Server connection if you want, but now you can use the much faster and easier to manage in-memory database as well.

Conclusion

Effort is awesome. Apart from the problems caused by using SQL Server specific annotations on common entities, Effort was extremely easy to setup and configure.

I can’t really hold the usage of SQL Server specific types against the original developers though, as I can’t imagine they saw the code ever being run on a non-SQL Server provider. Granted, it would have been nice if they had of isolated the SQL Server specific stuff from the core functionality, but that would have been unnecessary for their needs at the time, so I understand.

The biggest problem I ran into was the incredibly unhelpful error message coming from EF6 with regards to the unsupported types. If the exception had stated what the type was that couldn’t be found and for which property in which class, I wouldn’t have had to go to so much trouble to find out what the actual problem was.

Its never good being confronted with an entirely useless exception message, and we have to always be careful to make sure that our exceptions fully communicate the problem so that they help future developers, instead of just getting in the way.

A little Effort goes a long way after all.