Test Databases, Entity Framework and AWS

July 14. 2015 0 Comments

If you’ve ever read any of my previous posts, you would realise that I’m pretty big on testing. From my point of view, any code without automated tests is probably a liability in the long run. Some code doesn’t need tests of course (like prototypes and spikes), but if you intend for your code to be used or maintained (i.e. most of it) you should be writing tests for it.

I’ve blogged before about the 3 different classes of automated tests, Unit, Integration and Functional. I fully admit that the lines between them can get blurred from time to time, but I feel these 3 classifications help to think about tests at different levels, which in turn assists in creating a quality piece of software.

A lot of the work I do in my current job has dependencies on a legacy SQL Server database. Not one gigantic database in a central location, many databases with the same schema, distributed at client locations. Its a bit of a monster, but workable, and we control the schema, so we’ve got that going for us, which is nice.

Quite a few of the older components take a hard dependency on this database. Some of the newer ones use EF at least, which is nice, but others simply use direct SQL execution. Tests have typically been written using an approach I see a lot, use a local database in a known state and write some tests on top of it. Typically this database is specific to a developers machine, and as such the tests will not work inside our CI environment, so they just get marked as [Ignored] (so the build won’t fail) until someone wants to use them again.

Unacceptable.

Why go to all that effort writing a test if you aren’t going to execute it all the time? Tests that aren’t run are worthless after all.

Making it Work

Testing on top of a real SQL server is a fantastic idea and I would classify this sort of test as an Integration test, especially if you’re doing it directly from the classes themselves (instead of the normal entry point for the application). It verifies that the components that you have written (or at least a subset of them) work as expected when you sit them on top of a real database.

The problem comes in automating those sort of tests.

In an ideal world, your test would be responsible for setting up and tearing down all of its dependencies. This would mean it creates a database, fills it with some data, runs the code that it needs to run, verifies the results and then cleans up after itself.

That’s not simple though and looks like a lot of unnecessary work to some developers. They think, why don’t I just create a local database and put some data in it. I can run my test while I write my code, and I can reset my database whenever I want using scripts/restores/etc. These developers don’t realise that tests live just as long as the code that they verify, and if you’re developing for the long run, you need to put that effort in or suffer the consequences later on.

I’ll take a moment to mention that not every piece of code needs to be written for the long run (most do) and that it is possible to have so many tests that it becomes hard to change you code due to the barrier of requiring that you change you tests (which can be a lot of work). As with most things in software, its a balancing act. Just like your code, your tests should be easy to change and maintain, or they will end up causing the same pain that you were trying to avoid in the first place, just in a different place.

In order to facilitate the approach where each test is responsible for its own test data you need to put some infrastructure in place.

You need a common location where the tests can create a database. If your CI environment is in the same network as your developers, this can simply be a machine at a known location with the required software installed. Its a little more complicated if they are in two different networks (our CI is in AWS for example).
You need to have a reusable set of tools for creating, initialising and destroying database resources, to make sure your tests are self contained. These tools must be able to be run in an automated fashion.

Where the Wild Databases Are

The first step in allowing us to create executable database integration tests is to have a database server (specifically MSSQL server) available to both our development environment and our CI environment.

Since our CI is in AWS, the best location for the database is there as well. The main reason for this is that it will be easy enough to expose the database securely to the office, whereas it would be hard to expose resources in the office to AWS.

We can use the Amazon supplied SQL Server 2014 Express AMI as a baseline, and create a CloudFormation template that puts all the bits in place (instance, security groups, host record, etc).

The template is fairly trivial, so I won’t go into detail about it. Its very much the same as any other environment setup I’ve done before (I think the only publicly available example is the JMeter Workers, but that’s a pretty good example).

In order to expose the SQL instance on the canned AMI I had to make some changes to the server itself. It’s easy enough to execute a Powershell script during initialisation (via cfn-init), so I just ran this script to enable TCP/IP, switch to mixed mode and enable and change the password for the sa account. The instance is only accessible via a secure channel (internally in our VPC and via our AWS VPN) so I’m not too concerned about exposing the sa username directly, especially with a unique and secure password. Its only purpose is to hold temporary test data anyway, and it doesn’t have the ability to connect to any other resources, so I’m not particularly worried.

[CmdletBinding()]
param
(
    [Parameter(Mandatory=$true)]
    [string]$SaPassword
)

$ErrorActionPreference = "Stop"

Import-Module sqlps -DisableNameChecking

$localhostname = hostname
$instanceName = "mssqlserver"

$smo = 'Microsoft.SqlServer.Management.Smo.'
$wmi = new-object ($smo + 'Wmi.ManagedComputer').

Write-Verbose "Enabling TCP/IP."
$uri = "ManagedComputer[@Name='$localhostname']/ ServerInstance[@Name='$instanceName']/ServerProtocol[@Name='Tcp']"
$Tcp = $wmi.GetSmoObject($uri)
$Tcp.IsEnabled = $true
$Tcp.Alter()

Write-Verbose "Enabling Mixed Mode Authentication."
$s = new-object Microsoft.SqlServer.Management.Smo.Server($localhostname)

$s.Settings.LoginMode = [Microsoft.SqlServer.Management.SMO.ServerLoginMode]::Mixed
$s.Alter()

Write-Verbose "Restarting [$instanceName] so TCP/IP and Auth changes are applied."
$service = get-service $instanceName
$service.Stop()
$service.WaitForStatus("Stopped")
$service.Start()
$service.WaitForStatus("Running")

Write-Verbose "Editing sa user for remote access."
$SQLUser = $s.Logins | ? {$_.Name -eq "sa"}
$SQLUser.PasswordPolicyEnforced = 0
$SQLUser.Alter()
$SQLUser.Refresh()
$SQLUser.ChangePassword($SaPassword)
$SqlUser.Enable()
$SQLUser.Alter()
$SQLUser.Refresh()

I know the script above isn’t necessarily the neatest way to do the configuration I needed, but its enough for now.

How Are Baby Databases Made?

The next step is having a library that we can re-use to easily make test databases.

We already had some Entity Framework classes and a database context buried deep inside another solution, so I extracted those, put them into their own repository and built a standalone Nuget package from that. It’s not the same code that actually creates a brand new database for a client (that’s a series of SQL scripts embedded in a VB6 application), but its close enough for testing purposes. I hope that eventually we will use EF instead of the hardcoded scripts (leveraging the migrations functionality for database version management), but that’s probably a long way away.

I’d previously completed a small refactor on the EF project in the past, so it already had the concept of a DbContextFactory, so all I had to do was implement a new one that connected to a known SQL server and created a randomly named database. I made it disposable, so that it would destroy the database once it was done.

EF took care of actually creating the database to match the schema defined by the DTO classes (which were already there), so I didn’t have to worry about that too much.

In the code below, the ITestDatabaseConnectionStringFactory implementation is responsible for knowing where the server is and how to connect to it (there’s a few implementations, one takes values from the app.config, one is hardcoded, etc). The INamedDbFactory has a single Create method that returns a derived DbContent, nothing fancy.

using System;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Linq;
using Serilog;

namespace Solavirum.Database.EF.Tests.Common
{
    public class TestDatabaseDbFactory : INamedDbFactory, IDisposable
    {
        public TestDatabaseDbFactory(ILogger logger, ITestDatabaseConnectionStringFactory connFactory)
        {
            _connFactory = connFactory;
            var databaseName = _random.GenerateString(20);
            ConnectionString = _connFactory.Create(databaseName);

            _logger = logger
                .ForContext("databaseName", ConnectionString.DatabaseName)
                .ForContext("databaseHost", ConnectionString.Host);

            _logger.Information("New factory for database {databaseName} on host {databaseHost} created");
        }

        private readonly IRandomTestData _random = new DefaultRandomTestData();
        private readonly ILogger _logger;
        private ITestDatabaseConnectionStringFactory _connFactory;

        public readonly TestDatabaseConnectionString ConnectionString;

        public NamedDb Create()
        {
            _logger.Information("Creating a new NamedDb for database {databaseName}");
            var context = new NamedDb(new SqlConnection(ConnectionString.GetFullConnectionString()));

            if (!context.Database.Exists())
            {
                _logger.Information("This is the first time {databaseName} has been created, creating backing database on host {databaseHost}");
                context.Database.CreateIfNotExists();
            }

            return context;
        }

        public void Cleanup()
        {
            using (var context = new NamedDb(ConnectionString.GetFullConnectionString()))
            {
                _logger.Information("Deleting backing database {databaseName} on host {databaseHost}");
                context.Database.Delete();
            }
        }

        bool _disposed;

        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

        ~TestDatabaseNamedDbFactory()
        {
            Dispose(false);
        }

        protected virtual void Dispose(bool disposing)
        {
            if (_disposed)
                return;

            if (disposing)
            {
                try
                {
                    Cleanup();
                }
                catch (Exception ex)
                {
                    Trace.WriteLine(string.Format("An unexpected error occurred while attempting to clean up the database named [{0}] spawned from this database factory. The database may not have been cleaned up.", ConnectionString.DatabaseName));
                    Trace.WriteLine(ex.ToString());
                }
                
            }

            _disposed = true;
        }
    }

    public interface IRandomTestData
    {
        string GenerateString(int length);
    }

    public class DefaultRandomTestData : IRandomTestData
    {
        private readonly Random _random = new Random();

        public string GenerateString(int length)
        {
            var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
            var result = new string(Enumerable.Repeat(chars, length)
                .Select(s => s[_random.Next(s.Length)])
                .ToArray());

            return result;
        }
    }
}

The Cleanup

Of course by extracting the database classes above into its own repository/package, I had to replace all references to the old classes. It turns out the database component was quite heavily referenced in its original location, so it was a non-trivial amount of work to incorporate it properly.

As is almost always the way when you start doing things correctly after a long period of not doing that, I had to shave a few yaks to make the change happen.

On the upside, as a result of the extraction and the creation of the TestDatabaseFactory, I managed to create reliably executable tests for the actual database component itself, proving that it works when connected to a real SQL server database.

Summary

For me, the takeaway from this activity is that it takes effort to setup good testing infrastructure so that you can easily create tests that can be repeatedly executed. Its not something that just happens, and you need to be willing to accept the fact that you have to go slower to start off with in order to go faster later. Like delayed gratification.

I could have just setup an SQL server in a remote location without automating any of it, but that’s just another case of the same principle. I now have the ability to setup any version of SQL server that I want, or to change it to deploy a custom version (assuming we had an Octopus package to install SQL server, which is more than possible), and I’ve also automated its destruction every night and recreation every morning, allowing us to easily incorporate changes to the base AMI or to alter it in other ways (like a different version of SQL server).

It can be very hard to get some people to understand this. I find the best approach is to just do it (taking into account hard deadlines of course) and to let is prove itself in the field as time goes on, when you are able to make large changes without fear that you’ve broken everything.

I like knowing that I haven’t broken everything personally.

Or even better, knowing that I have.