Tests That Aren’t Run Are Worthless

June 2. 2015 0 Comments

We’ve spent a significant amount of effort recently ensuring that our software components are automatically built and deployed. Its not something new, and its certainly something that some of our components already had in place, but nothing was ever made generic enough to reuse. The weak spot in our build/deploy pipeline is definitely tests though. We’ve had a few attempts in the past to get test automation happening as part of the build, and while it has worked on an individual component basis, we’ve never really taken a holistic look at the process and made it easy to apply to a range of components.

I’ve mentioned this before but to me tests fall into 3 categories, Unit, Integration and Functional. Unit tests cover the smallest piece of functionality, usually algorithms or classes with all dependencies stubbed or mocked out. Integration tests cover whether all of the bits are configured to work together properly, and can be used to verify features in a controlled environment. Functional tests cover the application from a feature point of view. For example, functional tests for a web service would be run on it after it is deployed, verifying users can interact with it as expected.

From my point of view, the ideal flow is as follows:

Checkin – Build – Unit and Integration Tests – Deploy (CI) – Functional Tests – Deploy (Staging)

Obviously I’m talking about web components here (sites, services, etc), but you could definitely apply it to any component if you tried hard enough.

The nice part of this flow is that you can do any manual testing/exploration/early integration on the Staging environment, with the guarantee that it will probably not be broken by a bad deploy (because the functional tests will protect against that and prevent the promotion to staging).

Aren’t All Cities Full of Teams

We use Team City as our build platform and Octopus as our deployment platform, and thanks to these components we have the checkin, build and deployment parts of the pipeline pretty much taken care of.

My only issue with these products is that they are so configurable and powerful that people often use them to store complex build/deployment logic. This makes me sad, because that logic belongs as close to the code as possible, ideally in the same repository. I think you should be able to grab a repository and build it, without having the use an external tool to put all the pieces together. Its also an issue if you need to change your build logic, but still allow for older builds (maybe a hotfix branch or something). If you stored your build logic in source control, then this situation just works, because the logic is right there with the code.

So I mostly use Team City to trigger builds and collect history about previous builds (and their output), which it does a fine job at. Extending that thought I use Octopus to manage environments and machines, but all the logic for how to install a component lives in the deployable package (which can be built with minimal fuss from the repository).

I do have to mention that these tools do have elements of change control, and do allow you to version your Build Configurations (TeamCity)/Projects (Octopus). I just prefer that this logic lives with the source, because then the same version is applied to everything.

All of our build and deployment logic lives in source control, right next to the code. There is a single powershell script (unsurprisingly called build.ps1) per repository, acting as the entry point. The build script in each repository is fairly lightweight, leveraging a set of common scripts downloaded from our Nuget server, to avoid duplicating logic.

Team City calls this build script with some appropriate parameters, and it takes care of the rest.

Testy Testy Test Test

Until recently, our generic build script didn’t automatically execute tests, which was an obvious weakness. Being that we are in the process of setting up a brand new service, I thought this would be the ideal time to fix that.

To tie in with the types of tests I mentioned above, we generally have 2 projects that live in the same solution as the main body of code (X.Tests.Unit and X.Tests.Integration, where X is the component name), and then another project that lives in parallel called X.Tests.Functional. The Functional tests project is kind of a new thing that we’re trying out, so is still very much in flux. The other two projects are well accepted at this point, and consistently applied.

Both Unit and Integration tests are written using NUnit. We went with NUnit over MSTEST for reasons that seemed valid at the time, but which I can no longer recall with any level of clarity. I think it might have been something about the support for data driven tests, or the ability to easily execute the tests from the command line? MSTEST offers both of those things though, so I’m honestly not sure. I’m sure we had valid reasons though.

The good thing about NUnit, is that the NUnit Runner is a NuGet package of its own, which fits nicely into our dependency management strategy. We’ve written powershell scripts to manage external components (like Nuget, 7Zip, Octopus Command Line Tools, etc) and the general pattern I’ve been using is to introduce a Functions-Y.ps1 file into our CommonDeploymentScripts package, where Y is the name of the external component. This powershell file contains functions that we need from the external component (for example for Nuget it would be Restore, Install, etc) and also manages downloading the dependent package and getting a reference to the appropriate executable.

This approach has worked fairly well up to this point, so my plan was to use the same pattern for test execution. I’d need to implement functions to download and get a reference to the NUnit runner, as well as expose something to run the tests as appropriate. I didn’t only require a reference to NUnit though, as we also use OpenCover (and ReportGenerator) to get code coverage results when running the NUnit tests. Slightly more complicated, but really just another dependency to manage just like NUnit.

Weirdly Smooth

In a rare twist of fate, I didn’t actually encounter any major issues implementing the functions for running tests. I was surprised, as I always run into some crazy thing that saps my time and will to live. It was nice to have something work as intended, but it was probably primarily because this was a refactor of existing functionality. We already had the script that ran the tests and got the coverage metrics, I was just restructuring it and moving it into a place where it could be easily reused.

I wrote some very rudimentary tests to verify that the automatic downloading of the dependencies was working, and then set to work incorporating the execution of the tests into our build scripts.

function FindAndExecuteNUnitTests
{
    [CmdletBinding()]
    param
    (
        [System.IO.DirectoryInfo]$searchRoot,
        [System.IO.DirectoryInfo]$buildOutput
    )

    Write-Host "##teamcity[blockOpened name='Unit and Integration Tests']"

    if ($rootDirectory -eq $null) { throw "rootDirectory script scoped variable not set. Thats bad, its used to find dependencies." }
    $rootDirectoryPath = $rootDirectory.FullName

    . "$rootDirectoryPath\scripts\common\Functions-Enumerables.ps1"
    . "$rootDirectoryPath\scripts\common\Functions-OpenCover.ps1"

    $testAssemblySearchPredicate = { 
            $_.FullName -like "*release*" -and 
            $_.FullName -notlike "*obj*" -and
            (
                $_.Name -like "*integration*" -or 
                $_.Name -like "*unit*"
            )
        }
    Write-Verbose "Locating test assemblies using predicate [$testAssemblySearchPredicate]."
    $testLibraries = Get-ChildItem -File -Path $srcDirectoryPath -Recurse -Filter "*.Test*.dll" |
        Where $testAssemblySearchPredicate
            
    $failingTestCount = 0
    foreach ($testLibrary in $testLibraries)
    {
        $testSuiteName = $testLibrary.Name
        Write-Host "##teamcity[testSuiteStarted name='$testSuiteName']"
        $result = OpenCover-ExecuteTests $testLibrary
        $failingTestCount += $result.NumberOfFailingTests
        $newResultsPath = "$($buildDirectory.FullName)\$($result.LibraryName).TestResults.xml"
        Copy-Item $result.TestResultsFile "$newResultsPath"
        Write-Host "##teamcity[importData type='nunit' path='$newResultsPath']"

        Copy-Item $result.CoverageResultsDirectory "$($buildDirectory.FullName)\$($result.LibraryName).CodeCoverageReport" -Recurse

        Write-Host "##teamcity[testSuiteFinished name='$testSuiteName']"
    }

    write-host "##teamcity[publishArtifacts '$($buildDirectory.FullName)']"
    Write-Host "##teamcity[blockClosed name='Unit and Integration Tests']"

    if ($failingTestCount -gt 0)
    {
        throw "[$failingTestCount] Failing Tests. Aborting Build."
    }
}

As you can see, its fairly straightforward. After a successful build the source directory is searched for all DLLs with Tests in their name, that also appear in the release directory and are also named with either Unit or Integration. These DLLs are then looped through, and the tests executed on each one (using the OpenCover-ExecuteTests function from the Functions-OpenCover.ps1 file), with the results being added to the build output directory. A record of the number of failing tests is kept and if we get to the end with any failing tests an exception is thrown, which is intended to prevent the deployment of faulty code.

The build script that I extracted the excerpt above from lives inside our CommonDeploymentScripts package, which I have replicated into this Github repository.

I also took this opportunity to write some tests to verify that the build script was working as expected. In order to do that, I had to create a few dummy Visual Studio projects (one for a deployable component via Octopack and another for a simple library component). At the start of each test, these dummy projects are copied to a working directory, and then mutated as necessary in order to provide the appropriate situation that the test needs to verify.

The best example of this is the following test:

Describe {
    Context "When deployable component with failing tests supplied and valid deploy" {
        It "An exception is thrown indicating build failure" {
            $creds = Get-OctopusCredentials

            $testDirectoryPath = Get-UniqueTestWorkingDirectory
            $newSourceDirectoryPath = "$testDirectoryPath\src"
            $newBuildOutputDirectoryPath = "$testDirectoryPath\build-output"

            $referenceDirectoryPath = "$rootDirectoryPath\src\TestDeployableComponent"
            Copy-Item $referenceDirectoryPath $testDirectoryPath -Recurse

            MakeTestsFail $testDirectoryPath
            
            $project = "TEST_DeployableComponent"
            $environment = "CI"
            try
            {
                $result = Build-DeployableComponent -deploy -environment $environment -OctopusServerUrl $creds.Url -OctopusServerApiKey $creds.ApiKey -projects @($project) -DI_sourceDirectory { return $testDirectoryPath } -DI_buildOutputDirectory { return $newBuildOutputDirectoryPath }
            }
            catch 
            {
                $exception = $_
            }

            $exception | Should Not Be $null

            . "$rootDirectoryPath\scripts\common\Functions-OctopusDeploy.ps1"

            $projectRelease = Get-LastReleaseToEnvironment -ProjectName $project -EnvironmentName $environment -OctopusServerUrl $creds.Url -OctopusApiKey $creds.ApiKey
            $projectRelease | Should Not Be $result.VersionInformation.New
        }
    }
}

As you can see, there is a step in this test to make the dummy tests fail. All this does is rewrite one of the classes to return a different value than is expected, but its enough to fail the tests in the solution. By doing this, we can verify that yes a failing does in fact lead to no deployment.

Summary

Nothing that I’ve said or done above is particularly ground-breaking. Its all very familiar to anyone who is doing continuous integration/deployment. Having tests is fantastic, but unless they take part in your build/deploy pipeline they are almost useless. That’s probably a bit harsh, but if you can deploy code without running the tests on it, you will (with the best of intentions no doubt) and that doesn’t lead anywhere good.

Our approach doesn’t leverage the power of TeamCity directly, due to my reluctance to store complex logic there. There are upsides and downsides to this, mostly that you trade off owning the implementation of the test execution against keeping all your logic in one place.

Obviously I prefer the second approach, but your mileage may vary.