0 Comments

It makes me highly uncomfortable if someone suggests that I support a piece of software without an automated build process.

In my opinion its one of the cornerstones on top of which software delivery is built. It just makes everything that comes afterwards easier, and enables you to think and plan at a much higher level, allowing you to worry about much more complicated topics.

Like continuous delivery.

But lets shy away from that for a moment, because sometimes you have to deal with the basics before you can do anything else.

In my current position I’m responsible for the engineering behind all of the legacy products in our organization. At this point in time those products make almost all of the money (yay!), but contain 90%+ of the technical terror (boo!) so the most important thing from my point of view is to ensure that we can at least deliver them reliably and regularly.

Now, some of the products are in a good place regarding delivery.

Some, however, are not.

Someones Always Playing Continuous Integration Games

One specific product comes to mind. Now, that’s not to say that the other products are perfect (far from it in fact), but this product in particular is lacking some of the most fundamental elements of good software delivery, and it makes me uneasy.

In fairness, the product is still quite successful (multiple millions of dollars of revenue), but from an engineering point of view, that’s only because of the heroic efforts of the individuals involved.

With no build process, you suffer from the following shortcomings:

  • No versioning (or maybe ad-hoc versioning if you’re lucky). This makes it hard to reason about what variety of software the customer has, and can make support a nightmare. Especially true when you’re dealing with desktop software.
  • Mysterious or arcane build procedures. If no-one has taken the time to recreate the build environment (assuming there is one), then it probably has all sorts of crazy dependencies. This has the side effect of making it really hard to get a new developer involved as well.
  • No automated tests. With no build process running the tests, if you do have tests, they are probably not being run regularly. That’s if you have tests at all of course, because with no process running them, people probably aren’t writing them.
  • A poor or completely ad-hoc distribution mechanism. Without a build process to form the foundation of such a process, the one that does exist is mostly ad-hoc and hard to follow.

But there is no point in dwelling on what we don’t have.

Instead, lets do something about it.

Who Cares They’re Always Changing Continuous Integration Names

The first step is a build script.

Now, as I’ve mentioned before on this blog, I’m a big fan of including the build script into the repository, so that anyone with the appropriate dependencies can just clone the repo and run the script to get a deliverable. Release candidates will be built on some sort of controlled build server obviously, but I’ve found its important to be able to execute the same logic both locally and remotely in order to be able to react to unexpected issues.

Of course, the best number of dependencies outside of the repository is zero, but sometimes that’s not possible. Aim to minimise them at least, either by isolating them and including them directly, or by providing some form of automated bootstrapping.

This particular product is built in a mixture of Delphi (specifically Delphi 7) and .NET, so it wasn’t actually all that difficult to use our existing build framework (a horrific aberration built in Powershell) to get something up and running fairly quickly.

The hardest past was figuring out how to get the Delphi compiler to work from the command line, while still producing the same output as it would if you just followed the current build process (i.e. compilation from within the IDE).

With the compilation out of the way, the second hardest part was creating an artifact that looked and acted like the artifact that was being manually created. This comes in the form of a self-extracting zip file containing an assortment of libraries and executables that make up the “update” package.

Having dealt with both of those challenges, its nothing but smooth sailing.

We Just Want to Dance Here, But We Need An AMI

Ha ha ha ha no.

Being a piece of legacy software, the second step to was to create a build environment that could be used from TeamCIty.

This means an AMI with everything required in order to execute the build script.

For Delphi 7, that means an old version of the Delphi IDE and build tools. Good thing we still had the CD that the installer came on, so we just made an ISO and remotely mounted it in order to install the required software.

Then came the multitude of library and tool dependencies specific to this particular piece of software. Luckily, someone had actually documented enough instructions on how to set up a development environment, so we used that information to complete the configuration of the machine.

A few minor hiccups later and we had a build artifact coming out of TeamCity for this product for the very first time.

A considerable victory.

But it wasn’t versioned yet.

They Call Us Irresponsible, The Versioning Is A Lie

This next step is actually still under construction, but the plan is to use the TeamCity build number input and some static version configuration stored inside the repository to create a SemVer styled version for each build that passes through TeamCity.

Any build not passing through TeamCity, or being built from a branch should be tagged with an appropriate pre-release string (i.e. 1.0.0-[something]), allowing us to distinguish good release candidates (off master via TeamCity) from dangerous builds that should never be released to a real customer.

The abomination of a Powershell build framework allows for most of this sort of stuff, but assumes that a .NET style AssemblyInfo.cs file will exist somewhere in the source.

At the end of the day, we decided to just include such a file for ease of use, and then propagate that version generated via the script into the Delphi executables through means that I am currently unfamiliar with.

Finally, all builds automatically tag the appropriate commit in Git, but that’s pretty much built into TeamCity anyway, so barely worth mentioning.

Conclusion

Like I said at the start of the post, if you don’t have an automated build process, you’re definitely doing it wrong.

I managed to summarise the whole “lets construct a build process” journey into a single, fairly lightweight blog post, but a significant amount of work went into it over the course of a few months. I was only superficially involved (as is mostly the case these days), so I have to give all of the credit to my colleagues.

The victory that this build process represents cannot be understated though, as it will form a solid foundation for everything to come.

A small step in the greater scheme of things, but I’m sure everyone knows the quote at this point.

0 Comments

Another week, another piece of the puzzle.

This time around, I’ll go through how we’ve setup the build and deployment pipeline for the Lambda function code that processes the ELB log files. Its not a particularly complex system, but it was something completely different compared to things we’ve done in the past.

In general terms, building software is relatively straightforward, as long as you have two things:

  • A mechanism of tracking changes to the source code, i.e. source control.
  • The ability to validate the code and create a versioned package, where the behaviour of the code can be reasoned about.

With at least those two pieces in place (preferably fully automated), your builds are taken care of. Granted, the accomplishing those things is not as simple as two dot points would lead you to believe, but conceptually there is not a lot going on.

Once you have versioned packages that can be reasoned about, all you need to do is figure out how to deliver them to the appropriate places. Again, ideally the whole process is automated. You shouldn’t have to remote onto a machine and manually copy files around, as that hardly ever ends well. This can get quite complicated depending on exactly what it is that you are deploying: on-premises software can be notoriously painful to deploy without some supporting system, but deploying static page websites to S3 is ridiculously easy.

My team uses TeamCity, Nuget and Powershell to accomplish the build part and Octopus Deploy to deploy almost all of our code, and I don’t plan on changing any of that particularly soon.

Some people seem to think that because its so easy to manage Lambda functions from the AWS management website that they don’t have to have a build and deployment pipeline. After all, just paste the code into the website and you’re good to go, right?

I disagree vehemently.

Power Up

Our ELB logs processor Lambda function follows our normal repository structure, just like any other piece of software we write.

The code for the Lambda function goes into the /src folder, along with a Nuspec file that describes how to construct the resulting versioned package at build time.

Inside a /scripts folder is a build script, written in Powershell, containing all of the logic necessary to build and verify a deployable package. It mostly just leverages a library of common functions (so our builds are consistent), and its goal is to facilitate all of the parts of the pipeline, like compilation (ha Javascript), versioning, testing, packaging and deployment.

Some build systems are completely encapsulated inside the software that does the build, like Jenkins or TeamCity. I don’t like this approach, because you can’t easily run/debug the build on a different machine for any reason. I much prefer the model where the repository has all of the knowledge necessary to do the entire build and deployment process, barring the bits that it needs to accomplish via an external system.

The build script the ELB logs processor function is included below, but keep in mind, this is just the entry point, and a lot of the bits and pieces are inside the common functions that you can see referenced.

[CmdletBinding()]
param
(
    [switch]$deploy,
    [string]$octopusServerUrl,
    [string]$octopusApiKey,
    [string]$component,
    [string]$commaSeparatedDeploymentEnvironments,
    [string[]]$projects,
    [int]$buildNumber,
    [switch]$prerelease,
    [string]$prereleaseTag
)

$error.Clear()

$ErrorActionPreference = "Stop"

$here = Split-Path $script:MyInvocation.MyCommand.Path

. "$here\_Find-RootDirectory.ps1"

$rootDirectory = Find-RootDirectory $here
$rootDirectoryPath = $rootDirectory.FullName

. "$rootDirectoryPath\scripts\Invoke-Bootstrap.ps1"
. "$rootDirectoryPath\scripts\common\Functions-Build.ps1"

$arguments = @{}
$arguments.Add("Deploy", $deploy)
$arguments.Add("CommaSeparatedDeploymentEnvironments", $commaSeparatedDeploymentEnvironments)
$arguments.Add("OctopusServerUrl", $octopusServerUrl)
$arguments.Add("OctopusServerApiKey", $octopusApiKey)
$arguments.Add("Projects", $projects)
$arguments.Add("VersionStrategy", "SemVerWithPatchFilledAutomaticallyWithBuildNumber")
$arguments.Add("buildNumber", $buildNumber)
$arguments.Add("Prerelease", $prerelease)
$arguments.Add("PrereleaseTag", $prereleaseTag)
$arguments.Add("BuildEngineName", "nuget")

Build-DeployableComponent @arguments

I’m pretty sure I’ve talked about our build process and common scripts before, so I’m not going to go into any more detail.

Prior to deployment, the only interesting output is the versioned Nuget file, containing all of the dependencies necessary to run the Lambda function (which in our case is just the file in my previous post).

On A Rail

When it comes to deploying the Lambda function code, its a little bit more complicated that our standard software deployments via Octopus Deploy.

In most cases, we create a versioned package containing all of the necessary logic to deploy the software, so in the case of an API it contains a deploy.ps1 script that gets run automatically during deployment, responsible for creating a website and configuring IIS on the local machine. The most important thing Octopus does for us to provide mechanisms to get this package to the place where it needs to be.

It usually does this via an Octopus Tentacle, a service that runs on the deployment target and allows for communication between the Octopus server and the machine in question.

This system kind of falls apart when you’re trying to deploy to an AWS Lambda function, which cannot have a tentacle installed on it.

Instead, we rely on the AWS API and what amounts to a worker machine sitting in each environment.

When we do a deployment of our Lambda function project, it gets copied to the worker machine (which is actually just the Octopus server) and it runs the deployment script backed into the package. This script then uses Octopus variables to package the code again (in a way that AWS likes, a simple zip file) and uses the AWS API to upload the changed code to the appropriate Lambda function (by following a naming convention).

The deployment script is pretty straightforward:

function Get-OctopusParameter
{
    [CmdletBinding()]
    param
    (
        [string]$key
    )

    if ($OctopusParameters -eq $null)
    {
        throw "No variable called OctopusParameters is available. This script should be executed as part of an Octopus deployment."
    }

    if (-not($OctopusParameters.ContainsKey($key)))
    {
        throw "The key [$key] could not be found in the set of OctopusParameters."
    }

    return $OctopusParameters[$key]
}

$VerbosePreference = "Continue"
$ErrorActionPreference = "Stop"

$here = Split-Path -Parent $MyInvocation.MyCommand.Path
. "$here\_Find-RootDirectory.ps1"


$rootDirectory = Find-RootDirectory $here
$rootDirectoryPath = $rootDirectory.FullName

$awsKey = $OctopusParameters["AWS.Deployment.Key"]
$awsSecret = $OctopusParameters["AWS.Deployment.Secret"]
$awsRegion = $OctopusParameters["AWS.Deployment.Region"]

$environment = $OctopusParameters["Octopus.Environment.Name"]
$version = $OctopusParameters["Octopus.Release.Number"]

. "$rootDirectoryPath\scripts\common\Functions-Aws.ps1"
$aws = Get-AwsCliExecutablePath

$env:AWS_ACCESS_KEY_ID = $awsKey
$env:AWS_SECRET_ACCESS_KEY = $awsSecret
$env:AWS_DEFAULT_REGION = $awsRegion

$functionPath = "$here\src\function"

Write-Verbose "Compressing lambda code file"
Add-Type -AssemblyName System.IO.Compression.FileSystem
[system.io.compression.zipfile]::CreateFromDirectory($functionPath, "index.zip")

Write-Verbose "Updating Log Processor lambda function to version [$environment/$version]"
(& $aws lambda update-function-code --function-name $environment-ELBLogProcessorFunction --zip-file fileb://index.zip) | Write-Verbose

Nothing fancy, just using the AWS CLI to deploy code to a known function.

Apprehension

AWS Lambda does provide some other mechanisms to deploy code, and we probably could have used them to accomplish the same sort of thing, but I like our patterns to stay consistent and I’m a big fan of the functionality that Octopus Deploy provides, so I didn’t want to give that up.

We had to make a few changes to our environment pattern to allow for non-machine based deployment, like:

  • Ensuring every automatically created environment has a machine registered in it representing the Octopus server (for running scripts that effect external systems)
    • This meant that our environment cleanup also needed to take this into account, deregistering the machine before trying to remove the Octopus environment
  • Providing a section at the end of the environment setup to deploy Octopus projects that aren’t related to specific machines
    • Previously all of our deployments happened via cfn-init on the EC2 instances in question

Once all of that was in place, it was pretty easy to deploy code to a Lambda function as part of setting up the environment, just like we would deploy code to an EC2 instance. It was one of those cases where I’m glad we wrap our usage of CloudFormation in Powershell, because if we were just using raw CloudFormation, I’m not sure how we would have integrated the usage of Octopus Deploy.

To Be Summarised

I’ve only got one more post left in this series, which will summarise the entire thing and link back to all the constituent parts.

Until then, I don’t really have anything else to say.

0 Comments

Like me, I assume you get into a situation sometimes where you want to execute a script, but you definitely don’t want some of its more permanent side effects to happen. Scripts can do all sorts of crazy things, like commit files into git, make changes to the local machine or publish files into production and you definitely don’t want those things to happen when you’re not intending them to.

This becomes even more important when you want to start writing tests for your scripts (or at least their components, like functions). You definitely don’t want the execution of your tests to change something permanently, especially if it changes the behaviour of the system under test, because then the next time you run the test its giving you a different result or executing a different code path. All things you want to avoid to get high quality tests.

In my explorations of the issue, I have come across two solutions. One helps to deal with side effects during tests and the other gives you greater control over your scripts, allowing you to develop them with some security that you are not making unintended changes.

Testing the Waters

I’ve recently started using Pester to test my Powershell functions.

The lack of testing (other than manual testing of course) in my script development process was causing me intense discomfort, coming from a strong background where I always wrote tests (before, after, during) whenever I was developing a feature in C#.

Its definitely improved my ability to write Powershell components quickly and in a robust way, and has improved my ability to refactor, safe in the knowledge that if I mess it up (and in a dynamic scripting language like Powershell with no Visual Studio calibre IDE, you will mess it up) the tests will at least catch the most boneheaded mistakes. Maybe even the more subtle mistakes too, if you write good tests.

Alas, I haven’t quite managed to get a handle on how to accomplish dependency injection in Powershell, but I have some ideas that may turn up in a future blog post. Or they may not, because it might be a terrible idea, only time will tell.

To tie this apparently pointless and unrelated topic back into the blog post, sometimes you need a way to control the results returned from some external call or to make sure that some external call doesn’t actually do anything. Luckily, Powershell being a dynamic language, you can simply overwrite a function definition in the scope of your test. I think.

I had to execute a Powershell script from within TestComplete recently, and was surprised when my trusty calls to write-host (for various informational logging messages) would throw errors when the script was executed via the Powershell object in System.Management.Automation. The behaviour makes perfect sense when you think about it, as that particular implementation of a host environment simply does not provide a mechanism to output anything not through the normal streams. I mention this because it was a problem that I managed to solve (albeit in a hacky way) by providing a blank implementation of write-host in the same session as my script, effectively overriding the implementation that was throwing errors.

Pester provides a mechanism for doing just this, through the use of Mocks.

I’d love to write a detailed example of how to use Mocks in Pester here, but to be honest, I haven’t had the need as of yet (like I said, I’ve only very recently started using Pester). Luckily the Pester wiki is pretty great, so there’s enough information there if you want to have a read.

I’m very familiar with mocking as a concept though, as I use it all the time in my C# tests. I personally am a fan of NSubstitute, but I’ve used Moq as well.

The only point I will make is that without dependency injection advertising what your components dependencies are, you have to be aware of its internal implementation in order to Mock out its dependencies. This makes me a little bit uncomfortable, but still, being able to Mock those dependencies out instead of having them hardcoded is much preferred.

Zhu Li, Do the Thing (But Not the Other Thing)

The second approach that I mentioned is a concept that is built into Powershell, which I have stolen and bastardised for my own personal gain.

A lot of the pre-installed Powershell components allow you to execute the script in –WhatIf mode.

WhatIf mode essentially runs the script as normal, except it doesn’t allow it to actually make any permanent changes. It’s up to the component exactly what it considers to be permanent changes, but its typically things like changing system settings and interacting with the file system. Instead it just writes out messages to whatever appropriate stream stating the action that would have normally occurred. I imagine that depending on how your component is written it might not react well to files it asks to be created or deleted not occurring as expected, but its still an interesting feature all the same.

In my case, I had a build and publish script that changed the AssemblyInfo of a C# project and then committed those changes to git, as well as tagging git with a build number when the publish completed successfully. I had to debug some issues with the script recently, and I wanted to run it without any of those more permanent changes happening.

This is where I leveraged the –WhatIf switch, even though I used it in a slightly different way, and didn’t propagate the switch down to any components being used by my script. Those components were mostly non-powershell, so it wouldn’t have helped anyway (things like git, MSBuild and robocopy).

I used the switch to turn off the various bits that made more permanent changes, and to instead output a message through the host describing the action that would have occurred. I left in the parts that made permanent changes to the file system (i.e. the files output from MSbuild) because those don’t have any impact on the rest of the system.

Of course you still need to test the script as a whole, which is why we have a fully fledged development environment that we can freely publish to as much as we like, but its still nice to execute the script safe in the knowledge that it’s not going to commit something to git.

I’ve found the WhatIf approach to be very effective, but it relies entirely on the author of the script to select the bits that they thing are permanent system changes and distinguish them from ones that are not as permanent (or at least easier to deal with, like creating new files). Without a certain level of analysis and thinking ahead, the approach obviously doesn’t work.

I’ve even considered defaulting WhatIf to on, to ensure that its a conscious effort to make permanent changes, just as a safety mechanism, mostly to protect future me from running the script in a stupid way.

Summary

When programming, its important to be aware of and to limit any side effects of the code that you have written, both for testing and development. The same holds true of scripts. The complication here is that scripts tend to bundle up lots of changes to the system being acted upon as their entire purpose, so you have to be careful with selecting which effects you want to minimise while developing.

In other news, I’ve been working on setting up a walking skeleton for a new service that my team is writing. Walking skeleton is a term referenced in Growing Object Oriented Software, Guided By Tests, and describes writing the smallest piece of functionality possible first, and then ensuring that the entire build and deployment process is created and working before doing anything else.

I suspect I will make a series of blog posts about that particular adventure.

Spoiler alert, it involves AWS and Powershell.

0 Comments

So, we decided to automate the execution of our Functional tests. After all, tests that aren't being run are effectively worthless.

In Part 1, I gave a little background, mentioned the scripting language we were going to use (Powershell), talked about programmatically spinning up the appropriate virtual machine in AWS for testing purposes and then how to communicate with it.

This time I will be talking about automating the installation of the software under test, running the actual tests and then reporting the results.

Just like last time, here is a link to the GitHub repository with sanitized versions of all the scripts (which have been updated significantly), so you don't have to piece them together from the snippets I’m posting throughout this post.

Installed into Power

Now that I could create an appropriate virtual machine as necessary and execute arbitrary code on it remotely, I needed to install the actual software under test.

First I needed to get the installer onto the machine though, as we do not make all of our build artefacts publically available on every single build (so no just downloading it from the internet). Essentially I just needed a mechanism to transfer files from one machine to another. Something that wouldn’t be too difficult to setup and maintain.

I tossed up a number of options:

  • FTP. I could setup an FTP server on the virtual machine and transfer the files that way. The downside of this is that I would have to setup an FTP server on the virtual machine, and make sure it was secure; and configured correctly. I haven’t setup a lot of FTP servers before, so I decided not to do this one.
  • SSH + File Transfer. Similar to the FTP option, I could install an SSH server on the virtual machine and then use something like SCP to securely copy the files to the machine. This would have been the easiest option if the machine was Linux based, but being a Windows machine it was more effort than it was worth.
  • Use an intermediary location, like an Amazon S3 bucket. This is the option I ended up going with.

Programmatically copying files to an Amazon S3 bucket using Powershell is fairly straightforward, although I did run into two issues.

Folders? What Folders?

Even though its common for GUI tools that sit on top of Amazon S3 to present the information as a familiar folder/file directory structure, that is entirely not how it actually works. In fact, thinking of the information that you put in S3 in that way will just get you into trouble.

Instead, its much more accurate to think of the information you upload to S3 to be key/value pairs, where the key tends to look like a fully qualified file path.

I made an interesting error at one point and uploaded 3 things to S3 with the following keys, X, X\Y and X\Z. The S3 website interpreted the X as a folder, which meant that I was no longer able to access the file that I had actually stored at X, at least through the GUI anyway. This is one example of why thinking about S3 as folders/files can get you in trouble.

Actually uploading files to S3 using Powershell is easy enough. Amazon supply a set of cmdlets that allow you to interact with S3, and those cmdlets are pre-installed on machines originally created using an Amazon supplied AMI.

With regards to credentials, you can choose to store the credentials in configuration, allowing you to avoid having to enter them for every call, or you can supply then whenever you call the cmdlets. Because this was a script, I chose to supply them on each call, so that the script would be self contained. I’m not a fan of global settings in general, they make me uncomfortable. I feel that it makes the code harder to understand, and in this case, it would have obfuscated how the cmdlets were authenticating to the service.

The function that uploads things to S3 is as follows:

function UploadFileToS3
{
    param
    (
        [string]$awsKey,
        [string]$awsSecret,
        [string]$awsRegion,
        [string]$awsBucket,
        [System.IO.FileInfo]$file,
        [string]$S3FileKey
    )

    write-host "Uploading [$($file.FullName)] to [$($awsRegion):$($awsBucket):$S3FileKey]."
    Write-S3Object -BucketName $awsBucket -Key $S3FileKey -File "$($file.FullName)" -Region $awsRegion -AccessKey $awsKey -SecretKey $awsSecret

    return $S3FileKey
}

The function that downloads things is:

function DownloadFileFromS3
{
    param
    (
        [string]$awsKey,
        [string]$awsSecret,
        [string]$awsRegion,
        [string]$awsBucket,
        [string]$S3FileKey,
        [string]$destinationPath
    )

    $destinationFile = new-object System.IO.FileInfo($destinationPath)
    if ($destinationFile.Exists)
    {
        write-host "Destination for S3 download of [$S3FileKey] ([$($destinationFile.FullName)]) already exists. Deleting."
        $destinationFile.Delete()
    }

    write-host "Downloading [$($awsRegion):$($awsBucket):$S3FileKey] to [$($destinationFile.FullName)]."
    Read-S3Object -BucketName $awsBucket -Key $S3FileKey -File "$($destinationFile.FullName)" -Region $awsRegion -AccessKey $awsKey -SecretKey $awsSecret | write-host

    $destinationFile.Refresh()

    return $destinationFile
}

Once the installer was downloaded on the virtual machine, it was straightforward to install it silently.

if (!$installerFile.Exists)
{
    throw "The Installer was supposed to be located at [$($installerFile.FullName)] but could not be found."
}

write-host "Installing Application (silently) from the installer [$($installerFile.FullName)]"
# Piping the results of the installer to the output stream forces it to wait until its done before continuing on
# with the remainder of the script. No useful output comes out of it anyway, all we really care about
# is the return code.
& "$($installerFile.FullName)" /exenoui /qn /norestart | write-host
if ($LASTEXITCODE -ne 0)
{
    throw "Failed to Install Application."
}

We use Advanced Installer, which in turn uses an MSI, so you’ll notice that there are actually a number of switches being used above to get the whole thing to install without human interaction. Also note the the piping to write-host, which ensures that Powershell actually waits for the installer process to finish, instead of just starting it and then continuing on its merry way. I would have piped to write-output, but then the uncontrolled information from the installer would go to the output stream and mess up my return value.

Permissions Denied

God. Damn. Permissions.

I had so much trouble with permissions on the files that I uploaded to S3. I actually had a point where I was able to upload files, but I couldn’t download them using the same credentials that I used to upload them! That makes no goddamn sense.

To frame the situation somewhat, I created a user within our AWS account specifically for all of the scripted interactions with the service. I then created a bucket to contain the temporary files that are uploaded and downloaded as part of the functional test execution.

The way that S3 defines permissions on buckets is a little strange in my opinion. I would expect to be able to define permissions on a per user basis for the bucket. Like, user X can read from and write to this bucket, user Y can only read, etc. I would also expect files that are uploaded to this bucket to then inherit those permissions, like a folder, unless I went out of my way to change them. That’s the mental model of permissions that I have, likely as a result of using Windows for many years.

This is not how it works.

Yes you can define permissions on a bucket, but not for users within your AWS account. It doesn’t even seem to be able to define permissions for other specific AWS accounts either. There’s a number of groups available, one of which is Authenticated Users, which I originally set with the understanding that it would give authenticated users belonging to my AWS account the permissions I specified. Plot twist, Authenticated Users means any AWS user. Ever. As long as they are authenticating. Obviously not what I wanted. At least I could upload files though (but not download them).

Permissions are not inherited when set through the simple options I mentioned above, so any file I uploaded had no permissions set on it.

The only way to set permissions with the granularity necessary and to have them inherited automatically is to use a Bucket Policy.

Setting up a Bucket Policy is not straight forward, at least to a novice like myself.

After some wrangling with the helper page, and some reading of the documentation, here is the bucket policy I ended up using, with details obscured to protect the guilty.

{
    "Version": "2008-10-17",
    "Id": "Policy1417656003414",
    "Statement": [
        {
            "Sid": "Stmt1417656002326",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[USER IDENTIFIER]"
            },
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::[BUCKET NAME]/*"
        },
        {
            "Sid": "Stmt14176560023267",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[USER IDENTIFIER]"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[BUCKET NAME]"
        }
    ]
}

The policy actually reads okay once you have it, but I’ll be honest, I still don’t quite understand it. I know that I’ve specifically given the listed permissions on the contents of the bucket to my user, and also given List permissions on the bucket itself. This allowed me to upload and download files with the user I created, which is what I wanted. I’ll probably never need to touch Bucket Policies again, but if I do, I’ll make more of an effort to understand them.

Testing My Patience

Just like the installer, before I can run the tests I need to actually have the tests to run.

We currently use TestComplete as our Functional test framework. I honestly haven’t looked into TestComplete all that much, apart from just running the tests, but it seems to be solid enough. TestComplete stores your functional tests in a structure similar to a Visual Studio project, with a project file and a number of files under that that define the actual tests and the order to run them in.

For us, our functional tests are stored in the same Git repository as our code. We use feature branches, so it makes sense to run the functional tests that line up with the code that the build was made from. The Build Agent that builds the installer has access to the source (obviously), so its a fairly simple matter to just zip up the definitions and any dependent files, and push that archive to S3 in the exact same manner as the installer, ready for the virtual machine to download.

Actually running the tests though? That was a bastard.

As I mentioned in Part 1, after a virtual machine is spun up as part of this testing process, I use Powershell remote execution to push a script to the machine for execution.

As long as you don’t want to do anything with a user interface, this works fantastically.

Functional tests being based entirely around interaction with the user interface therefore prove problematic.

The Powershell remote session that is created when executing the script remotely does not have any ability for user-interactivity, and cannot to my knowledge. Its just something that Powershell can’t do.

However, you can remotely execute another tool, PSExec, and specify that it run a process locally in an interactive session.

$testExecute = 'C:\Program Files (x86)\SmartBear\TestExecute 10\Bin\TestExecute.exe'
$testExecuteProjectFolderPath = "$functionalTestDefinitionsDirectoryPath\Application"
$testExecuteProject = "$testExecuteProjectFolderPath\ApplicationTests.pjs"
$testExecuteResultsFilePath = "$functionalTestDefinitionsDirectoryPath\TestResults.mht"

write-host "Running tests at [$testExecuteProject] using TestExecute at [$testExecute]. Results going to [$testExecuteResultsFilePath]."
# Psexec does a really annoying thing where it writes information to STDERR, which Powershell detects as an error
# and then throws an exception. The 2>&1 redirects all STDERR to STDOUT to get around this.
# Bit of a dirty hack here. The -i 2 parameter executes the application in interactive mode specifying
# a pre-existing session with ID 2. This is the session that was setup by creating a remote desktop
# session before this script was executed. Sorry.
& "C:\Tools\sysinternals\psexec.exe" -accepteula -i 2 -h -u $remoteUser -p $remotePassword "$testExecute" "$testExecuteProject" /run /SilentMode /exit /DoNotShowLog /ExportLog:$testExecuteResultsFilePath 2>&1 | write-host
[int]$testExecuteExitCode = $LASTEXITCODE

The –i [NUMBER] in the command above tells PSExec to execute the process in an interactive user session, specifically the session with the ID specified. I’ve hardcoded mine to 2, which isn’t great, but works reliably in this environment because the remote desktop session I create after spinning up the instance always ends up with ID 2. Hacky.

Remote desktop session you may ask? In order for TestComplete (well TestExecute technically) to execute the tests correctly you need to actually have a desktop session setup. I assume this is related to it hooking into the UI user mouse and keyboard hooks or something, I don’t really know. All I know is that it didn't work without a remote desktop session of some sort.

On the upside, you can automate the creation of a remote desktop session with a little bit of effort, although there are two hard bits to be aware of.

Who Are You?

There is no obvious way to supply credentials to the Remote Desktop client (mstsc.exe). You can supply the machine that you want to make the connection to (thank god), but not credentials. I think there might be support for storing this information in an RDP file though, which seem to be fairly straightforward. As you can probably guess from my lack of knowledge about that particular approach, that’s not what I ended up doing.

I still don’t fully understand the solution, but you can use the built in windows utility cmdkey to store credentials for things. If you store credentials for the remote address that you are connecting to, the Remote Desktop client will happily use them.

There is one thing you need to be careful with when using this utility to automate Remote Desktop connections. If you clean up after yourself (by deleting the stored credentials after you use them) make sure you wait until the remote session is established. If you delete the credentials before the client actually uses them you will end up thinking that the stored credentials didn’t work, and waste a day investigating and trialling VNC solutions which ultimately don’t work as well as Remote Desktop before you realise the stupidity of your mistake. This totally happened to a friend of mine. Not me at all.

Anyway, the entirety of the remote desktop script (from start-remote-session.ps1):

param (
    [Parameter(Mandatory=$true,Position=0)]
    [Alias("CN")]
    [string]$ComputerNameOrIp,
    [Parameter(Mandatory=$true,Position=1)]
    [Alias("U")] 
    [string]$User,
    [Parameter(Mandatory=$true,Position=2)]
    [Alias("P")] 
    [string]$Password
)

& "$($env:SystemRoot)\system32\cmdkey.exe" /generic:$ComputerNameOrIp /user:$User /pass:$Password | write-host

$ProcessInfo = new-object System.Diagnostics.ProcessStartInfo

$ProcessInfo.FileName = "$($env:SystemRoot)\system32\mstsc.exe" 
$ProcessInfo.Arguments = "/v $ComputerNameOrIp"

$Process = new-object System.Diagnostics.Process
$Process.StartInfo = $ProcessInfo
$startResult = $Process.Start()

Start-Sleep -s 15

& "$($env:SystemRoot)\system32\cmdkey.exe" /delete:$ComputerNameOrIp | write-host

return $Process.Id

Do You Trust Me?

The second hard thing with the Remote Desktop client is that it will ask you if you trust the remote computer if you don’t have certificates setup. Now I would typically advise that you setup certificates for this sort of thing, especially if communicating over the internet, but in this case I was remoting into a machine that was isolated from the internet within the same Amazon virtual network, so it wasn’t necessary.

Typically, clicking “yes” on an identity verification dialog isn't a problem, even if it is annoying. Of course, in a fully automated environment, where I am only using remote desktop because I need a desktop to actually be rendered to run my tests, it’s yet another annoying thing I need to deal with without human interaction.

Luckily, you can use a registry script to disable the identity verification in Remote Desktop. This had to be done on the build agent instance (the component that actually executes the functional tests).

Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client]
    "AuthenticationLevelOverride"=dword:00000000

Report Soldier!

With the tests actually running reliably (after all the tricks and traps mentioned above), all that was left was to report the results to TeamCity.

It was trivial to simply get an exit code from TestExecute (0 = Good, 1 = Warnings, 2 = Tests Failed, 3 = Tests Didn’t Run). You can then use this exit code to indicate to TeamCity whether or not the tests succeeded.

if ($testResult -eq $null)
{
    throw "No result returned from remote execution."
}

if ($testResult.Code -ne 0)
{
    write-host "##teamcity[testFailed name='$teamCityFunctionalTestsId' message='TestExecute returned error code $($testResult.Code).' details='See artifacts for TestExecute result files']"
}
else
{
    write-host "##teamcity[testFinished name='$teamCityFunctionalTestsId'"
}

That's enough to pass or fail a build.

Of course, if you actually have failing functional tests you want a hell of a lot more information in order to find out whythey failed. Considering the virtual machine on which the tests were executed will have been terminated at the end of the test run, we needed to extract the maximum amount of information possible.

TestComplete (TestExecute) has two reporting mechanisms, not including the exit code from above.

The first is an output file, which you can specify when running the process. I chose an MHT file, which is a nice HTML document showing the tests that ran (and failed), and which has embedded screenshots taken on the failing steps. Very useful.

The second is the actual execution log, which is attached to the TestComplete project. This is a little harder to use, as you need to take the entire project and its log file and open it in TestComplete, but is great for in depth digging, as it gives a lot of information about which steps are failing and has screenshots as well.

Both of these components are zipped up on the functional tests worker and then placed into S3, so the original script can download them and attach them the TeamCity build artefacts. This is essentially the same process as for getting the test definitions and installer to the functional tests worker, but in reverse, so I won’t go into any more detail about it.

Summary

So, after all is said and done, I had automated:

  • The creation of a virtual machine for testing.
  • The installation of the latest build on that virtual machine.
  • The execution of the functional tests.
  • The reporting of the results, in both a programmatic (build pass/fail) and human readable way.

There were obviously other supporting bits and pieces around the core processes above, but there is little point in mentioning them here in the summary.

Conclusion

All up, I spent about 2 weeks of actual time on automating the functional tests.

A lot of the time was spent familiarising myself with a set of tools that I’d never (or barely) used before, like TestComplete (and TestExecute), the software under test, Powershell, programmatic access to Amazon EC2 and programmatic access to Amazon S3.

As with all things technical, I frequently ran into roadblocks and things not working the way that I would have expected them too out of the box. These things are vicious time sinks, involving scouring the internet for other people who’ve had the same issue and hoping to all that is holy that they solved their problem and then remembered to come back and share their solution.

Like all things involving software, I fought complexity every step of way. The two biggest offenders were the complexity in handling errors in Powershell in a robust way (so I could clean up my EC2 instances) and actually getting TestExecute to run the damn tests because of its interactivity requirements.

When all was said and done though, the functional tests are now an integral part of our build process, which means there is far more incentive to adding to them and maintaining them. I do have some concerns about their reliability (UI focused tests are a bit like that), but that can be improved over time.

1 Comments

Update: I ran into an issue with the script used in this post to do the signing when using an SHA-256 certificate (i.e. a newer one). You wrote another post describing the issue and solution here.

God I hate certificates.

Everything involving them always seems to be painful. Then you finally get the certificate thing done after hours of blood, sweat and pain, put it behind you, and some period of time later, the certificate expires and it all happens again. Of course, you’ve forgotten how you dealt with it the first time.

I’ve blogged before about the build/publish script I made for a ClickOnce WPF application, but I neglected to mention that there was a certificate involved.

Signing is important when distributing an application through ClickOnce, as without a signed installer, whenever anyone tries to install your application they will get warnings. Warnings like this one.

setup.exe is not commonly downloaded and could harm your computer

For a commercial application, that’s a terrible experience. Nobody will want to install a piece of software when their screen is telling them that “the author of the software is unknown”. And its red! Red bad. Earlier versions of Internet Explorer weren’t quite as hostile, but starting in IE9 (I think) the warning dialog was made significantly stronger. Its hard to even find the button to override the warning and just install the damn thing (Options –> More Options –> Run Anyway, which is a really out of the way).

As far as I can tell, all ClickOnce applications have a setup.exe file. I have no idea if you can customise this, but its essentially just a bootstrapper for the .application file which does some addition checks (like .NET Framework version).

Anyway, the appropriate way to deal with the above issue is by signing the ClickOnce manifests.

You need to use an Authenticode Code Signing Certificate, from a trusted Certificate Authority. These can range in price from $100 US to $500+ US. Honestly, I don’t understand the difference. For this project, we picked up one from Thawte for reasons I can no longer remember.

There’s slightly more to the whole signing process than just having the appropriate Certificate and Signing the installer. Even with a fully signed installer, Internet Explorer (via SmartScreen) will still give a warning to your users when they try to install, saying that “this application is not commonly downloaded”. The only way around this is to build up reputation with SmartScreen, and the only way to do that is slowly, over time, as more and more people download your installer. The kicker here is that without a certificate the reputation is tied to the specific installer, so if you ever make a new installer (like for a version update) all that built up reputation will go away. If you signed it however, the reputation accrues on the Certificate instead.

Its all very convoluted.

Enough time has passed between now and when I bought and setup the certificate for me to have completely forgotten how I went about it. I remember it being an extremely painful process. I vaguely recall having to generate a CSR (Certificate Signing Request), but I did it from Windows 7 first accidentally, and you can’t easily get the certificate out if you do that, so I had to redo the whole process on Windows Server 2012 R2. Thawte took ages to process the order as well, getting stuck on parts of the certification process a number of times.

Once I exported the certificate (securing the private key with a password) it was easy to incorporate it into the actual publish process though. Straightforward configuration option inside the Project properties, under Signing. The warning went from red (bad) to orange (okayish). This actually gives the end-user a Run button, instead of strongly recommending to just not run this thing. We also started gaining reputation against our Certificate, so that one day it would eventually be green (yay!).

Do you want to run or save setup.exe from

Last week, someone tried to install the application on Windows 8, and it all went to hell again.

I incorrectly assumed that once installed, the application would be trusted, which was true in Windows 7. This is definitely not the case in Windows 8.

Because the actual executable was not signed, the user got to see the following wonderful screen immediately after successfully installing the application (when it tries to automatically start).

windows protected your PC

Its the same sort of thing as what happens when you run the installer, except it takes over the whole screen to try and get the message across. The Run Anyway command is not quite as hidden (click on More Info) but still not immediately apparent.

The root cause of the problem was obvious (I just hadn’t signed the executable), but fixing it took me at least a day of effort, which is a day of my life I will never get back. That I had to spend in Certificate land. Again.

First Stab

At first I thought I would just be able to get away with signing the assembly. I mean, that option is directly below the configuration option for signing the ClickOnce manifests, so they must be related, right?

I still don’t know, because I spent the next 4 hours attempting to use my current Authenticode Code Signing Certificate as the strong name key file for signing the assembly.

I got an extremely useful error message.

Error during Import of the Keyset: Object already exists.

After a bit of digging it turns out that if you did not use KeySpec=2 (AT_SIGNATURE) during enrollment (i.e. when generating the CSR) you can’t use the resulting certificate for strong naming inside Visual Studio. I tried a number of things, including re-exporting, deleting and then importing the Certificate trying to force AT_SIGNATURE to be on, but I did not have any luck at all. Thawte support was helpful, but in the end unable to do anything about it.

Second Stab

Okay, what about signing the actual executable? Surely I can use my Authenticode Code Signing Certificate to sign the damn executable.

You can sign an executable (not just executables, other stuff too) using the CodeSign tool, which is included in one of the Windows SDKs. I stole mine from “C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\bin”. The nice thing is that its an (entirely?) standalone application, so you can include it in your repository in the tools directory so that builds/publish

Of course, because I’m publishing the application through ClickOnce its not just as simple as “sign the executable”. ClickOnce uses the hash of files included in the install in the generation of its .manifest file, so if you sign the executable after ClickOnce has published to a local directory (before pushing it to the remote location, like I was doing) it changes the hash of the file and the .manifest is no longer valid.

With my newfound Powershell skills (and some help from this awesome StackOverflow post), I wrote the following script.

param
(
    $certificatesDirectory,
    $workingDirectory,
    $certPassword
)

if ([string]::IsNullOrEmpty($certificatesDirectory))
{
    write-error "The supplied certificates directory is empty. Terminating."
    exit 1
}

if ([string]::IsNullOrEmpty($workingDirectory))
{
    write-error "The supplied working directory is empty. Terminating."
    exit 1
}

if ([string]::IsNullOrEmpty($certPassword))
{
    write-error "The supplied certificate password is empty. Terminating."
    exit 1
}

write-output "The root directory of all files to be deployed is [$workingDirectory]."

$appFilesDirectoryPath = Convert-Path "$workingDirectory\Application Files\[PUBLISH DIRECTORY ROOT NAME]_*\"

write-output "The application manifest and all other application files are located in [$appFilesDirectoryPath]."

if ([string]::IsNullOrEmpty($appFilesDirectoryPath))
{
    write-error "Application Files directory is empty. Terminating."
    exit 1
}

#Need to resign the application manifest, but before we do we need to rename all the files back to their original names (remove .deploy)
Get-ChildItem "$appFilesDirectoryPath\*.deploy" -Recurse | Rename-Item -NewName { $_.Name -replace '\.deploy','' }

$certFilePath = "$certificatesDirectory\[CERTIFICATE FILE NAME]"

write-output "All code signing will be accomplished using the certificate at [$certFilePath]."

$appManifestPath = "$appFilesDirectoryPath\[MANIFEST FILE NAME]"
$appPath = "$workingDirectory\[APPLICATION FILE NAME]"
$timestampServerUrl = "http://timestamp.globalsign.com/scripts/timstamp.dll"

& tools\signtool.exe sign /f "$certFilePath" /p "$certPassword" -t $timestampServerUrl "$appFilesDirectoryPath\[EXECUTABLE FILE NAME]"
if($LASTEXITCODE -ne 0)
{
    write-error "Signing Failure"
    exit 1
}

# mage -update sets the publisher to the application name (overwriting any previous setting)
# We could hardcode it here, but its more robust if we get it from the manifest before we
# mess with it.
[xml] $xml = Get-Content $appPath
$ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
$ns.AddNamespace("asmv1", "urn:schemas-microsoft-com:asm.v1")
$ns.AddNamespace("asmv2", "urn:schemas-microsoft-com:asm.v2")
$publisher = $xml.SelectSingleNode('//asmv1:assembly/asmv1:description/@asmv2:publisher', $ns).Value
write-host "Publisher extracted from current .application file is [$publisher]."

# It would be nice to check the results from mage.exe for errors, but it doesn't seem to return error codes :(
& tools\mage.exe -update $appManifestPath -certFile "$certFilePath" -password "$certPassword"
& tools\mage.exe -update $appPath -certFile "$certFilePath" -password "$certPassword" -appManifest "$appManifestPath" -pub $publisher -ti $timestampServerUrl

#Rename files back to the .deploy extension, skipping the files that shouldn't be renamed
Get-ChildItem -Path "$appFilesDirectoryPath"  -Recurse | Where-Object {!$_.PSIsContainer -and $_.Name -notlike "*.manifest"} | Rename-Item -NewName {$_.Name + ".deploy"}

Its not the most fantastic thing I’ve ever written, but it gets the job done. Note that the password for the certificate is supplied to the script as a parameter (don’t include passwords in scripts, that’s just stupid). Also note that I’ve replaced some paths/names with tokens in all caps (like [PUBLISH DIRECTORY ROOT NAME]) to protect the innocent.

The meat of the script does the following:

  • Locates the publish directory (which will have a name like [PROJECT NAME]_[VERSION]).
  • Removes all of the .deploy suffixes from the files in the publish directory. ClickOnce appends .deploy to all files that are going to be deployed. I do not actually know why.
  • Signs the executable.
  • Extracts the current publisher from the manifest.
  • Updates the .manifest file.
  • Updates the .application file.
  • Restores the previously removed .deploy suffix.

You may be curious as to why the publisher is extracted from the .manifest file and then re-supplied. This is because if you update a .manifest file and you don’t specify a publisher, it overwrites whatever publisher was there before with the application name. Obviously, this is bad.

Anyway, the signing script is called after a build/publish but before the copy to the remote location in the publish script for the application.

Conclusion

After signing the executable and ClickOnce manifest, Windows 8 no longer complains about the application, and the installation process is much more pleasant. Still not green, but getting closer.

I really do hate every time I have to interact with a certificate though. Its always complicated, complex and confusing and leaves me frustrated and annoyed at the whole thing. Every time I learn just enough to get through the problem, but I never feel like I understand the intricacies enough to really be able to do this sort of thing with confidence.

Its one of those times in software development where I feel like the whole process is too complicated, even for a developer. It doesn’t help that not only is the technical process of using a certificate complicated, but even buying one is terrible, with arbitrary price differences (why are they different?) and terrible processes that you have to go through to even get a certificate.

At least this time I have a blog, and I’ve written this down so I can find it when it all happens again and I’ve purged all the bad memories from my head.