0 Comments

Another week, another piece of the puzzle.

This time around, I’ll go through how we’ve setup the build and deployment pipeline for the Lambda function code that processes the ELB log files. Its not a particularly complex system, but it was something completely different compared to things we’ve done in the past.

In general terms, building software is relatively straightforward, as long as you have two things:

  • A mechanism of tracking changes to the source code, i.e. source control.
  • The ability to validate the code and create a versioned package, where the behaviour of the code can be reasoned about.

With at least those two pieces in place (preferably fully automated), your builds are taken care of. Granted, the accomplishing those things is not as simple as two dot points would lead you to believe, but conceptually there is not a lot going on.

Once you have versioned packages that can be reasoned about, all you need to do is figure out how to deliver them to the appropriate places. Again, ideally the whole process is automated. You shouldn’t have to remote onto a machine and manually copy files around, as that hardly ever ends well. This can get quite complicated depending on exactly what it is that you are deploying: on-premises software can be notoriously painful to deploy without some supporting system, but deploying static page websites to S3 is ridiculously easy.

My team uses TeamCity, Nuget and Powershell to accomplish the build part and Octopus Deploy to deploy almost all of our code, and I don’t plan on changing any of that particularly soon.

Some people seem to think that because its so easy to manage Lambda functions from the AWS management website that they don’t have to have a build and deployment pipeline. After all, just paste the code into the website and you’re good to go, right?

I disagree vehemently.

Power Up

Our ELB logs processor Lambda function follows our normal repository structure, just like any other piece of software we write.

The code for the Lambda function goes into the /src folder, along with a Nuspec file that describes how to construct the resulting versioned package at build time.

Inside a /scripts folder is a build script, written in Powershell, containing all of the logic necessary to build and verify a deployable package. It mostly just leverages a library of common functions (so our builds are consistent), and its goal is to facilitate all of the parts of the pipeline, like compilation (ha Javascript), versioning, testing, packaging and deployment.

Some build systems are completely encapsulated inside the software that does the build, like Jenkins or TeamCity. I don’t like this approach, because you can’t easily run/debug the build on a different machine for any reason. I much prefer the model where the repository has all of the knowledge necessary to do the entire build and deployment process, barring the bits that it needs to accomplish via an external system.

The build script the ELB logs processor function is included below, but keep in mind, this is just the entry point, and a lot of the bits and pieces are inside the common functions that you can see referenced.

[CmdletBinding()]
param
(
    [switch]$deploy,
    [string]$octopusServerUrl,
    [string]$octopusApiKey,
    [string]$component,
    [string]$commaSeparatedDeploymentEnvironments,
    [string[]]$projects,
    [int]$buildNumber,
    [switch]$prerelease,
    [string]$prereleaseTag
)

$error.Clear()

$ErrorActionPreference = "Stop"

$here = Split-Path $script:MyInvocation.MyCommand.Path

. "$here\_Find-RootDirectory.ps1"

$rootDirectory = Find-RootDirectory $here
$rootDirectoryPath = $rootDirectory.FullName

. "$rootDirectoryPath\scripts\Invoke-Bootstrap.ps1"
. "$rootDirectoryPath\scripts\common\Functions-Build.ps1"

$arguments = @{}
$arguments.Add("Deploy", $deploy)
$arguments.Add("CommaSeparatedDeploymentEnvironments", $commaSeparatedDeploymentEnvironments)
$arguments.Add("OctopusServerUrl", $octopusServerUrl)
$arguments.Add("OctopusServerApiKey", $octopusApiKey)
$arguments.Add("Projects", $projects)
$arguments.Add("VersionStrategy", "SemVerWithPatchFilledAutomaticallyWithBuildNumber")
$arguments.Add("buildNumber", $buildNumber)
$arguments.Add("Prerelease", $prerelease)
$arguments.Add("PrereleaseTag", $prereleaseTag)
$arguments.Add("BuildEngineName", "nuget")

Build-DeployableComponent @arguments

I’m pretty sure I’ve talked about our build process and common scripts before, so I’m not going to go into any more detail.

Prior to deployment, the only interesting output is the versioned Nuget file, containing all of the dependencies necessary to run the Lambda function (which in our case is just the file in my previous post).

On A Rail

When it comes to deploying the Lambda function code, its a little bit more complicated that our standard software deployments via Octopus Deploy.

In most cases, we create a versioned package containing all of the necessary logic to deploy the software, so in the case of an API it contains a deploy.ps1 script that gets run automatically during deployment, responsible for creating a website and configuring IIS on the local machine. The most important thing Octopus does for us to provide mechanisms to get this package to the place where it needs to be.

It usually does this via an Octopus Tentacle, a service that runs on the deployment target and allows for communication between the Octopus server and the machine in question.

This system kind of falls apart when you’re trying to deploy to an AWS Lambda function, which cannot have a tentacle installed on it.

Instead, we rely on the AWS API and what amounts to a worker machine sitting in each environment.

When we do a deployment of our Lambda function project, it gets copied to the worker machine (which is actually just the Octopus server) and it runs the deployment script backed into the package. This script then uses Octopus variables to package the code again (in a way that AWS likes, a simple zip file) and uses the AWS API to upload the changed code to the appropriate Lambda function (by following a naming convention).

The deployment script is pretty straightforward:

function Get-OctopusParameter
{
    [CmdletBinding()]
    param
    (
        [string]$key
    )

    if ($OctopusParameters -eq $null)
    {
        throw "No variable called OctopusParameters is available. This script should be executed as part of an Octopus deployment."
    }

    if (-not($OctopusParameters.ContainsKey($key)))
    {
        throw "The key [$key] could not be found in the set of OctopusParameters."
    }

    return $OctopusParameters[$key]
}

$VerbosePreference = "Continue"
$ErrorActionPreference = "Stop"

$here = Split-Path -Parent $MyInvocation.MyCommand.Path
. "$here\_Find-RootDirectory.ps1"


$rootDirectory = Find-RootDirectory $here
$rootDirectoryPath = $rootDirectory.FullName

$awsKey = $OctopusParameters["AWS.Deployment.Key"]
$awsSecret = $OctopusParameters["AWS.Deployment.Secret"]
$awsRegion = $OctopusParameters["AWS.Deployment.Region"]

$environment = $OctopusParameters["Octopus.Environment.Name"]
$version = $OctopusParameters["Octopus.Release.Number"]

. "$rootDirectoryPath\scripts\common\Functions-Aws.ps1"
$aws = Get-AwsCliExecutablePath

$env:AWS_ACCESS_KEY_ID = $awsKey
$env:AWS_SECRET_ACCESS_KEY = $awsSecret
$env:AWS_DEFAULT_REGION = $awsRegion

$functionPath = "$here\src\function"

Write-Verbose "Compressing lambda code file"
Add-Type -AssemblyName System.IO.Compression.FileSystem
[system.io.compression.zipfile]::CreateFromDirectory($functionPath, "index.zip")

Write-Verbose "Updating Log Processor lambda function to version [$environment/$version]"
(& $aws lambda update-function-code --function-name $environment-ELBLogProcessorFunction --zip-file fileb://index.zip) | Write-Verbose

Nothing fancy, just using the AWS CLI to deploy code to a known function.

Apprehension

AWS Lambda does provide some other mechanisms to deploy code, and we probably could have used them to accomplish the same sort of thing, but I like our patterns to stay consistent and I’m a big fan of the functionality that Octopus Deploy provides, so I didn’t want to give that up.

We had to make a few changes to our environment pattern to allow for non-machine based deployment, like:

  • Ensuring every automatically created environment has a machine registered in it representing the Octopus server (for running scripts that effect external systems)
    • This meant that our environment cleanup also needed to take this into account, deregistering the machine before trying to remove the Octopus environment
  • Providing a section at the end of the environment setup to deploy Octopus projects that aren’t related to specific machines
    • Previously all of our deployments happened via cfn-init on the EC2 instances in question

Once all of that was in place, it was pretty easy to deploy code to a Lambda function as part of setting up the environment, just like we would deploy code to an EC2 instance. It was one of those cases where I’m glad we wrap our usage of CloudFormation in Powershell, because if we were just using raw CloudFormation, I’m not sure how we would have integrated the usage of Octopus Deploy.

To Be Summarised

I’ve only got one more post left in this series, which will summarise the entire thing and link back to all the constituent parts.

Until then, I don’t really have anything else to say.

0 Comments

As part of the work I did recently to optimize our usage of RavenDB, I had to spin up a brand new environment for the service in question.

Thanks to our commitment to infrastructure as code (and our versioned environment creation scripts), this is normally a trivial process. Use TeamCity to trigger the execution of the latest environment package and then play the waiting game. To setup my test environment, all I had to do was supply the ID of the EC2 volume containing a snapshot of production data as an extra parameter.

Unfortunately, it failed.

Sometimes that happens (an unpleasant side effect of the nature of AWS), so I started it up again in the hopes that it would succeed this time.

It failed again.

Repeated failures are highly unusual, so I started digging. It looked as though the problem was that the environments were simply taking too long to “finish” provisioning. We allocate a fixed amount of time for our environments to complete, which includes the time required to create the AWS resources, deploy the necessary software and then wait until the resulting service can actually be hit from the expected URL.

Part of the environment setup is the execution of the CloudFormation template (and all of the AWS resources that go with that). The other part is deploying the software that needs to be present on those resources, which is where we use Octopus Deploy. At startup, each instance registers itself with the Octopus server, applying tags as appropriate and then a series of Octopus projects (i.e. the software) is deployed to the instance.

Everything seemed to be working fine from an AWS point of view, but the initialisation of the machines was taking too long. They were only getting through a few of their deployments before running out of time. It wasn’t a tight time limit either, we’d allocated a very generous 50 minutes for everything to be up and running, so it was disappointing to see that it was taking so long.

The cfn-init logs were very useful here, because they show start/end timestamps for each step specified. Each of the deployments was taking somewhere between 12-15 minutes, which is way too slow, because there are at least 8 of them on the most popular machine.

This is one of those cases where I’m glad I put the effort in to remotely extract log files as part of the reporting that happens whenever an environment fails. Its especially valuable when the environment creation is running in a completely automated fashion, like it does when executed through TeamCity, but its still very useful when debugging an environment locally. The last thing I want to do is manually remote into each machine in the environment and grab its log files in order to try and determine where a failure is occurring when I can just have a computer do it for me.

Speed Deprived Octopus

When using Octopus Deploy (at least the version we have, which is 2.6.4), you can choose to either deploy a specific version of a project or deploy the “latest” version, leaving the decision of what exactly is the latest version up to Octopus. I was somewhat uncomfortable with the concept of using “latest” for anything other than a brand new environment that had never existed before, because initial testing showed that “latest” really did mean the most recent release.

The last thing I wanted to do was to cede control over exactly what as getting deployed to a new production API instance when I had to scale up due to increased usage.

In order to alleviate this, we wrote a fairly simple Powershell script that uses the Octopus .NET Client library to find what the best version to deploy is, making sure to only use those releases that have actually been deployed to the environment, and discounting deployments that failed.

$deployments = $repository.Deployments.FindMany({ param($x) $x.EnvironmentId -eq $env.Id -and $x.ProjectId -eq $project.Id })

if ($deployments | Any)
{
    Write-Verbose "Deployments of project [$projectName] to environment [$environmentName] were found. Selecting the most recent successful deployment."
    $latestDeployment = $deployments |
        Sort -Descending -Property Created |
        First -Predicate { $repository.Tasks.Get($_.TaskId).FinishedSuccessfully -eq $true } -Default "latest"

    $release = $repository.Releases.Get($latestDeployment.ReleaseId)
}
else
{
    Write-Verbose "No deployments of project [$projectName] to environment [$environmentName] were found."
}

$version = if ($release -eq $null) { "latest" } else { $release.Version }

Write-Verbose "The version of the recent successful deployment of project [$projectName] to environment [$environmentName] was [$version]. 'latest' indicates no successful deployments, and will mean the very latest release version is used."

return $version

We use this script during the deployment of the Octopus projects that I mentioned above.

When it was first written, it was fast.

That was almost two years ago though, and now it was super slow, taking upwards of 10 minutes just to find what the most recent release was. As you can imagine, this was problematic for our environment provisioning, because each one had multiple projects being deployed during initialization and all of those delays added up very very quickly.

Age Often Leads to Slowdown

The slowdown didn’t just happen all of a sudden though. We had noticed the environment provisioning getting somewhat slower from time to time, but we’d assumed it was a result of there simply being more for our Octopus server to do (more projects, more deployments, more things happening at the same time). It wasn’t until it reached that critical point where I couldn’t even spin up one of our environments for testing purposes, that it needed to be addressed.

Looking at the statistics of the Octopus server during an environment provisioning, I noticed that it was using 100% CPU for the entire time that it was deploying projects, up until the point where the environment timed out.

Our Octopus server isn’t exactly a powerhouse, so I thought maybe it had just reached a point where it simply did not have enough power to do what we wanted it to do. This intuition was compounded by the knowledge that Octopus 2.6 uses RavenDB as its backend, and they’ve moved to SQL Server for Octopus 3, citing performance problems.

I dialed up the power (good old AWS), and then tried to provision the environment again.

Still timed out, even though the CPU was no longer maxxing out on the Octopus server.

My next suspicion was that I had made an error of some sort in the script, but there was nothing obviously slow about it. It makes a bounded query to Octopus via the client library (.FindMany with a Func specifying the environment and project, to narrow down the result set) and then iterates through the resulting deployments (by date, so working backwards in time) until it finds an acceptable deployment for identifying the best release.

Debugging the automated tests around the script, the slowest part seemed to be when it was getting the deployments from Octopus. Using Fiddler, the reason for this became obvious.

The call to .FindMany with a Func was not optimizing the resulting API calls using the information provided in the Func. The script had been getting slower and slower over time because every single new deployment would have to retrieve every single previous deployment from the server (including those unrelated to the current environment and project). This meant hundreds of queries to the Octopus API, just to get what version needed to be deployed.

The script was initially fast because I wrote it when we only had a few hundred deployments recorded in Octopus. We have thousands now and it grows every day.

What I thought was a bounded query, wasn’t bounded in any way.

But Age Also Means Wisdom

On the upside, the problem wasn’t particularly hard to fix.

The Octopus HTTP API is pretty good. In fact, it actually has specific query parameters for specifying environment and project when you’re using the deployments endpoint (which is another reason why i assumed the client would be optimised to use them based on the incoming parameters). All I had to do was rewrite the “last release to environment” script to use the API directly instead of relying on the client.

$environmentId = $env.Id;

$headers = @{"X-Octopus-ApiKey" = $octopusApiKey}
$uri = "$octopusServerUrl/api/deployments?environments=$environmentId&projects=$projectId"
while ($true) {
    Write-Verbose "Getting the next set of deployments from Octopus using the URI [$uri]"
    $deployments = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -Verbose:$false
    if (-not ($deployments.Items | Any))
    {
        $version = "latest";
        Write-Verbose "No deployments could not found for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)]. Returning the value [$version], which will indicate that the most recent release is to be used"
        return $version
    }

    Write-Verbose "Finding the first successful deployment in the set of deployments returned. There were [$($deployments.TotalResults)] total deployments, and we're currently searching through a set of [$($deployments.Items.Length)]"
    $successful = $deployments.Items | First -Predicate { 
        $uri = "$octopusServerUrl$($_.Links.Task)"; 
        Write-Verbose "Getting the task the deployment [$($_.Id)] was linked to using URI [$uri] to determine whether or not the deployment was successful"
        $task = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -Verbose:$false; 
        $task.FinishedSuccessfully; 
    } -Default "NONE"
    
    if ($successful -ne "NONE")
    {
        Write-Verbose "Finding the release associated with the successful deployment [$($successful.Id)]"
        $release = Invoke-RestMethod "$octopusServerUrl$($successful.Links.Release)" -Headers $headers -Method Get -Verbose:$false
        $version = $release.Version
        Write-Verbose "A successful deployment of project [$projectName ($projectId)] was found in environment [$environmentName ($environmentId)]. Returning the version of the release attached to that deployment, which was [$version]"
        return $version
    }

    Write-Verbose "Finished searching through the current page of deployments for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)] without finding a successful one. Trying the next page"
    $next = $deployments.Links."Page.Next"
    if ($next -eq $null)
    {
        Write-Verbose "There are no more deployments available for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)]. We're just going to return the string [latest] and hope for the best"
        return "latest"
    }
    else
    {
        $uri = "$octopusServerUrl$next"
    }
}

Its a little more code than the old one, but its much, much faster. 200 seconds (for our test project) down to something like 6 seconds, almost all of which was the result of optimizing the number of deployments that need to be interrogated to find the best release. The full test suite for our deployment scripts (which actually provisions test environments) halved its execution time too (from 60 minutes to 30), which was nice, because the tests were starting to get a bit too slow for my liking.

Summary

This was one of those cases where an issue had been present for a while, but we’d chalked it up to growth issues rather than an actual inefficiency in the code.

The most annoying part is that I did think of this specific problem when I originally wrote the script, which is why I included the winnowing parameters for environment and project. It’s pretty disappointing to see that the client did not optimize the resulting API calls by making use of those parameters, although I have a suspicion about that.

I think that if I were to run the same sort of code in .NET directly (rather than through Powershell) that the resulting call to .FindMany would actually be optimized in the way that I thought it was. I have absolutely no idea how Powershell handles the equivalent of Lambda expressions, so if it was just passing a pure Func through to the function instead of an Expression<Func>, there would be no obvious way for the client library to determine if it could use the optimized query string parameters on the API.

Still, I’m pretty happy with the end result, especially considering how easy it was to adapt the existing code to just use the HTTP API directly.

It helps that its a damn good API.

0 Comments

A long time ago, I wrote a post about fixing up our log management scripts to actually delete logs properly. Logs were being aggregated into our ELK stack and were then being deleted after reaching a certain age. It fixed everything perfectly and it was all fine, forever. The End

Or not.

The log management script itself was very simple (even though it was wrapped in a bunch of Powershell to schedule it via Windows Scheduled Task. It looked at a known directory (C:\logs), found all the files matching *.log, narrowed it down to only those files that had not been written to in the last 7 days and then deleted them. Not exactly rocket science.

As we got more and more usage on the services in question though, the log file generation started to outstrip the ability of the script to clean up.

They Can Fill You Up Inside

The problem was twofold, the log management script was hardcoded to only delete things that hadn’t been touched in the last 7 days and the installation of the scheduled task that ran the script was done during machine initialisation (as part of the cfn-init configuration). Changing the script would require refreshing the environment, which requires a migration (which is time consuming and still not perfect). Not only that, but changing the script for one service (i.e. to delete all logs older than a day), might not be the best thing for other services.

The solution was to not be stupid and deploy log management in the same way we deploy everything, Octopus Deploy.

With Octopus, we could build a package containing the all of the logic for how to deploy a log management solution, and use that package in any number of projects with customised parameters, like retention period, directory, filter, whatever.

More importantly, it would give us the ability to update existing log management deployments without having to alter their infrastructure configuration, which is important for minimising downtime and just generally staying sane.

They Are Stronger Than Your Drive

It was easy enough to create a Nuget package containing the logic for installing the log management scheduled task. All I had to do was create a new repository, add a reference to our common scripts and then create a deploy.ps1 file describing how the existing scheduled task setup script should be called during deployment.

In fact, the only difference from calling the script directly like we were previously, was that I grabbed variable values from the available Octopus parameters, and then slotted them into the script.

With a nice self contained Nuget package that would install a scheduled task for log management, the only thing left to do was create a new Octopus project to deploy the package.

I mostly use a 1-1 Nuget Package to Project strategy when using Octopus. I’ve found that its much easier to get a handle on the versions of components being deployed when each project only has 1 thing in it. In this case, the Nuget package was generic, and was then referenced from at least two projects, one to manage the logs on some API instances and the other to manage logs on a log shipper component (which gets and processes ELB logs).

In order to support a fully automated build, test and deployment cycle, I also created a TeamCity Build Configuration. Like all Build Configurations, it watches a repository for changes, runs tests, packages and then deploys to the appropriate environments. This one was slightly different though, in that it deployed multiple Octopus projects, rather than a single one like almost all of our other Build Configurations. I’m not sure how I feel about this yet (because its different from what we usually do), but its definitely easier to manage, especially since all the projects just use the same package anyway.

Conclusion

This post might not look like much, and to be honest it isn’t. Last weeks post on Expression Trees was a bit exhausting so I went for something much smaller this week.

That is not to say that the concepts covered herein are not important.

One thing that I’ve taken away from this is that if you ever think that installing something just once on machine initialization is enough, you’re probably wrong. Baking the installation of log management into our cfn-init steps caused us a whole bunch of problems once we realised we needed to update them because they didn’t quite do the job. It was much harder to change them on the fly without paying a ridiculous penalty in terms of downtime. It also felt really strange to have to migrate an entire environment just because the log files weren’t being cleaned up correctly.

When it comes down to it, the real lesson is to always pick the path that lets you react to change and deploy with the least amount of effort.

0 Comments

Its that time of the release cycle where you need to see if your system will actually stand up to people using it. Ideally this is the sort of thing you do as you are developing the solution (so the performance and load tests grow and evolve in parallel with the solution), but I didn’t get that luxury this time.

Now that I’m doing something new, that means another serious of blog posts about things that I didn’t understand before, and now partially understand! Mmmmmm, partial understanding, the goal of true champions.

At a high level, I want to break our new API. I want to find the point at which it just gives up, because there are too many users trying to do things, and I want to find out why it gives up. Then I want to fix that and do the whole process again, hopefully working my way up to how many users we are expecting + some wiggle room.

Over the next couple of weeks I’m going to be talking about JMeter (our load testing application of choice), Nxlog (a log shipping component for Windows), ELK (our log aggregator), AWS (not a huge amount) and some other things that I’ve done while working towards good, reproducible load tests.

Todays topic is something that has bugged me for a few weeks, ever since I implemented shipping logs from our API instances into our log aggregator.

Configurific

Our log shipper of choice is Nxlog, being that we primarily use Windows servers. The configuration for Nxlog, the part that says what logs to process and how and where to send them, is/was installed at initialization time for the AWS instances inside our auto scaling group. This seemed like a good idea at the time, and it got us a working solution fairly quickly, which is always helpful.

Once I started actually using the data in the log aggregator though, I realised very quickly that I needed to make fairly frequent changes to the configuration, to either add more things being logged (windows event log for example) or to change fields or fix bugs. I needed to be able to deploy changes, and I didn’t want to have to do that by manually copying configuration files directly onto the X number of machines in the group. I certainly didn’t want to recreate the whole environment every time I wanted to change the log configuration either.

In essence, the configuration quickly became very similar to a piece of software. Something that is deployed, rather than just incorporated into the environment once when it is created. I needed to be able to version the configuration, easily deploy new versions and to track what versions were installed where.

Sounds like a job for Octopus Deploy.

Many Gentle Yet Strong Arms

I’ve spoken about Octopus Deploy a few times before on this blog, but I’ll just reiterate one thing.

Its goddamn amazing.

Honestly, I’m not entirely sure how I lived without it. I assume with manual deployment of software and poor configuration management.

We use Octopus in a fairly standard way. When the instances are created in our environment they automatically install an Octopus Tentacle, and are tagged with appropriate roles. We then have a series of projects in Octopus that are are configured to deploy various things to machines in certain roles, segregated by environment. Nothing special, very standard Octopus stuff.

If I wanted to leverage Octopus to manage my log configuration, I would need to package up my configuration files (and any dependencies, like Nxlog itself) into NuGet packages. I would then have to configure at least one project to deploy the package to the appropriate machines.

Tasty Tasty Nougat

I’ve used NuGet a lot before, but I’ve never really had to manually create a package. Mostly I just publish .NET libraries, and NuGet packaging is well integrated into Visual Studio through the usage of .csproj files. You don’t generally have to think about the structure of the resulting package much, as its taken care of by whatever the output of the project is. Even when producing packages for Octopus, intended to be deployed instead of just referenced from other projects, Octopack takes are of all the heavy lifting, producing a self contained package with all of the dependencies inside.

This time, however, there was no project.

I needed to package up the installer for Nxlog, along with any configuration files needed. I would also need to include some instructions for how the files should be deployed.

Luckily I had already written some scripts for an automated deployment of Nxlog, which were executed as part of environment setup, so all I needed to do was package all of the elements together, and give myself a way to easily create and version the resulting package (so a build script).

I created a custom nuspec file to define the contents of the resulting package, and relied on the fact that Octopus runs appropriately named scripts at various times during the deployment lifecycle. For example, the Deploy.ps1 script is run during deployment. The Deploy script acts as an entry point and executes other scripts to silently install Nxlog, modify and copy the configuration file to the appropriate location and start the Nxlog service.

Alas Octopus does not appear to have any concept of uninstallation, so there is no way to include a cleanup component in the package. This would be particularly useful when deploying new versions of the package that now run a new version of Nxlog for example. Instead of having to include the uninstallation script inside the new package, you could deploy the original package with clean up code already encapsulated inside, ready to be run when uninstallation is called for (for example, just before installing the new package).

There was only one issue left to solve.

How to select the appropriate configuration file based on where the package is being deployed? I assumed that there will be multiple configuration files, because different machines require different logging. This is definitely not a one-size fits all kind of thing.

Selection Bias

In the end, the way in which I solved the configuration selection problem was a bit hacky. It worked (hacky solutions usually do), but it doesn’t feel nice and clean. I generally shy away from solutions like that, but as this continuous log configuration deployment was done while I was working on load/performance testing, I had to get something in place that worked, and I didn’t have time to sand off all the rough edges.

Essentially each unique configuration file gets its own project in Octopus, even though they all reference the same NuGet package. This project has a single step, which just installs the package onto the machines with the specified roles. Since machines could fill multiple roles (even if they don’t in our environments) I couldn’t use the current role to select the appropriate configuration file.

What I ended up doing was using the name of the project inside the Deploy script. The project name starts with NXLOG_ and the last part of the name is the name of the configuration file to select. Some simple string operations very quickly get the appropriate configuration file, and the project can target any number of roles and environments as necessary.

It definitely isn't the greatest solution, but it gets the job done. There are some other issues that I can foresee as well, like the package being installed multiple times on the same machine through different projects, which definitely won’t give the intended result. That's partially a factor of the way Nxlog works though,  so I’m not entirely to blame.

Results

In the end I managed to get something together that accomplished my goal. With the help of the build script (which will probably be incorporated into a TeamCity Build Configuration at some stage), I can now build, version and optionally deploy a package that contains Nxlog along with automated instructions on how and when to install it, as well as configuration describing what it should do. I can now make log changes locally on my machine, commit them, and have them appear on any number of servers in our AWS infrastructure within moments.

I couldn’t really ask for anything more.

I’ve already deployed about 50 tweaks to the log configuration since I implemented this continuous deployment, as a result of needing more information during load and performance testing, so I definitely think it has been worthwhile.

I like that the log configuration isn’t something that is just hacked together on each machine as well. Its controlled in source and we have a history of all configurations that were previously deployed, and can keep an eye on what package is active in what environments.

Treating the log configuration like I would treat an application was definitely the right choice.

I’ve knocked together a repository containing the source for the Nxlog package and all of the supporting elements (i.e. build scripts and tests). As always, I can’t promise that I will maintain it moving forward, as the actual repository I’m using is in our private BitBucket, but this should be enough for anyone interested in what I’ve written about above.

Next time: Load testing and JMeter. I write Java for the first time in like 10 years!