The Case of the Slow Package Upload

October 25. 2016 0 Comments

Unfortunately for us, we had to move our TeamCity build server into another AWS account recently, which is never a pleasant experience, though it is often eye opening.

We had to do this for a number of reasons, but the top two were:

We’re consolidating some of our many AWS accounts, to more easily manage them across the organisation.
We recently sold part of our company, and one of the assets included in that sale was a domain name that we were using to host our TeamCity server on.

Not the end of the world, but annoying all the same.

Originally our goal was to do a complete refresh of TeamCity, copying it into a new AWS account, with a new URL and upgrading it to the latest version. We were already going to be disrupted for a few days, so we thought we might as well make it count. We’re using TeamCity 8, and the latest is 10, which is a pretty big jump, but we were hopeful that it would upgrade without a major incident.

I should have remembered that in software development, hope it for fools. After the upgrade to TeamCity 10, the server hung on initializing for long enough that we got tired of waiting (and I’m pretty patient).

So we abandoned the upgrade, settling for moving TeamCity and rehosting it at a different URL that we still had legal ownership of.

That went relatively well. We needed to adapt some of our existing security groups in order to correctly grant access to various resources from the new TeamCity Server/Build Agents, but nothing we hadn’t dealt with a hundred times before.

Our builds seemed to be working fine, compiling, running tests and uploading nuget packages to either MyGet or Octopus Deploy as necessary.

As we executed more and more builds though, some of them started to fail.

Failures Are Opportunities To Get Angry

All failing builds were stopping in the same place, when uploading the nuget package at the end of the process. Builds uploading to Octopus Deploy were fine (its a server within the same AWS VPC, so that’s not surprising), but a random sampling of builds uploading packages to MyGet had issues.

Investigating, the common theme of the failing builds was largish packages. Not huge, but at least 10 MB. The nuget push call would timeout after 100s, trying a few times, but always experiencing the same issue.

With 26 MB of data required to be uploaded for one of our packages (13 MB package, 13 MB symbols, probably should optimize that), this meant that the total upload speed we were getting were < 300 KBps, which is pretty ridiculously low for something literally inside a data centre.

The strange thing was, we’d never had an issue with uploading large packages before. It wasn’t until we moved TeamCity and the Build Agents into a new AWS account that we started having problems.

Looking into the network configuration, the main differences I could determine were:

The old configuration used a proxy to get to the greater internet. Proxies are the devil, and I hate them, so when we moved into the new AWS account, we put NAT gateways in place instead. Invisible to applications, a NAT gateway is a far easier way to give internet access to machines that do not need to be exposed on the internet directly.
Being a completely different AWS account means that there is a good chance those resources would be spun up on entirely different hardware. Our previous components were pretty long lived, so they had consistently been running on the same stuff for months.

At first I thought maybe the NAT gateway had some sort of upload limit, but uploading large payloads to other websites was incredibly fast. With no special rules in place for accessing the greater internet, the slow uploads to MyGet were an intensely annoying mystery.

There was another thing as well. We wrap our usages of Nuget.exe in Powershell functions, specifically to ensure we’re using the various settings consistently. One of the settings we were setting by default with each usage of push, was the timeout. It wasn’t set to 100 seconds though, it was set to 600.

Bugs, Bugs, Bugs

A while back I had to upgrade to the latest Nuget 3.5 release candidate in order to get a fix for a bug that was stopping us from deploying empty files from a package. Its a long story, but it wasn’t something we could easily change. Unfortunately, the latest release candidate also has a regression in it where the timeout for the push is locked at 100 seconds, no matter what you do.

Its been fixed since, but there isn’t another release candidate yet.

Rolling back to a version that allows the timeout to work correctly, stops the other thing from working.

That whole song and dance is how software feels sometimes.

With no obvious way to simply increase the timeout, and because all other traffic seemed to be perfectly fine, it was time to contact MyGet support.

They responded that its something they’ve seen before, but they do not know the root cause. It appears to be an issue with the way that AWS is routing traffic to their Azure hosts. It doesn’t happen all the time, but when it does, it tanks performance. They suggested recycling the NAT gateway to potentially get it on new hardware (and thus give it a chance at getting access to better routes), but we tried that and it didn’t make a difference. We’ve since sent them some detailed fiddler and network logs to help them diagnose the issue, but I wouldn’t be surprised if it was something completely out of their control.

On the upside, we did actually have a solution that was already working.

Our old proxy.

It hadn’t been shut down yet, so we configured the brand new shiny build agents to use the old proxy and lo and behold, packages uploaded in a reasonable time.

This at least unblocked our build pipeline so that other things could happen while we continue to investigate.

Conclusion

Disappointingly, that’s where this blog post ends. The solution that we put into place temporarily with the old proxy (and I really hate proxies) is a terrible hack, and we’re going to have to spend some significant effort fixing it properly because if that proxy instance dies, we could be returned to exactly the same place without warning (if the underlying issue is something to do with routing that is out of control).

Networking issues like the one I’ve described above are some of the most frustrating, especially because they can happen when you least expect it.

Not only are they basically unfathomable, there is very little you can do to actively fix the issue other than moving your traffic around trying to find a better spot.

Of course, we can also optimise the content of our packages to be as small as possible, hopefully making all of our artifacts small enough to be uploaded with the paltry amount of bandwidth available.

Having a fixed version of Nuget would be super nice as well.

A Deftly Encoded Mystery

July 5. 2016 0 Comments

We use TeamCity as our Continuous Integration tool.

Unfortunately, our setup hasn’t been given as much love as it should have. Its not bad (or broken or any of those things), its just not quite as well setup as it could be, which increases the risk that it will break and makes it harder to manage (and change and keep up to date) than it could be.

As with everything that has got a bit crusty around the edges over time, the only real way to attack it while still delivering value to the business is by doing it slowly, piece by piece, over what seems like an inordinate amount of time. The key is to minimise disruption, while still making progress on the bigger picture.

Our setup is fairly simple. A centralised TeamCity server and at least 3 Build Agents capable of building all of our software components. We host all of this in AWS, but unfortunately, it was built before we started consistently using CloudFormation and infrastructure as code, so it was all manually setup.

Recently, we started using a few EC2 spot instances to provide extra build capabilities without dramatically increasing our costs. This worked fairly well, up until the spot price spiked and we lost the build agents. We used persistent requests, so they came back, but they needed to be configured again before they would hook up to TeamCity because of the manual way in which they were provisioned.

There’s been a lot of instability in the spot price recently, so we were dealing with this manual setup on a daily basis (sometimes multiple times per day), which got old really quickly.

You know what they say.

“If you want something painful automated, make a developer responsible for doing it manually and then just wait.”

Its Automatic

The goal was simple.

We needed to configure the spot Build Agents to automatically bootstrap themselves on startup.

On the upside, the entire process wasn’t completely manual. We were at least spinning up the instances from a pre-built AMI that already had all of the dependencies for our older, crappier components as well as an unconfigured TeamCity Build Agent on it, so we didn’t have to automate absolutely everything.

The bootstrapping would need to tag the instance appropriately (because for some reason spot instances don’t inherit the tags of the spot request), configure the Build Agent and then start it up so it would connect to TeamCity. Ideally, it would also register and authorize the Build Agent, but if we used controlled authorization tokens we could avoid this step by just authorizing the agents once. Then they would automatically reappear each time the spot instance came back,.

So tagging, configuring, service start, using Powershell, with the script baked into the AMI. During provisioning we would supply some UserData that would execute the script.

Not too complicated.

Like Graffiti, Except Useful

Tagging an EC2 instance is pretty easy thanks to the multitude of toolsets that Amazon provides. Our tool of choice is the Powershell cmdlets, so the actual tagging was a simple task.

Getting permission to the do the tagging was another story.

We’re pretty careful with our credentials these days, for some reason, so we wanted to make sure that we weren’t supply and persisting any credentials in the bootstrapping script. That means IAM.

One of the key features of the Powershell cmdlets (and most of the Amazon supplied tools) is that they are supposed to automatically grab credentials if they are being run on an EC2 instance that currently has an instance profile associated with it.

For some reason, this would just not work. We tried a number of different things to get this to work (including updating the version of the Powershell cmdlets we were using), but in the end we had to resort to calling the instance metadata service directly to grab some credentials.

Obviously the instance profile that we applied to the instance represented a role with a policy that only had permissions to alter tags. Minimal permission set and all that.

Service With a Smile

Starting/stopping services with Powershell is trivial, and for once, something weird didn’t happen causing us to burn days while we tracked down some obscure bug that only manifests in our particular use case.

I was as surprised as you are.

Configuration Is Key

The final step should have been relatively simple.

Take a file with some replacement tokens, read it, replace the tokens with appropriate values, write it back.

Except it just wouldn’t work.

After editing the file with Powershell (a relatively simple Get-Content | For-Each { $_ –replace {token}, {value} } | Out-File) the TeamCity Build Agent would refuse to load.

Checking the log file, its biggest (and only) complaint was that the serverUrl (which is the location of the TeamCity server) was not set.

This was incredibly confusing, because the file clearly had a serverUrl value in it.

I tried a number of different things to determine the root cause of the issue, including:

Permissions? Was the file accidentially locked by TeamCity such that the Build Agent service couldn’t access it?
Did the rewrite of the tokens somehow change the format of the file (extra spaces, CR LF when it was just expecting LF)
Was the serverUrl actually configured, but inaccessible for some reason (machine proxy settings for example) and the problem was actually occurring not when the file was rewritten but when the script was setting up the AWS Powershell cmdlets proxy settings?

Long story short, it turns out that Powershell doesn’t remember file encoding when using the Out-File functionality in the way we were using it. It was changing the Byte Order Mark (BOM) on the file from ASCII to Unicode Little Endian, and the Build Agent did not like that (it didn’t throw an encoding error either, which is super annoying, but whatever).

The error message was both a red herring (yes the it was configured) and also truthly (the Build Agent was incapable of reading the serverUrl).

Putting It All Together

With all the pieces in place, it was a relatively simple matter to create a new AMI with those scripts baked into it and put it to work straightaway.

Of course, I’d been doing this the whole time in order to test the process, so I certainly had a lot of failures building up to the final deployment.

Conclusion

Even simple automation can prove to be time consuming, especially when you run into weird unforseen problems like components not performing as advertised or even reporting correct errors for you to use for debugging purposes.

Still, it was worth it.

Now I never have to manually configure those damn spot instances when they come back.

And satisfaction is worth its weight in gold.

Many Logstash Makes Light Work

December 29. 2015 0 Comments

The L in ELK stands for Logstash. This is not news.

When we put together the system we use for log aggregation, we needed to find a way to deal with Logstash such that it would fit into our relatively new (and still maturing) model for automated build and deployment. We put together a system that stores a deployable version of Logstash in Bitbucket, a build/package script written in Powershell that puts everything together into a Nuget package, TeamCity to listen for changes and then Octopus Deploy for managing the deployment.

The difficult part in this was that each place that required Logstash required a different configuration (because it was processing different files or transforming the data in different ways). There was a lot of commonality, but it was mostly in the actual installation and management of Logstash, rather than the configuration Logstash happened to be running at any particular point in time.

We have one git repository that contains everything necessary to run a copy of Logstash on a machine. This includes Logstash itself, the Java Runtime Environment (JRE) and a set of Powershell scripts that allow us to install Logstash as a Windows Service and choose which configuration it should be running. This meant that all of the configuration files for each of our Logstash deployments all lived in the same repository.

This worked okay for a little while, until we started adding more configuration files.

One Change Makes Many Work?

Suddenly, we had a bunch Build Configurations in TeamCity triggering off changes to the one repository. They only triggered off changes to their own Logstash configurations at least, but they all triggered whenever we made changes to Logstash itself or to the scripts surrounding it. Technically, they were all building (and deploying) the same package (X.Logging.Logstash.nupkg), but each would generate a different version, and deploy a different Octopos Deploy project. Luckily our versions are based off the current time, so it wasn’t like the version kept going up and down (because of the difference in number of builds), but there was the occasional conflict when two TeamCity tasks just happened to build on two different Build Agents within a few seconds of each other (which would generate identical packages).

The bigger issue was that each package was over 100MB! Logstash is a good 80MB by itself and the JRE is another 40MB, so once you add in a MB or two for everything else, your package is huge.

Yes, technically we could deal with this issue by making sure Java and Logstash are installed on the target machines ahead of time, but I don’t like this from an isolation/self-reliance point of view. I want to be able to push a package with the minimal dependencies already in existence on the target machine, ideally just an Octopus tentacle and Powershell (for Windows boxes anyway). Anything else that is required should be contained within the package itself (or, in extreme cases, bootstrapped from somewhere after deployment (but that just moves the problem around slightly)).

Suddenly a checkin to the repository would gum up our CI build agents with a set of relatively unimportant tasks, stopping other builds from progressing and interrupting peoples work.

The easiest (but somewhat fiddly) solution was to split the concept of an installed version of Logstash from the configuration it happened to be running. With this approach we could deploy Logstash to a target machine once and then not have to pay the price of shifting that much data over the network every single time we wanted to alter the config. When we did want to upgrade Logstash, we could simply build a new version and have it deployed in the same way.

The plan was relatively simple. Create one repository for a deployable version of Logstash by itself (making sure to generalise it enough such that you could easily point it at any configuration you wanted) and then split out each configuration into a repository of its own. Whenever Logstash changes, it would be built and published to Octopus, but nothing would be deployed. Each configuration repository would be able to choose to upgrade to the new version (by changing the dependency in source) and then TeamCity would pick up the changes and run the normal build/package/deploy cycle for that configuration.

Executions Are Tiresome

As is almost always the case, coming up with the idea and its execution plan was a lot more fun than actually doing it.

The Logstash repository that we had already was pretty tightly coupled to the way it handled configuration. It actually used the current Octopus Project name during the installation to determine the configuration that it should be running, and each configuration really only consisted of a single Logstash conf file.

The first task was to generalise the Logstash installation, so that we could deploy it separately and then have the configuration projects use it via a known interface. Nothing particularly interesting here from a design standpoint, just a few Powershell functions. Execute-Logstash, Install-LogstashService (and its counterpart, Remove-LogstashService) and some helpers for generating configuration files based on templates (because sometimes during deployment you need to be able to substitute some deployment specific values into your configuration, like AWS Instance Id).

The next task was taking one of the current configuration files and converting it into the new pattern, a repository of its own. This repository would need to contain everything necessary for the configuration of a Logstash instance, plus some tests to verify that the config file works as expected when given a set of known inputs.

Its not overly easy to test a Logstash configuration, especially when it has never been tested before. Like all code, you need to be able to substitute certain values (like the location of the log files to read) and then figure out a way to measure the outcome, without changing the actual configuration too much. The approach I settled on was to parameterise the log locations like I mentioned above and to add an additional output during testing that wrote everything to a file. That way I could read the file in and check to make sure that it was outputting as many lines as I expected.

The last task was to rebuild the Octopus Deploy project for the configuration to deploy both the Logstash deployable component and the configuration and verify that it installed and ran correctly on deployment. The most difficult part here was that different versions of each component were required, so we had to extend our Octopus scripts to handle this properly (i.e. step 1 which deploys Logstash needs to know that it should deploy version 1.5.2, but step 2 needs to deploy version 1.0.15235.12789 of the actual configuration).

Conclusion

I really should have structured our Logstash deployments in this way from the start. Its almost always better to separate configuration from application code, especially when you don’t control the application. Often you will find that configuration changes a lot more than the code does, and when the code itself is quite large (as is the case with Logstash and its dependencies) it can get quite painful shifting all of those bytes around for no real reason.

But, alas, you can never see into the future with any sort of clarity, and you need to be willing to change things that you’ve done in the past when problems are identified or a better solution comes along.

Can’t get sentimental about these things.

Environmental Teamwork in the City

July 7. 2015 0 Comments

Posted in:
teamcity
automation

God that title is terrible. Sorry.

If you’ve ever read any of my blog before, you would know that I’ve put a significant amount of effort into making sure that our environments can be spun up and down easily. A good example of this is the environment setup inside the Solavirum.Testing.JMeter repository, which allows you to easily setup a group of JMeter worker machines for the purposes of load testing.

These environment setup scripts are all well and good, but I really don’t like the fact that I end up executing them from my local machine. This can breed a host of problems, in the same way that compiling and distributing code from your own machine can. Its fine for the rarely provisioned environments (like the JMeter workers I mentioned above) but its definitely not suitable for provisioning a CI environment for a web service or anything similar.

Additionally, when you provision environments from a developer machine you risk accidentally provisioning an environment with changes that have not been committed into source control. This can be useful for testing purposes, but ultimately leads to issues with configuration management. Its nice to tag the commit that the environment was provisioned from as well, on both sides (in source control and on the environment itself, just like versioning a library or executable).

Luckily we already have a platform in place for centralizing our compiling and packaging, and I should be able to use it to do environment management as well.

TeamCity.

I’m Not Sure Why Its Called TeamCity

If you’re unfamiliar with TeamCity, its similar to Jenkins. If you’re unfamiliar with Jenkins, its similar to TeamCity.

Ha.

Anyway, TeamCity is a CI (Continuous Integration) service. It allows you to setup build definitions and then run them on various triggers to produce artifacts (like installers, or Nuget packages). It does a lot more to be honest, but its core idea is about automating the build process in a controlled environment, free from the tyranny of developer machines.

The team here was using TeamCity before I started, so they already had some components being built, including a 20 step monstrosity that I will one day smooth out.

As a general rule, I love a good CI server, but I much prefer to automate in something like Powershell, so that it can be run locally if need be (for debugging/investigation purposes), so I’m wary of putting too much logic inside the actual CI server configuration itself. I definitely like to use the CI server for scheduling, history and other ancillary services (like tagging on successful builds and so on) though, the things that you don’t necessarily need when running a build locally.

Anyway, the meat of this blog post is about automating environment management using TeamCity, so I should probably talk about that now.

Most of my environment provisioning scripts have the same structure (with a few notable exceptions), so it was easy enough to create a Template is TeamCity to automate the destruction and recreation of an environment via scripts that already exist. The template was simple, a link to a git repository (configurable by repo name), a simple build step that just runs some Powershell, a trigger and some build parameters.

The only thing I can really copy here is the Powershell build step, so here it is:

try
{
    if ("%teamcity.build.branch.is_default%" -eq "false") 
    {
        Write-Error "Cannot create environment from non-default branch (i.e. not master)."
        exit 1
    }

    . ".\scripts\build\_Find-RootDirectory.ps1"

    $rootDirectory = Find-RootDirectory "."
    $rootDirectoryPath = $rootDirectory.FullName

    # Consider adding Migration check and run that instead if it exists.
    $environment = "%environment%"
    
    $restrictedEnvironmentRegex = "prod"
    if ($environment -match $restrictedEnvironmentRegex)
    {
        write-error "No. You've selected the environment named [$environment] to create, and it matches the regex [$restrictedEnvironmentRegex]. Think harder before you do this."
        Write-Host "##teamcity[buildProblem description='Restricted Environment Selected']"
        exit 1
    }
    
    $invokeNewEnvironmentPath = ".\scripts\environment\Invoke-NewEnvironment.ps1"
    $invokeDeleteEnvironmentPath = ".\scripts\environment\Invoke-DeleteEnvironment.ps1"

    if(-not (test-path $invokeNewEnvironmentPath) -or -not (test-path $invokeDeleteEnvironmentPath))
    {
        write-error "One of the expected environment management scripts (New: [$invokeNewEnvironmentPath], Delete: [$invokeDeleteEnvironmentPath]) could not be found."
        Write-Host "##teamcity[buildProblem description='Missing Environment Management Scripts']"
        exit 1
    }
    
    $bootstrapPath = ".\scripts\Functions-Bootstrap.ps1"
    if (-not (test-path $bootstrapPath))
    {
        Write-Warning "The bootstrap functions were not available at [$bootstrapPath]. This might not be important if everything is already present in the repository."
    }
    else
    {
        . $bootstrapPath
        Ensure-CommonScriptsAvailable
    }
    
    $octopusUrl = "%octopusdeploy-server-url%"
    $octopusApiKey = "%octopusdeploy-apikey%"
    $awsKey = "%environment-deployment-aws-key%"
    $awsSecret = "%environment-deployment-aws-secret%"
    $awsRegion =  "%environment-deployment-aws-region%"
    
    $arguments = @{}
    $arguments.Add("-Verbose", $true)
    $arguments.Add("-AwsKey", $awsKey)
    $arguments.Add("-AwsSecret", $awsSecret)
    $arguments.Add("-AwsRegion", $awsRegion)
    $arguments.Add("-OctopusApiKey", $octopusApiKey)
    $arguments.Add("-OctopusServerUrl", $octopusUrl)
    $arguments.Add("-EnvironmentName", $environment)
    
    try
    {
        Write-Host "##teamcity[blockOpened name='Delete Environment']"
        Write-Host "##teamcity[buildStatus text='Deleting $environment']"
        & $invokeDeleteEnvironmentPath @arguments
        Write-Host "##teamcity[buildStatus text='$environment Deleted']"
        Write-Host "##teamcity[blockClosed name='Delete Environment']"
    }
    catch
    {
        write-error $_
        Write-Host "##teamcity[buildProblem description='$environment Deletion Failed']"
        exit 1
    }

    try
    {
        $recreate = "%environment-recreate%"
        if ($recreate -eq "true")
        {
            Write-Host "##teamcity[blockOpened name='Create Environment']"
            Write-Host "##teamcity[buildStatus text='Creating $environment']"
            & $invokeNewEnvironmentPath @arguments
            Write-Host "##teamcity[buildStatus text='$environment Created']"
            Write-Host "##teamcity[blockClosed name='Create Environment']"
        }
    }
    catch
    {
        write-error $_
        Write-Host "##teamcity[buildProblem description='$environment Created Failed']"
        exit 1
    }
}
catch 
{
    write-error $_
    Write-Host "##teamcity[buildProblem description='$environment Created Failed']"
    exit 1
}

Once I had the template, I created new build configurations for each environment I was interested in, and filled them out appropriately.

Now I could recreate an entire environment just by clicking a button in TeamCity, and every successful recreation was tagged appropriately in Source Control, which was awesome. Now I had some traceability.

The final step was to schedule an automatic recreation of each CI environment every morning, to constantly validate our scripts and make sure they work appropriately.

Future Improvements

Alas, I ran into one of the most annoying parts of TeamCity. After the initial 20, licensing is partially based on build configurations. We already had a significant amount of configs, so I ran out before I could implement a build configuration to do a nightly tear down of environments that don’t need to exist overnight (for example all our CI environments). I had to settle for merely recreating them each morning (tear down followed by spin up), which at least verifies that the scripts continue to work.

If I could change build parameters based on a Trigger in TeamCity that would also work, but that’s a missing feature for now. I could simply set up two triggers, one for the morning to recreate and the other for the evening to tear down (where they both execute the same script, just with different inputs). This has been a requested feature of TeamCity for a while now, so I hope they get to it at some stage.

I’ll rectify this as soon as we get more build configurations. Which actually leads nicely into my next point.

So, What’s It Cost

Its free! Kinda.

Its free for 3 build agents and 20 build configurations. You can choose to buy another agent + 10 build configs for a token amount (currently $300 US), or you can choose to buy unlimited build configurations (the Enterprise edition) for another token amount (currently $2000 US).

If you’re anything like me, and you love to automate all the things, you will run out of build configurations far before you need more build agents.

I made the mistake of getting two build agent + configs packs through our organization before I realized that I should have just bought the Enterprise edition, and now I’m having a hard time justifying its purchase to the people what control the money. Unfortunate, but I’m sure I’ll convince them in time, and we’ve got an extra 2 build agents as well, so that’s always nice.

Jetbrains (the creators of TeamCity) were kind of annoying in this situation actually. We wanted to switch to Enterprise, and realized we didn’t need the build agents (yet), but they wouldn’t do us a deal. I can understand that its just probably their policy, but its still annoying.

Summary

I’m never happy unless what I’ve done can be executed on a machine completely different from my own, without a lot of futzing about. I’m a big proponent for “it should just work”, and having the environments triggered from TeamCity enforces that sort of thinking. Our build agents are pretty vanilla as far as our newer code is concerned (our legacy code has some nasty installed dependencies that I won’t go into detail about), so being able to execute the environment provisioning through TeamCity constantly validates that the code works straight out of source control.

It also lets other people create environments too, and essentially documents the usage of the environment provisioning scripts.

I get some nice side effects from doing environment management in TeamCtiy as well, the most important of which is the ability to easily tag when environments were provisioned (and from what commit) in source control.

Now I just need more build configurations…

Shaving Yaks Is What I Do

May 5. 2015 0 Comments

This will hopefully be a short post, as I haven’t been up to anything particularly interesting recently. We deployed the service we’d been working on, and everything appears to be going swimmingly. The log aggregation that we setup is doing its job (allowing us high visibility into the operations of the service) and we’re slowly introducing new users, making sure they have a great experience.

As a result of the recent deployment, we’re in a bit of a lull, so I took this opportunity to work on our development infrastructure a little bit.

Well, I didn’t start out with the intent to work on our infrastructure. I just wanted to update this component we have that seeds demonstration data into our new service. When I got into the code though, I found that we had a bunch of shared code that was duplicated. Mostly code for dealing with the models inherent in the service and with authentication to the service. The approach that had been taken was to simply copy the code around, which is not generally a good idea in the long run. The demonstration data component didn’t have the copied code (yay!) but it was referencing the libraries of another solution directly from their output directory, which meant you had to remember to compile that other solution before the demo data, which is a bit crap.

All in all, it was in need of a clean up.

Fairly straight forward, split the common code into a different project and repository, create a new TeamCity build configuration to make it output a Nuget Package and reference it like that.

Its never straightforward.

Free As In Beer

We currently use TeamCity as our Nuget server, with the outputs of various build configurations being added to the Nuget server when they are compiled. The downside of this is that you can’t actually publish packages to the TeamCity Nuget server from outside of TeamCity. Another downside of the TeamCity Nuget server is that the packages are tied to the builds that made them. We have had issues in the past with packages disappearing from the the Nuget server because the build that created the package was cleaned up. It makes sense when you think about it, and we should definitely have pinned those builds to keep them forever, but its still another thing that we need to remember to do, which is vulnerable to human failure.

The free version of TeamCity is pretty great. 3 build agents and 20 build configurations are usually enough for small teams, and the product itself is pretty amazing. When you hit that build configuration limit though, you need to pay a few hundred dollars. That’s fine, its a solid product and I have no problems in giving JetBrains (the company that make TeamCity) money for their efforts.

Getting money out of some organizations though, that can be a giant pain. Which it was.

Time for an alternate approach. Time to setup our own Nuget server.

Nuget makes this pretty easy. Just create a new, blank ASP.NET web application and add a reference to the Nuget Server package. Easy.

When you’re building a piece of infrastructure though, you want to go to a bit of trouble to make it nicely encapsulated. So I leveraged some work we’d done as part of the new service and created a deployable package for our Nuget server, an Octopus project to deploy it and an environment for it to be deployed to via AWS CloudFormation. While I was there I added the logging components to make sure we have records of what is being uploaded and downloaded from the server.

Yak Shaved, Moving On

Now that I had a Nuget server separate from TeamCity, I could split out the common components into their own repository and construct a build/publish script to publish to the Nuget server from my local machine (with the intent to turn it into a TeamCity build configuration later, when we finally pay money).

I’ve done this before so I just grabbed my .NET Library Repository template and added the new solution with the project and its tests. The build script in the template automatically picks up the solution file, restores its packages, automatically generates a version, compiles it with MSBuild and then publishes a package for the single non-test project. Its pretty simple, I just use it to get reusable libraries up and running ASAP.

I did have a few difficulties with untangling the common code from the rest, which is probably why it was copied in the first place. The easiest way to go about this sort of thing is to create a brand new project and namespace and just migrate the minimum amount of code that is needed by the other components. If you try and take everything at once, you’re going to have a bad time.

Now with the magic of Nuget I just need to add a reference to the new package in the various places that need it and delete the duplicated code, running the tests (where they exist) to ensure that I haven’t broken anything.

Summary

Its important for the health of your code base to execute this sort of clean up when you see an issue like duplicated code. Some people would say that some duplicated code isn't too bad, and I would generally agree. However, you need to be able to tell a very good story for why the code is duplicated, and not just accept that it is. Its usually not too hard to extract a library and create a Nuget package, assuming you have the infrastructure setup to support it. Which you should. If you don’t, you should set it up.

If you’ve never heard the team “Shaving a Yak” before, you can blame Scott Hanselman for that one. He did a blog post on the concept and I have never seen anything that better describes an average day in the life of a software developer. I use the term a lot whenever I’m doing something I never expected to be doing due to some crazy causality chain, which happens more than I would like.