Pack Of Lies

March 14. 2017 0 Comments

I use Packer on and off. Mostly I use it to make Amazon Machine Images (AMI’s) for our environment management packages, specifically by creating Packer templates that operate on top of the Amazon supplied Windows Server images.

You should never use an Amazon supplied Windows Server AMI in your Auto Scaling Group Launch Configurations. These images are regularly retired, so if you’ve taken a dependency on one, there is a good chance it will disappear just when you need it most. Like when you need to auto-scale your API cluster because you’ve unknowingly burnt through all of the CPU credits you had on the machines slowly over the course of the last few months. What you should do is create an AMI of your own from the ones supplied by AWS so you can control its lifetime. Packer is a great tool for this.

A Packer template is basically a set of steps to execute on a virtual machine of some sort, where the core goal is to take some sort of baseline thing, apply a set of steps to it programmatically and end up with some sort of reusable thing out the other end. Like I mentioned earlier, we mostly deal in AWS AMI’s, but it can do a bunch of other things as well (VWWare, Docker, etc).

The benefits of using a Packer template for this sort of thing (instead of just doing it all manually) is reproducibility. Specifically, if you built your custom image using the AWS AMI for Windows Server 2012 6 months ago, you can go and grab the latest one from yesterday (with all of the patches and security upgrades), execute your template on it and you’ll be in a great position to upgrade all of the existing usages of your old custom AMI with minimal effort.

When using Packer templates though, you need to be cognizant of how errors are dealt with. Specifically:

Step failures appear to be indicated entirely by the exit code of the tool used in the step.

I’ve been bitten by this on two separate occasions.

A Powerful Cry For Help

Packer has much better support for Windows than it once did, but even taking that into account, Powershell steps can still be a troublesome beast.

The main issue with the Powershell executable is that if an error or exception occurs and terminates the process (i.e. its a terminating error of you have ErrorActionPreference set to Stop) the Powershell process itself still exits with zero.

In a sane world, an exit code of zero indicates success, which is what Packer expects (and most other automation tools like TeamCity/Octopus Deploy).

If you don’t take this into account, your Powershell steps may fail but the Packer execution will still succeed, giving you an artefact that hasn’t been configured the way it should have been.

Packer is pretty configurable though, and is very clear about the command that it uses to execute your Powershell steps. The great thing is, it also enables you to override that command, so you can customise your Powershell steps to exit with a non-zero code if an error occurs without actually having to change every line in your step to take that sort of thing into account.

Take this template excerpt below, which uses Powershell to set the timezone of the machine and turn off negative DNS result caching.

{
    "type": "powershell",
    "inline": [
        "tzutil.exe /s \"AUS Eastern Standard Time_dstoff\"",
        "[Microsoft.Win32.Registry]::SetValue('HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Dnscache\\Parameters','NegativeCacheTime',0,[Microsoft.Win32.RegistryValueKind]::DWord)"
    ],
    "execute_command": "powershell -Command \"$ErrorActionPreference = 'Stop'; try { & '{{.Path}}' } catch { Write-Warning $_; exit 1; } \""
}

The “execute_command” is the customisation, providing error handling for exceptions that occur during the execution of the Powershell snippet. Packer will take each line in that inline array, copy it to a file on the machine being setup (using WinRM) and then execute it using the command you specify. The {{.Path}} syntax is the Packer variable substitution and specifically refers to the path on the virtual machine that packer has copied the current command to. With this custom command in place, you have a much better chance of catching errors in your Powershell commands before they come back to bite you later on.

So Tasty

In a similar vein to the failures with Powershell above, be careful when doing package installs via yum on Linux.

The standard “yum install” command will not necessarily exit with a non-zero code when a package fails to install. Sometimes it will, but if a package couldn’t be found (maybe you misconfigured the repository or something) it still exits with a zero.

This can throw a pretty big spanner in the works when you’re expecting your AMI to have Elasticsearch on it (for example) and it just doesn’t because the package installation failed but Packer thought everything was fine.

Unfortunately, there is no easy way to get around this like there is for the Powershell example above, but you can mitigate it by just adding an extra step after your package install that validates the package was actually installed.

{
    "type" : "shell",
    "inline" : [
        "sudo yum remove java-1.7.0-openjdk -y",
        "sudo yum install java-1.8.0 -y",
        "sudo yum update -y",
        "sudo sh -c 'echo \"[logstash-5.x]\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo sh -c 'echo \"name=Elastic repsitory for 5.x packages\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo sh -c 'echo \"baseurl=https://artifacts.elastic.co/packages/5.x/yum\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo sh -c 'echo \"gpgcheck=1\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo sh -c 'echo \"gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo sh -c 'echo \"enabled=1\" >> /etc/yum.repos.d/logstash.repo'",
        "sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch",
        "sudo yum install logstash-5.2.2 -y",
        "sudo rpm --query logstash-5.2.2"
    ]
}

In the example above, the validation is the rpm --query command after the yum install. It will return a non-zero exit code (and thus fail the Packer execution) if the package with that version is not installed.

Conclusion

Packer is an incredibly powerful automation tool for dealing with a variety of virtual machine platforms and I highly recommend using it.

If you’re going to use it though, you need to understand what failure means in your specific case, and you need to take that into account when you decide how to signal to the Packer engine that something isn’t right.

For me, I prefer to treat every error as critical, because I prefer to deal with them at the time the AMI is created, rather than 6 months later when I try to use the AMI and can’t figure out why the Windows Firewall on an internal API instance is blocking requests from its ELB. Not that that has ever happened of course.

In order to accomplish this lofty goal of dealing with errors ASAP you need to understand how each one of your steps (and the applications and tools they use) communicate failure, and then make sure they all communicate that appropriately in a way Packer can understand.

Understanding how to deal with failure is useful outside Packer too.