Building A Better Beast, Part 3

April 18. 2017 0 Comments

Posted in:
elk
logging
octopus

Continuing on from last week, its time to talk software and the configuration thereof.

With the environments themselves being managed by Octopus (i.e. the underlying infrastructure), we need to deal with the software side of things.

Four of the five components in the new ELK stack require configuration of some sort in order to work properly:

Logstash requires configuration to tell it how to get events, filter/process then and where to put them, so the Broker and Indexer layers require different, but similar, configuration.
Elasticsearch requires configuration for a variety of reasons including cluster naming, port setup, memory limits and so on.
Kibana requires configuration to know where Elasticsearch is, and a few other things.

For me, configuration is a hell of a lot more likely to change than the software itself, although with the pace that some organisations release software, that might not always be strictly true. Also, coming from a primarily Windows background, software is traditionally a lot more difficult to install and setup, and by that logic is not something you want to do all the time.

Taking those things into account, I’ve found it helpful to separate the installation of the software from the configuration of the software. What this means in practice is that a particular version of the software itself will be baked into an AMI, and then the configuration of that software will be handled via Octopus Deploy whenever a machine is created from the AMI.

Using AMIs or Docker Images to create immutable software + configuration artefacts is also a valid approach, and is superior in a lot of respects. It makes dynamic scaling easier (by facilitating a quick startup of fully functional nodes), helps with testing and generally simplifies the entire process. Docker Images in particular are something that I would love to explore in the future, just not right at this moment.

The good news is that this is pretty much exactly what the new stack was already doing, so we only need to make a few minor improvements and we’re good to go.

The Lament Configuration

As I mentioned in the first post in this series, software configuration was already being handled by TeamCity/Nuget/Octopus Deploy, it just needed to be cleaned up a bit. First thing was to move the configuration out into its own repository as appropriate for each layer and rewrite TeamCity as necessary. The Single Responsibility Principle doesn’t just apply to classes after all.

The next part is something of a personal preference, and it relates to the logic around deployment. All of the existing configuration deployments in the new stack had their logic (i.e. where to copy the files on the target machine, how to start/restart services and so on) encapsulated entirely inside Octopus Deploy. I’m not a fan of that. I much prefer to have all of this logic inside scripts in source control alongside the artefacts that will be deployed. This leaves projects in Octopus Deploy relatively simple, only responsible for deploying Nuget packages, managing variables (which is hard to encapsulate in source control because of sensitive values) and generally overseeing the whole process. This is the same sort of approach that I use for building software, with TeamCity acting as a relatively stupid orchestration tool, executing scripts that live inside source control with the code.

Octopus actually makes using source controlled scripts pretty easy, as it will automatically execute scripts named a certain way at particular times during the deployment of a Nuget package (for example, any script called deploy.ps1 at the root of the package will be executed after the package has been copied to the appropriate location on the target machine). The nice thing is that this also works with bash scripts for Linux targets (i.e. deploy.sh), which is particularly relevant here, because all of the ELK stuff happens on Linux.

Actually deploying most of the configuration is pretty simple. For example, this is the deploy.sh script for the ELK Broker configuration.

# The deploy script is automatically run by Octopus during a deployment, after Octopus does its thing.
# Octopus deploys the contents of the package to /tmp/elk-broker/
# At this point, the logstash configuration directory has been cleared by the pre-deploy script

# Echo commands after expansion
set -x

# Copy the settings file
cp /tmp/elk-broker/logstash.yml /etc/logstash/logstash.yml || exit 1

# Copy the config files
cp /tmp/elk-broker/*.conf /etc/logstash/conf.d/ || exit 1

# Remove the UTF-8 BOM from the config files
sed -i '1 s/^\xef\xbb\xbf//' /etc/logstash/conf.d/*.conf || exit 1

# Test the configuration
sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/ -t --path.settings /etc/logstash/logstash.yml || exit 1

# Set the ownership of the config files to the logstash user, which is what the service runs as
sudo chown -R logstash:logstash /etc/logstash/conf.d || exit 1

# Restart logstash - dont use the restart command, it throws errors when you try to restart a stopped service
sudo initctl stop logstash || true
sudo initctl start logstash

Prior to the script based approach, this logic was spread across five or six different steps inside an Octopus Project, which I found much harder to read and reason about.

Or Lemarchand’s Box If You Prefer

The only other difference worth talking about is the way in which we actually trigger configuration deployments.

Traditionally, we have asked Octopus to deploy the most appropriate version of the necessary projects during the initialization of a machine. For example, the ELK Broker EC2 instances had logic inside their LaunchConfiguration:UserData that said “register as Tentacle”, “deploy X”, “deploy Y” etc.

This time I tried something a little different, but which feels a hell of a lot smarter.

Instead of the machine being responsible for asking for projects to be deployed to it, we can just let Octopus react to the registration of a new Tentacle and deploy whatever Projects are appropriate. This is relatively easy to setup as well. All you need to do is add a trigger to your Project that says “deploy whenever a new machine comes online”. Octopus takes care of the rest, including picking what version is best (which is just the last successful deployment to the environment).

This is a lot cleaner than hardcoding project deployment logic inside the environment definition, and allows for changes to what software gets deployed where without actually having to edit or update the infrastructure definition. This sort of automatic deployment approach is probably more useful to our old way of handling environments (i.e. that whole terrible migration process with no update logic), than it is to the newer, easier to update environment deployments, but its still nice all the same.

Conclusion

There really wasn’t much effort required to clean up the configuration for each of the layers in the new ELK stack, but it was a great opportunity to try out the new trigger based deployments in Octopus Deploy, which was pretty cool.

With the configuration out of the way, and the environment creation also sorted, all that’s left is to actually create some new environments and start using them instead of the old one.

That’s a topic for next time though.