Many Logstash Makes Light Work
The L in ELK stands for Logstash. This is not news.
When we put together the system we use for log aggregation, we needed to find a way to deal with Logstash such that it would fit into our relatively new (and still maturing) model for automated build and deployment. We put together a system that stores a deployable version of Logstash in Bitbucket, a build/package script written in Powershell that puts everything together into a Nuget package, TeamCity to listen for changes and then Octopus Deploy for managing the deployment.
The difficult part in this was that each place that required Logstash required a different configuration (because it was processing different files or transforming the data in different ways). There was a lot of commonality, but it was mostly in the actual installation and management of Logstash, rather than the configuration Logstash happened to be running at any particular point in time.
We have one git repository that contains everything necessary to run a copy of Logstash on a machine. This includes Logstash itself, the Java Runtime Environment (JRE) and a set of Powershell scripts that allow us to install Logstash as a Windows Service and choose which configuration it should be running. This meant that all of the configuration files for each of our Logstash deployments all lived in the same repository.
This worked okay for a little while, until we started adding more configuration files.
One Change Makes Many Work?
Suddenly, we had a bunch Build Configurations in TeamCity triggering off changes to the one repository. They only triggered off changes to their own Logstash configurations at least, but they all triggered whenever we made changes to Logstash itself or to the scripts surrounding it. Technically, they were all building (and deploying) the same package (X.Logging.Logstash.nupkg), but each would generate a different version, and deploy a different Octopos Deploy project. Luckily our versions are based off the current time, so it wasn’t like the version kept going up and down (because of the difference in number of builds), but there was the occasional conflict when two TeamCity tasks just happened to build on two different Build Agents within a few seconds of each other (which would generate identical packages).
The bigger issue was that each package was over 100MB! Logstash is a good 80MB by itself and the JRE is another 40MB, so once you add in a MB or two for everything else, your package is huge.
Yes, technically we could deal with this issue by making sure Java and Logstash are installed on the target machines ahead of time, but I don’t like this from an isolation/self-reliance point of view. I want to be able to push a package with the minimal dependencies already in existence on the target machine, ideally just an Octopus tentacle and Powershell (for Windows boxes anyway). Anything else that is required should be contained within the package itself (or, in extreme cases, bootstrapped from somewhere after deployment (but that just moves the problem around slightly)).
Suddenly a checkin to the repository would gum up our CI build agents with a set of relatively unimportant tasks, stopping other builds from progressing and interrupting peoples work.
The easiest (but somewhat fiddly) solution was to split the concept of an installed version of Logstash from the configuration it happened to be running. With this approach we could deploy Logstash to a target machine once and then not have to pay the price of shifting that much data over the network every single time we wanted to alter the config. When we did want to upgrade Logstash, we could simply build a new version and have it deployed in the same way.
The plan was relatively simple. Create one repository for a deployable version of Logstash by itself (making sure to generalise it enough such that you could easily point it at any configuration you wanted) and then split out each configuration into a repository of its own. Whenever Logstash changes, it would be built and published to Octopus, but nothing would be deployed. Each configuration repository would be able to choose to upgrade to the new version (by changing the dependency in source) and then TeamCity would pick up the changes and run the normal build/package/deploy cycle for that configuration.
Executions Are Tiresome
As is almost always the case, coming up with the idea and its execution plan was a lot more fun than actually doing it.
The Logstash repository that we had already was pretty tightly coupled to the way it handled configuration. It actually used the current Octopus Project name during the installation to determine the configuration that it should be running, and each configuration really only consisted of a single Logstash conf file.
The first task was to generalise the Logstash installation, so that we could deploy it separately and then have the configuration projects use it via a known interface. Nothing particularly interesting here from a design standpoint, just a few Powershell functions. Execute-Logstash, Install-LogstashService (and its counterpart, Remove-LogstashService) and some helpers for generating configuration files based on templates (because sometimes during deployment you need to be able to substitute some deployment specific values into your configuration, like AWS Instance Id).
The next task was taking one of the current configuration files and converting it into the new pattern, a repository of its own. This repository would need to contain everything necessary for the configuration of a Logstash instance, plus some tests to verify that the config file works as expected when given a set of known inputs.
Its not overly easy to test a Logstash configuration, especially when it has never been tested before. Like all code, you need to be able to substitute certain values (like the location of the log files to read) and then figure out a way to measure the outcome, without changing the actual configuration too much. The approach I settled on was to parameterise the log locations like I mentioned above and to add an additional output during testing that wrote everything to a file. That way I could read the file in and check to make sure that it was outputting as many lines as I expected.
The last task was to rebuild the Octopus Deploy project for the configuration to deploy both the Logstash deployable component and the configuration and verify that it installed and ran correctly on deployment. The most difficult part here was that different versions of each component were required, so we had to extend our Octopus scripts to handle this properly (i.e. step 1 which deploys Logstash needs to know that it should deploy version 1.5.2, but step 2 needs to deploy version 1.0.15235.12789 of the actual configuration).
I really should have structured our Logstash deployments in this way from the start. Its almost always better to separate configuration from application code, especially when you don’t control the application. Often you will find that configuration changes a lot more than the code does, and when the code itself is quite large (as is the case with Logstash and its dependencies) it can get quite painful shifting all of those bytes around for no real reason.
But, alas, you can never see into the future with any sort of clarity, and you need to be willing to change things that you’ve done in the past when problems are identified or a better solution comes along.
Can’t get sentimental about these things.