Building A Better Beast, Part 4

April 25. 2017 0 Comments

With environments and configuration out of the way, its time to put all of the pieces together.

Obviously this isn’t the first time that both of those things have been put together. In order to validate that everything was working as expected, I was constantly creating/updating environments and deploying new versions of the configuration to them. Not only that, but with the way our deployment pipeline works (commit, push, build, test, deploy [test], [deploy]), the CI environment had been up and running for some time.

What’s left to do then?

Well, we still need to create the Staging and Production environments, which should be easy because those are just deployments inside Octopus now.

The bigger chunk of work is to use those new environments and to redirect all of our existing log traffic as appropriate.

Hostile Deployment

This is a perfect example of why I spend time on automating things.

With the environments setup to act just like everything else in Octopus, all I had to do to create a Staging environment was push a button. Once the deployment finished and the infrastructure was created, it was just another button push to deploy the configuration for that environment to make it operational. Rinse and repeat for all of the layers (Broker, Indexer, Cache, Elasticsearch) and Staging is up and running.

Production was almost the same, with one caveat. We use an entirely different AWS account for all our production resources, so we had to override all of the appropriate Octopus variables for each environment project (like AWS Credentials, VPC ID, Subnet ID’s, etc). With those overrides in place, all that’s left is to make new releases (to capture the variables) and deploy to the appropriate environments.

It’s nice when everything works.

Redirecting Firepower

Of course, the new logging environments are worthless without log events. Luckily, we have plenty of those:

IIS logs from all of our APIs
Application logs from all of our APIs
ELB logs from a subset of our load balancers, most of which are APIs, but at least one is an Nginx router
Simple performance counter statistics (like CPU, Memory, Disk, etc) from basically every EC2 instance
Logs from on-premises desktop applications

We generally have CI, Staging and prod-X (green/blue) environments for all of our services/APIs (because its how our build/test/deployment pipeline works), so now that we have similarly partitioned logging environments, all we have to do is line them up (CI to CI, Staging to Staging and so on).

For the on-premises desktop applications, there is no CI, but they do generally have the capability to run in Staging mode, so we can use that setting to direct log traffic.

There are a few ways in which the log events hit the Broker layer:

Internal Logstash instance running on an EC2 instance with a TCP output pointing at the Broker hostname
Internal Lambda function writing directly to the Broker hostname via TCP (this is the ELB logs processor)
External application writing to an authenticated Logging API, which in turn writes to the Broker via TCP (this is for the on-premises desktop applications)

We can change the hostname used by all of these mechanisms simply by changing some variables in Octopus deploy, making a new release and deploying it through the environments.

And that’s exactly what we did, making sure to monitor the log event traffic for each one to make sure we didn’t lose anything.

With all the logs going to their new homes, all that was left to do was destroy the old log stack, easily done manually through CloudFormation.

You might be wondering about any log events that were stored in the old stack? Well, we generally only keep around 14 days worth of log events in the stack itself (because there are so many), so we pretty much just left the old stack up for searching purposes until it was no longer relevant, and then destroyed it.

Conclusion

And that basically brings us to the end of this series of posts about our logging environment and the reclamation thereof.

We’ve now got our standard deployment pipeline in place for both infrastructure and configuration and have separated our log traffic accordingly.

This puts us in a much better position moving forward. Not only is the entire thing fully under our control, but we now have the capability to test changes to infrastructure/configuration before just deploying them into production, something we couldn’t do before when we only had a single stack for everything.

In all fairness though, all we really did was polish an existing system so that it was a better fit for our specific usage.

Evolutionary, not revolutionary.