Clusterstruck

December 6. 2017 0 Comments

We’ve had full control over our Elasticsearch field mappings for a while now, but to be honest, once we got the initial round of mappings out of the way, we haven’t really had to deal with too many deployments. Sure, they happen every now and then, but its not something we do every single day.

In the intervening time period, we’ve increased the total amount of data that we store in the ELK stack, which has had an unfortunate side effect when it comes to the deployment of more mappings.

It tends to take the Elasticsearch cluster down.

Is The Highway To Hell Paved With Good Intentions?

When we originally upgraded the ELK stack and introduced the deployment of the Elasticsearch configuration it was pretty straightforward. The deployment consisted of only those settings related to the cluster/node, and said settings wouldn’t be applied until the node was restarted anyway, so it was an easy decision to just force a node restart whenever the configuration was deployed.

When deploying to multiple nodes, the first attempt at orchestration just deployed sequentially, one node at a time, restarting each one and then waiting for the node to rejoin the cluster before moving on.

This approach….kind of worked.

It was good for when a fresh node was joining the cluster (i.e. after an auto scale) or when applying simple configuration changes (like log settings), but tended to fall apart when applying major configuration changes or when the cluster did not already exist (i.e. initial creation).

The easy solution was to just deploy to all nodes at the same time, which pretty much guaranteed a small downtime while the cluster reassembled itself.

Considering that we weren’t planning on deploying core configuration changes all that often, this seemed like a decent enough compromise.

Then we went and included index templates and field mappings into the configuration deployment.

Each time we deployed a new field mapping the cluster would go down for a few moments, but would usually come good again shortly after. Of course, this was when we still only had a weeks worth of data in the cluster, so it was pretty easy for it to crunch through all of the indexes and shards and sort them out when it came back online.

Now we have a little over a months worth of data, and every time the cluster goes down it takes a fair while to come back.

That’s real downtime for no real benefit, because most of the time we’re just deploying field mappings, which can actually just be updated using the HTTP API, no restart required.

Dirty Deployments, Done Dirt Cheap

This situation could have easily been an opportunity to shift the field mappings into a deployment of their own, but I still had the same problem as I did the first time I had to make this decision – what’s the hook for the deployment when spinning up a new environment?

In retrospect the answer is probably “the environment is up and has passed its smoke test”, but that didn’t occur to me until later, so we went in a different direction.

What if we didn’t always have to restart the node on a configuration deployment?

We really only deploy three files that could potentially require a node restart:

The core Elasticsearch configuration file (/etc/elasticsearch/elasticsearch.yml)
The JVM options file (/etc/elasticsearch/jvm.options)
The log4j2 configuration file (/etc/elasticsearch/log4j2.properties)

If none of those files have changed, then we really don’t need to do a node restart, which means we can just move ahead with the deployment of the field mappings.

No fuss, no muss.

Linux is pretty sweet in this regard (well, at least the Amazon Linux baseline is) in that it provides a diff command that can be used to easily compare two files.

It was a relatively simple matter to augment the deployment script with some additional logic, like below:

… more script up here
 
temporary_jvm_options="/tmp/elk-elasticsearch/jvm.options"
destination_jvm_options="/etc/elasticsearch/jvm.options"

echo "Mutating temporary jvm.options file [$temporary_jvm_options] to contain appropriate memory allocation and to fix line endings"
es_memory=$(free -m | awk '/^Mem:/ { print int($2/2) }') || exit 1
sed -i "s/@@ES_MEMORY@@/$es_memory/" $temporary_jvm_options || exit 1
sed -i 's/\r//' $temporary_jvm_options || exit 1
sed -i '1 s/^\xef\xbb\xbf//' $temporary_jvm_options || exit 1

echo "Diffing $temporary_jvm_options and $destination_jvm_options"
diff --brief $temporary_jvm_options $destination_jvm_options
jvm_options_changed=$?

 
… more script down here
 
if [[ $jvm_options_changed == 1 || $configuration_file_changed == 1 || $log_config_file_changed == 1 ]]; then
    echo "Configuration change detected, will now restart elasticsearch service."
    sudo service elasticsearch restart || exit 1
else
    echo "No configuration change detected, elasticsearch service will not be restarted."
fi

No more unnecessary node restarts, no more unnecessary downtime.

Conclusion

This is another one of those cases that seems incredibly obvious in retrospect, but I suppose everything does. Its still almost always better to go with the naive solution at first, and then improve, rather than try to deal with everything up front. Its better to focus on making something easy to adapt in the face of unexpected issues than to try and engineer some perfect unicorn.

Regardless, with no more risk that the Elasticsearch cluster will go down for an unspecified amount of time whenever we just deploy field mapping updates, we can add new fields with impunity (it helps that we figured out how to reindex).

Of course, as with anything, there are still issues with the deployment logic:

Linking the index template/field mapping deployment to the core Elasticsearch configuration was almost certainly a terrible mistake, so we’ll probably have to deal with that eventually.
The fact that a configuration deployment can still result in the cluster going down is not great, but to be honest, I can’t really think of a better way either. You could deploy the configuration to the master nodes first, but that leaves you in a tricky spot if it fails (or if the configuration is a deep enough change to completely rename or otherwise move the cluster). You might be able to improve the logic to differentiate between “first time” and “additional node”, but you still have the problem of dealing with major configuration changes. Its all very complicated and honestly we don’t really do configuration deployments enough to spend time solving that particularly problem.
The index template/field mapping deployment technically occurs once on every node, simultaneously. For something that can be accomplished by a HTTP call, this is pretty wasteful (though doesn’t have any obvious negative side effects).

There’s always room for improvement.