0 Comments

This post is not as technical as some of my others. I really just want to bring attention to a tool for Elasticsearch that I honestly don’t think I could do without.

Cerebro.

From my experience, one of the hardest things to wrap my head around when working with Elasticsearch was visualizing how everything fit together. My background is primarily C# and .NET in a very Microsoft world, so I’m used to things like SQL Server, which comes with an excellent exploration and interrogation tool in the form of SQL Server Management studio. When it comes to Elasticsearch though, there seems to be no equiavelent, so I felt particularly blind.

Since starting to use Elasticsearch, I’ve become more and more fond of using the command line, so I’ve started to appreciate its amazing HTTP API more and more, but that initial learning curve was pretty vicious.

Anyway, to bring it back around, my first port of call when I started using Elasticsearch was to find a tool conceptually similar to SQL Server Management Studio. Something I could use to both visualize the storage system (however it worked) and possibly even query it as necessary.

I found Kopf.

Kopf did exactly what I wanted it to do. It provided a nice interface on top of Elasticsearch that helped me visualize how everything was structured and what sort of things I could do. To this day, if I attempt to visualize an Elasticsearch cluster in my head, the pictures that come to mind are of the Kopf interface. I can thank it for my understanding of the cluster, the nodes that make it up and the indexes stored therein, along with the excellent Elasticsearch documentation of course.

Later on I learnt that Kopf didn’t have to be used from the creators demonstration website (which is how I had been using it, connecting from my local machine to our ES ELK cluster), but could in fact be installed as a plugin inside Elasticsearch itself, which was even better, because you could access it from {es-url]}/plugins/_kopf, which was a hell of a lot easier.

Unfortunately, everything changed when the fire nation attacked…

No wait, that’s not right.

Everything changed when Elasticsearch 5 was released.

I’m The Juggernaut

Elasticsearch 5 deprecated site plugins. No more site plugins meant no more Kopf, or at least no more Kopf hosted within Elasticsearch. This made me sad, obviously, but I could still use the standalone site, so it wasn’t the end of the world.

My memory of the next bit is a little bit fuzzy, but I think even the standalone site stopped working properly when connecting to Elasticsearch 5. The creator of Kopf was no longer maintaining the project either, so it was unlikely that the problems would be solved.

I was basically blind.

Enter Cerebro.

No bones about it, Cerebro IS Kopf. It’s made by the same guy and is still being actively developed. Its pretty much a standalone Kopf (i.e. built in web server), and any differences between the two (other than some cosmetic stuff and the capability to easily save multiple Elasticsearch addresses) are lost on me.

As of this post, its up to 0.6.5, but as far as I can tell, it’s fully functional.

For my usage, I’ve incorporated Cerebro into our ELK stack, with a simple setup (ELB + single instance ASG), pre-configured with the appropriate Elasticsearch address in each environment that we spin up. As is the normal pattern, I’ve set it up on an AMI via Packer, and I deploy its configuration via Octopus deploy, but there is nothing particularly complicated there.

Kitty, Its Just A Phase

This post is pretty boring so far, so lets talk about Cerebro a little with the help of a screenshot.

This is the main screen of Cerebro, and it contains a wealth of information to help you get a handle on your Elasticsearch cluster.

It shows an overview of the cluster status, data nodes, indexes and their shards and replicas.

  • The cluster status is shown at the top of the screen, mostly via colour. Green good, yellow troublesome, red bad. Helpfully enough, the icon in the browser also changes colour according to the cluster status.
  • Data nodes are shown on the left, and display information like memory, cpu and disk, as well as IP address and name.
  • Indexes pretty much fill the rest of the screen, displaying important statistics like the number of documents and size, while allowing you to access things like mappings and index operations (like delete)
  • The intersection of index and data node gives information about shard/replica allocation. In the example above, we have 3 shards, 2 replicas and 3 nodes, so each node has a full copy of the data. Solid squares indicate the primary shard.
  • If you have unassigned or relocating shards, this information appears directly above the nodes, and shards currently being moved are shown in same place as normal shards, except blue.

Honestly, I don’t really use the other screens in Cerebro very much, or at least nowhere near as much as I use the overview screen. The dedicated nodes screen can be useful to view your master nodes (which aren’t shown on the overview), and to get a more performance focused display. I’ve also used the index templates screen for managing/viewing our logstash index template, but that’s mostly done through an Octopus deployment now.

There are others (including an ad-hoc query screen), but again, I haven’t really dug into them in depth. At least not enough to talk about them anyway.

That first screen though, the overview, is worth its weight in gold as far as I’m concerned.

Conclusion

I doubt I would understand Elasticsearch anywhere near as much as I do without Kopf/Cerebro. Realistically, I don’t really understand it much at all, but that little understanding I do have would be non-existent without these awesome tools.

Its not just a one horse town though. Elastic.co provides some equivalent tools as well (like Monitoring (formerly Marvel)) which offer similar capabilities, but they are mostly paid services as far as I can tell, so I’ve been hesitant to explore them in more depth.

I’m already spending way too much on the hardware for our log stack, so adding software costs on top of that is a challenging battle that I’m not quite ready to fight.

It doesn’t help that the last time I tried to price it, their answer for “How much for the things?” was basically “How much you got?”.