0 Comments

As you may (or may not) have noticed, I’ve published exactly zero blog posts over the last 3 weeks.

I was on holidays, and it was glorious.

Well, actually it was filled with more work than I would like (both from the job I was actually on holidays from as well as some other contract work I do for a company called MEDrefer), but it was still nice to be the master of my own destiny for a little while.

Anyway, I’m back now and everything is happening all at once, as these things sort of do.

Three things going on right now: tutoring at QUT, progress on the RavenDB issue I blogged about and some work I’m doing towards replacing RavenDB altogether (just in case), and I’ll be giving those items a brief explanation below. I’ve also been doing some work related to incorporating running Webdriver IO tests from TeamCity via Powershell (and including the results) as well as fixing an issue with Logstash on Windows where you can’t easily configure it to not do a full memory dump whenever it crashes (and it crashes a lot!).

Without further ado, on with the show!

How Can I Reach These Kids?

Its that time of the year when I start up my fairly regular Agile Project Management Tutoring gig at QUT (they’ve change the course code to IAB304 for some ungodly reason this semester, but its basically the same thing), so I’ve got that to look forward to. Unfortunately they are still using the DSDM material, but at least its changed somewhat to be more closely aligned to Scrum than to some old school project management/agile hybrid.

QUT is also offering sessional academics workshops on how to be a better teacher/tutor, which I plan on attending. There are 4 different workshops being run over the next few months, so I might follow each one with a blog post outlining anything interesting that was covered.

I enjoy tutoring at QUT at multiple levels, even if the bureaucracy there drives me nuts. It gives me an opportunity to really think about what it means to be Agile, which is always a useful though experiment. Meeting and interacting with people from many diverse backgrounds is also extremely useful for expanding my worldview, and I enjoy helping them understand the concepts and principles in play, and how they benefit both the practitioner and whatever business they are trying to serve.

The Birds is the Word

The guys at Hibernating Rhinos have been really helpful assisting me with getting to the bottom of the most recent RavenDB issue that I was having (a resource consumption issue that was preventing me from upgrading the production servers to RavenDB 3). Usually I would make a full post about the subject, but in this particular case it was mostly them investigating the issue, and me supplying a large number of memory dumps, exported settings, statuses, metrics and various other bits and pieces.

It turns out the issue was in an optimization in RavenDB 3 that caused problems for our particular document/workload profile. I’ve done a better explanation of the issue on the topic I made in the RavenDB Google Group, and Michael Yarichuk (one of the Hibernating Rhinos guys I was working with) has followed that up with even more detail.

I learned quite a few things relating to debugging and otherwise inspecting a running copy of RavenDB, as well as how to properly use the Sysinternals Procdump tool to take memory dumps.

A short summary:

  • RavenDB has stats endpoints which can be be hit via a simple HTTP call. {url}/stats and {url}/admin/stats give all sorts of great information, including memory usage and index statistics.
    • I’ve incorporated a regular poll of these endpoints into my logstash configuration for monitoring our RavenDB instance. It doesn’t exactly insert cleanly into Elasticsearch (too many arrays), but its still useful, and allows us to chart various RavenDB statistics through Kibana.
  • RavenDB has config endpoints that show what settings are currently in effect (useful for checking available options and to see if your own setting customizations were applied correctly). The main endpoint is available at {url}/debug/config but there are apparently config endpoints for specific databases as well. We only use the default, system database, and there doesn’t seem to be an endpoint specific to that one.
  • The sysinternals tool procdump can be configured to take a full memory dump if your process exceeds a certain amount of usage. procdump –ma –m 4000 w3wp.exe C:\temp\IIS.dmp will take a full memory dump (i.e. not just handlers) when the w3wp process exceeds 4GB of memory for at least 10 seconds, and put it in the C:\temp directory. It can be configured to take multiple dumps as well, in case you want to track memory growth over time.
    • If you’re trying to get a memory dump of the w3wp process, make sure you turn off pinging for the appropriate application pool, or IIS will detect that its frozen and restart it. You can turn off pinging by running the Powershell command Set-ItemProperty "IIS:\AppPools\{application pool}" -name processmodel.pingingEnabled -Value False. Don’t forget to turn it back on when you’re done.
  • Google Drive is probably the easiest way to give specific people over the internet access to large (multiple gigabyte) files. Of course there is also S3 (which is annoying to permission) and FTP/HTTP (which require setting up other stuff), but I definitely found Google Drive the easiest. OneDrive and DropBox would also probably be similarly easy.

Once Hibernating Rhinos provides a stable release containing the fix, it means that we are no longer blocked in upgrading our troubled production instance to the latest version of RavenDB, which will hopefully alleviate some of its performance issues.

More to come on this topic as it unfolds.

Quoth The Raven, Nevermore

Finally, I’ve been putting some thought into how we can move away from RavenDB  (or at least experiment with moving away from RavenDB), mostly so that we have a backup plan if the latest version does not in fact fix the performance problems that we’ve been having.

We’ve had a lot of difficulty in simulating the same level and variety of traffic that we see in our production environment (which was one of the reasons why we didn’t pick up any of the issues during our long and involved load testing), so I thought, why not just deploy any experimental persistence providers directly into production and watch how they behave?

Its not as crazy as it sounds, at least in our case.

Our API instances are hardly utilised at all, so we have plenty of spare CPU to play with in order to explore new solutions.

Our persistence layer is abstracted behind some very basic repository interfaces, so all we would have to do is provide a composite implementation of each repository interface that calls both persistence providers. Only take the response from the one that is not experimental, and everything is golden. As long as we log lots of information about the requests being made and how long they took, we can perform all sorts of interesting analysis without ever actually affecting the user experience.

Well, that’s the idea anyway. Whether or not it actually works is a whole different question.

I’ll likely make a followup post when I finish exploring the idea properly.

Summary

As good as my kinda-holidays were, it feels nice to be back in the thick of things, smiting problems and creating value.

I’m particularly looking forward to exploring a replacement for RavenDB in our troublesome service, because while I’m confident that the product itself is solid, it’s not something we’re very familiar with, so we’ll always be struggling to make the most of it. We don’t use it anywhere else (and are not planning on using it again), so its stuck in this weird place where we aren’t good at it and we have low desire to get better in the long run.

It was definitely good to finally get to the bottom of why the new and shiny version of RavenDB was misbehaving so badly though, because most of the time when I have a problem with a product like that, I always assume its the way I’m using it, not the product itself.

Plus, as a general rule of thumb, I don’t like it when mysteries remain unsolved. It bugs me.

Like why Firefly was cancelled.

Who does that?