0 Comments

I don’t think I’ve ever had a good experience dealing with dates, times and timezones in software.

If its not some crazy data issue that you don’t notice until its too late, then its probably the confusing nature of the entire thing that leads to misunderstandings and preventable errors. Especially when you have to include Daylight Savings into the equation, which is just a stupid and unnecessary complication.

Its all very annoying.

Recently we found ourselves wanting to know what timezones our users were running in, but of course, nothing involving dates and times is ever easy.

Time Of My Life

Whenever we migrate user data from our legacy platform into our cloud platform, we have to take into account the timezone that the user wants to operate in. Generally this is set at the office level (i.e. the bucket that segregates one set of user profiles and data from the rest), so we need to know that piece of information right at the start of the whole process, when the office is created.

Now, to be very clear, what we need to know is the users preferred timezone, not their current offset from UTC. The offset by itself is not enough information, because we need to be able to safely interpret dates and times both in the past (due to the wealth of historical data we bring along) and in the future (things that are scheduled to happen, but haven’t happened yet). A timezone contains enough information for us to interact with any date and time, and includes a very important piece of information.

If/when daylight savings is in effect and what sort of adjustment it makes to the normal offset from UTC.

Right now we require that the user supply this information as part of the migration process. By itself, its not exactly a big deal, but we want to minimise the amount of involvement we require from the user in order to reduce the amount of resistance that the process can cause. The migration should be painless, and anything we can do to make it so is a benefit in the long run.

We rely on the user here because the legacy data doesn’t contain any indication as to what timezone it should be interpreted in.

So we decided to capture it.

No Time To Explain

The tricky part of capturing the timezone is that there are many users/machines within a client site that access the underlying database, and each one might not be set to the same timezone. Its pretty likely for them all to be set the same way, but we can’t guarantee it, so we need to capture information about every user who is actively interacting with the software.

So the plan is straightforward; when the user logs in, record some information in the database describing the timezone that they are currently using. Once this information exists, we can sync it up through the normal process and then use it within the migration process. If all of the users within an office agree, we can just set the timezone for the migration. If there are conflicts we have to revert back to asking the user.

Of course, this is where things get complicated.

The application login is written in VB6, and lets be honest, is going to continue to be written in VB6 until the heat death of the universe.

That means WIN32 API calls.

The one in particular that we need is GetTimeZoneInformation, which will fill out the supplied TIME_ZONE_INFORMATION structure when called and return a value indicating the usage of daylight savings for the timezone information specified.

Seems pretty straightforward in retrospect, but it was a bit of a journey to get there.

At first we thought that we had to use the *Bias fields to determine whether or not daylight savings was in effect, but that in itself  brought about by a misunderstanding, because we don’t actually care if daylight savings is in effect right now, just what the timezone is (because that information is encapsulated in the timezone itself). It didn’t help that we were originally outputting the currentoffset instead of the timezone as well.

Then, even when we knew we had to get at the timezone, it still wasn’t clear which of the two fields (StandardName or DaylightName) to use. That is, until we looked closer at the documentation of the function and realised that the return value could be used to determined which field we should refer to.

All credit to the person who implemented this (a colleague of mine), who is relatively new to the whole software development thing, and did a fine job, once we managed to get a clear idea of what we actually had to accomplish.

Its Time To Stop

At the end of the day we’re left with something that looks like this.

Public Type TIME_ZONE_INFORMATION
    Bias As Long
    StandardName(0 To 63) As Byte
    StandardDate As SYSTEMTIME
    StandardBias As Long
    DaylightName(0 To 63) As Byte
    DaylightDate As SYSTEMTIME
    DaylightBias As Long
End Type


Private Function GetCurrentTimeZoneName() As String

    Dim tzi     As TIME_ZONE_INFORMATION

    If GetTimeZoneInformation(tzi) = 0 Then
        GetCurrentTimeZoneName = Replace(tzi.StandardName, Chr(0), "")
    Else
        GetCurrentTimeZoneName = Replace(tzi.DaylightName, Chr(0), "")
    End If
    
End Function

That function for extracting the timezone name is then used inside the part of the code that captures the set of user information that we’re after and stores it in the local database. That code is not particularly interesting though, its just a VB6 ADODB RecordSet.

Hell, taken in isolation, and ignoring the journey that it took to get here, the code above isn’t all that interesting either.

Conclusion

With the required information being captured into the database, all we have to do now is sync it up, like any other table.

Of course, we have to wait until our next monthly release to get it out, but that’s not the end of the world.

Looking back, this whole dance was less technically challenging and more just confusing and hard to clearly explain and discuss.

We got there in the end though, and the only challenge left now belongs to another team.

They have to take the timezone name that we’re capturing and turn it into a Java timezone/offset, which is an entirely different set of names that hopefully map one-to-one.

Since the situation involves dates and times though, I doubt it will be that clean.

0 Comments

Its the gift that keeps on giving, our data synchronization process!

Well, it keeps on giving to me anyway, because its fuel for the furnaces of this blog. Sometimes finding topics to write about every week can be hard, so its nice when they drop into your lap.

Anyway, the process has started to creak at the seams a bit, because we’re pushing more data through it than ever before.

And when I say creak at the seams, what I mean is that our Read IOPS usage on the underlying database has returned to being consistently ridiculous.

Couldn’t Eat Another Bite

The data synchronization process had been relatively stable over most of 2018. Towards the middle, we scaled the underlying database to allow for the syncing of one of the two biggest data sets in the application, and after a slow rollout, that seemed to be going okay.

Of course, with that success under our belt, we decided to sync the other biggest data set in the application. Living life on the edge.

We ended up getting about half way through because everything started to fall apart again, with similar symptoms to last time (spiking Read IOPS capping out at the maximum allowed burst, which would consume the IO credits and then tank the performance completely). We tried a quick fix of provisioning IOPS (to guarantee performance and remove the tipping point created by the consumption of IO credits), but it wasn’t enough.

The database just could not keep up what was being demanded of it.

I’m A Very Understanding Person

Just like last time, the first step was to have a look at the queries being run and see if there was anything obviously inefficient.

With the slow queries related to the “version” of the remote table mostly dealt with in our last round of improvements, the majority of the slow queries remaining were focused on the part of the process that gets a table “manifest”. The worst offenders were the manifest calls for one of the big tables that we had only started syncing relatively recently. Keep in mind that this table is the “special” one featuring hard deletes (compared to the soft deletes of the other tables), so it was using the manifest functionality a lot more than any of the other tables were.

Having had enough of software level optimizations last time, we decided to try a different approach.

An approach that is probably, by far, the more common approach when dealing with performance issues in a database.

Indexes.

Probably The Obvious Solution

The first time we had performance problems with the database we shied away from implementing additional indexes. At the time, we thought that the indexes that we did have were the most efficient for our query load (being a Clustered Index on the two most selective fields in the schema), and we assumed we would have to look elsewhere for optimization opportunities. Additionally, we were worried that the performance issues might have an underlying cause related to total memory usage, and adding another index (or 10) is just more things to keep in memory.

Having scaled the underlying instance and seeing no evidence that the core problem was memory related, we decided to pull the index lever this time.

Analysis showed that the addition of another index similar to the primary key would allow for a decent reduction in the amount of reads required to service a single request (in that, the index would short circuit the need to read the entire partition of the data set into memory in order to figure out what the max value was for the un-indexed field). A quick replication on our performance testing environment proved it unequivocally, which was nice.

For implementation, its easy enough to use Entity Framework to add an index as part of a database migration, so that’s exactly what we did.

We only encountered two issues, which was nice:

  • We didn’t seem to be able to use the concurrent index creation feature in PostgreSQL with the version of EF and Npgsql that we were using (which are older than I would like)
  • Some of the down migrations would not consistently apply, no matter what we tried

Neither of those two factors could stop us though, and the indexes were created.

Now we just had to roll them out.

Be Free Indexes, Be Free!

That required a little finesse.

We had a decent number of indexes that we wanted to add, and the datasets we wanted to add them to were quite large. Some of the indexes only took a few minutes to initialise, but others took as long as twenty.

Being that we couldn’t seem to get concurrent index creation working with Entity Framework data migrations, we had to sequence them out one at a time in sequential releases.

Not too hard, but a little bit more time consuming than we originally desired.

Of course, the sync process being what it is, its okay if it goes down for a half hour every now and then, so we just took everything out of service temporarily on each deployment to ensure that the database could focus on the index creation without having worry too much about dealing with the constant flood of requests that it usually gets.

Conclusion

At the end of the day, this round of performance investigation and optimization actually took a hell of a lot less time and effort than the last, but I think that’s kind of to be expected when you’re actively trying to minimise code changes.

With the first few of the indexes deployed, we’ve already seen a significant drop in the Read IOPS of the database, and I think we’re going to be in a pretty good place to continue to sync the remainder of the massive data set that caused the database to choke.

The best indicator of future performance is past data though, so I’m sure there will be another post one day, talking all about the next terrible problem.

And how we solved it of course, because that’s what we do.

0 Comments

A new year means means more blog posts, and there is no better time to start than now.

Or maybe a week ago I suppose when the new year actually started, but I was on holidays, and writing blog posts while I’m on holidays just seems wrong. Blog posts are written on the train on my way to work, and that pattern is far too ingrained to do anything about now.

Anyway, that’s probably enough rambling, so lets get on with the show and discuss software prototypes, because I have opinions and this is the internet.

Its Code, But You Throw It Away

Software prototyping is simple in concept, but quickly gets complicated in execution.

Typically a prototype consists of some engineers throwing something together, probably ignoring normal engineering practices, to prove an idea or approach. Then, once its served its purpose, those same engineers toss it in the garbage.

The goal is to learn, not to create a long lasting artefact, and that is often where prototypes become dangerous. If a business sees something working (and a prototype probably works, even though it might have rough edges), then it might be inclined to make plans based on that. Perhaps attempt to push it out to a larger audience than was originally intended, or to start making claims that a feature is complete and ready to use.

Its a horrible feeling, watching the terrifying hacked together piece of code meant to prove a possibility become a core part of a business process. Especially so when you’re the one responsible for maintaining it, probably because you’re the only one who knows how it works.

Like I said, simple in concept, but complicated in the long run.

Of course, a prototype does not strictly have to be thrown away, but in my opinion, if you’re not throwing it away at the end, you’re probably just doing iterative development. If that’s the case, you really should be following good engineering practices all the way through instead of hacking something together and then trying to build on top of unstable foundations later.

Building Things To Answer The Wrong Questions

This blog post exists because we built a prototype recently.

I’m sure you think that the next paragraph is going to describe the situation where it “accidentally” became a core part of the business and its causing all sorts of problems, but that is surprisingly not the case. Everyone involved understood the purpose and limitations of the prototype and it was abandoned at the appropriate time, once it had served its purpose.

I had a completely different issue with our prototype experience; we probably shouldn’t have built one at all, and the construction of the prototype felt like it was wasted effort.

The situation we found ourselves in was that we wanted to provide some new functionality to the users of our legacy application that leveraged our new and shiny cloud platform. Kind of like a typical integration, with two different systems working in tandem, but we had a lot of control over both sides.

We prototyped the process for getting the two systems to talk to each other, with the plan that once we had that working at least partially, we could go and have early conversations with customers to see if they wanted to use it and how.

The reality was that the actual data flow between the two systems never really came up in any of those early conversations, as the topics covered were almost entirely focused around the new features available. We already had a one-off data migration process that would initialize the cloud system with information from the legacy software, and honestly, that would have been enough to start the conversation.

So the first mark against the prototype was that it just didn’t feel like its existence made a difference.

Hindsight Is Misleading

To be fair, I could very well be suffering from the curse of hindsight. Being able to look back at a situation with current knowledge and see a much more efficient way to do it is not really surprising after all. That’s how learning works.

Or it could be that we simply held on to the prototype for too long, and should have switched into constructing it (iteratively) for real sooner. Possibly as soon as we had answered the question “is it even possible?”.

Instead we held on to the prototype while we engaged with customers because we wanted to give them a sense of how the system would work in practice. Of course, because we were asking them to do real work in an environment that would one day be thrown away, they were rightly resistant, so not only did we gain little to nothing from throwing together the process from a customer conversation point of view, we actually made it harder to engage with them in relation to trying out the system for real.

If we had of simply started building the process out, piece by piece, following good engineering practices, we probably would have ended up at the same place in the end. It might have taken us longer in terms of constructing the real version (not having the lessons of the prototype to build on top of), but total time spent would probably have been less. Not only that, but the customers would have been able to try it out for real sooner, which would have given us the feedback that we needed sooner as well.

That’s not to say that a prototype is never beneficial, just that in my most recent experience, it didn’t really feel like it generated an appropriate amount of value.

Conclusion

Unlike the technical posts that I make, this one feels much more like a series of vaguely connected musings.

I don’t really have a concrete conclusion or lesson to take away, I’m just left with vague sense that our most recent experiment with building a prototype was a waste of time and effort that could have been better spent elsewhere.

Of course, there’s always the possibility that the specific situation we found ourselves in was simply a bad place to apply a prototype (which seems likely looking back), or maybe the prototype actually generated a huge amount of value and its just hard to see it in hindsight, because we have that knowledge now and its hard to analyse the situation without it.

Perhaps I’ll be making another post in a few months about a situation where I wished we had built a prototype…

0 Comments

With no need for additional fanfare, I now present to you the continuation of last weeks post about DDD 2018.

Break It Down

My fourth session for the day was presented by the wonderful Larene Le Gassick.

As a woman in tech, Larene was curious about the breakdown of gender for the speakers participating in the various Brisbane based Meetups, so she built a bot that would aggregate all of that information together and post it into Slack, thus bringing the data out into the open.

Well, the word “bot” might be overselling it.

It was Larene. Larene was the bot.

Regardless of the mechanism, there was some good tech stuff in there (including a neat website using a NES css style), but the real value from the process was in the data itself, and the conversation that it started when presented in a relatively public place on a regular basis.

From my own experience, the technology industry and software development in particular, does seem to be male dominated. I’m honestly unsure whether that’s a good or bad thing, but I am fully in favour of encouraging more participation from anyone who wants to get involved, regardless of sex, race or any other discriminating factor you can think of.

DDD in particular is pretty great for this sort of inclusiveness actually, sometimes resulting in surprising feedback.

Actually, This Time It Does Mean What You Think It Means

The fifth session that I attended was delivered by Steve Morris in his usual style. Which is to say, awesomely.

To be honest, I probably could have skipped this session as it was basically Domain Driven Design 101, but it was still pretty useful as a refresher all the same.

Domain driven design is in a weird place in my head. The blue book is legendary for how dry and difficult to read it is, but there is some really great stuff in there. Actually trying to understand and then model the domain that your software is operating in seems like an extremely good idea, but its one of those things that’s really hard to do properly.

I’ve inherited at least one system built by people who had clearly read some of the book, but what I ended up with was a hard to maintain and understand system, so I’m going to assume that they did it wrong. I don’t know how to do it right though.

Regardless, I’ll keep trying to head in that direction as best I can.

Intelligent Design

The sixth session of the day was a presentation on UX and Design by Jamie Larkin. Her first such presentation in fact, which was actually really hard to tell, because she did extremely well.

The session itself was fantastic.

I’ve always questioned why developers seem to shy away from design (or why designers shy away from development), and I like to think that I’ve tried to keep UX high in my priority list when implementing things in the past. Having said that, I’m definitely not cognizant of many design patterns and principles, so it was really nice to see something with experience in both design and development talk about the topic.

The main body of the talk was focused on UX design patterns presented in such a way that they would be relevant to developers. Even better, the presentation used real websites (MailChimp and Air B&B) as examples. This was pretty great, because it paired the generic design principles with concrete examples of how they had been applied, or in some cases, how the design principles had been broken and how it was negatively affecting the resulting user experience.

Some specific takeaways:

  • Consistency is key. If you’re building something inside a system, its probably a good idea to match the style that is already present, even if it results in a sub-optimal experience. Disjointed design can be extremely damaging to the user experience.
  • Put things where users will expect to find them. This might mean bending towards common interaction paradigms (i.e. it looks like Word), or even just spending the time to understand your users so that interaction elements appear in places that make sense to them.
  • Understand what the user wants to accomplish and focus the experience around that. That is, don’t just present information or actions for no reason, focus them around goals and intent.
  • Consider the context in which the user wants to use the software. Are they on a train? In a car? At home in bed? Smart answers to these questions can make a huge difference to the usability of your service.
  • Feedback to the user while operating your system is essential. Things like hover highlights, immediate feedback when validating user input and loading or processing indicators can really reinforce in the users mind that they are doing something meaningful and that the system recognizes that

At the end of the session I left richer in knowledge than when I arrived, so I consider that a victory.

Don’t Trust Your Brain

The last session I attended was a presentation on cognitive bias by Joseph Cooney.

For me, this was the most interesting session of the day, as it really reinforced that I should never trust the first thing that comes into my brain, because it was probably created as a result of a lazy thought process that took as many shortcuts as it could.

I’ve been aware of the concept of cognitive bias for a while now, but I didn’t really understand it all that well. To be honest, I still don’t really understand it all that well, but I think I know more about it than I did before the session, so that’s probably a good outcome.

To quote my notes from the session verbatim:

Cognitive bias is the situations where people don't make rational decisions for a number of reasons (which may not be conscious). Kind of like an optical illusion, but harder to dispel.

Not the greatest definition in the world, but good enough to be illustrative I think.

What it comes down to is that the human brain appears to operate via a combination of two systems:

  • The first system is automatic, effortless, fast and specialized. Its always running in the background and offers up images and feelings as opposed to raw data. It thinks in stories and deals with ambiguity well, even retconning past events to fit into a new model as necessary.
  • The second system is deliberate, effortful, slow, general purpose and incredibly lazy. That is, you have to actually try to engage it, as its expensive to run.

The first system does a lot of work, and helps you to make decisions quickly and without fuss. Unfortunately, sometimes it takes a shortcut that is less appropriate than it could be and makes a non-ideal decision, thus cognitive bias.

As conscious beings though, we can choose to be aware of the decisions being made by the first system, question them and kick the second system into gear if we need to (performing the rational and data based analysis that we thought we were probably doing in the first place).

I’m sure I haven’t done the topic justice here though, so if you’re interested, I recommended starting at the article in Wikipedia and discovering all the ways in which I have misinterpreted and otherwise misrepresented such an interesting facet of the human psyche.

In summary, 10/10, would listen to talk again.

Conclusion

Unfortunately, I had to bug out before the locknote (a session on how to support constant change), but all in all the day was well worth it.

Its always nice to see a decent chunk of the Brisbane Developer Community get together and share the knowledge they’ve gained and the lessons they’ve learned over the last year. DDD is one of those low-key conferences that just kind of happens (thanks to the excellent efforts of everyone involved obviously), but doesn’t seem to have the underlying agenda that others do. It really does feel like a bunch of friends getting together to just chat about software development stuff, and I appreciate that.

If you get a chance, I highly recommend attending.

0 Comments

This post is a week later than I originally intended it to be, but I think we can all agree that terrifying and unforeseen technical problems are much more interesting than conference summaries.

Speaking of conference summaries!

DDD Brisbane 2018 was on Saturday December 1, and, as always, it was a solid event for a ridiculously cheap price. I continue to heartily recommend it to any developer in Brisbane.

In an interesting twist of fate I actually made notes this time, so I’m slightly better prepared to author this summarization.

Lets see if it makes a difference.

I Don’t Think That Word Means What You Think It Means

The first session of the day, and thus the keynote, was a talk on Domain Driven Design by Jessica Kerr.

Some pretty good points here about feedback/growth loops, and ensuring that when you establish a loop that you understand what indirect goal that you are actually moving towards. One of the things that resonated the most with me here was how most long term destinations are actually the acquisition of domain knowledge in the brains of your people. This sort of knowledge acquisition allows for a self-perpetuating success cycle, as the people building and improving the software actually understand the problems faced by the people who use it and can thus make better decisions on a day to day basis.

As a lot of that knowledge is often sequestered inside specific peoples heads, it reinforced to me that while the software itself probably makes the money in an organization, its the people who put it together that allow you to move forward. Thus retaining your people is critically important, and the cost of replacing a person who is skilled at the domain is probably much higher than you think it is.

A softer, less technical session, but solid all round.

Scale Mail

The next session that I attended was about engineering for scale from a DDD staple, Andrew Harcourt.

Presented in his usual humorous fashion, it featured a purely hypothetical situation around a census website and the requirement that it be highly available. Something that would never happen in reality I’m sure.

Interestingly enough, it was a live demonstration as well, as he invited people to “attack” the website during the talk, to see if anyone could flood it with enough requests to bring it down. Unfortunately (fortunately?) no-one managed to do any damage to the website itself, but someone did managed to take out his Seq instance, which was pretty great.

Andrew went through a wealth of technical detail about how the website and underlying service was constructed (Docker, Kubernetes, Helm, React, .NET Core, Cloudflare) illustrating the breadth of technologies involved. He even did a live, zero-downtime deployment while the audience watched, which was impressive.

For me though, the best parts of the session were the items to consider when designing for scale, like:

  • Actually understand your expected load profile. Taking the Australian Census as an example, it needed to be designed for 25 million requests over an hour (i.e. after dinner as everyone logged on to do the thing), instead of that load spread evenly across a 24 hour period. In my opinion, understanding your load profile is one of the more challenging aspects of designing for scale, as it is very easy to make a small mistake or misunderstanding that snowballs from that point forward.
  • Make the system as simple as possible. A simpler system will have less overhead and generally be able to scale better than a complex one. The example he gave (his Hipster Census), contained a lot of technologies, but was conceptually pretty straight forward.
  • Provide developers with a curated path to access the system. This was a really interesting one, as when he invited people to try and take down the website, he supplied a client library for connecting to the underlying API. What he didn’t make obvious though, was that the supplied client library had rate limiting built in, which meant that anyone who used it to try and flood the service was kind of doomed from that start. A sneaky move indeed. I think this sort of thing would be surprisingly effective even against actual attackers, as it would catch out at least a few of them.
  • Do as little as possible up front, and as much as possible later on. For the census example specifically, Andrew made a good point that its more important to simply accept and store the data, regardless of its validity, because no-one really cares if it takes a few months to sort through it later.
  • Generate access tokens and credentials through math, so that its much easier to filter out bad credentials later. I didn’t quite grok this one entirely, because there was still a whitelist of valid credentials involved, but I think that might have just been for demonstration purposes. The intent here is to make it easier to sift through the data later on for valid traffic.

As is to be expected from Andrew, it was a great talk with a fantastic mix of both new and shiny technology and real-world pragmatism.

Core Competencies

The third session was from another DDD staple, Damien McLennan.

It was a harrowing tale of one mans descent into madness.

But seriously, it was a great talk about some real-world experiences using .NET Core and Docker to build out an improved web presence for Work180. Damien comes from a long history of building enterprisey systems (his words, not mine) followed by a chunk of time being entirely off the tools altogether and the completely different nature of the work he had to do in his new position (CTO at Work180) threw him for a loop initially.

The goal was fairly straightforward; replace an existing hosted solution that was not scaling well with something that would.

The first issue he faced was selecting a technology stack from the multitude that were available; Node, Python, Kotlin, .NET Core and so on.

The second issue he faced, once he had made the technology decision, was feeling like a beginner again as he learned the ins and outs of an entirely new thing.

To be honest, the best part of the session was watching a consummate industry professional share his experiences struggling through the whole process of trying a completely different thing. Not from a “ooooo, a train wreck” point of view though, because it wasn’t that at all. It was more about knowing that this is something that other people have gone through successfully, which can be really helpful when its something that you’re thinking about doing yourself.

Also, there was some cool tech stuff too.

To Be Continued

With three session summaries out of the way, I think this blog post is probably long enough.

Tune in next week for the thrilling conclusion!