0 Comments

You've run across a blog post that has moved!

I've got a new blog now and I'm moving over bits and pieces of content depending on my need. Also, I don't have an easy mechanism to do a permanent redirect, so I have to settle for this instead.

Click here to see the exactly same content that used to be here, except now it also has a picture.

0 Comments

Its been two months since I posted about our foray into continuous data migration from our legacy product to our cloud platform. Its been mulling around in the back of my head ever since though, and a few weeks ago we finally got a chance to go back and poke at it again. Our intent was to extend the original prototype and get it into a place where it was able to be demonstrated to a real client (using their real data).

Spending development effort to build an early, extremely rough prototype and then demonstrating it to users and gathering feedback as quickly as possible is a great way to stop from building the wrong thing. You can get early indicators about whether or not you’re headed in the right direction without having to invest too much money, assuming you pick a representative example of your target audience of course.

When we finished building the first prototype, it quickly became apparent we couldn’t actually show it to a user. We could barely show it to ourselves.

I can just imagine the sort of awkward statement that such a demonstration would have started with:

And when you make a change in your legacy system like this, all you have to do is wait 45+ minutes before its available over here in the new and shiny cloud platform! How great is that!

Its a pretty hard sell, so before we could even talk to anyone, we needed to do better.

Its All So Pointless

The first prototype extended our existing migration process, and meant that instead of creating a brand new account in our cloud platform every time customer data was migrated, it could update an existing account.

In doing so, it just re-ran the entire migration again (query, transform, publish) over the entire customer data set, focusing its efforts on identifying whether or not a transformed entity was new or existing and then performing the appropriate actions via the cloud API’s

This was something of a nuclear approach (like our original strategy for dealing with database restores in the sync algorithm) and resulted in a hell of a lot of wasted effort. More importantly, it resulted in a huge amount of wasted time, as the system still had to iterate through thousands of entities only to decide that nothing needed to be done.

The reality is that customers don’t change the entire data set all the time. They make small changes consistently throughout the day, so as long as we can identify only those changes and act on them, we should be able to do an update in a much shorter amount of time.

So that’s exactly what we did.

Tactical Strike

Whoever it was that implemented row level versioning in our legacy database, I should send them a gift basket or something, as it was one of the major contributing factors to the success of our data synchronization algorithm. With all of that delicious versioning information is available to the migration process,  the optimisation is remarkably simple.

Whenever we do a migration and it results in a set of transformed entities, we just store the versioning information about those entities.

When an update is triggered, we can query the database for the things that are greater than the maximum version we dealt with last time, vastly decreasing the amount of raw information that we have to deal with, decreasing the total amount of time taken.

The only complication is that because each transformed entity might be built up from many legacy entities, the version must be an aggregate, specifically the lowest version of the constituent entities.

With that change in place, rather than the execution time of every migration being directly dependent on the total size of the customers data, its now dependent on how much has changed since we last migrated. As a result, its actually better to run the update process frequently to ensure that it makes many small updates over the course of the day, reducing the overall latency of the process and generally giving a much better user experience.

Excellent Timing

Speaking of running many small updates over the course of a day.

With the biggest blocker to a simple timer out of the way (the time required to execute the migration), we could actually put a timer into place.

The migration API is written in Kotlin, using Spring IO and Spring Batch, so it was a relatively simple matter to implement an in-memory job that runs every minute, identifies the migrations that should be updated (by picking the last successfully completed ones) and then executes an update operation on each.

For simplicity we execute the job synchronously, so each migration update must finish before the next can start, and the entire job cannot be rescheduled for re-execution until the previous job finishes. Obviously that approach doesn’t scale at all (every additional migration being updated increases the latency of the others), but in a controlled environment where we only have a limited set of migrations, its perfectly fine.

The only other thing we had to do in order to ensure the timer job worked as expected was to lock down the migration API to only have a single instance. Again, something that would never advise in production, but is acceptable for a prototype. If we do end up using a timer in production, we’d probably have to leverage some sort of locking process to ensure that it only executes once.

Oops I Did It Again

We are highly unlikely to go with the delta approach if this project pushes ahead though.

It provides just enough functionality to be able to to demonstrate the concept to the anchor customer (and maybe a few additional validation customers), but it does not cater for at least two critical cases:

It could be augmented to cater for those cases of course.

Its just software, and software is infinitely mutable, but all we would be doing is re-implementing the data synchronization algorithm, and it was hard enough to get right the first time. I don’t really want to write it all over again in a different system, we’ll just mess it up in new and unique ways.

Instead, we should leverage the existing algorithm, which is already really good at identifying the various events that can happen.

So the goal would be to implement some sort of event pipeline that contains a set of changes that have occurred (i.e. new entity, updated entity, deleted entity, database restored to X and so on), and then react to those events as appropriate from the migration side.

Obviously its not that simple in practice, but it is likely the direction that we will end up going if this all pans out.

Conclusion

What we’re left with right now is a prototype that allows for changes to customer data in their legacy system to be applied automatically to the cloud platform with a latency of under 5 minutes.

Of course, it has a bunch of conditions attached to it (every additional customer makes it slower for every other customer, doesn’t handle deletes, doesn’t handle database restores, was not built using normal engineering practices), but its enough to demonstrate the concept to a real person and start a useful conversation.

As is always the case with this sort of thing, there is a very real risk that this prototype might accidentally become production code, so its something that we as developers are eternally vigilant against.

That’s a hill I’m prepared to die on though.

0 Comments

Software Development as a discipline puts a bunch of effort into trying to minimise the existence and creation of bugs, but the reality is that its an investment/return curve that flattens off pretty quickly.

Early discovery of issues is critical. Remember, the cost to the business for a bug existing is never lower than it is at development time. The longer it has to fester, the worse its going to get.

Of course, when a bug is discovered, there are decisions to make around whether or not to fix it. For me, every bug that exists in a piece of software that might cause an issue for a user is a mark against its good name, so my default policy is to fix. Maybe not in the same piece of work that it was found in, but in general, bugs should be fixed.

That is, unless you hit that awkward conjunction of high cost/low incidence.

Why waste a bunch of money fixing a bug that might never happen?

Split Personality

I’m sure you can guess from the mere existence of this blog post that this is exactly the situation we found ourselves in recently.

While evaluating a new component in our legacy application, we noticed that it was technically possible to break what was intended to be an entirely atomic operation consisting of multiple database writes.

Normally this wouldn’t even be worth talking about, as its basically the reason that database transactions exist. When used correctly its a guarantee of an all or nothing situation.

Unfortunately, one of the writes was in Visual Basic 6 code, and the other was in .NET.

I don’t know if you’ve ever tried to span a database transaction across a technology boundary like that, but its not exactly the easiest thing in the world.

When we looked into the actual likelihood of the issue occurring, we discovered that if the VB6 part failed, we could easily just rollback the .NET part. If the write failed in .NET though, we had no way to go back and undo the work that had already been done in VB6. Keep in mind, this entire section was essentially creating transactions in trust accounting application, so non-atomic operations can get users into all sorts of terrible situations.

On deeper inspection, the only way we thought the .NET stuff could fail would be transitory database issues. That is, connection or command timeouts or disconnects.

We implemented a relatively straightforward retry strategy to deal with those sorts of failures and then moved on. Sure, it wasn’t perfect, but it seemed like we’d covered our bases pretty well and mitigated the potential issue as best we could.

I Did Not See That One Coming

Of course, the code failed in a way completely unrelated to temporary connectivity issues.

In our case, we were stupid and attempted to write an Entity Framework entity to the database whose string values exceeded the column size limits. Long story short, we were concatenating an information field from some other fields and didn’t take into account that maybe the sum of those other fields would exceed the maximum.

The write failure triggered exactly the situation that we were worried about; the actual trust account record had been created (VB6) but our record if it happening was missing (.NET).

I still don’t actually know why we bothered implementing column size limits. As far as I know, there is no difference between a column of VARCHAR(60) and VARCHAR(MAX) when it comes to performance. Sure, you could conceivably store a ridiculous amount of data in the MAX column at some point, but I feel like that is a lot less destructive than the write failures (and its knock-on effects) that we got.

Even worse, from the users point of view, the operation looked like it had worked. There were no error notifications visible to them, because we couldn’t write to the table that we used to indicate errors! When they returned to their original action list though, the item that failed was still present. They then processed it again and the same thing happened (it looked like it worked but it was still in the list afterwards) at which point they twigged that something was unusual and contacted our support team (thank god).

Once we found out about the issue, we figured out pretty quickly what the root cause was thanks to our logging and cursed our hubris.

Off With Their Head!

The fix for this particular problem was easy enough and involved two extension methods; one for truncating a string and another for scanning an object and automatically truncating string lengths as per data annotation attributes.

public static string Truncate(this string value, int maxLength, ILogger logger = null)
{
    if (string.IsNullOrEmpty(value))
    {
        return value;
    }

    if (maxLength < 0) throw new ArgumentException($"Truncate cannot be used with a negative max length (supplied {nameof(maxLength)} was [{maxLength}]). That doesn't even make sense, what would it even do?", nameof(maxLength));

    if (value.Length <= maxLength)
    {
        return value;
    }

    string truncated = null;
    truncated = maxLength <= 3 ? value.Substring(0, maxLength) : value.Substring(0, maxLength - 3) + "...";

    
    logger?.Debug("The string [{original}] was truncated because it was longer than the allowed length of [{length}]. The truncated value is [{truncated}]", value, maxLength, truncated);

    return truncated;
}

public static void TruncateAllStringPropertiesByTheirMaxLengthAttribute(this object target, ILogger logger = null)
{
    var props = target.GetType().GetProperties().Where(prop => Attribute.IsDefined(prop, typeof(MaxLengthAttribute)) && prop.CanWrite && prop.PropertyType == typeof(string));

    foreach (var prop in props)
    {
        var maxLength = prop.GetCustomAttribute(typeof(MaxLengthAttribute)) as MaxLengthAttribute;
        if (maxLength != null)
        {
            prop.SetValue(target, ((string)prop.GetValue(target)).Truncate(maxLength.Length, logger));
        }
    }
}

Basically, before we write the entity in question, just call the TruncateAllStringPropertiesByTheirMaxLengthAttribute method on it.

With the immediate problem solved, we were still left with two outstanding issues though:

  • A failure occurred in the code and the user was not notified
  • An atomic operation was still being split across two completely different programming contexts

In this particular case we didn’t have time to alleviate the first issue, so we pushed forward with the fix for the issue that we knew could occur.

We still have absolutely no idea how to deal with the second issue though, and honestly, probably never will.

Conclusion

In retrospect, I don’t think we actually made the wrong decision. We identified an issue, analysed the potential occurrences, costed a fix and then implemented a smaller fix that should have covered out bases.

The retry strategy would likely have dealt with transitory failures handily, we just didn’t identify the other cases in which that section could fail.

As much as I would like to, its just not cost-effective to account for every single edge case when you’re developing software.

Well, unless you’re building like pacemaker software or something.

Then you probably should.

0 Comments

Sometimes when maintaining legacy software, you come across a piece of code and you just wonder;

But….why?

If you’re lucky the reasoning might end up being sound (i.e. this code is structured this way because of the great {insert-weird-issue-here} of 1979), but its probably more likely that it was just a plain old lack of knowledge or experience.

Its always a journey of discovery though, so enjoy the following post, sponsored by some legacy VB6 code that interacts with a database.

Building On Shaky Foundations

As is typically the precursor to finding weird stuff, it all started with a bug.

During the execution of a user workflow, the presence of a ‘ character in a name caused an error, which in turn caused the workflow to fail, which in turn led to some nasty recovery steps. I’m not going to get into the recovery steps in this post, but suffice to say, this workflow was not correctly wrapped in a transaction scope and we’ll just leave it at that.

Now, the keen-eyed among you might already be able to guess what the issue was. Maybe because you’ve already fought this battle before.

Deep in the bowels of that particular piece of code, an SQL statement was being manually constructed using string concatenation.

Private Function Owner_LetPrepFees_CreateTransaction(ByVal psTransAudit As String, ByVal plTransLink As Long, ByVal plTransPropertyID As Long, _
                                                     ByVal plTransOwnerID As Long, ByVal psTransFromRef As String, ByVal psTransFromText As String, _
                                                     ByVal plTransCreditorID As Long, ByVal psTransToRef As String, ByVal psTransToText As String, _
                                                     ByVal psTransReference As String, ByVal psTransType As String, ByVal psTransDetails As String, _
                                                     ByVal pbGSTActive As Boolean, ByVal pnTransAmount As Currency, ByVal psTransLedgerType As String, _
                                                     ByVal pbLetFee As Boolean, ByVal pbPrepFee As Boolean, ByVal plCount As Long) As Long
    Dim sInsertFields           As String
    Dim sInsertValues           As String
    Dim lCursorLocation         As Long

    ' replaces single "'" with "''" to avoid the "'" issue in insert statement
    psTransFromRef = SQL_ValidateString(psTransFromRef)
    psTransFromText = SQL_ValidateString(psTransFromText)
    psTransToRef = SQL_ValidateString(psTransToRef)
    psTransToText = SQL_ValidateString(psTransToText)
    psTransReference = SQL_ValidateString(psTransReference)
    psTransDetails = SQL_ValidateString(psTransDetails)

    sInsertFields = "AccountID,Created,UserId,Audit,Date"
    sInsertValues = "" & GetAccountID() & ",GetDate()," & User.ID & ",'" & psTransAudit & "','" & Format(UsingDate, "yyyyMMdd") & "'"
    If plTransLink <> 0& Then
        sInsertFields = sInsertFields & ",Link"
        sInsertValues = sInsertValues & "," & plTransLink
    End If
    If plTransPropertyID <> 0 Then
        sInsertFields = sInsertFields & ",PropertyId"
        sInsertValues = sInsertValues & "," & plTransPropertyID
    End If
    If plCount = 1& Then
        sInsertFields = sInsertFields & ",OwnerID,FromRef,FromText"
        sInsertValues = sInsertValues & "," & plTransOwnerID & ",'" & Left$(psTransFromRef, 8&) & "','" & Left$(psTransFromText, 50&) & "'"
    Else
        sInsertFields = sInsertFields & ",CreditorId,ToRef,ToText"
        sInsertValues = sInsertValues & "," & plTransCreditorID & ",'" & Left$(psTransToRef, 8&) & "','" & Left$(psTransToText, 50&) & "'"
    End If
    sInsertFields = sInsertFields & ",Reference,TransType,Details,Amount,LedgerType"
    sInsertValues = sInsertValues & ",'" & psTransReference & "','" & psTransType & "','" & IIf(pbGSTActive, Left$(Trim$(psTransDetails), 50&), Left$(psTransDetails, 50&)) & "'," & pnTransAmount & ",'" & psTransLedgerType & "'"
    If pbLetFee And psTransType = "CR" And psTransLedgerType = "JC" Then
        sInsertFields = sInsertFields & ",DisburseType"
        sInsertValues = sInsertValues & "," & dtLetFeePayment
    ElseIf pbPrepFee And psTransType = "CR" And psTransLedgerType = "JC" Then
        sInsertFields = sInsertFields & ",DisburseType"
        sInsertValues = sInsertValues & "," & dtPrepFeePayment
    End If
    If pbGSTActive Then
        sInsertFields = sInsertFields & ",GST"
        sInsertValues = sInsertValues & "," & 1&
    End If
    
    lCursorLocation = g_cnUser.CursorLocation
    g_cnUser.CursorLocation = adUseClient
    Owner_LetPrepFees_CreateTransaction = RSQueryFN(g_cnUser, "SET NOCOUNT ON; INSERT INTO Transactions (" & sInsertFields & ") VALUES (" & sInsertValues & "); SET NOCOUNT OFF; SELECT SCOPE_IDENTITY() ID")
    g_cnUser.CursorLocation = lCursorLocation

End Function

Hilariously enough, I don’t even have a VB6 Syntax Highlighter available on this blog, so enjoy the glory of formatter:none.

Clearly we had run into the same problem at some point in the past, because the code was attempting to sanitize the input (looking for special characters and whatnot) before concatenating the strings together to make the SQL statement. Unfortunately, input sanitization can be a hairy beast at the best of times, and this particular sanitization function was not quite doing the job.

Leave It Up To The Professionals

Now, most sane people don’t go about constructing SQL Statements by concatenating strings, especially when it comes to values that are coming from other parts of the system.

Ignoring SQL injection attacks for a second (its an on-premises application, you would already have to be inside the system to do any damage in that way), you are still taking on a bunch of unnecessary complexity in handling all of the user input sanitization yourself. You might not even get it right, as you can see in the example above.

Even in VB6 there exists libraries that do exactly that sort of thing for you, allowing you to use parameterized queries.

No muss, no fuss, just construct the text of the query (which is still built, but is built from controlled strings representing column names and parameter placeholders), jam some parameter values in and then execute. It doesn’t matter what’s in the parameters at that point, the library takes care of the rest.

Private Function Owner_LetPrepFees_CreateTransaction(ByVal psTransAudit As String, _
            ByVal plTransLink As Long, ByVal plTransPropertyID As Long, _
            ByVal plTransOwnerID As Long, ByVal psTransFromRef As String, ByVal psTransFromText As String, _
            ByVal plTransCreditorID As Long, ByVal psTransToRef As String, ByVal psTransToText As String, _
            ByVal plTransReference As Long, ByVal psTransType As String, ByVal psTransDetails As String, _
            ByVal pbGSTActive As Boolean, ByVal pnTransAmount As Currency, ByVal psTransLedgerType As String, _
            ByVal pbLetFee As Boolean, ByVal pbPrepFee As Boolean, ByVal plCount As Long) As Long            


    'Trims the parameters to fit inside the sql tables columns'
    psTransFromRef = Left$(Trim$(psTransFromRef), 8&)
    psTransFromText = Left$(Trim$(psTransFromText), 50&)
    psTransToRef = Left$(Trim$(psTransToRef), 8&)
    psTransToText = Left$(Trim$(psTransToText), 50&)
    psTransDetails = Left$(Trim$(psTransDetails), 50&)

    Dim sCommand    As String
    Dim sFieldList  As String
    Dim sValueList  As String
    Dim oCommand    As ADODB.Command
    Dim oParameter  As ADODB.Parameter

    Set oCommand = New Command
    oCommand.ActiveConnection = g_cnUser
    oCommand.CommandType = adCmdText
    oCommand.CommandTimeout = Val(GetRegistry("Settings", "SQL Command Timeout Override", "3600"))

    sFieldList = "AccountID, Created, UserId, Audit, Date"
    sValueList = "?" & ", GetDate()" & ", ?" & ", ?" & ", ?"
    oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , GetAccountID())
    oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , User.ID)
    oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 10, psTransAudit)
    oCommand.Parameters.Append oCommand.CreateParameter(, adDate, adParamInput, , UsingDate)

    If plTransLink <> 0& Then
        sFieldList = sFieldList & ", Link"
        sValueList = sValueList & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , plTransLink)
    End If

    If plTransPropertyID <> 0 Then
        sFieldList = sFieldList & ", PropertyId"
        sValueList = sValueList & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , plTransPropertyID)
    End If

    If plCount = 1& Then
        sFieldList = sFieldList & ", OwnerID" & ", FromRef" & ", FromText"
        sValueList = sValueList & ", ?" & ", ?" & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , plTransOwnerID)
        oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 8, psTransFromRef)
        oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 50, psTransFromText)
    Else
        sFieldList = sFieldList & ", CreditorId" & ", ToRef" & ", ToText"
        sValueList = sValueList & ", ?" & ", ?" & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , plTransCreditorID)
        oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 8, psTransToRef)
        oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 50, psTransToText)
    End If

    sFieldList = sFieldList & ", Reference" & ", TransType" & ", Details" & ", Amount" & ", LedgerType"
    sValueList = sValueList & ", ?" & ", ?" & ", ?" & ", ?" & ", ?"
    oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, 8, plTransReference)
    oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 2, psTransType)
    oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 100, psTransDetails)
    oCommand.Parameters.Append oCommand.CreateParameter(, adCurrency, adParamInput, , pnTransAmount)
    oCommand.Parameters.Append oCommand.CreateParameter(, adVarChar, adParamInput, 2, psTransLedgerType)

    If pbLetFee And psTransType = "CR" And psTransLedgerType = "JC" Then
        sFieldList = sFieldList & ", DisburseType"
        sValueList = sValueList & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , dtLetFeePayment)
    ElseIf pbPrepFee And psTransType = "CR" And psTransLedgerType = "JC" Then
        sFieldList = sFieldList & ", DisburseType"
        sValueList = sValueList & ",?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adInteger, adParamInput, , dtPrepFeePayment)
    End If

    If pbGSTActive Then
        sFieldList = sFieldList & ", GST"
        sValueList = sValueList & ", ?"
        oCommand.Parameters.Append oCommand.CreateParameter(, adBoolean, adParamInput, , pbGSTActive)
    End If

    Set oParameter = oCommand.CreateParameter(, adInteger, adParamReturnValue)
    oCommand.Parameters.Append oParameter
    sCommand = "INSERT INTO Transactions (" & sFieldList & ") VALUES (" & sValueList & "); SELECT ?=SCOPE_IDENTITY()"
    oCommand.CommandText = sCommand
    oCommand.Execute , , adExecuteNoRecords
    Exit Function    
    
    Owner_LetPrepFees_CreateTransaction = oParameter.Value
    Set oCommand = Nothing
    
End Function

Of course, because VB6 is a positively ancient language, its not quite as easy as all that.

Normally, in .NET or something similar, if you are constructing a parameterized query yourself, you’d probably use a query builder of some sort, and you would name your parameters so that the resulting query was easy to read as it goes through profilers and logs and whatnot.

No such luck here, the only parameter placeholder that seems to be supported is the classic ?, which means the parameters are positional. This can be dangerous when the code is edited in the future, but if we’re lucky, maybe that won’t happen.

Its still better than jamming some strings together though.

Conclusion

I think of this as a textbook case for never just “fixing the bug” as its stated. Sure, we could have just edited the SQL_ValidateString function to be more robust in the face of a particular arrangement of special characters, but that’s just putting a bandaid on a bad solution.

Instead we spent the time to work with the language and the constructs available to put a solution in place that is a lot more robust in the face of unexpected input.

And lets be honest, at this point, the only input we get is the unexpected kind.

0 Comments

Its been about two months since I first posted about Dungeons and Dragons, which seems like a sane amount of time to wait before writing a followup. A lot can change in two months after all.

And change it has! As is unfortunately sometimes the case, we had to do some pretty classic organizational restructuring and as a result, our old DM has gone the way of Gary Gygax.

To be clear, he’s not dead, but he's definitely no longer with us. His absence is keenly felt, both from a D&D point of view and from an organizational point of view. Also it makes me sad.

Its not all doom and gloom though.

His departure provided a natural opportunity to refresh and revisit our D&D group and paved the way for a second “season” of D&D campaigns.

Where once there was a mere eight players in a single group, now we are legion. Surely it is only a matter of time before the entire company pivots and we flip to some sort of D&D related business.

And I for one welcome our new draconic overlords.

The Phoenix Returns To Life In 1d6 Days

While the dissolution of the first group was a sad day indeed, our attempts to reform were immediately met with a fresh wave of interest from a wide variety of people.

I like to think that its all because of the masterfully crafted email that I sent out looking for fresh blood, but its probably more likely that the buzz generated from the ridiculousness of the first groups campaign provided for solid word of mouth advertising.

It wasn’t long before I had a list of about twenty people who wanted to see what all the fuss was about. The best part was that the people were from all across the business, and ranged from other members of our development team, to our support department, all the way through to marketing.

Having discovered in the first season that seven players and one DM is somewhat challenging to run (similar to how large teams get less effective), the goal was to try and limit each group to five players. Using the power of maths, that meant we would have four groups.

Of course, with the departure of our sole DM, we’d need to find a new one. Actually, we’d need to find four new ones, as running even a single group can be time consuming at the best of times, let alone four completely different groups.

Taking into account everyone’s availability, the desire for each group to have a mixture of newbies and experienced people and the fact that the volunteer DM’s would also like to be able to play was somewhat….challenging. My logistical skills are pretty good these days though, so once I’d exited the fugue state brought on by all of the constraints, we were left with four groups organized over Monday, Tuesday, Wednesday and Thursday evenings and everybody reasonably happy.

Also, I was a DM, because I’m a sucker like that.

Behind The Curtain

Having only played D&D once before, peeking behind the curtain was both enlightening and not really all that surprising.

To make things easy, the fresh DM’s all picked solely from the pre-built adventures supplied by Wizards of the Coast. Its a lot simpler to get a handle on the responsibilities of being a DM when you are provided with some guidelines and ideas that are easy to extend (or not) as you wish, without requiring you to think of absolutely everything.

I picked Hoard of the Dragon Queen, which is a pretty big campaign that is actually the first half of a massive two-part adventure (the other half being Rise of Tiamat). Using our very first group as a high water mark, I’m not sure we’ll ever finish it, but that probably doesn’t matter as long as we have fun while we try.

For me, already being in a leadership role, the transition to being a DM is probably only about half as hard as it is for someone who has never had the experience of organising and guiding a group of people. Of course, that leaves the other half of the difficulty fully intact, so its not all easy riding.

To compare:

  • As a player, you really only have to know the details about your own character and the basic gameplay tenets, and even those are not super important if you have a solid DM to guide you.
  • As the DM, you need to know everything. Literally everything, top to bottom, all the rules and constraints, how the game flows in the abstract and how your campaign needs to progress. You also need to be good at improvising, as you can plan for a huge variety of things, but once it gets into the hands of your players, all of that might go out the window.

Finally, the last (and possible most critical part) is knowing when to step in and when to let things go. You have to be careful to not spoil anyone’s fun (within reason) and to enable those cool D&D moments to happen as organically as possible, but you also need to maximise the amount of fun happening across the entire group.

Like I said in my first post, its a fine line to walk.

Epic With A Capital E

Interestingly enough, with multiple groups running at the same time, we might eventually be able to do some sort of epic crossover. Imagine a campaign where one or more different groups are working together to accomplish a greater goal, or even better, maybe the groups are working against each other, possibly without even realising it.

You could probably retrofit multiple groups acting together into one of the pre-built campaigns, but to be honest, you’d be better off putting a custom one together that was built with the concept in mind from the start.

Its a neat dream, but as we have people who are both players and DM’s, we would have to be pretty careful as anyone filling both roles would know too much.

I think I’d be more than willing to give up the ability to play to put something like that together though, as it would be an amazing experience for everyone involved.

Cure Wounds, Gain 1d8 Hit Points

Last but not least, one of the completely unintended side effects of running D&D groups at work is employee retention.

This comes in two flavours.

The first is the simple presence of social activities that help to bind people together around more than just work. For us, D&D is just another addition to an already healthy space, but being focused on much smaller groups it results in slightly different benefits (tighter bonds, closer friendships, etc).

The second is the continuity of the campaign itself. D&D groups are essentially engaged in collaborative storytelling, weaving an epic, personalised tale together over time. Its hard to pull away from that on a whim, so it tends to increase the barrier to leaving for non-critical reasons.

I say non-critical, because if your workplace is bad enough, no amount of social activities will keep the good people around.

Having said all of that, I would never prevent a former employee from continuing to participate in D&D. As long as they can regularly make the sessions, they are always welcome in any campaign that they were already a member of. Obviously current employees get preference when it comes to putting together new groups, but there is no reason to be a jerk about it just because someone has moved on.

Conclusion

We started this whole thing because there were a few of us that just wanted to play D&D, and having a bunch of people who were already together on a day to day basis made it a lot easier to organise. Its pretty hard to get a group of adults with various adult responsibilities together on a regular basis after all.

The result of that initial, relatively selfish desire has been a wealth of social activity that is effectively linking a bunch of different business units together, and is helping to smooth out some of the natural stress inherent in working in a challenging environment.

All in all, I think its been a worthwhile usage of my (and other peoples) time and I highly recommend it.