0 Comments

If you’re building some sort of software that you intend to be used by real people, you should probably talk to those people as soon as possible. Preferably before you start building the software.

You might be pretty confident about your plans and direction, but in my mind its far more sane to validate your assumptions before you spend hundreds of thousands of dollars building something that nobody wants. Or even worse, that they only want a little bit.

Granted, engaging with customers directly is a different sort of challenge compared to software development, but I think its worth the effort.

I’m Kind Of A Big Deal, People Know Me

I’ve alluded to it in at least two relatively recent blog posts, but one of my teams is engaged in a project whose goal is to give existing customers the ability to use some of our new and shiny cloud features while continuing to use their older, more mature desktop software for everyone else on a day-to-day basis.

For me, the obvious first step is to talk to some of our existing customers and see if they are interested. Granted, some of them already use an equivalent third party product to do basically the same thing, but that’s no guarantee that they would accept our offering.

Now, I’m not sure whether or not this kind of early consultative customer has a specific name in the industry, but internally we’ve taken to calling them anchor/validation customers.

The core idea behind validation customers is simple; find a small set of customers that represent your wider audience and engage with them around the situation you are trying to improve. Once that representative sample of customers is happy with the new thing and are actively using it, then you’re probably in a good position to roll it out to a wider audience.

Luckily for us (and for this project) our legacy product has quote a large installed user base and we’re planning to offer an incremental improvement to the way they already work, so locating potential validation customer candidates isn’t too hard.

I imagine it would be a completely different story in a different situation.

60% Of The Time It Works Every Time

Actually gathering feedback from the validation customers is possibly the most interesting part.

Early engagements can be all words if necessary, but eventually you’ll want the customer to actually perform the workflow that you’re trying to improve for real. This can be a hard sell if you’re expecting them to do double the work for no immediate gain (for example, asking them to do all their normal work and then to do some of that work again in a different system), so be careful not to expect too much from the customer. Working prototypes help quite a lot here.

Having said that, even if you’re just shadowing them while they do their normal day-to-day activities, you’ll still learn an incredible amount.

There is one caveat that you should keep in mind though; don’t take what the customer says as gospel.

Most people when presented with a problem of some sort will start to solutionise it straight away. These customers may have been struggling with a particular workflow for a while and have probably given their chosen solution a lot of thought.

Its still incredibly useful information and you should definitely listen, but you need to focus on getting back down to what the actual problem is, rather than just focusing on the solution that they are presenting. Identify the problem or pain point and then let your team of smart people solve the problem.

And with that effortless topic transition, lets talk about the team.

I’m A Glass Case Of Emotion

Traditionally you’d probably have a product owner (or maybe a business analyst) doing all of the customer interaction, abstracting away the individual customers and providing a stream of curated feedback to the team.

To be honest, this is probably the most efficient structure in terms of pure time spent interacting with customers and gathering feedback, but you do accept a risk of having everything come through one person.

This time I wanted to try something a little different, and while each validation customer has a dedicated contact within the team (so they always have a familiar face), its not always the same person for every single customer. Additionally, and most importantly I think, that person is not the only one that visits or observes the customer to gather feedback.

At some point, every single member of the team will go visit a customer, probably multiple times.

This approach provides a wealth of qualitative benefits, but the main one is the generation of empathy with the end-user. Its hard not to empathise with someone when you’ve literally been out on the road with them while they are doing their job, and this affects everything you build from that point on.

The only downside that I’ve noticed is that the customer interaction can be quite mentally and emotionally exhausting, compared to the normal run of the mill software development process. Some people will take to it like a fish to water, but not everyone is the same, and you need to make sure that you leave enough space and downtime for them to recover, to prevent them from getting burnt out.

Conclusion

At the end of the day getting a team of software developers to interact with customers directly can be a scary thought for some, both from a leadership point of view and for the developers themselves.

I don’t think its as scary as anyone makes out though, assuming you trust your team to interact appropriately with the customers and you give them at least a modicum of coaching on good practices.

The only unfortunate thing about the whole process is that it is incredibly difficult to definitively measure the positive effect of customer interactions, as the benefits are almost all qualitative. Sure you’ll probably deliver a better, more successful outcome, but there are hundreds of factors that could have lead to that success. How can you pin any of it specifically back down to “we engaged with representative customers early and frequently”?

I think the the key part is getting that core set of validation customers to use whatever you’re building for real. If you manage to put something together and they pick it up, assuming they were a good representative sample, you can probably directly link the success of any subsequent rollout back down to those early engagements.

The whole empathy generation thing is just icing on the cake.

0 Comments

After that brief leadership interlude, its time to get back into the technical stuff with a weird and incredibly frustrating issue that I encountered recently when building a small command-line application using .NET 4.7.

More specifically, it failed to compile on our build server citing problems with a dependency that it shouldn’t have even required.

So without further ado, on with the show.

One Build Process To Rule Them All

One of the rules I like to follow for a good build process is that it should be executable outside of the build environment.

There are a number of reasons for this rule, but the two that are most relevant are:

  1. If something goes wrong during the build process you can try and run it yourself and see what’s happening, without having to involve the build server
  2. As a result of having to execute the process outside of the build environment, its likely that the build logic will be encapsulated in source control, alongside the code

With the way that a lot of software compilation works though, it can be hard to create build processes that automatically bootstrap the necessary components on a clean machine.

For example, there is no real way to compile a .NET Framework 4.7 codebase without using software that has to be installed. As far as I know you have to use either MSBuild, Visual Studio or some other component to do the dirty work. .NET Core is a lot better in this respect, because its all command line driven and doesn’t feature any components that must be installed on the machine before it will work. All you have to do is bootstrap the self contained SDK.

Thus while the dream is for the build process to be painless to execute on a clean git clone (with the intent that that is exactly what the build server does), sometimes dreams don’t come true, no matter how hard you try.

For us, our build server comes with a small number of components pre-installed, including MSBuild, and then our build scripts rely on those components existing in order to work correctly. There is a little bit of fudging involved though so you don’t have to have exactly the same components installed locally,  and it will dynamically find MSBuild for you.

This was exactly how the command-line application build process was working before I touched it.

Then I touched it and it all went to hell.

Missing Without A Trace

Whenever you go back to a component that hasn’t been actively developed for a while, you always have to decide whether or not you should go to the effort of updating its dependencies that are now probably out of date.

Of course, some upgrades are a lot easier to action than others (i.e. a NuGet package update is generally a lot less painful than updating to a later version of the .NET Framework), but the general idea is to put some effort into making sure you’ve got a strong base to work from.

So that’s exactly what I did when I resurrected the command-line application used for metrics generation. I updated the build process, renamed the repository/namespaces (to be more appropriate), did a pass over the readme and updated the NuGet packages. No .NET version changes though, because that stuff can get hairy and it was already at 4.7, so it wasn’t too bad.

Everything compiled perfectly fine in Visual Studio and the self-contained build process continued to work on my local machine, so I pushed ahead and implemented the necessary changes.

Then I pushed my code and the automated build process on our build server failed consistently with a bunch of compilation errors like the following:

Framework\IntegrationTestKernel.cs(64,13): error CS0012: The type 'ValueType' is defined in an assembly that is not referenced. You must add a reference to assembly 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'.

The most confusing part?

I had taken no dependency on netstandard as far as I knew.

More importantly, my understanding of netstandard was that it is basically a set of common interfaces to allow for a interoperability between the .NET Framework and .NET Core. I had no idea why my code would fail to compile citing a dependency I didn’t even ask for.

Also, it worked perfectly on my machine, so clearly something was awry.

The Standard Response

The obvious first response is to add a reference to netstandard.

This is apparently possible via NETStandard.Library NuGet package, so I added that, verified that it compiled locally and pushed again.

Same compilation errors.

My next hypothesis was that maybe something had gone weird with .NET Framework 4.7. There are a number of articles on the internet about similar looking topics and some of them read like later versions of .NET 4.7 (which are in-place upgrades for god only knows what reason) have changes relating to netstandard and .NET Framework integrations and compatibility. It was a shaky hypothesis though, because this application had always specifically targeted .NET 4.7.

Anyway, I flipped the projects to all target an earlier version of the .NET Framework (4.6.2) and then reinstalled all the NuGet packages (thank god for the Update-Package –reinstall command).

Still no luck.

The last thing I tried was removing all references to the C# 7 Value Tuple feature (super helpful when creating methods with complex return types), but that didn’t help either.

I Compromised; In That I Did Exactly What It Wanted

In the end I accepted defeat and made the  Visual Studio Build Tools 2017 available on our build server by installing them on our current build agent AMI, taking a new snapshot and then updating TeamCity to use that snapshot instead. In order to get everything to compile cleanly, I had to specifically install the .NET Core Build Tools, which made me sad, because .NET Core was actually pretty clean from a build standpoint. Now if someone puts together a .NET Core repository incorrectly, it will probably still continue to compile just fine on the build server, leaving a tripwire for the next time someone cleanly checks out the repo.

Ah well, can’t win them all.

Conclusion

I suspect that the root cause of the issue was updating to some of the NuGet packages, specifically the packages that are only installed in the test projects (like the Ninject.MockingKernel and its NSubstitute implementation) as the test projects were the only ones that were failing to compile.

I’m not entirely sure whya package update would cause compilation errors though, which is pretty frustrating. I’ve never experienced anything similar before, so perhaps those libraries were compiled to target a specific framework (netstandard 2.0) and those dependencies flowed through into the main projects they were installed into?

Anyway, our build agents are slightly less clean now as a result, which makes me sad, but I can live with it for now.

I really do hate system installed components though.

0 Comments

You might think that a quote from the cult classic Scarface is a poor way to start a blog post about leadership. I mean, Tony Montana isn’t exactly a beacon of good management. You probably shouldn’t look up to him in any way, shape or form.

But its a good quote all the same, because really, if you’re in a leadership position, one of the things that you should value the most is your word.

If you can’t keep it, than you should have kept your mouth shut in the first place.

Honesty Is Actually The Best Policy

I’m a firm believer in being honest and straightforward when communicating with anyone. It doesn’t matter if they are a general colleague, someone who you are directly responsible for or even your own boss, you should tell it like it is. Of course, nothing puts an honesty policy to the test like making a huge mistake, so when I look back, it seems to me like I’m pretty consistent about that sort of thing.

There are a bunch of benefits to being consistently straightforward and honest, but one of the most valuable is that its a natural trust builder. Of course, that doesn’t mean that people will necessarily like what you have to say, but that’s another thing entirely.

Regardless, if people know that you are transparent, regardless of the situation, then they know what to expect and don’t have to worry about spending cognitive resources deciphering what you really meant. That means more mental power dedicated to actually getting things done, which can only be a good thing.

Expecto Patronum

Interestingly enough, setting the correct expectations can actually be quite difficult, even if you’re being straightforward and honest.

The best possible situation is when you are able to supply full, unadulterated information, clear in purpose and unambiguous. In this case, there is little to no room for assumptions, and if the other party misinterpreted the information given, it is much easier to clear things up, as the misinterpretation is usually pretty obvious. That doesn’t mean its easy to clear up misconceptions, just easier. People are complicated animals after all.

Other situations are more challenging.

For example, if there is a situation and you are unable to provide all of the information to the appropriate parties (maybe you don’t know, maybe its sensitive), then its entirely possible and likely that the people involved will fill in the blanks with their own assumptions. In this case there is little you can do other than be as clear as possible that this part of the picture is fuzzy and unclear, and to ensure that as soon as you know more, they know more. The quicker you learn and disseminate information, the better. It leaves less time for assumptions to fester.

The last case is quite possibly the most painful.

Sometimes you set expectations, that through no fault of your own, end up being wrong.

Expect The Unexpected

To be clear, if you can do anything at all to prevent incorrect expectations being set, you should do it. Expectations that are incorrectly set and then not met are easily one of the most damaging things to the professional happiness of a person, and can result in all sorts of other negative side effects like loss of trust (which is horrible), loss of motivation and a growing desire to be somewhere else.

A growing desire to be somewhere else is a dangerous thing. In comparison, it is normal and healthy for your people to keep themselves well informed about the job market and opportunities available to them, and you should do what you can to encourage that behaviour. Remember, you serve the overlapping interests of your people and your organization, but if push comes to shove, your people come first. This doesn’t mean that you are encouraging people to leave; quite the contrary, you want them to stay because they want to stay, not because they can’t go anywhere else.

If the worst happens and the wrong expectations have been set for some reason, then you should do everything in your power to ensure that those expectations are met.

I mean, obviously if the expectations are ridiculous then you’ve clearly screwed up, and you need to have a much longer, harder look at your own behaviour as a leader. There will be damage in this case, and you will have to do your best to mitigate it.

If the expectations are reasonable though, and your actions lead to them being set (miscommunication, poorly worded statements, communicating information that you believed to be true but wasn’t), then you should move heaven and earth to follow through and hold true to your word.

Even if it costs you personally.

Consider it the price paid for a valuable lesson learned.

Conclusion

At the end of the day, as a leader, you need to be acutely aware of the impact that you can have as a result of words that come out of your mouth.

Do not underestimate the damage that you can cause as a result of incorrectly set expectations not being met.

Its trite, but with great power comes great responsibility.

“Great power” is probably overselling it though, but the phrase “with mediocre power comes great responsibility” just doesn’t have the same ring to it.

0 Comments

We have a lot of logs.

Its mostly my fault to be honest. It was only a few years ago that I learned about log aggregation, and once you have an ELK stack, everything looks like an structured log event formatted as JSON.

We aggregate a wealth of information into our log stack these days, including, but not limited to:

Now, if I had my way, we would keep everything forever. My dream would be to be able to ask the question “What did our aggregate API traffic look like over the last 12 months?”

Unfortunately, I can’t keep the raw data forever.

But I might be able to keep a part of it.

Storage Space

Storage space is pretty cheap these days, especially in AWS. In the Asia Pacific region, we pay $US 0.12 per GB per month for a stock standard, non-provisioned IOPS ELB volume.

Our ELK stack accumulates gigabytes of data every day though, and trying to store everything for all of eternity can add up pretty quickly. This gets even more complicated by the nature of Elasticsearch, because it likes to keep replicas of things just in case a node explodes, so you actually need more storage space than you think in order to account for redundancy.

In the end we somewhat randomly decided to keep a bit more than a months worth of data (40 days), which gives us the capability to reliably support our products, and to have a decent window for viewing business intelligence and usage. We have a scheduled task in TeamCity that leverages Curator to remove data as appropriate.

Now, a little more than a month is a pretty long time.

But I want more.

In For The Long Haul

In any data set, you are likely to find patterns that emerge over a much longer period than a month.

A good example would be something like daily active users. This is the sort of trend that is likely to show itself over months or years, especially for a relatively stable product. Unless you’ve done something extreme of course, in which case we might get a meaningful trend over a much shorter period.

Ignoring the extremes, we have all the raw data required to calculate the metric, we’re just not keeping it. If we had some way of summarising it into a smaller data set though, we can keep it for a much longer period. Maybe some sort of mechanism to do some calculations and store the resulting derivation somewhere safe?

The simplest approach is some sort of script or application that runs on a schedule and uses the existing data in the ELK stack to create and store new documents, preferably back into the ELK stack. If we want to ensure those new documents don’t get deleted by Curator, all we have to do is put them into different indexes (as Curator is only cleaning up indexes prefixed with logstash).

Seems simple enough.

Generator X

For once it actually was simple enough.

At some point in the past we actually implemented a variation of this idea, where we calculated some metrics from a database (yup, that database) and stored them in an Elasticsearch instance for later use.

Architecturally, the metric generator was a small C# command line application scheduled for daily execution through TeamCity, so nothing particularly complicated.

We ended up decommissioning those particular metrics (because it turned out they were useless) and disabling the scheduled task, but the framework already existed to do at least half of what I wanted to do; the part relating to generating documents and storing them in Elasticsearch. All I had to do was extend it to query a different data source (Elasticsearch) and generate a different set of metrics documents for indexing.

So that’s exactly what I did.

The only complicated part was figuring out how to query Elasticsearch from .NET, which as you can see from the following metrics generation class, can be quite a journey.

public class ElasticsearchDailyDistinctUsersDbQuery : IDailyDistinctUsersDbQuery
{
    public ElasticsearchDailyDistinctUsersDbQuery
    (
        SourceElasticsearchUrlSetting sourceElasticsearch,
        IElasticClientFactory factory,
        IClock clock,
        IMetricEngineVersionResolver version
    )
    {
        _sourceElasticsearch = sourceElasticsearch;
        _clock = clock;
        _version = version;
        _client = factory.Create(sourceElasticsearch.Value);
    }

    private const string _indexPattern = "logstash-*";

    private readonly SourceElasticsearchUrlSetting _sourceElasticsearch;
    private readonly IClock _clock;
    private readonly IMetricEngineVersionResolver _version;

    private readonly IElasticClient _client;

    public IEnumerable<DailyDistinctUsersMetric> Run(DateTimeOffset parameters)
    {
        var start = parameters - parameters.TimeOfDay;
        var end = start.AddDays(1);

        var result = _client.Search<object>
        (
            s => s
                .Index(_indexPattern)
                .AllTypes()
                .Query
                (
                    q => q
                        .Bool
                        (
                            b => b
                                .Must(m => m.QueryString(a => a.Query("Application:GenericSoftwareName AND Event.Name:SomeLoginEvent").AnalyzeWildcard(true)))
                                .Must(m => m
                                    .DateRange
                                    (
                                        d => d
                                            .Field("@timestamp")
                                            .GreaterThanOrEquals(DateMath.Anchored(start.ToUniversalTime().DateTime))
                                            .LessThan(DateMath.Anchored(end.ToUniversalTime().DateTime))
                                    )
                                )
                        )
                )
                .Aggregations(a => a
                    .Cardinality
                    (
                        "DistinctUsers", 
                        c => c.Field("SomeUniqueUserIdentifier")
                    )
                )
        );

        var agg = result.Aggs.Cardinality("DistinctUsers");

        return new[]
        {
            new DailyDistinctUsersMetric(start)
            {
                count = Convert.ToInt32(agg.Value),
                generated_at = _clock.UtcNow,
                source = $"{_sourceElasticsearch.Value}/{_indexPattern}",
                generator_version = _version.ResolveVersion().ToString()
            }
        };
    }
}

Conclusion

The concept of calculating some aggregated values from our logging data and keeping them separately has been in my own personal backlog for a while now, so it was nice to have a chance to dig into it in earnest.

It was even nicer to be able to build on top of an existing component, as it would have taken me far longer if I had to put everything together from scratch. I think its a testament to the quality of our development process that even this relatively unimportant component was originally built following solid software engineering practices, and has plenty of automated tests, dependency injection and so on. It made refactoring it and turning it towards a slightly different purpose much easier.

Now all I have to do is wait months while the longer term data slowly accumulates.

0 Comments

A very long time ago I wrote a post on this blog about interceptors.

The idea behind an interceptor is pretty straight forward; dynamically create a wrapper around some interface or class to either augment or alter its behaviour when used by the rest of the system, without actually having to implement the interface or override the class. For example, my original post was for an interceptor that slowed down requests heading to an API, simulating how that particular application would feel for a user with terrible internet.

I honestly haven’t touched the concept since, until recently that is.

I wanted to add some logging around usage of a third party API from our legacy application and conceptually the problem seemed like a perfect application for another interceptor. A quick implementation of a logging interceptor injected via Ninject and I’d have all the logging I could ever want, without having to mess around too much.

Reality had other ideas though.

Here’s Why I Came Here Tonight

Our legacy software is at that time in its life where it mostly just gets integrations. Its pretty feature complete as far as core functionality goes, so until the day we finally grant it Finis Rerum and it can rest, we look to add value to our users by integrating with third party services.

The most recent foray in this space integrated a payment provider into the software, which is quite useful considering its core value proposition is trust account management. From a technical point of view, the payment provider has an API and we wrote a client to access that API, with structured request and response objects. Pretty standard stuff.

As part of our development, we included various log events that allowed us to track the behaviour of parts of the system, mostly so that we could more easily support the application and get accurate metrics and feedback from users in the wild relating to performance. This is all well and good, but those events generally cover off combined parts of the application logic; for example, an operation that queries the local DB and then augments that information by calling into the third party API to display a screen to the user.

This makes it relatively easy to see when any users are experiencing performance issues, but it makes it hard to see whether or not the local DB, the actual programming logic or the internet call was the root cause.

An improvement to this would be to also log any outgoing API requests and their responses, along with the execution time. With that information we would be able to either blame or absolve the clients internet connection when it comes to performance questions.

Now, I’m an extremely lazy developer, so while we have a nice interface that I could implement to accomplish this (i.e. some sort of LoggingPaymentProviderClient), its got like twenty methods and I really don’t have the time, patience or motivation for that. Also, while its unlikely that the interface would change all that much over time, its still something of a maintenance nightmare.

Interceptors to the rescue!

But I Got The Feeling That Something Ain’t Right

As I explained in my original post all those years ago, the IInterceptor interface supplied by the Castle library allows you to implement logic via a proxy and slot it seamlessly into a call stack. Its usage is made easier by the presence of good dependency injection, but its definitely not required at all.

Thus enters the logging interceptor.

public class PaymentProviderClientMethodExecutionLoggingInterceptor : IInterceptor
{
    public PaymentProviderClientMethodExecutionLoggingInterceptor(ILogger logger)
    {
        _logger = logger;
    }

    private readonly ILogger _logger;

    public void Intercept(IInvocation invocation)
    {
        var stopwatch = Stopwatch.StartNew();
        try
        {
            invocation.Proceed();
            stopwatch.Stop();

            var log = new PaymentProviderMethodCallCompleted(invocation, stopwatch.Elapsed);
            _logger.Information(log);
        }
        catch (Exception ex)
        {
            stopwatch.Stop();

            var log = new PaymentProviderMethodCallFailed(invocation, stopwatch.Elapsed, ex);
            _logger.Warning(log);

            throw;
        }
    }
}

Its not an overly complicated class, and while its written to be specific, its actually quite generic.

Given a proxied class, all methods called on that class will be logged via Serilog, with the method name, its parameters and its return value (the structured logging being provided by the dedicated event classes).

Nothing ever works the first time though, and while I’m constantly being reminded of that, I’m always hopeful all the same. Denial is a powerful motivator.

The problem was that the IInterceptor interface is old enough that it doesn’t consider the existence of asynchronous methods. It does exactly what it says on the tin, starts a timer, proceeds with the method invocation and then because the method is asynchronous, immediately logs an event with the wrong execution time and no return value.

It has no idea that it has to wait for the invocation to complete because it thinks everything is synchronous.

Clowns To The Left Of Me, Jokers To The Right

This is where everything is going to get a little bit fuzzier than I would like, because I wrote this blog post before I had a working solution.

From what I can tell, the situation is quite complex.

The simplest solution appears to be to leverage the existing interface and simply check for the presence of a Task (or Task<T>) return value. If detected, append a continuation to that Task to perform the desired functionality. For me this would be a continuation on both faulted and success (and maybe cancelled?) that would log the completion of the method. It seems like it would work, but I do have some concerns about the scheduling of the continuation and how that makes the code harder to reason about.

Luckily, someone has already written a reusable library together that allows for asynchronous interceptors via a slightly different interface.

This is attractive because its code that I don’t have to write (remember, I’m lazy), but it not being built into the core Castle library does make me question its legitimacy. Surely if it was that critical the maintainers would have updated Castle.Core?

Regardless, I explored using the library first, but in order to use it I had to go on an adventure to upgrade a bunch of our Nuget dependencies (because it relied on the latest version of Castle.Core), which meant updates to Castle, Ninject and Ninject’s extension libraries. This caused knock on effects because the Ninject.MockingKernel.NSubstitute library was not available for .NET 4.5 (even though all the others were), so I had to temporarily copy that particular implementation into our codebase.

Once everything was compiling, a full test run showed some failing tests that weren’t failing before the library upgrades, so I kind of stopped there.

For now.

Conclusion

Unfortunately this is one of those blog posts that comes off feeling a little hollow. I didn’t actually get to my objective (seamless per method logging for a third-party dependency), but I did learn a lot on the way, so I think it was still useful to write about.

Probably should have waited a little longer though, I jumped the gun a bit.

Its not the only time in recent memory that asynchronous behaviour has made things more difficult than they would have been otherwise. In an unrelated matter, some of our automated tests have been flakey recently, and the core of the issue seems to be asynchronous behaviour that is either a.) not being correctly overridden to be synchronous for testing purposes or b.) not correctly being waited for before tests proceed.

Its not hard to write tests that are purely async of course (NUnit supports tests marked with the async keyword), but when you’re testing a view model and the commands are “synchronous” it gets a bit more challenging.

That’s probably a topic for another day though.