0 Comments

I’ve been using MVVM as a pattern for UI work for a while now, mostly because of WPF. Its a solid pattern and while I’ve not really delved into the publicly available frameworks (Prism, Caliburn.Micro, etc) I have put together a few reusable bits and pieces to make the journey easier.

One of those bits and pieces is the ability to perform work in the background, so that the UI remains responsive and usable while other important things are happening. This usually manifests as some sort of refresh or busy indicator on the screen after the user elects to do something complicated, but the important part is that the screen itself does not become unresponsive.

People get antsy when software “stops responding” and tend to murder it with extreme prejudice.

Now, the reusable components are by no means perfect, but they do get the job done.

Except when they don’t.

Right On Schedule

The framework itself is pretty bare bones stuff, with a few simple ICommand implementations and some view model base classes giving easy access to commonly desired functions.

The most complex part is the build in support to easily do background work in a view model while leaving the user experience responsive and communicative. The core idea is to segregate the stuff happening in the background from the stuff happening in the foreground (which is where all the WPF rendering and user interaction lives) using Tasks and TaskSchedulers from the TPL (Task Parallel Library), while also helping to manage some state to communicate what was happening to the user (like busy indicators).

Each view model is be responsible for executing some long running operation (probably started from a command), and then deciding what should happen when that operation succeeds, fails or is cancelled.

In order to support this segregation, the software takes a dependency on three separate task schedulers; one for the background (which is just a normal ThreadPoolTaskScheduler), one for the foreground (which is a DispatcherTaskScheduler or something similar) and one for tasks that needed to be scheduled on a regular basis (another ThreadPoolTaskScheduler).

This dependency injection allows for those schedulers to be overridden for testing purposes, so that they executed completely synchronously or could be pumped at will as necessary in tests.

It all worked pretty great until we started really pushing it hard.

Schedule Conflict

Our newest component to use the framework did a huge amount of work in the background. Not only that, because of the way the interface was structured, it pretty much did all of the work at the same time (i.e. as soon as the screen was loaded), in order to give the user a better experience and minimise the total amount of time spent waiting.

From a technical standpoint, the component needed to hit both a local database (not a real problem) and a remote API (much much slower), both of which are prime candidates for background work due to their naturally slow nature. Not a lot of CPU intensive work though, mostly just DB and API calls.

With 6-10 different view models all doing work in the background, it quickly became apparent that we were getting some amount of contention for resources, as not all Tasks were being completed in a reasonable amount of time. Surprisingly hard to measure, but it looked like The Tasks manually scheduled via the TaskSchedulers were quite expensive to run, and the ThreadPoolTaskSchedulers could only run so much at the same time due to the limits on parallelization and the number of threads that they could have running at once.

So that sucked.

As a bonus annoyance, the framework did not lend itself to usage of async/await at all. It expected everything to be synchronous, where the “background” nature of the work was decided by virtue of where it was executed. Even the addition of one async function threw the whole thing into disarray, as it became harder to reason about where the work was actually being executed.

In the grand scheme of things, async/await is still relatively new (but not that new, it was made available in 2013 after all), but its generally considered a better and less resource intensive way to ensure that blocking calls (like HTTP requests, database IO, file IO and so on) are not causing both the system and the user to wait unnecessarily. As a result, more and more libraries are being built with async functions, sometimes not even exposing a synchronous version at all. Its somewhat difficult to make an async function synchronous to, especially if you want to avoid potential deadlocks.

With those limitations noted, we had to do something.

Why Not Both?

What we ended up doing was allowing for async functions to be used as part of the background work wrappers inside the base view models. This retained the managed “busy” indicator functionality and the general programming model that had been put into place (i.e. do work, do this on success, this on failure, etc).

Unfortunately what it also did was increase the overall complexity of the framework.

It was now much harder to reason about which context things were executing on, and while the usage of async functions was accounted for in the background work part of the framework, it was not accounted for in either the success or error paths.

This meant that is was all too easy to use an async function in the wrong context, causing a mixture of race conditions (where the overarching call wasn’t aware that part of itself was asynchronous) or bad error handling (where a developer had marked a function as async void to get around the compiler errors/warnings).

Don’t get me wrong, it all worked perfectly fine, assuming you knew to avoid all of the things that would make it break.

The tests got a lot more flaky though, because while its relatively easy to override TaskSchedulers with synchronous versions, its damn near impossible to force async functions to execute synchronously.

Sole Survivor

Here’s where it all gets pretty hypothetical, because the solution we actually have right now is the one that I just wrote about (the dual natured, overly complex abomination) and its causing problems on and off in a variety of ways.

A far better model is to incorporate async/await into the fabric of the framework, allowing for its direct usage and doing away entirely with the segmentation logic that I originally put together (with the TaskSchedulers and whatnot).

Stephen Cleary has some really good articles in MSDN magazine about this sort of stuff (being async ViewModels and supporting constructs), so I recommend reading them all if you’re interested.

At a high level, if we expose the fact that the background work is occurring asynchronously (view async commands and whatnot), then not only do we make it far easier to do work in the background (literally just use the standard async/await constructs), but it becomes far easier to handler errors in a reliable way, and the tests become easier too, because they can simply be async themselves (which all major unit testing frameworks support).

It does represent a significant refactor though, which is always a bit painful.

Conclusion

I’m honestly still not sure what the better approach is for this sort of thing

Async/await is so easy to use at first glance, but has a bunch of complexity and tripwires for the unwary. Its also something of an infection, where once you use it even a little bit, you kind of have to push it through everything in order for it to work properly end-to-end. This can be problematic for an existing system, where you want to introduce it a bit at a time.

On the other side, the raw TPL stuff that I put together is much more complex to use, but is relatively shallow. It much easier to reason about where work is actually happening and relatively trivial tocompletely change the nature of the application for testing purposes. Ironically enough, the ability to easily change from asynchronous background workers to a purely synchronous execution is actually detrimental in a way, because it means your tests aren’t really doing the same thing as your application will, which can mask issues.

My gut feel is to go with the newer thing, even though it feels a bit painful.

I think the pain is a natural response to something new though, so its likely to be a temporary thing.

Change is hard, you just have to push through it.

0 Comments

A very long time ago I wrote a post on this blog about interceptors.

The idea behind an interceptor is pretty straight forward; dynamically create a wrapper around some interface or class to either augment or alter its behaviour when used by the rest of the system, without actually having to implement the interface or override the class. For example, my original post was for an interceptor that slowed down requests heading to an API, simulating how that particular application would feel for a user with terrible internet.

I honestly haven’t touched the concept since, until recently that is.

I wanted to add some logging around usage of a third party API from our legacy application and conceptually the problem seemed like a perfect application for another interceptor. A quick implementation of a logging interceptor injected via Ninject and I’d have all the logging I could ever want, without having to mess around too much.

Reality had other ideas though.

Here’s Why I Came Here Tonight

Our legacy software is at that time in its life where it mostly just gets integrations. Its pretty feature complete as far as core functionality goes, so until the day we finally grant it Finis Rerum and it can rest, we look to add value to our users by integrating with third party services.

The most recent foray in this space integrated a payment provider into the software, which is quite useful considering its core value proposition is trust account management. From a technical point of view, the payment provider has an API and we wrote a client to access that API, with structured request and response objects. Pretty standard stuff.

As part of our development, we included various log events that allowed us to track the behaviour of parts of the system, mostly so that we could more easily support the application and get accurate metrics and feedback from users in the wild relating to performance. This is all well and good, but those events generally cover off combined parts of the application logic; for example, an operation that queries the local DB and then augments that information by calling into the third party API to display a screen to the user.

This makes it relatively easy to see when any users are experiencing performance issues, but it makes it hard to see whether or not the local DB, the actual programming logic or the internet call was the root cause.

An improvement to this would be to also log any outgoing API requests and their responses, along with the execution time. With that information we would be able to either blame or absolve the clients internet connection when it comes to performance questions.

Now, I’m an extremely lazy developer, so while we have a nice interface that I could implement to accomplish this (i.e. some sort of LoggingPaymentProviderClient), its got like twenty methods and I really don’t have the time, patience or motivation for that. Also, while its unlikely that the interface would change all that much over time, its still something of a maintenance nightmare.

Interceptors to the rescue!

But I Got The Feeling That Something Ain’t Right

As I explained in my original post all those years ago, the IInterceptor interface supplied by the Castle library allows you to implement logic via a proxy and slot it seamlessly into a call stack. Its usage is made easier by the presence of good dependency injection, but its definitely not required at all.

Thus enters the logging interceptor.

public class PaymentProviderClientMethodExecutionLoggingInterceptor : IInterceptor
{
    public PaymentProviderClientMethodExecutionLoggingInterceptor(ILogger logger)
    {
        _logger = logger;
    }

    private readonly ILogger _logger;

    public void Intercept(IInvocation invocation)
    {
        var stopwatch = Stopwatch.StartNew();
        try
        {
            invocation.Proceed();
            stopwatch.Stop();

            var log = new PaymentProviderMethodCallCompleted(invocation, stopwatch.Elapsed);
            _logger.Information(log);
        }
        catch (Exception ex)
        {
            stopwatch.Stop();

            var log = new PaymentProviderMethodCallFailed(invocation, stopwatch.Elapsed, ex);
            _logger.Warning(log);

            throw;
        }
    }
}

Its not an overly complicated class, and while its written to be specific, its actually quite generic.

Given a proxied class, all methods called on that class will be logged via Serilog, with the method name, its parameters and its return value (the structured logging being provided by the dedicated event classes).

Nothing ever works the first time though, and while I’m constantly being reminded of that, I’m always hopeful all the same. Denial is a powerful motivator.

The problem was that the IInterceptor interface is old enough that it doesn’t consider the existence of asynchronous methods. It does exactly what it says on the tin, starts a timer, proceeds with the method invocation and then because the method is asynchronous, immediately logs an event with the wrong execution time and no return value.

It has no idea that it has to wait for the invocation to complete because it thinks everything is synchronous.

Clowns To The Left Of Me, Jokers To The Right

This is where everything is going to get a little bit fuzzier than I would like, because I wrote this blog post before I had a working solution.

From what I can tell, the situation is quite complex.

The simplest solution appears to be to leverage the existing interface and simply check for the presence of a Task (or Task<T>) return value. If detected, append a continuation to that Task to perform the desired functionality. For me this would be a continuation on both faulted and success (and maybe cancelled?) that would log the completion of the method. It seems like it would work, but I do have some concerns about the scheduling of the continuation and how that makes the code harder to reason about.

Luckily, someone has already written a reusable library together that allows for asynchronous interceptors via a slightly different interface.

This is attractive because its code that I don’t have to write (remember, I’m lazy), but it not being built into the core Castle library does make me question its legitimacy. Surely if it was that critical the maintainers would have updated Castle.Core?

Regardless, I explored using the library first, but in order to use it I had to go on an adventure to upgrade a bunch of our Nuget dependencies (because it relied on the latest version of Castle.Core), which meant updates to Castle, Ninject and Ninject’s extension libraries. This caused knock on effects because the Ninject.MockingKernel.NSubstitute library was not available for .NET 4.5 (even though all the others were), so I had to temporarily copy that particular implementation into our codebase.

Once everything was compiling, a full test run showed some failing tests that weren’t failing before the library upgrades, so I kind of stopped there.

For now.

Conclusion

Unfortunately this is one of those blog posts that comes off feeling a little hollow. I didn’t actually get to my objective (seamless per method logging for a third-party dependency), but I did learn a lot on the way, so I think it was still useful to write about.

Probably should have waited a little longer though, I jumped the gun a bit.

Its not the only time in recent memory that asynchronous behaviour has made things more difficult than they would have been otherwise. In an unrelated matter, some of our automated tests have been flakey recently, and the core of the issue seems to be asynchronous behaviour that is either a.) not being correctly overridden to be synchronous for testing purposes or b.) not correctly being waited for before tests proceed.

Its not hard to write tests that are purely async of course (NUnit supports tests marked with the async keyword), but when you’re testing a view model and the commands are “synchronous” it gets a bit more challenging.

That’s probably a topic for another day though.