I’ve been using MVVM as a pattern for UI work for a while now, mostly because of WPF. Its a solid pattern and while I’ve not really delved into the publicly available frameworks (Prism, Caliburn.Micro, etc) I have put together a few reusable bits and pieces to make the journey easier.
One of those bits and pieces is the ability to perform work in the background, so that the UI remains responsive and usable while other important things are happening. This usually manifests as some sort of refresh or busy indicator on the screen after the user elects to do something complicated, but the important part is that the screen itself does not become unresponsive.
People get antsy when software “stops responding” and tend to murder it with extreme prejudice.
Now, the reusable components are by no means perfect, but they do get the job done.
Except when they don’t.
Right On Schedule
The framework itself is pretty bare bones stuff, with a few simple ICommand implementations and some view model base classes giving easy access to commonly desired functions.
The most complex part is the build in support to easily do background work in a view model while leaving the user experience responsive and communicative. The core idea is to segregate the stuff happening in the background from the stuff happening in the foreground (which is where all the WPF rendering and user interaction lives) using Tasks and TaskSchedulers from the TPL (Task Parallel Library), while also helping to manage some state to communicate what was happening to the user (like busy indicators).
Each view model is be responsible for executing some long running operation (probably started from a command), and then deciding what should happen when that operation succeeds, fails or is cancelled.
In order to support this segregation, the software takes a dependency on three separate task schedulers; one for the background (which is just a normal ThreadPoolTaskScheduler), one for the foreground (which is a DispatcherTaskScheduler or something similar) and one for tasks that needed to be scheduled on a regular basis (another ThreadPoolTaskScheduler).
This dependency injection allows for those schedulers to be overridden for testing purposes, so that they executed completely synchronously or could be pumped at will as necessary in tests.
It all worked pretty great until we started really pushing it hard.
Our newest component to use the framework did a huge amount of work in the background. Not only that, because of the way the interface was structured, it pretty much did all of the work at the same time (i.e. as soon as the screen was loaded), in order to give the user a better experience and minimise the total amount of time spent waiting.
From a technical standpoint, the component needed to hit both a local database (not a real problem) and a remote API (much much slower), both of which are prime candidates for background work due to their naturally slow nature. Not a lot of CPU intensive work though, mostly just DB and API calls.
With 6-10 different view models all doing work in the background, it quickly became apparent that we were getting some amount of contention for resources, as not all Tasks were being completed in a reasonable amount of time. Surprisingly hard to measure, but it looked like The Tasks manually scheduled via the TaskSchedulers were quite expensive to run, and the ThreadPoolTaskSchedulers could only run so much at the same time due to the limits on parallelization and the number of threads that they could have running at once.
So that sucked.
As a bonus annoyance, the framework did not lend itself to usage of async/await at all. It expected everything to be synchronous, where the “background” nature of the work was decided by virtue of where it was executed. Even the addition of one async function threw the whole thing into disarray, as it became harder to reason about where the work was actually being executed.
In the grand scheme of things, async/await is still relatively new (but not that new, it was made available in 2013 after all), but its generally considered a better and less resource intensive way to ensure that blocking calls (like HTTP requests, database IO, file IO and so on) are not causing both the system and the user to wait unnecessarily. As a result, more and more libraries are being built with async functions, sometimes not even exposing a synchronous version at all. Its somewhat difficult to make an async function synchronous to, especially if you want to avoid potential deadlocks.
With those limitations noted, we had to do something.
Why Not Both?
What we ended up doing was allowing for async functions to be used as part of the background work wrappers inside the base view models. This retained the managed “busy” indicator functionality and the general programming model that had been put into place (i.e. do work, do this on success, this on failure, etc).
Unfortunately what it also did was increase the overall complexity of the framework.
It was now much harder to reason about which context things were executing on, and while the usage of async functions was accounted for in the background work part of the framework, it was not accounted for in either the success or error paths.
This meant that is was all too easy to use an async function in the wrong context, causing a mixture of race conditions (where the overarching call wasn’t aware that part of itself was asynchronous) or bad error handling (where a developer had marked a function as async void to get around the compiler errors/warnings).
Don’t get me wrong, it all worked perfectly fine, assuming you knew to avoid all of the things that would make it break.
The tests got a lot more flaky though, because while its relatively easy to override TaskSchedulers with synchronous versions, its damn near impossible to force async functions to execute synchronously.
Here’s where it all gets pretty hypothetical, because the solution we actually have right now is the one that I just wrote about (the dual natured, overly complex abomination) and its causing problems on and off in a variety of ways.
A far better model is to incorporate async/await into the fabric of the framework, allowing for its direct usage and doing away entirely with the segmentation logic that I originally put together (with the TaskSchedulers and whatnot).
Stephen Cleary has some really good articles in MSDN magazine about this sort of stuff (being async ViewModels and supporting constructs), so I recommend reading them all if you’re interested.
At a high level, if we expose the fact that the background work is occurring asynchronously (view async commands and whatnot), then not only do we make it far easier to do work in the background (literally just use the standard async/await constructs), but it becomes far easier to handler errors in a reliable way, and the tests become easier too, because they can simply be async themselves (which all major unit testing frameworks support).
It does represent a significant refactor though, which is always a bit painful.
I’m honestly still not sure what the better approach is for this sort of thing
Async/await is so easy to use at first glance, but has a bunch of complexity and tripwires for the unwary. Its also something of an infection, where once you use it even a little bit, you kind of have to push it through everything in order for it to work properly end-to-end. This can be problematic for an existing system, where you want to introduce it a bit at a time.
On the other side, the raw TPL stuff that I put together is much more complex to use, but is relatively shallow. It much easier to reason about where work is actually happening and relatively trivial tocompletely change the nature of the application for testing purposes. Ironically enough, the ability to easily change from asynchronous background workers to a purely synchronous execution is actually detrimental in a way, because it means your tests aren’t really doing the same thing as your application will, which can mask issues.
My gut feel is to go with the newer thing, even though it feels a bit painful.
I think the pain is a natural response to something new though, so its likely to be a temporary thing.
Change is hard, you just have to push through it.