0 Comments

Yeah, sorry about the title. They can’t all be winners.

Anyway, I tracked down an interesting bug a few weeks ago, and I thought that it might be worthwhile to discuss it here, so that I can see it later when the memory fades. Also it might help someone else, which is nice, I guess.

The Problem

There was a web application. Some performance/load testing was being completed (for the first time, after development has “completed”, because who needs an Agile approach, right?) and the results showed that there was an unreasonable amount of failures during login. Something in the realm of 10% of all login attempts under a medium load would fail.

The Cause

At the root, the bug involved this class:

public class ExecutionEngine
{
    public string ConnectionString { get; set; }
    public SqlCommand Command { get; set; }
    public DataSet Result { get; set; }

    public void Execute()
    {
        var conn = new SqlConnection(ConnectionString);
        Command.Connection = conn;
        var adapter = new SqlDataAdapter(Command);

        Result = new DataSet();
        adapter.Fill(Result);
    }
}

Pretty terrible all things considered. Weird way to use SqlCommands and DataSets, but okay.

A typical usage of the ExecutionEngine class was as follows:

public class Demonstration
{
    public Demonstration(ExecutionEngine engine)
    {
        _Engine = engine;
    }

    private readonly ExecutionEngine _Engine;

    public IEnumerable<string> GetTheThingsForAUser(string userId)
    {
        _Engine.Command.CommandText = "GetAListOfAllTheThings";
        _Engine.Command.CommandType = CommandType.StoredProcedure;

        _Engine.Command.Parameters.Clear();
        _Engine.Command.Parameters.AddWithValue("UserId", userId);

        _Engine.Execute();

        var allTheThings = new List<string>();
        foreach (DataRow thing in _Engine.Result.Tables[0].Rows)
        {
            allTheThings.Add((string)thing["Name"]);
        }

        return allTheThings;
    }
}

There were a LOT of usages like the demonstration class above (100+), jammed into one SUPER-class called “DataAccessLayer”. This “DAL” was a dependency of the business classes, which were used by the rest of the system. An instance of a business class would be instantiated as needed, which in turn would resolve its dependencies (using Ninject) and then be used to service the incoming request.

Given that I’ve already spoiled the ending by mentioning threading in the title of this post, you can probably guess that there was a threading problem. Well, there was.

The ExecutionEngine class is obviously not thread-safe. At any point in time if you have one instance of this class being used on multiple threads, you could conceivably get some very strange results. Best case would be errors. Worst case would be someone else’s data!To illustrate:

  1. Thread A enters GetTheThingsForAUser
  2. Thread A sets the command text and type to the appropriate values.
  3. Thread B enters GetTheThingsForAUser
  4. Thread A clears the existing parameters and adds its User Id
  5. Thread B clears the parameters and adds its User Id
  6. Thread A executes, grabs the result and returns it. Thread A just returned the values for a completely different user that it asked for, but has given no indicationof this!

At the very least, the developer who created the class had thought about thread-safety (or someone had thought about it later).

public class DataAccessLayerModule : NinjectModule
{
    public override void Load()
    {
        Bind<ExecutionEngine>().ToSelf().InThreadScope();
    }
}

For those of you unfamiliar with the Thread scope, it ensures that there is one instance of the class instantiated per thread, created at the time of dependency resolution. It adds thread affinity to classes that don’t otherwise have it, but ONLY during construction.

At least there will be only one instance of this created per thread, and a single thread isn’t going to be jumping between multiple method executions (probably, I’m not sure to be honest) so at least the lack of thread safety might not be an issue.

This was, of course, a red herring. The lack of thread-safety was EXACTLY the issue. It took an embarrassingly large amount of time for me to track down the root cause. I debugged, watched the business objects being instantiated and then watched the execution engine being injected into them…with the correct thread affinity. Only the latest version of this web application was experiencing the issue, so it had to have been a relatively recent change (although this particular project did have quite a long and…storied history).

The root issue turned out to be the following:

public class DataAccessLayerModule : NinjectModule
{
    public override void Load()
    {
        // Some bindings.

        Bind<ExecutionEngine>().ToSelf().InThreadScope();

        // Some more bindings.

        Bind<Helper>().To<DefaultHelper>().InSingletonScope();
    }
}

See that second binding there, the one that’s a Singleton? It had a dependency on the ExecutionEngine. This of course threw a gigantic spanner in the works, as an instance of the ExecutionEngine was no longer only being used on one thread at a time, leaving it wide open to concurrency issues (which is exactly what was happening).

If you’re unfamiliar with the Singleton scope, it basically means that only one instance of the class is going to be instantiated in the application. This instance will then be re-used every time that dependency is requested.

At some point someone had refactored (that’s good!) one of the business classes (which were quite monolithic) and had extracted some of the functionality into that Helper class. Luckily this particular helper was only related to login, which explained why the failures only occurred during login in the load tests, so the impact was isolated.

The Solution

All of the business classes in the application were Transient scoped. This helper class was essentially a business class, but had been marked as Singleton for some reason. The simplest solution was to make it match the scoping of the other business classes and mark it as Transient too. This reduced the number of Login failures during the medium load test to 0 (yay!) which was good enough for the business.

The Better Solution

Of course, the code is still terrible underneath, and more subtle failures could still be lurking (can we be sure that every single time the ExecutionEngine is used that its only being used on the thread that it was created? not without adding thread affinity into the class itself), but you don’t always get time to fix underlying issues. As per my previous post, Champion you Code, normally I would fight pretty hard to fix the root cause of the problem (that goddamn ExecutionEngine). This time though…well the code had already been sold to someone who was going to develop it themselves and I wasn’t going to be working for that organisation for much longer, so I took the pragmatic approach and left it as it was. Granted, its essentially a tripwire for some future developer, which makes me sad, but you can’t always fix all the problems.

If given the opportunity I would probably change the way the ExecutionEngine is used at least, so that it isn’t as vulnerable to concurrency issues. The easiest way to do this would be to make the Execute method take a Command and return a DataSet, removing all of the state (except the connection string) from the class. That way it doesn’t matter how many threads attempt to Execute at the same time, they will all be isolated from each other. Not a small change, considering how many usages of the class in its current form existed, and the risk that that much change would introduce.

Summary

Singletons are dangerous. Well, I should say, Singletons that involve state in some way (either themselves, or with dependencies that involve state) are dangerous. If you go to mark something as being in Singleton scope, step away from the keyboard, go for a walk and think about it some more. There’s probably another way to do it. When using dependency injection its not always immediately obvious the impacts of making something a Singleton, so you have to be extremelycareful.