All Left Out

June 14. 2016 0 Comments

We have the following simple entities, representing a fairly basic property management system.

public class Contact
{
    public int Id { get; set; }
}

public class ContactLink
{
    public int Id { get; set; }
    public int ContactId { get; set; }
    public string Type { get; set; }
    public int LinkedId { get; set; }
}

public class Owner
{
    public int Id { get; set; }
}

public class Tenant
{
    public int Id { get; set; }
    public int? PropertyId { get; set; }
}

public class Property
{
    public int Id { get; set; }
    public int? OwnerId { get; set; }
}

Each Property has zero or one Owners, but an Owner can be linked to multiple Contacts.

A Property can be Tenanted, and each Tenant can be linked to multiple Contacts.

The ContactLink class represents multiple arbitrary links to Contacts, and is how both the Owner and Tenant to Contact relationships are specified.

This model was not created with Entity Framework in mind, which unfortunately is something that you have to deal with when working on a legacy system. The EF model was put in place after many years of the database being the closest thing the software had to a domain model. The models that I’ve listed above are obviously not complete. They contain many more fields than those displayed, and have fields for segmentation because the database is multi-tenant. Anything not specified here is not useful for the purposes of illustrating the point of post.

We need to answer a relatively simple question.

“Show me the Properties that are related to a Contact”

The goal is to answer that question while leaving as much of the work as possible up to the database. So ideally one call and minimal post processing.

Give Me All The Relationships

First things first, a test to ensure that our query works the way we expect it to. Make a database, seed it, run the query, validate the right number of results. Very straightforward.

public void WhenAContactIsRelatedToPropertiesThroughBothOwnerAndTenantRelationships_TheQueryReturnsTheCorrectNumberOfRows()
{
    using (var db = GetDbContext())
    {
        var contactId = RandomId();

        // Seed some data into the database. We use Builder objects and extension methods on DataSets to do
        // seeding, allowing us to fill a previously empty database with all the information we need for a test.

        // Specifically for this test, seed:
        // 1.) 1 Contact with the specified Id.
        // 2.) 1 Owner.
        // 3.) 2 Properties. One owned by the Owner that was just created. 
        // 4.) 1 Tenant, for the property with no Owner.
        // 5.) 2 ContactLinks, linking the Contact to the Owner and Tenant.

        var target = new QueryEncapsulation();

        var results = target.Execute(db, contactId);
        
        results.Count().Should().Be(2);
    }
}

With the test written, all that’s left is to implement the query itself.

public class QueryEncapsulation
{
    public IEnumerable<LinkedProperty> Execute(DbContext db)
    {
        return (from c in db.Contacts
            .Where(c => c.Id == specifiedId)
        from cl in db.ContactLinks
            .Where(cl => cl.ContactId == c.Id)
        from o in db.Owners
            .Where(o => o.Id == cl.LinkedId && cl.Type == "owner")
            .DefaultIfEmpty()
        from t in db.Tenants
            .Where(t => t.Id == cl.LinkedId && cl.Type == "tenant")
            .DefaultIfEmpty()
        from p in db.Properties
            .Where(p => p.OwnerId == o.Id || t.PropertyId == p.Id)
        select new LinkedProperty(o, t, p))
        .ToArray();
    }
}

Not super complicated, in fact it looks fairly similar to the query that you might put together in SQL. Join a bunch of tables together and then filter them as appropriate so only the valid links are included.

The usage of the DefaultIfEmptymethod is the mechanism by which you accomplish Left Outer Joins in Linq. This is important because even though each Owner and Tenant map to a single property, as a result of the way the relationships are structured, and the direction we are starting from (the Contact) each “row” returned by this query will only contain either an Owner or a Tenant, never both. Without the DefaultIfEmpty, the query wouldn’t return anything.

Of course, the test fails.

Failure Is Just A Stepping Stone To Greatness

Instead of getting 2 results (one Property linked via the Owner and one Property linked via the Creditor) the query returns 3 results.

The Property linked to the Tenant is coming back twice. Once for the correct link (to the Tenant) and then again for the Owner, which is nonsense.

As far as I understand, this combination is still included because a completely unspecified Owner can be linked to a null Owner Id on the Property.

Intuitively, p.OwnerId == o.Id feels like it should throw a NullReferenceException when o is null. I’m still not exactly sure I understand why this is a valid result, but I know if I make a similar query in SQL using Left Outer Joins, that it will work the same way. Essentially I’m not being specific enough in my exclusion criteria and null is indeed equal to null.

The fix?

Add a check on the Property condition that checks for Owner being null.

The final query we’re left with is:

public class QueryEncapsulation
{
    public IEnumerable<LinkedProperty> Execute(DbContext db)
    {
        return (from c in db.Contacts
            .Where(c => c.Id == specifiedId)
        from cl in db.ContactLinks
            .Where(cl => cl.ContactId == c.Id)
        from o in db.Owners
            .Where(o => o.Id == cl.LinkedId && cl.Type == "owner")
            .DefaultIfEmpty()
        from t in db.Tenants
            .Where(t => t.Id == cl.LinkedId && cl.Type == "tenant")
            .DefaultIfEmpty()
        from p in db.Properties
            .Where(p => (o != null && p.OwnerId == o.Id) || t.PropertyId == p.Id)
        select new LinkedProperty(o, t, p))
        .ToArray();
    }
}

Summary

Honestly, I didn’t expect this sort of behaviour from EF, but that’s not surprising. EF is an incredibly complicated beast and I’m barely familiar with the shallow end.

To try and understand the results, I had to build up the set by hand and account for all of the potential combinations, combined with whether or not it made sense for them to be included in the final results. Doing this helped to identify the fact that fields on what should be a null object created via DefaultIfEmpty seem to be nullable even though the underlying type on the model is not.

An interesting finding at least and a good example of how sometimes its important to understand your queries conceptually.

Especially if they are giving weird results.