0 Comments

As part of the work we did recently to add search to one of our newer API’s, we leveraged the support that Npgsql has for constructing PostgreSQL Full Text Search queries via Linq. This was a vast improvement over our previous attempts to use the PostgreSQL Full Text Search, because we could combine it with sorting and filtering, and have everything apply right at the database level for maximum efficiency.

Internally, we use a Retriever class for each entity that we offer access to via the API. This class in turn leverages a number of different dependencies to provide for filtering, sorting and searching by modifying an IQueryable. For search, there is an IFullTextSearcher<TEntity> interface, which contains a single method Search, taking an IQueryable and the query and returning a modified IQueryable.

A searcher implementation for an entity called Entity looks like this:

public class EntityFullTextSearcher : IFullTextSearcher<Entity>
{
    private readonly IPostgresQueryBuilder _postgresQueryBuilder;

    public EntityFullTextSearcher(IPostgresQueryBuilder postgresQueryBuilder)
    {
        _postgresQueryBuilder = postgresQueryBuilder;
    }

    public IQueryable<EntityListProjectionModel> Search(IQueryable<EntityListProjectionModel> queryable, SearchQuery searchQuery)
    {
        var query = _postgresQueryBuilder.Build(searchQuery);
        if (string.IsNullOrEmpty(query)) return queryable;

        queryable = queryable.Where(b => NpgsqlTextFunctions.Match
        (
            NpgsqlTextFunctions.QueryOr(
                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldA), NpgsqlWeightLabel.A),
                NpgsqlTextFunctions.QueryOr(
                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldB), NpgsqlWeightLabel.A),
                    NpgsqlTextFunctions.QueryOr(
                        NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldC), NpgsqlWeightLabel.A),
                        NpgsqlTextFunctions.QueryOr(
                            NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldD), NpgsqlWeightLabel.B),
                            NpgsqlTextFunctions.QueryOr(
                                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldE), NpgsqlWeightLabel.C),
                                NpgsqlTextFunctions.QueryOr(
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldF), NpgsqlWeightLabel.C),
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldG), NpgsqlWeightLabel.C)
                                    )
                                )
                            )
                        )
                    )
                )
            ,
            NpgsqlTextFunctions.ToTsQuery(query))
        );

        return queryable;
    }
}

Pretty straightforward, but there is a massive inline lambda function that is somewhat hard to read.

Its A Frowny Face

The essence of that massive chunk of lambda is the definition of the fields that will be indexed during the search, along with the weight at which they will be indexed and how they are combined, which in our case, is just using OR. This Full Text Search definition is then combined with the Match operator. This is the way to get Full Text Search working via Linq, as I wrote about in my previous post on the subject.

If you look at the definition of these functions inside the Npgsql source code, you’ll see that their implementations all throw an InvalidOperationException. How do they even work?

The answer lies in the magic of expression trees and Entity Framework. The Where function on an IQueryable actually takes an Expression<Func<TEntity, bool>>, which this lambda function is supplying.

At no point is this code intended to be executed directly. You could try (i.e. by running this on top of a List or In Memory Database via Effort), but it would fail miserably, throwing an InvalidOperationException as expected.

However, when the IQueryable is passed through the Npgsql EF Provider, it is transformed into actual SQL which is then executed and the results interpreted accordingly. To be honest, I’ve never had to really dig into the whole EF provider concepts too deeply, and I don’t really want to. I’m happy enough that it works, and allows us to combine Full Text Search with arbitrary selection, filtering and sorting, leaving all of the hard work up to the database.

Back to the massive chunk of lambda though, at least its encapsulated inside this tiny class. Right?

Well, it was up until we needed to add sorting by relevance.

Maybe More Of A Grimace?

Have a look at this class and see if anything looks familiar.

public class EntitySearchRelevanceSorter : ISearchRelevanceSorter<EntityListProjectionModel>
{
    private readonly IPostgresQueryBuilder _postgresQueryBuilder;

    public EntitySearchRelevanceSorter(IPostgresQueryBuilder postgresQueryBuilder)
    {
        _postgresQueryBuilder = postgresQueryBuilder;
    }

    public IQueryable<EntityListProjectionModel> Sort(IQueryable<EntityListProjectionModel> queryable, SearchQuery searchQuery)
    {
        if(searchQuery == null || !searchQuery.Terms.Any()) return queryable;
        var query = _postgresQueryBuilder.Build(searchQuery);
        if (string.IsNullOrEmpty(query)) return queryable;

        return queryable.OrderByDescending(b => NpgsqlTextFunctions.TsRank(
               NpgsqlTextFunctions.QueryOr(
                   NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldA), NpgsqlWeightLabel.A),
                   NpgsqlTextFunctions.QueryOr(
                       NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldB), NpgsqlWeightLabel.A),
                       NpgsqlTextFunctions.QueryOr(
                           NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldC), NpgsqlWeightLabel.A),
                           NpgsqlTextFunctions.QueryOr(
                               NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldD), NpgsqlWeightLabel.B),
                               NpgsqlTextFunctions.QueryOr(
                                   NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldE), NpgsqlWeightLabel.C),
                                   NpgsqlTextFunctions.QueryOr(
                                       NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldF), NpgsqlWeightLabel.C),
                                       NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldG), NpgsqlWeightLabel.C)
                                       ))))))
               ,
               NpgsqlTextFunctions.ToTsQuery(query))
               );
    }
}

The lambda expression being used as the input for the TsRank function is exactly the same as the function used inside the searcher class.

It should be the same as well, because the relevance sorting should return the results in an order appropriate for how they were found via the search.

Now we have a problem though, because if the search definition changes, the relevance sorting will no longer make sense. The same is somewhat true for the reverse option (changing the sort order).

My first attempt at fixing this so that there was only a single place where the Full Text Search definition existed, was to simply extract the lambda function to a class of its own, and take a dependency on that class in both the searcher and sorter.

The definition class then looked like this:

public class EntitySearchDefinition
{
    public Expression<Func<Entity, string>> Definition
    {
        get
        {
            return b => NpgsqlTextFunctions.QueryOr( 
                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldA), NpgsqlWeightLabel.A), 
                NpgsqlTextFunctions.QueryOr( 
                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldB), NpgsqlWeightLabel.A), 
                    NpgsqlTextFunctions.QueryOr( 
                        NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldC), NpgsqlWeightLabel.A), 
                        NpgsqlTextFunctions.QueryOr( 
                            NpgsqlTextFunctions.SetWeightt(NpgsqlTextFunctions.ToTsVector(b.FieldD), NpgsqlWeightLabel.B), 
                            NpgsqlTextFunctions.QueryOr(
                                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldE), NpgsqlWeightLabel.C), 
                                NpgsqlTextFunctions.QueryOr( 
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldF), NpgsqlWeightLabel.C), 
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldG), NpgsqlWeightLabel.C) 
                                )
                            )
                        )
                    )
                )
            )
        }
    }
}

And the usage within the searcher looked like this:

public class EntityFullTextSearcher : IFullTextSearcher<Entity>
{
    private readonly IPostgresQueryBuilder _postgresQueryBuilder;
    private readonly EntitySearchDefinition _def;

    public EntityFullTextSearcher(IPostgresQueryBuilder postgresQueryBuilder, EntitySearchDefinition def)
    {
        _postgresQueryBuilder = postgresQueryBuilder;
        _def = def;
    }

    public IQueryable<EntityListProjectionModel> Search(IQueryable<EntityListProjectionModel> queryable, SearchQuery searchQuery)
    {
        var query = _postgresQueryBuilder.Build(searchQuery);
        if (string.IsNullOrEmpty(query)) return queryable;

        queryable = queryable.Where(b => NpgsqlTextFunctions.Match
        (
            def.Definition,
            NpgsqlTextFunctions.ToTsQuery(query))
        );

        return queryable;
    }
}

This won’t compile at all though.

Nope, Definitely Rage

The problem here is that the definition of NpgsqlTextFunctions.Match is bool Match(string, string), so from the compilers point of view it doesn’t take an expression like the one I extracted out to the other class. We can’t just call it here either, because that would defeat the entire point of constructing the Where so that it can be turned into SQL and executed against the database. We have to put together an expression tree containing the logic we want to be pushed to the database.

How can we do that and maintain some common definition of the Full Text Search that will be used by both search and sorting components?

Long story short, my definition class ended up looking like this:

public class EntitySearchDefinition
{
    private Expression<Func<Entity, string>> Definition
    {
        get
        {
            return b => NpgsqlTextFunctions.QueryOr( 
                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldA), NpgsqlWeightLabel.A), 
                NpgsqlTextFunctions.QueryOr( 
                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldB), NpgsqlWeightLabel.A), 
                    NpgsqlTextFunctions.QueryOr( 
                        NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldC), NpgsqlWeightLabel.A), 
                        NpgsqlTextFunctions.QueryOr( 
                            NpgsqlTextFunctions.SetWeightt(NpgsqlTextFunctions.ToTsVector(b.FieldD), NpgsqlWeightLabel.B), 
                            NpgsqlTextFunctions.QueryOr(
                                NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldE), NpgsqlWeightLabel.C), 
                                NpgsqlTextFunctions.QueryOr( 
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldF), NpgsqlWeightLabel.C), 
                                    NpgsqlTextFunctions.SetWeight(NpgsqlTextFunctions.ToTsVector(b.FieldG), NpgsqlWeightLabel.C) 
                                )
                            )
                        )
                    )
                )
            )
        }
    }

    private static MethodInfo TsRank()
    {
        return MethodInfoFromExpression<Func<string, string, float>>((a, b) => NpgsqlTextFunctions.TsRank(a, b));
    }

    private static MethodInfo ToTsQuery()
    {
        return MethodInfoFromExpression<Func<string, string>>(a => NpgsqlTextFunctions.ToTsQuery(a));
    }

    private MethodInfo Match()
    {
        return MethodInfoFromExpression<Func<string, string, bool>>((a, b) => NpgsqlTextFunctions.Match(a, b));
    }
    
    public Expression<Func<Entity, float>> CreateOrderBy(string query)
    {
        return MakeMethodCallExpression<float>(query, TsRank());
    }

    public Expression<Func<Entity, bool>> CreateWhere(string query)
    {
        return MakeMethodCallExpression<bool>(query, Match());
    }

    private Expression<Func<Entity, TOut>> MakeMethodCallExpression<TOut>(string query, MethodInfo methodInfo)
    {
        var p = Expression.Parameter(typeof(Entity));
        var queryExpression = CreateConstantExpressionForQuery(query);
        var definitionBody = Definition.Body;
        var replacer = new ExpressionReplacer(a => a.Exp == typeof(Entity), a => p);
        var visited = replacer.Visit<TEntity>(definitionBody);
        var expression = Expression.Lambda<Func<TEntity, TOut>>(Expression.Call(methodInfo, visited, queryExpression), new List<ParameterExpression> { p });
        return expression;
    }

    private static MethodCallExpression CreateConstantExpressionForQuery(string query)
    {
        var queryConstant = Expression.Constant(query);
        var queryExpression = Expression.Call(ToTsQuery(), queryConstant);
        return queryExpression;
    }
    
    private class ExpressionReplacer : ExpressionVisitor
    {
        private readonly Func<Expression, bool> _selector;
        private readonly Func<Expression, Expression> _replace;

        public ExpressionReplacer(Func<Expression, bool> selector, Func<Expression, Expression> replace)
        {
            _selector = selector;
            _replace = replace;
        }

        public override Expression Visit(Expression exp)
        {
            if (exp == null) return exp;
            if (_selector(exp)) exp = _replace(exp);
            return base.Visit(exp);
        }
    }
}

Yup, that’s a lot of code, but a lot of it is just utility methods for interacting with expression trees (specifically the parts that get method names without resorting to Reflection or magic strings and the expression visitor that mutates the expression tree for a purpose I’ll get to in a minute).

Using this definition class the body of the search method in EntityFullTextSearcher now looks like this:

public IQueryable<EntityListProjectionModel> Search(IQueryable<EntityListProjectionModel> queryable, SearchQuery searchQuery)
{
    var query = _postgresQueryBuilder.Build(searchQuery);
    if (string.IsNullOrEmpty(query)) return queryable;

    queryable = queryable.Where(b => NpgsqlTextFunctions.Match(def.CreateWhere(query));

    return queryable;
}

With the body of the sort method looking very very similar (calling CreateOrderBy instead).

But what does it all mean?

Expressing Myself

There are two important things to notice about the search definition class above.

The first is that it dynamically constructs the expression tree that would have previously been supplied by the inline lambda expression. You can see this construction inside the MakeMethodCallExpressionmethod. It creates a ParameterExpression (for the incoming Entity from the IQueryable), and then constructs a Call node that has the definition as one parameter, and the query as the other, just like how it was originally specified inline. This method is then used in the construction of the expression trees for Where and OrderBy.

The second important thing is the usage of the ExpressionReplacerclass. The lambda expression that contains the definition for the Full Text Search (the one returned by the Definition property) contains a parameter of type Entity. The lambda expression constructed by the CreateWhere and CreateOrderBy methods also contains a parameter of type Entity.

They are not the same thing.

In order for the inner expression parameter to be correctly bound to the same value as the outer expression parameter, it was necessary to walk the expression tree, find the inner parameter expression and replace it with the outer one. Assuming that it is unlikely there will be multiple parameter expressions throughout the tree of the same type that shouldn’t be bound to the same value, this approach will work just fine to consolidate all of the parameter expressions to the same eventual value.

The end result of the replacement is a new expression tree where all parameter expressions of type Entity are the same reference (there are only two, one inner and one outer).

Conclusion

Looking back at the effort that went into this, I’m not entirely sure it was worth it. I did accomplish my original goal (only one place where the Full Text Search definition was defined), but I’m not sure if the code is more understandable.

It felt good to separate the concerns, which makes it clear that the searcher and sorter classes are dependent on the same thing, and all of the complicated and hard to understand expression tree logic is encapsulated inside a single class, but I worry that it will be almost impenetrable to anyone who comes after me who doesn’t understand expression trees. Keep in mind, at this point, even after this entire adventure, I barely understand them. Me 3 months from now will probably look at the code, get confused, google part of it, find this blog post and then relive the entire affair.

On the upside, this did generate a significant increase in understanding the way that magic of Entity Framework works, and sparked quite a few discussions within the team that let to some interesting insights for all involved. It even spawned some other changes, where we can now use our own custom methods inside the expressions being passed into Entity Framework and then have them transformed appropriately to things that EF knows how to turn into SQL.

That’s a topic for another time though.