3 Ways to Refactor EF Linq Queries w/o Killing Performance
Last week I was shocked to discover that refactoring Entity Framework LINQ queries for readability or reusability by extracting a method can quietly swap a query out of SQL and into in-memory processing and kill performance. Fortunately there are a couple of tidy, but subtle solutions.
Enumeration<problem>
Here's a simplified version of my problem.
private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
var usersQuery = Users.Where(u =>
(filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate) &&
u.Name.StartsWith(prefix));
return await usersQuery.ToListAsync();
}
I had a site-wide filtering object supplied by the front-end, but then I needed to do something else specific to the task at hand like the .StartsWith(). Then elsewhere I needed something very similar:
private async Task<List<User>> GetUsersWithoutRoles(IMainFilterDto filter)
{
var usersQuery = AbpUsers.Include(i => i.Roles).Where(u =>
(filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate) &&
!u.Roles.Any()
);
return usersQuery.ToList();
}
Uch. The common code between the two isn't DRY and feels awful. If I ever needed to change it, perhaps by replacing > with >= I'd have to track down all the places with that code. I was tempted to extract it like this:
private bool ApplyMainFilter(IMainFilterDto filter, User u)
{
return (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate);
}
And use it like this:
private async Task<List<User>> GetUsersWithoutRoles(IMainFilterDto filter)
{
var usersQuery = Users.Where(u =>
ApplyMainFilter(filter, u) &&
u.Name.StartsWith(prefix));
That certainly reads better. And when I tested it, it returns the exact same results. Sadly, when I ran it through LINQPad the original query (where filter has a non-null start date but null end date) turns from this:
SELECT [stuff] FROM [Users] AS [u] WHERE ([u].[CreationTime] > @__filter_StartDate_0) AND (([u].[Name] LIKE @__prefix_1 + N'%' AND (LEFT([u].[Name], LEN(@__prefix_1)) = @__prefix_1)) OR (@__prefix_1 = N'')) into: SELECT [stuff] FROM [Users] AS [u]WHERE ([u].[Name] LIKE @__prefix_1 + N'%' AND (LEFT([u].[Name], LEN(@__prefix_1)) = @__prefix_1)) OR (@__prefix_1 = N'')
It dropped out all the code in ApplyMainFilter()! That may not look terrible in this simple example, but imagine more complex scenarios. It could result in a lot more records returning from the database. It could create a network bottleneck or put excess strain on the middleware. Worst of all it could prevent the database from doing what it does best: use indexes to optimize query execution. This could mean bypassing existing indexes, preventing query optimization with future indexes, or reducing the effectiveness of performance recommendations in e.g. the Azure SQL database by hiding the problem from the database entirely.
Incidentally, if you'd prefer to see a video of the problem and solutions, check out Episode 22 of Code Hour:
return solution[0]
The solution turned out to be fairly easy once I identified the problem. Understanding how Entity Framework works internally helped. It's all about expression trees, which I've written about before (ok, I wrote that 11 years ago, but the fundamentals it describes are still solid).
Anticipating all the possible ways someone might pass arbitrary C# language into a where clause and turning it all into SQL is a hard problem. I needed to give Entity Framework a hand. One way to do that is to return a fully parseable expression tree like Expression<Func<User, bool> rather than just a bool. It looked like this:
private Expression<Func<User, bool>> GetMainFilterQuery(IMainFilterDto filter)
{
return u => (filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate);
}
Executed like this:
private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
var usersQuery = Users
.Where(GetMainFilterQuery(filter))
.Where(u => u.Name.StartsWith(prefix));
Isn't that an aesthetically pleasing solution? It's reusable, reads well, and converts to SQL.
But Wait, There's More
But, if you're up for reading further I thought I'd present one more more interesting option. If you're into flow style API's then an extension method approach may be perfect:
public static class QueryUtils
{
public static IQueryable AppendMainFilterQuery(
this IQueryable existingQuery, IMainFilterDto filter)
{
return existingQuery.Where(u => (
filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate));
}
}
Which is a little harder to read, but allows this:
private async Task<List<User>> GetUsersMatching(IMainFilterDto filter, string prefix)
{
var usersQuery = Users
.Where(u => u.Name.StartsWith(prefix))
.AppendMainFilterQuery(filter);
That reads nicely, is reusable, and like the 1st solution keeps the SQL exactly how it was initially.
OR LinqKit?
I ran all this by a smart co-worker who recommended I check out LinqKit in case I ever needed to do anything more complicated. Among other things LinqKit allows you to build up expressions across multiple methods. For instance if I needed an OR clause instead of an AND clause it might look like this:
private ExpressionStarter<User> GetMainFilterPredicate(IMainFilterDto filter)
{
var predicate = PredicateBuilder.New<User>().Start(u =>
(filter.StartDate == null || u.CreationTime > filter.StartDate) &&
(filter.EndDate == null || u.CreationTime <= filter.EndDate));
return predicate;
}
private Task> GetUsersMatching(IMainFilterDto filter, string prefix)
{
var predicate = GetMainFilterPredicate(filter);
predicate = predicate.Or(u => u.Name.StartsWith(prefix));
return Users.Where(predicate).ToListAsync();
}
Pretty nifty.
Summary
I like the 1st approach if I don't need anything more complex, but regardless, identifying how not to refactor LINQ queries is the important part. If you have any other creative solutions please share in the comments or hit me up on Twitter.