0 Comments

Update:I wrote the code for the seeder below outside of a development environment. It doesn’t work. I’ll try to revisit this post at a later date if I get a chance to implement a seeder class, but I’m currently using the Seed extension methods to great effect. My apologies to anyone who finds this post and is surprised when it doesn’t work.

Entity Framework and ORM’s in general have come a long way. Entity Framework in particular is pretty amazing now, compared to where it was 5 years ago. It was around then that my team made the decision to use NHibernate as our ORM, instead of EF. EF has obviously matured a lot since then, and seems to be the default choice now when working in the .NET world.

I’ve made a couple of posts on this blog involving Entity Framework and some of my adventures with it, one on creating test databases leveraging a scratch MSSQL instance in AWS and another on using a different EF provider to allow for in-memory databases.

One of the great things about working with ORMs, is that your persisted data is just objects, which means you have far more control over it than you ever did before. No need to use SQL statements (or similar) to build up some test data, just create some objects, insert them and off you go.

This post is going to talk about a mechanism for creating those objects, specifically about the concept of seeding.

Also, obviously, all of the sub-titles will be puns based on seeds and seeding.

Just Tossing My Seed Around

Most of the time I see seeding functions built into the DbContext class. They are typically executed whenever the context is created, making sure that certain data is available.

To me this is a violation of the Single Responsibility Principle, because now you have a class that is responsible for both managing data access and for putting some subset of the data there in the first place. While this alone is definitely a good reason to have a dedicated seeder class, there are others as well:

  • If you have a hardcoded seed method inside your DbContext, its much harder to customise it to seed different data based on your current needs.
  • Commonly, seed implementations inside the DbContext are wasteful, always trying to seed into the database whenever you create a new DbContext. I’m in favour of using a DbContextFactory and creating a DbContext per operation (or at least per request), which can make the time spent dealing with seeding significant.

I find that the best way to think about seeding is to use the specification pattern (or at least the concept). You want to be able to create an object that describes how you want your data to look (or what capabilities you want your data to have), and then execute it. Let the object sort out the seeding as its dedicated function.

This works fairly well. Yu define a Seeder or DataSpecification class, and expose appropriate properties and methods on it to describe the data (like how many DbFoo entries do I want, how many DbBar entries, what they look like, etc). You implement a method that takes a DbContext of the appropriate type, and in that method you use the information supplied to create and save the appropriate entities.

If you follow this approach, you find that your Seeder can become very complicated very quickly, especially because its entire purpose is to be highly configurable. Its also responsible for knowing how to construct many different varieties of objects, which is another violation of SRP.

I find SRP to be a pretty good guideline for handling class complexity. If you think about the responsibilities that your class has, and it has more than a few, then those responsibilities either need to be very tightly coupled, such that you couldn’t reasonably pull them apart, or you should really consider having more than one class. The downside of SRP is that you tend to have quite a lot of small classes, which is another form of complexity. The upside is that you have a lot of small, composable, modular classes, which are extremely useful once you get over that initial complexity bump for having many many classes.

Ready For Round Two

I didn’t like that my Seeder class had detailed knowledge about how to construct the various entities available from the DbContext. Plus it was huge and hard to understand at a glance.

The next step was to split the logic for how to create an entity into classes dedicated to that. I tend to use the naming convention of XBuilder for this purpose, and they all look very similar:

using System;

public interface IEntityBuilder<TEntity>
{
    TEntity Build();
}

public class DbFoo
{
    public int Id { get; set; }
    public string Do { get; set; }
    public string Re { get; set; }
}

public class DbFooBuilder : IEntityBuilder<DbFoo>
{
    private string _Do = "bananas";
    private string _Re = "purple";

    public DbFooBuilder WithDo(string v)
    {
        _Do = v;
        return this;
    }

    public DbFoo Build()
    {
        return new DbFoo()
        {
            Do = _Do,
            Re = _Re
        };
    }
}

As you can see, the builder features somewhat fluent syntax (the WithX methods) allowing you to chain calls to customise the constructed entity, but has sane defaults for all of the various properties that matter.

The Faker.Net package is handy here, for generating company names, streets, etc. You can also simply generate random strings for whatever properties require it, but its generally much better to generate real looking data than completely nonsensical data.

With the additional of dedicated builders for entities, the Seeder looks a lot better, being mostly dedicated to the concept of “how many” of the various entities. It could be improved though, because its difficult to use the Seeder to specify a subset of entities that meet certain criteria (like generate 10 DbFoo’s with their Do property set to “bananas”, and 200 where its set to “apples”).

We can fix that by providing some additional methods on the Seeder that allow you to customise the builders being used, instead of just letting to Seeder create X number of them to fulfil its “number of entities” requirement.

public class Seeder
{
    private List<DbFooBuilder> FooBuilders = new List<DbFooBuilder>();

    public Seeder WithDbFoos<TEntityBuilder>(IEnumerable<TEntityBuilder> builders)
        where TEntityBuilder : IEntityBuilder<DbFoo>
    {
        FooBuilders.AddRange(builders);

        return this;
    }

    public Seeder WithDbFoos<TEntityBuilder>(int number, Action<TEntityBuilder> customise = null)
        where TEntityBuilder : IEntityBuilder<DbFoo>, new()
    {
        var builders = Enumerable.Range(0, number).Select(a => customise(new TEntityBuilder()));
        return WithDbFoos(builders);
    }

    public void Seed(DbContext db)
    {
        foreach (var builder in FooBuilders)
        {
            db.DbFoos.Add(builder.Build());
        }
        db.SaveContext();
    }
}

Much better and extremely flexible.

Bad Seed

I actually didn’t quite implement the Seeder as specified above, though I think its definitely a better model, and I will be implementing it in the near future.

Instead I implemented a series of builders for each of the entities I was interested in (just like above), and then wrote a generic Seed extension method for IDbSet:

using System;
using System.Collections.Generic;
using System.Data.Entity;
using System.Linq;

namespace Solavirum.Database.EF
{
    public IEntityBuilder<TEntity>
    {
        TEntity Build();
    }

    public static class SeedExtensions
    {
        private static Random _random = new Random();

        public static void Seed<TEntity, TBuilder>(this IDbSet<TEntity> set, Func<TBuilder, TBuilder> modifications = null, int number = 10)
            where TEntity : class
            where TBuilder : IEntityBuilder<TEntity>, new()
        {
            modifications = modifications ?? (a => a);

            for (int i = 0; i < number; i++)
            {
                var builder = new TBuilder();
                builder = modifications(builder);
                set.Add(builder.Build());
            }
        }

        public static T Random<T>(this IEnumerable<T> enumerable)
        {
            int index = _random.Next(0, enumerable.Count());
            return enumerable.ElementAt(index);
        }
    }
}

This is nice from a usability point of view, because I can seed any entity that has an appropriate builder, just by using the one method. The Random<T> method exists so I can get a random element out of a DbSet for linking purposes, if I need to (it was used in a method that I removed, dealing specifically with an entity with links to other entities).

What I don’t like about it:

  • Its difficult to supply dependencies to the seed method (unless you expose them in the method signature itself) because its inside a static class. This means supplying a logger of some description is hard.
  • The builders have to have parameterless constructors, again because its hard to supply dependencies. This isn’t so bad, because the builders are meant to be simple and easy to use, with sane default values.
  • Builders with dependencies on other entities (like a hypothetical DbFooBar class that has references to both a DbFoo and a DbBar) have to have their own Seed method in order to use entities that exist in the current DbContext. This isn’t a dealbreaker, but it does complicate things.

I think a well constructed Seeder class better encapsulates the concept, even though its nice to be able to just hit up a Seed method right off the IDbSet and have it all just work.

Conclusion

Being able to easily create data that meets certain criteria is an amazing tool when it comes to development and testing. Doing it in a provider agnostic way is even better, because you can push data into an in-memory database, an SQL database or a Postgres database, using the same code. In my opinion, the ability to sub out providers is one of the best parts of using an ORM (that and not having to deal with query/data manipulation languages directly).

Nothing that I’ve written above is particularly ground breaking, but its still very useful, and I highly recommend following the general strategy when working with a persistence layer via Entity Framework.

I hope you enjoyed all the terrible seed puns.

I regret nothing.