When writing tests for Django projects, you typically need to create quite a lot of instances of database model objects. This page documents the patterns I recommend, and the ones I don’t.
Before I get going, I should mention that a lot of this can be avoided altogether if you can separate out database independent logic from your models. But you can only go so far without serious contortions, and you’ll probably still need to write a fair number of tests that hit the database.
We want the following:
Every test should specify each detail about database state it depends on
The test should not specify any detail it doesn’t depend on
We should be able to conveniently and succinctly write “what we mean”, without having to worry about lower level details, especially database schema details that are not intrinsic to the test.
These things are important so that you can understand tests in isolation, and so that changes not relevant to a test should not break that test. Otherwise you will spend a lot of your time fixing broken tests rather than actually doing the changes you need to do.
Using Django ORM create calls directly in your tests is not a great solution, because database constraints often force you to specify fields that you are not interested in.
The answer to this is simply to create your own “factory” functions, with optional keyword arguments (preferably keyword only) for almost everything. You can add parameters by hand as and when you need them.
Here are some simple but real examples from the Christian Camps in Wales booking system, which has a
and includes the ability to pay by cheque which is a
def create_booking_account( *, name: str = "A Booker", address_line1: str = "", address_post_code: str = "XYZ", email: str = Auto, ) -> BookingAccount: return BookingAccount.objects.create( name=name, email=email or next(BOOKING_ACCOUNT_EMAIL_SEQUENCE), address_line1=address_line1, address_post_code=address_post_code, ) def create_manual_payment( *, account: BookingAccount = Auto, amount: int = 1, ) -> ManualPayment: return ManualPayment.objects.create( account=account or create_booking_account(), amount=amount, payment_type=ManualPaymentType.CHEQUE, )
You can find the rest of this project’s test factory functions with this search on GitHub
A few patterns to note:
A number of places here we used a default value of
Auto, which is a custom
object defined as follows:
Auto instead of
None or something else, because:
Sometimes you need to specify
Noneas an actual value (for nullable DB fields), but not want it as a default.
Often the correct default needs to be defined dynamically:
you need to create another object at runtime, as in the
account: BookingAccount = Autoline above
a sensible and correct default depends on some other argument, so requires some logic in the body of the function.
We create a singleton value
Auto so we can do
if foo is Auto checks.
We also give it a type
Any so that type checkers don’t complain about using
it as a default value. It doesn’t break type checking for the functions calling
our factory functions.
Often you have the problem that a unique constraint on a field makes it
difficult to provide a static default. As in the example above, I’m using a
really simple technique to deal with this – generate a sequence of values that
are unlikely to be specified manually in a test. In the above code, you can see
BOOKING_ACCOUNT_EMAIL_SEQUENCE which is defined like this at the module level:
Every time we call
next() on this object, we get a distinct value, so we avoid
issues with constraints.
sequence utility is actually super simple, but presented here in all
it’s type-hinted glory:
import itertools from typing import Any, Callable, Generator, TypeVar T = TypeVar("T") def sequence(func: Callable[[int], T]) -> Generator[T, None, None]: """ Generates a sequence of values from a sequence of integers starting at zero, passed through the callable, which must take an integer argument. """ return (func(n) for n in itertools.count())
You could do something even simpler though – just use a generator expression at the top level:
There can be some cases where you need something more complicated than this (for example to be able to reset sequences) but they are rare in my experience and fairly easy to write .
Factory functions often delegate to other factory functions, as in the examples above.
It’s also quite common to want to specify something about a sub-object. Rather than build up a tree of objects as the caller, I often add a parameter to the top-level factory itself. This gives you some independence from the actual schema.
You aren’t limited to one factory function per model, you can have as many as
you like. For example you might have
create_customer which take different parameters, but both happen to return
As far as possible, the factory function should pick sensible defaults, based on what parameters were passed in if any. If it can’t because the caller contradicted themselves, it should raise an exception.
I normally take the approach that the defaults should produce minimal and pristine objects, while being complete and usable.
For example, if your model supports soft-delete via deactivation,
active=False would be a bad default. On the other hand, creating lots of
related objects in order to be “realistic” would not be a good idea.
You should be pragmatic. For example, for a
User object, if a brand new,
“pristine” user is always forced to go through an on-boarding flow on your
website, meaning that every single page but the on-boarding page is blocked
until they complete it, then
has_onboarded=True is probably a more sensible
default – only a few of your tests will want
A good factory function will often simplify things for the caller.
For example, in the CCiW project mentioned, the
Camp model has a
relationship, which is a many-to-many. For several good reasons, the leaders are
User objects, but
Person objects, where
Person has some metadata
and another many-to-many (!) with
User objects. However, when I’m writing a
test, I might want to be able to say something like:
Here, I just care that the user is conceptually the leader of the camp. I don’t care:
that a camp can have more than one leader
Campis actually related to the
Userobject via a
Sometimes I don’t care about specifying who the leader actually is, just that
there is one, so I might want to pass
My factory function ends up looking like this:
It’s redundant, but it’s easy to use, and this approach means you isolate many of your tests from needing changing. Sometimes my factory functions end up having a lot of parameters, and they’re unlikely to win any beauty contests — but who really cares? They are easy to understand and modify.
Type hints are great for getting good help in your editor when writing tests. Use them!
If a test requires a certain value, and it happens to be the default that the factory will use, the test should still specify it. This makes the test more robust, and allows the factory to change the defaults. If a test doesn’t specify it, it means it doesn’t care, and it should work with any value the factory happens to choose.
Now for the anti-patterns. If you’re happy with the answer above, you don’t need to read this bit.
There are some legitimate cases for using these kinds of fixtures in tests – in particular, where you might use the same/similar fixture files for loading data in a production environment. This is typically when you have essentially static data that is defined by some external reality, which happens to be stored in a database table in your app – such as a list of countries and their ISO codes.
When writing factory functions, rather than adding loads of parameters, it may
be tempting to just let them accept
**kwargs and pass those on to the
underlying model. I usually prefer not to do that, because:
you get much less help when writing tests
you tend to end up overly tied to the actual schema
I used to use django-dynamic-fixture to avoid the tedium of manual factory functions, but have since moved away from that. You are just introducing a layer between yourself and the code that you actually need to write, and have to stop it from doing things you don’t want etc. It also doesn’t understand the “business logic” needed to come up with sensible defaults.
OK, factory_boy, this is like my comments for django-dynamic-fixture, only more so.
Let me put it this way:
You’ve been tasked with providing a procedure for creating model instances, where that procedure will have sensible defaults, but will allow the caller to override them. You have to decide what are the appropriate language features of Python to use. Do you:
Create a function or a method, with parameters for overriding defaults, or,
Define a new class that inherits from
Factory, and use the body of the class statement to define a procedure?
If you chose A), congratulations, you got the right answer! You will be rewarded for using the language as it was meant to be used, by things like:
Automatic help inside your editor, both for the parameters and the returned value.
Static type checking if you want it.
Everyone being able to modify your code without looking up some documentation.
If you chose B), you get points for novelty. But you will be punished as follows:
You will have to invent things like:
You will have to write thousands of lines of code (1700+), thousands more of tests (5000+), and page after page of documentation (16,000+ words) to support all this.
You will have to get people to read that documentation. Instead of which, they will spend their evenings writing snarky blog posts complaining about all your hard work!
You will have an Open Source side project with hundreds of open issues, fun!
You will get less than zero help from your editor when using these factories – not only will it just display
**kwargsfor inputs, it will think the output is a
Factoryinstance, which it is not.
For people to find what parameters they can pass to a
Factory, they will have to look up the model, and inspect the
Factorydefinition and decipher its “traits” etc.
I don’t want to add any further to the burden of the authors – they have suffered enough already! But I do want to deal with a few objections:
This is useful if you want to avoid hitting the DB while being able to test a model method that doesn’t need the DB. In Django, it’s extremely easy to do that without help, because if you aren’t going to save a model instance, you don’t need to worry about any attributes other than the ones you specify – models don’t run validation in the constructor – and so you don’t need factories at all:
If you really need it, you could always add a
commit: bool = True parameter to your factory functions.
If you want randomized and realistic looking data, you can use
directly with almost exactly the same amount of code:
If you need to create a bunch of things, you can just do this:
which really isn’t very hard, and also means you can have arguments that vary depending on the loop variable.
But, because I’m very generous, I will write you a
for free. Not only that, I’ll add type hints for free, and I’ll leave it
right here where you can find it, in the public domain:
Now you can do the following, and your editor and static type checker will know
exactly what type of objects
You don’t need to install anything to create factory functions. Just use built-in language features, and maybe a few tiny helpers like I’ve shown, and you’re good!
The only real issue with my approach is that sometimes it can feel a bit tedious adding another parameter. But slightly tedious code that is extremely easy to understand and modify, and helps you in all the ways I’ve described, is still a big win in my book. There will be many days when you long for slightly tedious code that just works.