Test factory functions in Django

by Luke Plant

Posted in:

— November 25, 2022 16:07

When writing tests for Django projects, you typically need to create quite a lot of instances of database model objects. This page documents the patterns I recommend, and the ones I don’t.

Before I get going, I should mention that a lot of this can be avoided altogether if you can separate out database independent logic from your models. But you can only go so far without serious contortions, and you’ll probably still need to write a fair number of tests that hit the database.

The aim

We want the following:

Every test should specify each detail about database state it depends on
The test should not specify any detail it doesn’t depend on
We should be able to conveniently and succinctly write “what we mean”, without having to worry about lower level details, especially database schema details that are not intrinsic to the test.

These things are important so that you can understand tests in isolation, and so that changes not relevant to a test should not break that test. Otherwise you will spend a lot of your time fixing broken tests rather than actually doing the changes you need to do.

Using Django ORM create calls directly in your tests is not a great solution, because database constraints often force you to specify fields that you are not interested in.

Custom factory functions

The answer to this is simply to create your own “factory” functions, with optional keyword arguments (preferably keyword only) for almost everything. You can add parameters by hand as and when you need them.

Here are some simple but real examples from the Christian Camps in Wales booking system, which has a BookingAccount model and includes the ability to pay by cheque which is a ManualPayment object:

def create_booking_account(
    *,
    name: str = "A Booker",
    address_line1: str = "",
    address_post_code: str = "XYZ",
    email: str = Auto,
) -> BookingAccount:
    return BookingAccount.objects.create(
        name=name,
        email=email or next(BOOKING_ACCOUNT_EMAIL_SEQUENCE),
        address_line1=address_line1,
        address_post_code=address_post_code,
    )

def create_manual_payment(
    *,
    account: BookingAccount = Auto,
    amount: int = 1,
) -> ManualPayment:
    return ManualPayment.objects.create(
        account=account or create_booking_account(),
        amount=amount,
        payment_type=ManualPaymentType.CHEQUE,
    )

You can find the rest of this project’s test factory functions with this search on GitHub

A few patterns to note:

The Auto sentinel

A number of places here we used a default value of Auto, which is a custom object defined as follows:

class _Auto:
    """
    Sentinel value indicating an automatic default will be used.
    """

    def __bool__(self):
        # Allow `Auto` to be used like `None` or `False` in boolean expressions
        return False


Auto: Any = _Auto()

We use Auto instead of None or something else, because:

Sometimes you need to specify None as an actual value (for nullable DB fields), but not want it as a default.
Often the correct default needs to be defined dynamically:
- you need to create another object at runtime, as in the account: BookingAccount = Auto line above
- a sensible and correct default depends on some other argument, so requires some logic in the body of the function.

We create a singleton value Auto so we can do if foo is Auto checks.

We also give it a type Any so that type checkers don’t complain about using it as a default value. It doesn’t break type checking for the functions calling our factory functions.

Constraints and sequences

Often you have the problem that a unique constraint on a field makes it difficult to provide a static default. As in the example above, I’m using a really simple technique to deal with this – generate a sequence of values that are unlikely to be specified manually in a test. In the above code, you can see BOOKING_ACCOUNT_EMAIL_SEQUENCE which is defined like this at the module level:

BOOKING_ACCOUNT_EMAIL_SEQUENCE = sequence(lambda n: f"booker_{n}@example.com")

Every time we call next() on this object, we get a distinct value, so we avoid issues with constraints.

The sequence utility is actually super simple, but presented here in all it’s type-hinted glory:

import itertools
from typing import Any, Callable, Generator, TypeVar

T = TypeVar("T")


def sequence(func: Callable[[int], T]) -> Generator[T, None, None]:
    """
    Generates a sequence of values from a sequence of integers starting at zero,
    passed through the callable, which must take an integer argument.
    """
    return (func(n) for n in itertools.count())

You could do something even simpler though – just use a generator expression at the top level:

BOOKING_ACCOUNT_EMAIL_SEQUENCE = (f"booker_{n}@example.com" for n in itertools.count())

There can be some cases where you need something more complicated than this (for example to be able to reset sequences) but they are rare in my experience and fairly easy to write [1].

Delegation and sub-objects

Factory functions often delegate to other factory functions, as in the examples above.

It’s also quite common to want to specify something about a sub-object. Rather than build up a tree of objects as the caller, I often add a parameter to the top-level factory itself. This gives you some independence from the actual schema.

Special purpose factories

You aren’t limited to one factory function per model, you can have as many as you like. For example you might have create_staff_user and create_customer which take different parameters, but both happen to return the same User model.

Sensible and minimal defaults

As far as possible, the factory function should pick sensible defaults, based on what parameters were passed in if any. If it can’t because the caller contradicted themselves, it should raise an exception.

I normally take the approach that the defaults should produce minimal and pristine objects, while being complete and usable.

For example, if your model supports soft-delete via deactivation, active=False would be a bad default. On the other hand, creating lots of related objects in order to be “realistic” would not be a good idea.

You should be pragmatic. For example, for a User object, if a brand new, “pristine” user is always forced to go through an on-boarding flow on your website, meaning that every single page but the on-boarding page is blocked until they complete it, then has_onboarded=True is probably a more sensible default – only a few of your tests will want has_onboarded=False.

In many cases, your main business logic may already have functions that initialise database objects into sensible states when creating them, or when changing their states. Test factory functions will often delegate to them, so that things are set up as close as possible to how they would be normally.

Simplified interface

A good factory function will often simplify things for the caller.

For example, in the CCiW project mentioned, the Camp model has a leaders relationship, which is a many-to-many. For several good reasons, the leaders are not User objects, but Person objects, where Person has some metadata and another many-to-many (!) with User objects. However, when I’m writing a test, I might want to be able to say something like:

user = create_user()
camp = create_camp(leader=user)
login(user)

Here, I just care that the user is conceptually the leader of the camp. I don’t care:

that a camp can have more than one leader
that the Camp is actually related to the User object via a Person object.

Sometimes I don’t care about specifying who the leader actually is, just that there is one, so I might want to pass leader=True.

My factory function ends up looking like this:

def create_camp(
    *,
    leader: Person | User | bool = Auto,
    leaders: list[Person | User] = Auto,
) -> Camp:
    ...

It’s redundant, but it’s easy to use, and this approach means you isolate many of your tests from needing changing. Sometimes my factory functions end up having a lot of parameters, and they’re unlikely to win any beauty contests — but who really cares? They are easy to understand and modify.

Type hints

Type hints are great for getting good help in your editor when writing tests. Use them!

Don’t depend on defaults

If a test requires a certain value, and it happens to be the default that the factory will use, the test should still specify it. This makes the test more robust, and allows the factory to change the defaults. If a test doesn’t specify it, it means it doesn’t care, and it should work with any value the factory happens to choose.

Enhancements

If you are using pytest (which I recommend, along with pytest-django), Haki Benita has nice post that explains how to use factory functions as pytest fixtures.

What not to do

Now for the anti-patterns. If you’re happy with the answer above, you don’t need to read this bit.

JSON/YAML fixtures

Django docs used to encourage you to define models in JSON/YAML fixtures for use in tests. Don’t do that! I’ll let Carl Meyer tell you why.

There are some legitimate cases for using these kinds of fixtures in tests – in particular, where you might use the same/similar fixture files for loading data in a production environment. This is typically when you have essentially static data that is defined by some external reality, which happens to be stored in a database table in your app – such as a list of countries and their ISO codes.

`**kwargs`

When writing factory functions, rather than adding loads of parameters, it may be tempting to just let them accept **kwargs and pass those on to the underlying model. I usually prefer not to do that, because:

you get much less help when writing tests
you tend to end up overly tied to the actual schema

django-dynamic-fixture

I used to use django-dynamic-fixture to avoid the tedium of manual factory functions, but have since moved away from that. You are just introducing a layer between yourself and the code that you actually need to write, and have to stop it from doing things you don’t want etc. It also doesn’t understand the “business logic” needed to come up with sensible defaults.

factory_boy

OK, factory_boy, this is like my comments for django-dynamic-fixture, only more so.

Let me put it this way:

You’ve been tasked with providing a procedure for creating model instances, where that procedure will have sensible defaults, but will allow the caller to override them. You have to decide what are the appropriate language features of Python to use. Do you:

Create a function or a method, with parameters for overriding defaults, or,
Define a new class that inherits from Factory, and use the body of the class statement to define a procedure?

If you chose A), congratulations, you got the right answer! You will be rewarded for using the language as it was meant to be used, by things like:

Automatic help inside your editor, both for the parameters and the returned value.
Static type checking if you want it.
Everyone being able to modify your code without looking up some documentation.

If you chose B), you get points for novelty. But you will be punished as follows:

You will have to invent things like:
- nested class Meta for essential configuration of FactoryOptions
- nested class Params
- Trait
- PostGeneration
- @post_generation
- LazyAttribute
- @lazy_attribute
- @lazy_attribute_sequence
- LazyFunction
- SubFactory
- RelatedFactory
- SelfAttribute
- a debug mode (of course)
- and much, much more!
You will have to write thousands of lines of code (1700+), thousands more of tests (5000+), and page after page of documentation (16,000+ words) to support all this.
You will have to get people to read that documentation. Instead of which, they will spend their evenings writing snarky blog posts complaining about all your hard work!
You will have an Open Source side project with hundreds of open issues, fun!
You will get less than zero help from your editor when using these factories – not only will it just display **kwargs for inputs, it will think the output is a Factory instance, which it is not.
For people to find what parameters they can pass to a Factory, they will have to look up the model, and inspect the Factory definition and decipher its “traits” etc.

I don’t want to add any further to the burden of the authors – they have suffered enough already! But I do want to deal with a few objections:

But factory_boy can also create instances without saving them!

This is useful if you want to avoid hitting the DB while being able to test a model method that doesn’t need the DB. In Django, it’s extremely easy to do that without help, because if you aren’t going to save a model instance, you don’t need to worry about any attributes other than the ones you specify – models don’t run validation in the constructor – and so you don’t need factories at all:

def test_address_formatted():
    address = Address(line1="123 Main St", line2="London")
    assert address.formatted() == "123 Main St\nLondon\n")

If you really need it, you could always add a commit: bool = True parameter to your factory functions.

But factory_boy has faker integration!

If you want randomized and realistic looking data, you can use faker directly with almost exactly the same amount of code:

from faker import Faker

faker = Faker()

def create_user():
    return User.objects.create(
        name=faker.name(),
    )

But factory_boy has a create_batch method!

If you need to create a bunch of things, you can just do this:

payments = [create_manual_payment() for i in range(0, 100)]

which really isn’t very hard, and also means you can have arguments that vary depending on the loop variable.

But, because I’m very generous, I will write you a create_batch function for free. Not only that, I’ll add type hints for free, and I’ll leave it right here where you can find it, in the public domain:

from typing import Callable, TypeVar

T = TypeVar("T")


def create_batch(factory: Callable[..., T], count, /, **kwargs) -> list[T]:
    """
    Use `factory` callable to create `count` objects, passing along kwargs
    """
    return [factory(**kwargs) for i in range(0, count)]

Now you can do the following, and your editor and static type checker will know exactly what type of objects payment_1 and payment_2 are:

payment_1, payment_2 = create_batch(create_manual_payment, 2, amount=10)

Conclusion

You don’t need to install anything to create factory functions. Just use built-in language features, and maybe a few tiny helpers like I’ve shown, and you’re good!

The only real issue with my approach is that sometimes it can feel a bit tedious adding another parameter. But slightly tedious code that is extremely easy to understand and modify, and helps you in all the ways I’ve described, is still a big win in my book. There will be many days when you long for slightly tedious code that just works.

Happy testing!

Footnotes

[1]

Advanced sequences:

Sometimes, you might want to reset your sequences, and perhaps automatically between every test case. I would implement that as follows. Replace the previous sequence implementation with:

from __future__ import annotations
import itertools
from typing import Generic, Iterator, TypeVar


T = TypeVar("T")


class sequence(Generic[T]):
    instances: list[sequence] = []

    def __init__(self, func: Callable[[int], T]) -> None:
        self.func = func
        self.reset_sequence()
        self.instances.append(self)

    def reset_sequence(self):
        self.seq: Iterator[T] = (self.func(n) for n in itertools.count())

    def __next__(self) -> T:
        return next(self.seq)

To reset automatically between each test case, assuming use of pytest, add the following autouse fixture to conftest.py:

@pytest.fixture(autouse=True)
def reset_all_sequences():
    from myproject.factory_utils import sequence  # or wherever

    for instance in sequence.instances:
        instance.reset_sequence()

You may also like: §

Comments §