This post is about why
GenericForeignKey is usually something you should
stay away from. I haven't see any other articles describing why that is, or what
the alternatives are, so this is my attempt at “GenericForeignKey considered
Before I get going, I think that there are some legitimate cases, where most of the problems I'll highlight below aren't an issue. In particular, the following spring to mind:
generic auditing, where changes to DB rows are tracked in separate table – for this case, some of the disadvantages below are not so important, and might even be advantages (e.g. being able to refer to deleted rows),
generic tagging apps,
other generic applications where you have no real alternative, because you really don't know what models, or even how many different models, you might want to refer to.
However, I think there are many situations that don't fit the above, but where
people are tempted to use
You have a case where each object of a given model needs to be connected to one, and only one, of a known set of other models.
You are developing a generic app in which a model is designed to relate to one other model, but you don't know which model yet.
Most of this post is focused on the first of these situations, but I'll also address the second briefly. First, to make things easier to talk about, I'll introduce an example.
Our example application handles “tasks”. Tasks can be “owned” by either an
individual or a group – but not both. You might be tempted to use a
GenericForeignKey for this, such as below:
class Person(models.Model): name = models.CharField() class Group(models.Model): name = models.CharField() creator = models.ForeignKey(Person) # for a later example class Task(models.Model): description = models.CharField(max_length=200) # owner_id and owner_type are combined into the GenericForeignKey owner_id = models.PositiveIntegerField() owner_type = models.ForeignKey(ContentType, on_delete=models.PROTECT) # owner will be either a Person or a Group (or perhaps # another model we will add later): owner = GenericForeignKey('owner_type', 'owner_id')
In this case there are just two options for
owner, for simplicity, but most of
what follows will apply just as well if there are more than two.
Please be clear – the pattern above is what I'm NOT recommending! And here's why:
The database schema resulting from use of
GenericForeignKey is not great.
I've heard it said, “data matures like wine, application code matures like
fish”. Your database will likely outlast the application in its current
incarnation, so it would be nice if it makes sense on its own, without needing
the application code to understand what it is talking about.
(If this doesn't sound very convincing, you might still want to read this section – the things explained here are important for the rest of this post).
In general, helpfully named tables and columns (which Django produces), and
foreign key constraints (which Django also produces), make databases largely
GenericForeignKey breaks that.
For the above example, this is what your database looks like (using SQLite syntax, because that's what I'm using for the demo app for this post):
owner_id is just an integer – any integer – with no obvious way to work
out what it refers to.
owner_type_id is better – we get another table to
look at. This is what it looks like:
Taking a look at the contents of this table for my demo app:
With some good guesses, someone in the future looking at the data might be able to guess how this works, which is as follows:
gfks_task.owner_type_idrefers us to a row in
django_content_type(this is clear from the constraints).
By putting together the
modelfrom this row, we can work out the table name by adding underscores e.g. if
gfks_task.owner_type_id == 8, we need to look at the
(In fact this is incorrect. To do it correctly, we actually need to look at the model i.e. we need to import
gfks.models.Person, and look at its
._meta.db_tableattribute. This is a rather nasty little gotcha which will catch you out if the
Meta.db_tableattribute was set explicitly for a model, and means that we have a rather ugly dependence on being able to import our Python application in order to make sense of the database).
We now have a table name, in which we can look up the record whose PK matches the
There are some obvious things to comment on:
This is clearly much more complex than just doing a foreign key lookup to a table.
The above mechanism makes writing custom SQL to query this data much harder — the join condition has become very nasty because the table name itself has become a value that has to be calculated.
But worse than these is that the database schema no longer describes your data very well.
We have a big problem with referential integrity – namely, you have none.
This is perhaps the biggest and most important problem. The consistency and
integrity of data in a database is of first importance, and with
GenericForeignKey you lose out massively compared to database foreign keys.
owner_id is just an integer, it can have junk in there which means
it doesn't refer to any real data. This could happen if the field is manually
edited, or if the row it referred to is deleted, or if various other things
happen – things that your database will protect you from if you use a normal
A major issue with
GenericForeignKey is performance.
To get an object with its generic related object, we have to do multiple lookups:
Get the main object (e.g. a
ContentTypeobject that is pointed at by
Task.owner_type(this table is usually cached by Django).
ContentTypeobject we can find the model and therefore the table name.
Knowing the table name from part 3, and the object ID from part 1, we can get the related object.
This is a more complex and expensive process than a normal foreign key, and it also resists optimisation, especially when you are getting a batch of objects.
For a start, you cannot use
select_related, because that would require
knowing what table to join on. For
prefetch_related there is some limited
support. For example, you can do:
Django tries to be smart about this case and reduces the number of queries as much as it can. However, if, for example, you wanted to do:
then you will get an exception, because only
Group has the attribute
creator, and not
In addition, in my experience, usage of GFKs will generally make your Django
code worse, not better. It can be tempting to think that having a single
Task.owner attribute which behaves polymorphically is an attractive option,
but it soon breaks down.
First, filtering using the Django ORM works badly – the ORM cannot create joins to the right table, pushing the burden of doing DB level filtering onto you.
For example, if you want to get only tasks assigned to groups, and filter them further on their own, you can't do:
Instead, you have to do:
There are other more efficient options, but you need to be willing to get your hands dirty creating SQL joins manually.
Second, a polymorphic object rarely works out as nicely as it sounds. In my experience, you will very often have to branch on type:
…either in Python code, or in your templates, at which point it doesn't seem so neat any more. This is especially the case when the models you are pointing to aren't under your control, so it's harder to make them all have the same interface.
A necessary consequence of their design means that GFKs are just more awkward to deal with, and this is also reflected in the level of support that they have from other Django features:
By default, if you delete a
Person (the target object), for
example in the admin interface, or from code, the object that refers to it won't
be updated/deleted. The admin interface won't trace through
GenericForeignKeys that might refer to that object. You will simply be left
with corrupt data.
You can, however, add a GenericRelation
Person models, which will fix the ORM and admin to do
the deleting. But note that this is not the default, and is attempting to ensure
at the application level something that would be ensured at the database level
for a normal foreign key.
GenericForeignKey field, the admin will show you only what you would
owner_type_id – an integer field, and a content
type drop down, not very helpful. And yes, you can change the integer value to
anything, resulting in dangling rows i.e. corrupt data. There are some 3rd party
attempts to get a better interface e.g. see
And as mentioned above, objects referred to via GFKs don't (by default) get included in the “collect and display objects for deletion” logic of the Django admin delete page.
There are various other gotchas – they work badly with the admin's list filters for example, you'll be having to write extra code to support them, and they don't work nicely with ModelForms. You'll be having to patch up a lot of stuff at the interface level yourself.
Having hopefully persuaded you to find another solution, let's look at some of the options available.
This is perhaps the simplest solution. We make an
owner field for each type
of possible owner there is. That requires making the fields nullable, and doing
application level checks to ensure we have one-and-only-one not null in
It looks like this:
So we have restored proper foreign keys, and all the goodness that goes with
them. You will need to do
None checks when you access
owner_person, which you could wrap up like this if you wanted some of the
Similarly you'll also need to ensure that one and only one of the two fields is set when saving.
This has the disadvantage that at the schema level, unless you add a check
constraint, there is the possibility of an
Owner pointing to both a
Person and a
Group, which doesn't make sense. But this is much smaller than
the issues you have with
Here, we move the nullable FKs out to a new table, where they turn into one-to-one fields, and create a non-nullable FK on the first table. It looks like this:
This has some nice advantages – we now have an
Owner abstraction. If you
want to use
Task.owner polymorphically, you have a place to put the logic
that understands how to treat
Group differently, without
having to put it on
Group, which is especially useful if you
don't own those models, or want the logic to be kept separate. We've also got
one place that documents all the things that can be ‘owners’.
Further, if you come to need other things that use the same definition of
Owner, you will have a very easy implementation – just another FK to
Owner, which is much nicer than for alternative 1.
It still has the disadvantages of nullable fields, but having a dedicated
Owner model to deal with that issue feels much cleaner.
It also has few other disadvantages compared to the previous solution:
We have an extra table, increasing the number of joins required to get everything if we need it all at once.
We will need to ensure that an
Ownerrecord exists for each group/person that you want to link to. This could mean creating one when we create a group/person, or later. Also, setting the
Task.ownerfield correctly is going to take more work than in alternative 1 – this affects both code and things like default admin interface.
This starts with alternative 2, but moves the
OneToOneFields to the other
table, i.e. to the destination models. By doing so, they no longer need to be
class Owner(models.Model): pass class Person(models.Model): name = models.CharField() owner = models.OneToOneField(Owner, on_delete=models.CASCADE) class Group(models.Model): name = models.CharField() owner = models.OneToOneField(Owner, on_delete=models.CASCADE) creator = models.ForeignKey(Person) class Task(models.Model): description = models.CharField(max_length=200) owner = models.ForeignKey(Owner, on_delete=models.PROTECT)
Some notes, compared to alternative 2:
We no longer have any NULL foreign keys to worry about.
However, we are required to create rows in
Groupobjects. In addition, those rows might never be used, e.g. a group might never be used as an
This pattern requires modifying
For some access patterns this requires more queries (e.g. if you start with a
Taskand want to know which type of
Owneryou have, this will require more queries than alternative 2).
If you are aware of Django's multi-table inheritance,
you might recognise that alternative 3 above can be created in Django with less
code. Instead of explicit
Owner, we can make
Group inherit from
This will actually create a very similar database schema as above - Django adds
OneToOneField links for you. Apart from column name differences, the one
additional schema difference is that the
owner column will also be used as a
primary key (which could also be done manually for alternative 3 if you wanted,
although I wouldn't recommend it).
At the code level, it is very similar to alternative 3 as well, and in fact
simplifies some things significantly e.g. you don't need to manually create the
Owner objects. In addition, you now get polymorphism for free (ish) – since
Owner, it inherits its behaviour.
Personally I avoid using multi-table inheritance. One reason for this is because
I worry about the complexity of the inheritance mechanism Django uses. Secondly
there are performance concerns – having the
OneToOneFields explicit makes it
easier for me to be aware of joins and performance issues. Thirdly, Django
doesn't support multiple inheritance, so you can only use it once. In other
words, you are taking one “is-a” or “has-a” relationship (a Group is-a Owner and
a Person is-a Owner) and giving it special status and implementation (concrete
model inheritance), while all other similar relationships have to be dealt with
using other mechanisms. By contrast, alternatives 2 and 3 can be used as many
times as you want. My experience with OOP, real world business objects, and the
ever constant reality of ever changing requirements, is that you are better off
‘demoting’ all the relationships and implementing them all using composition
rather than inheritance.
For completeness, however, I've added this alternative, with the code outlined below:
Note that this is concrete model inheritance – you can't use
abstract = True
Owner table (thanks Airith).
Finally, there is the case of needing to link to a single but unknown model (for
example in a generic 3rd party app) for which a
GenericForeignKey is a
For this case, there are two approaches I know of:
Make your model abstract, and require users to inherit from it, adding the
ForeignKeyfield themselves. This can be a helpful pattern for other reasons, but can also get a bit unwieldy in some cases.
Use swappable models. Django actually has support for this, but at the time of writing it is officially for internal use only (i.e. for swapping out the
django.auth.contrib.Usermodel). However, Swapper is an unofficial attempt to create a public API for it, which looks to be well maintained. This looks like a better option than a GFK to me.
For all the above examples, I've created a repo: https://github.com/spookylukey/djangoadmintips/tree/master/generic_foreign_key_tests
All the examples are different apps within the same project.
It is bare bones – just for purposes of illustration. Not all things mentioned above are implemented.
In each case, the admin changelist for
Taskillustrates the typical N+1 (or worse) situation. In each case I've implemented
prefetch_relatedas well as possible. Using the Django debug toolbar you can see how successful that is – for the GFK case, not very.
You will also notice that the admin interfaces vary between the different alternatives. There will be ways to make all of them better, but they illustrate what you will get without much work.
If there are other strategies or corrections, please let me know – I intend to keep this page up to date as a reference.
2018-10-19 - Added Alternative 5