lukeplant.me.uk

Handling Django’s get_query_set rename is hard

Posted in: Python, Django

Handling the rename of get_query_set to get_queryset in Django 1.6 is very tricky - much trickier than it might seem.

For many 3rd party libraries, you will have a few aims:

  1. Maintain compatibility with older Django versions, and older projects (including libraries) that might call your code, or subclass it.
  2. Update your code to be compatible with latest versions of Django
  3. Stop getting warnings when running your code/tests. Warnings you’ve already addressed are a pain, as they drown out the ones you haven’t addressed.

Subclassing makes this much, much harder.

Suppose you have:

class FooManager(models.Manager):

    def active(self):
        return self.get_query_set().filter(active=True)

    def get_query_set(self):
        return FooQuerySet(model=self.model)

How do you update this code?

The tricky bit is that you need to consider a class that subclasses FooManager, which you don’t control. That code might not be updated yet.

A very important example is the RelatedManager class in Django, which dynamically subclasses the default manager on a model to provide access to related objects. You use this all the time, in ORM calls like person.schools.all() — roughly speaking schools returns a dynamic subclass of School.objects. In Django 1.5, this subclass overrides the get_query_set method so that it calls super.get_query_set (e.g. School.objects.get_query_set) and then adds some filters to limit to the particular related object it is attached to.

You certainly need compatibility with this if you want to still support Django 1.5. But there could be other examples of things that subclass FooManager explicitly and are outside your control. Let’s represent all these subclasses as follows:

class FooManagerSubClass(FooManager):

    def get_query_set(self):
        return super(FooManagerSubClass, self).get_query_set().order_by('bar')

Now, imagine an instance of FooManagerSubClass, and calling the active method on it. The call look like:

FooManager.active ->
FooManagerSubClass.get_query_set ->
FooManager.get_query_set

So all the defined behaviours combine as they should.

Now you come to update FooManager so that it will still work with Django 1.5, but works on Django 1.6 and later without issuing warnings if everyone is using get_queryset.

How should you rewrite it?

First pass:

class FooManager(models.Manager):

    def get_queryset(self):
        return FooQuerySet(model=self.model)

    get_query_set = get_queryset  # Django 1.5 compat

    def active(self):
        return self.get_queryset().filter(active=True)

This looks correct, but FooManagerSubClass().active() is now broken, because calling it runs:

FooManager.active ->
FooManager.get_queryset

— that is, it misses out FooManagerSubClass.get_query_set, because active is calling get_queryset, and the subclass knows nothing about that, it’s only overridden the get_query_set method.

(This type of failure is particularly nasty, because the failure is silent. It applies to related managers, which are going to be broken in a nasty way - they no longer have filtering applied, so person.schools.all() returns all schools, not just the ones related to person. And you probably won’t have an explicit unit test for this).

So, now you need something like this:

class FooManager(models.Manager):

    def get_queryset(self):
        return FooQuerySet(model=self.model)

    get_query_set = get_queryset

    def active(self):
        m = (self.get_query_set
             if hasattr(self, 'get_query_set')
             else self.get_queryset)
        return m().filter(active=True)

But now you now get warnings emitted when active() is called in some cases, even on Django 1.6 and later - in particular, you get it when you call it via a RelatedManager on Django 1.6 and later, due to the way that Django sticks get_query_set wrappers on manager subclasses that don’t define get_query_set explicitly (such as RelatedManager) - via RenamedMethodsBase.

Also, if you reverse the conditional so that get_queryset is called in preference to get_query_set, then active() breaks as above, so you can’t do that.

This is trickier than it seems!

Due to the subclassing issue, I can’t find a way, even with modifying Django’s behaviour, to satisfy all the constraints, at least if you want warnings to be emitted when they need to be, and silenced when you’ve fixed your code. I’ve updated the Django docs to my best ideas so far.

Update, 2015-07-08:

With Django 1.8 in the mix, it gets much worse.

Django 1.8 removes the get_query_set compatiblity shim, and this method is no longer defined on its RelatedManager classes. Remember that these RelatedManager classes will be created by Django and will subclass whatever manager you defined on your model. It’s possible to end up with a stack like this:

  1. Base class - Django 1.8 Manager class.
  2. Manager sub class that uses the above pattern for compatiblity with Django 1.5, and therefore defines a get_query_set method (e.g. SomeUsefulManager)
  3. Another manager sub class that uses the above pattern for compatiblity with Django 1.5, (e.g. FooManager as defined above).
  4. Django 1.8 RelatedManager class, which does not use the above pattern or provide a get_query_set method.

In this case, suppose you call the .active() on the entire stack. This resolves to FooManager.active() method, which ends up calling SomeUsefulManager.get_query_set -> SomeUsefulManager.get_queryset -> Manager.get_queryset, thus missing out RelatedManager.get_queryset. The result is, as above, a major bug where queries (including updates and deletes) that are supposed to be limited to a related object apply to all records.

To fix this, you need to change the pattern so that if we are on Django 1.8, we do not provide the get_query_set method to our subclasses (but we do still call it on our super-classes). You can’t monkey-patch your way out of this, by the way, because RelatedManager classes are created dynamically. This fix also must be applied to all managers that are attempting compatibility with Django 1.5 to 1.8:

class FooManager(models.Manager):

    def get_queryset(self):
        return FooQuerySet(model=self.model)

    if django.VERSION < (1, 6):
         get_query_set = get_queryset

    def active(self):
        m = (self.get_query_set
             if hasattr(self, 'get_query_set')
             else self.get_queryset)
        return m().filter(active=True)

Working out what the conditional should be is tricky. The above is the pattern I decided to use in the Django 1.6 release notes, which only defines the method if you are on Django 1.5. An alternative would be to define it all the way up to Django 1.7, but you get problems either way.

The above pattern will only work correctly on Django 1.6 and up if all the classes in the stack follow this pattern. Otherwise you will get methods being missed out. So, despite the fact that Django 1.6 had compatibility shims for code that might be slow to upgrade, you can’t rely on that, and will get potentially serious bugs if you don’t upgrade your entire stack to cope with this.

Comments §

...loading...
blog comments powered by Disqus