Announcing django-easyfilters (with some heresy on the side)

Posted in:

This post announces django-easyfilters, and follows that with some editorial generated by the learning experience of implementing the library.

django-easyfilters is a bit like the admin's list_filter and date_hierarchy features combined, but with the following differences:

  • it's intended for use outside the admin, with any QuerySet.

  • for each field, database-level aggregation is done to show the number of results for each option presented.

  • choosing multiple values from ManyToMany fields is supported.

  • the UI is, by default, a horizontal list of links. (You can provide your own template though).

  • the UI works like buttons that you toggle on and off, and when a button is 'on', all options that are no longer possible disappear.

  • the emphasis is on simplicity, and on users browsing to find things, rather than on admins finding an exhaustive set. So some things, like an option that does an is_null query, are not implemented.

You can see an example on my church's website. (view code, template code).

I have tried to put an emphasis on performance and limiting the number of queries - each row does as few queries as possible, I think between zero and three depending on some factors, and never O(n) in the number of items in the database or the number of options.

I've also tried hard to produce an API that is dead simple to use, automatically doing sensible things like using Field.choices and selecting an appropriate filter type etc., and works well with other things like pagination. It's a few lines of code to plug it into an existing list view.

So far, it doesn't support the normal join syntax like list_filter does, I'm not sure how hard that would be to add. The aggregation may make it tricky.

The docs are fairly complete given the state of the project. With some feedback, I'd like to make the internal APIs more concrete, allowing greater levels of customization and extension.

Now, for some editorial.

What Django gets right

The thing that makes this possible (and the features like list_display and date_hierarchy in the admin) is Django's choice of abstraction: a Django 'Model' is a rich description of a chunk of database schema.

With that definition, I'm pointing out that a Model is not:

  • a description of a single table – it goes beyond that, and includes things like many-to-many fields which can imply another whole table in a single line. This allows you to intelligently do queries on the intermediate table and display the right thing to the user.

  • a bare description – it includes user-oriented metadata about the Model, including things like choices and a convention regarding __unicode__ which means we can automatically do sensible things.

Where Django fails

As I implemented this library, I was writing this editorial in my head, and it slowly progressed from “Well done us!” to “I wish I was using SQLAlchemy”. In truth, I haven't actually used SQLAlchemy, I was just reading the docs out of interest recently, but I'm pretty certain it provides tools that would have been extremely useful for this library.

django-easyfilters puts stress on an ORM due to the need for aggregation – providing the totals for each option. For some of the field types, Django's ORM was fine (although it might not be immediately obvious how to get it to do the aggregation). For ManyToManyField, the ORM was stretched to the limit. For the queries I wanted to do for DateTimeField, the ORM was stretched way beyond the limit of what it could comfortably do.

In maintaining Django's DB code, I haven't really ventured below the level of the QuerySet since the 'queryset refactor' merge (done in changeset 7477). Malcolm Tredennick did some amazing work there, fixing at least 50 bugs by re-writing the guts of our SQL generation, and opening up new possibilities. Essentially, it introduced a new layer, which consists of:

  • the Query class (and subclasses) which builds up the queries that are produced by QuerySet in an abstract way, and

  • SQLCompiler (and subclasses) which provides templates for the basic SQL operations, such as SELECT, INSERT etc. and puts the queries together as strings.

However, in trying to do other, more advanced queries, I found that this layer really does not help you out. Query does one job: it builds the queries that QuerySet might want to run (which is what it was created to do). It does not provide tools for users to build advanced queries.

However, in a library like easyfilters, that's exactly what I needed. And I needed to start from where the ORM could get me, because I needed to be able to work with arbitrary QuerySets - a user of the library could easily pass in a QuerySet was already filtered on some related join etc, so I couldn't just write my own SQL generation.

Thankfully, the ORM does provide some tools for that - you can get the query attribute of a QuerySet, and start from there. That is what I did for the DateTimeFilter queries, but I had to write a Query subclass, a SqlCompiler subclass, and add some other nasty hacks to get what I needed out. I haven't implemented range-based aggregation for Decimal fields yet, which is on the TODO list, but I imagine I will need similar things for that too.

Of course, I'm also relying on undocumented internals. I've got good test coverage, so I'm reasonably well positioned to cope with changes there, but it's very far from ideal.

Where do we go from here?

So, what are the options for Django's ORM? Let's try some:

  1. We publicly document Query and make it suitable for external use.

    I don't think this is feasible at all. Query is just never going to be that kind of code. It is not suitable for public use, and would take massive effort to make it so. A brief analysis finds:

    • It is a monster class that is now nearly 2000 lines of code.

    • The constructor has over 40 assignments to self.

    • Virtually no unit tests. (It has plenty of test coverage, via tests for QuerySet, but no unit tests for its own functionality. It's very difficult to write them, because it doesn't have a well defined job, other than ‘produce the queries that QuerySet needs’).

    It is also hindered by the unreasonable things that QuerySet asks it to do e.g. extra() which is gross hack, operating at the wrong abstraction level, but is something we are stuck with because it was there before Query even existed.

  2. We live with it, and don't provide more advanced SQL tools.

    Part of Django's popularity is its simplicity and low barrier to entry. That was achieved, in part, by making the primary API the Model, and not focussing on the lower level stuff.

  3. We move to a proper SQL generation library for the lower levels.

    As soon as you say that, you ought to be thinking about SQLAlchemy. Having had a brief look at its rather impressive features, I'm seriously thinking we should consider, probably for Django 2.0, ripping out everything at least at the layer of Query and below, and rewriting it using SQLAlchemy.

    SQLAlchemy is already implemented in careful layers. The lower one, the SQL Expression Language provides a nice programmatic layer over SQL. The ORM was built on top of that expression language - unlike Django's ORM, where it would be more appropriate to say that the our foundation of query generation was constructed after the ORM that already sat on top of it!

    Obviously, switching to SQLAlchemy would be quite a lot of work, and could introduce a lot of backwards incompatibilities, especially in terms of anyone doing lower level stuff. I suspect that once you've started down that route, the only sensible way forward is that you also hand over everything to do with making DB connections etc. to SQLAlchemy. This in turn could mean a lot of rework for our test suite and for schema creation code etc. I don't know how much of this could be done in a piecemeal way.

    I'm not sure what the consequence would be for non-relational databases. It might make some things easier. The rewrite would force us to get rid of explicitly SQL-specific things in QuerySet, like the extra() method. But providing the option to do more advanced things on the SQL level might produce more libraries that rely explicitly on SQL (like django-easyfilters does), increasing the divide between the SQL and NoSQL world.

    However, perhaps we have already created a large divide between SQL and NoSQL by our extensive support for joins. In reality, with our current code you are unlikely to have an app where you could switch from a SQL DB to a non-relational one without having to do large changes to your code.

Thoughts?

Comments §

Comments should load when you scroll to here...