lukeplant.me.uk

Double-checked locking with Django ORM

Posted in: Django, Python

The double-checked locking pattern is one that is useful when:

  1. You need to restrict access to a certain resource to stop simultaneous processes from working on it at the same time.
  2. The locking patterns available to you have significant costs.

This post is about how we can implement this pattern in Django, using the ORM and database level locking features. The pattern applies if you are not using Django, or indeed any ORM, but I have only checked it for Django, and in fact only really verified it works as expected using PostgreSQL.

The situation

You have some database records that require ‘processing’ of some kind, but need to ensure that they are only processed once. For example, in your e-commerce system, you might want to send an email to your users when their order is shipped, but make sure you only send one. You’ve got something like this:

class Order(models.Model):
    shipped_at = models.DateTimeField(null=True)
    shipped_email_sent = models.BooleanField(default=False)

Email sending may take a little time, or fail, so we have some kind of background job to do this. We could use a job queue like Celery to do this and ensure that only one process at a time attempts to do the sending, but maybe we don’t have a queue like that, or maybe we don’t trust it to have been configured correctly etc. Or, maybe we know we only have one process doing this task, but need to ensure that other, different concurrent tasks that might lock the same rows are not interfered with.

We’re using shipped_email_sent to track whether we’ve sent the emails or not, but even if we filter on that, simultaneous processes launched at the same moment could end up sending emails twice, due to the delay between querying and updating the records. We could use select_for_update(), but want to avoid locking this important table anymore than absolutely necessary. What should we do?

Solution

I’ll present my solution, and then explain afterwards. But the notes are important — don’t just copy-paste this!

def send_pending_order_shipped_emails():
    orders_to_email = Order.objects.filter(
        shipped_at__isnull=False,
        shipped_email_sent=False,
    )

    for order in orders_to_email:
        with transaction.atomic():
            for order in (orders_to_email
                          .filter(id=order.id)
                          .select_for_update(of='self', skip_locked=True)):
                send_shipped_email(order)
                order.shipped_email_sent = True
                order.save()

Explanation

  1. This whole block of code should be executed outside an atomic block.
  2. Note that the outer orders_to_email block is therefore outside a transaction, and uses no locking. So, if this query return no results, the whole process does just a single read query, without locking, and exits. This is great for performance and avoiding contention on the DB.
  3. If there are items to process, we then start an atomic transaction block.
  4. Since the first query was outside a transaction, another process may have beaten us to it, we have to do another query to make sure the record still matches the criteria — the inner query.
  5. This time we use SELECT FOR UPDATE, so that no other process will be able to enter this block for this row at the same time.
  6. We use skip_locked so that if someone else does beat us to it, we just skip the record and try the next one.
  7. We set a flag that ensures that this record will no longer be found by the orders_to_email query.

The result is that we guarantee each record gets processed at most once.

Notes

  • I’ve only checked this with PostgreSQL and the default isolation level of READ COMMITTED.

  • Note the use of Django QuerySets: We define the correctly filtered query once, and then we re-use with chaining to execute it multiple times. We are relying on the fact that the additional filter etc. are creating a new, unevaluated QuerySet, which executes a new query when we loop over it with the second for loop.

  • Making sure you read the notes for select_for_update and use of parameter as appropriate.

  • We guarantee “at most once” — but that allows the possibility of zero times. If you have other, different processes that are also locking those rows in the table (not just multiple copies of this code executing this same code), then the skip_locked=True flag means this process could exit without having processed all the rows, and without any errors. If you don’t mind having to try multiple times to be sure, that should be OK. In other words, this code is assuming “everyone else is more important than me.”

    I think you could change this by using select_for_update(nowait=True) instead, combined with appropriate try/except/looping.

    For trying multiple times, we could:

    1. Leave that to the next time our background job attempts this, or
    2. Do some counting inside the two loops, and if the inner loop comes up short, we know that for some reason we skipped some rows (it could have been because someone else already processed the row, or because someone else locked the rows for a different reason). If so, recursively call send_pending_order_shipped_emails. This recursion will definitely terminate when the orders_to_email query comes up empty, or when we succeed in processing everything in it.
  • Performance note: we are doing N+1 read queries to process all the pending records, plus N writes. You might need to be aware of that, compared to doing 1 read and 1 write if we did them all together and used some other mechanism to ensure we didn’t have multiple competing processes.

  • If you have multiple processes racing to process the pending records, the above code will naturally distribute the work between them approximately equally — you get work distribution for free.

  • I’ve tried to find ways to encapsulate this pattern more neatly in Django/Python, like with double_checked_locking(queryset) but so far had no luck at producing something materially better (like this in django-mailer, which works OK but has an awkward usage pattern). I think this one is better just doing every time, especially given some of the issues above.

  • If you processing is idempotent, or you can arrange for that, then you may be able to get away without any locking at all, and may not need this pattern. You may need to be careful to use QuerySet.update rather than Model.save() to avoiding race conditions with overwriting data etc. (thanks Haki)

Anything else? My understanding of how PostgreSQL isolation levels and transactions work, along with my experiments (using good ILtime.sleep()) seem to confirm this works as per the description and notes above, but if I’ve missed something please add a comment!

Updates

  • Adam Johnson has attempted a refactor of this idea. It’s untested, but looks like it will work. I would still recommend reading all the caveats above and adapting for your own usage rather than just dropping it in - there are plenty of subtleties here, as noted.
  • Other discussion on Twitter.

Comments §

Comments should load when you scroll to here...