The other day I got a question about some old code I had written which, instead of raising an exception for an error condition as the reader expected, returned an error object:
With your EmailVerifyTokenGenerator class, why do you return error classes instead of raising custom errors? You could still pass the email to a custom VerifyExpired exception.
I think I'm too eager to raise errors but maybe there's something I'm missing with classes 😁!
The code in question is below (slightly modified and with several uninteresting methods removed). It is part of a system for doing email address verification via magic links in emails.
from dataclasses import dataclass class VerifyFailed: pass VerifyFailed = VerifyFailed() # singleton sentinel value @dataclass class VerifyExpired: email: str class EmailVerifyTokenGenerator: def token_for_email(self, email): ... def email_from_token(self, token): """ Extracts the verified email address from the token, or a VerifyFailed constant if verification failed, or VerifyExpired if the link expired. """ max_age = settings.EMAIL_VERIFY_TIMEOUT try: unencoded_token = self.url_safe_decode(token) except (UnicodeDecodeError, binascii.Error): return VerifyFailed try: return self.signer.unsign(unencoded_token, max_age=max_age) except (SignatureExpired,): return VerifyExpired(self.signer.unsign(unencoded_token)) except (BadSignature,): return VerifyFailed
To sum up, we have a function that extracts an email address from a token, checking the HMAC signature that it is bundled with. There are 3 possibilities we want to deal with:
The happy case – we’ve got a valid HMAC code, we just need the email address returned.
We’ve got an invalid signature.
We’ve got a valid but expired signature. We want to handle this separately, because we’d like to streamline the user experience for getting a new token generated and sent to them, which means we need to return the email address.
It’s using Django’s signer functions to do the heavy lifting, but that doesn’t matter for our purposes, because we are wrapping it up.
To get going on designing our API for this bit of code, here are some bad options:
-
We could have a pair of methods or functions:
extract_email_from_token
andcheck_signature
, which can be used independently. This is bad because you could easily useextract_email_from_token
and completely forget to usecheck_signature
.The principle here is that we want the developer using this API to fall into the pit of success. Either the developer should get their code perfectly correct, or if they don’t, it either will be obviously broken and not work at all, or at least not subtly flawed with some nasty bug, like a security issue.
-
We could have
email_from_token()
method or function with a return value of a tuple containing(email_address: str, valid_and_not_expired_signature: bool)
.This has a similar issue to above – the calling code could use
email_address
and forget to check the validity boolean.
Having ruled those out, we’ve got two main contenders for how to design
email_from_token()
:
We could make it raise exceptions for the “invalid” or “expired” cases. We need to pass extra data for the latter, but we can put it inside the exception object – as noted by the original questioner.
We could make it return error objects for the error cases, as coded above.
Both of these satisfy the “pit of success” criterion. If the developer accidentally does not handle the error cases, they won’t have a bug where we verified an email address that should not be verified. We will instead probably have a crasher of some kind, which in the case of a web app, like this one, means a 500 error page being seen, and something in our logs that makes it pretty clear what happened.
If we choose to raise exceptions, naive code which doesn’t check for the
exceptions will simply get no further – the exception will propagate up and
terminate the handler. With the second option where we return error objects,
those objects can’t be accidentally converted into success values – the
VerifyExpired
object contains the email address, but it is a completely
different shape of value from the happy case.
Both of these approaches, to some degree, respect the principle that can be summed up as Parse Don’t Validate. Instead of merely validating a token and extracting an email address as two independent things, we are parsing a token, and encoding the result of the validation in the type of objects that will then flow through the program.
But which is better?
One of the influences on my thinking is the way types work in Haskell and other similar language which make it very easy to create types and constructors. In Haskell, the following is all the code you need to define a return type for this kind of function, and the 3 different data constructors you need, which then do double duty for pattern matching:
Now, Python is not nearly as succinct, but dataclasses were a big improvement
for defining things like VerifyExpired
.
In Haskell, due to static type checking, this pattern makes it pretty much impossible for the calling code to accidentally fail to handle the return value correctly. But even in Python, which doesn’t have that built in, I think there are some compelling advantages:
We expect the calling code to handle all the different return values at some point, and at the same point. (This is unlike some code where we can raise an exception that we never expect the calling code to specifically handle – it will be handled by more generic methods at a different layer). It therefore makes sense that we treat all 3 values as the same kind of thing — they are just different return values.
If you instead raise exceptions, you are immediately forcing the calling code into a special control flow structure, namely the
try/except
dance, which can be inconvenient.-
In particular, if you want to hand off processing of the value to some other function or code for handling, you can’t do it easily. For example, code like this would be fine with the “return error object” method, but significantly complicated by the “raise exception” method:
In the years since I wrote the code, however, some perhaps more compelling arguments have come along for the error object method.
First, with some small changes (specifically, removing the sentinel singleton
value), we can now add a type signature for email_from_token
:
(You may need typing.Union for older Python versions)
This is a benefit in itself from a documentation point of view, and for better IDE/editor help.
We can go further with mypy. We can structure our calling code as follows to make use of mypy exhaustiveness checking:
from typing_extensions import assert_never verified_email = EmailVerifyTokenGenerator().email_from_token(token) if isinstance(verified_email, VerifyFailed): ... elif isinstance(verified_email, VerifyExpired): ... elif isinstance(verified_email, str): ... else: assert_never(verified_email)
Now, if we remove one of these blocks, let’s say the VerifyExpired
one (or
if we added another option to email_from_token
), mypy will catch it for us:
With the error object method, we could also write our handling code using structural pattern matching. The equivalent code, including our mypy exhaustiveness check, now looks like this:
verified_email = EmailVerifyTokenGenerator().email_from_token(token) match verified_email: case VerifyFailed(): ... case VerifyExpired(expired_token_email): ... case str(): ... case _: assert_never(verified_email)
This has destructuring of the email address in VerifyExpired
built in – it
is bound to the name expired_token_email
in that branch.
Hopefully this gives a good justification for the approach I took with this code. There are times when exceptions are better – generally when the things mentioned above don’t apply, or the opposite applies – but I think error objects also have their place, and sometimes are a much better solution.