Never, Sometimes, Always

Posted in:

In software development, we often use the Zero one infinity rule (or “zero one many”) to decide how many instances of things we should allow.

For example, a customer record in your database might have zero email addresses associated with it, one email address, or many email addresses. If you currently support one and someone says they need two, you should skip two and go straight to infinity. Adding in support for just a second email address is a bad idea, because you will still have to cope with a variable number (one or two), so it is actually simpler to cope with any number, plus you are future proof.

There is a parallel to frequencies of events: just as programmers only care about the numbers 0, 1 and ∞, the only three frequencies we care about are Never, Sometimes, and Always.

I’ve often had conversations with a client that go like this:

  • Me: is it possible that situation X will arise?

  • Client: No, no, no. That doesn’t happen. Hardly ever.

The problem is that Hardly Ever is still Sometimes, so the client’s response has gone from being a definite “no” to a definite “yes” in less than a second. It doesn’t matter to me that it rarely happens – the situation still happens, and I’ve got to write code to cope with it. The fact that the code won’t be used very often doesn’t make it cheaper to write – it’s not like engineering where a machine that is used less will require less maintenance.

The Hardly Ever cases can in fact be significantly more costly to cope with, because, for example, it might be harder to find or construct examples to use for testing. If a special branch of code is constructed to handle the rare situation, such code is likely to be less well tested and more buggy, and cost more in the long run than the code which handles the common case.

This kind of thing often comes up with a client when converting from a manual system in which all input is being handled by an intelligent agent, so Hardly Ever doesn’t cause too many problems, because the person just does something sensible. So to the client it might feel like I’m being overly pedantic and focusing all my energy on the weird cases – but in converting to the computerised system, we don’t have an intelligent agent behind the wheel, so every case has got to be covered off. Understanding the weird cases as early as possible actually helps me come up with a design in which those cases are not exceptions at all, they are just normal operation and require zero additional code – and this makes the design a lot more robust.

In fact, Hardly Ever means that the frequency is relatively high – it probably means that you’ve witnessed at least one case of it in the past, so that’s a definite Sometimes. More slippery cases are things like It’s Never Happened Yet, which just means you haven’t personally seen an instance of it, but there’s no theoretical reason why it couldn’t happen. In other words, Sometimes. And then there is It’ll Never Happen, at which point my spidey sense is saying “it’s going to happen, and probably sooner than we expect”. So, again, it’s a Sometimes.

Caveats

There are, of course, times when the programmer does care about relative frequency – where 1% is very different from 10% or 90% – and most of them fall under the term “optimisation”.

If you are a programmer wearing a product manager’s hat, you are of course allowed to produce something that is an “80% solution”, or even a “20%” solution, knowing that your product simply won’t cope at all with some cases, for which people will need to find other solutions. You are optimising for a common case at the level of business needs, and you’ll want to know what that common case is.

You may also care about relative frequency for other kinds of optimisation, such as performance or user interface design. You are allowed to have an interface which works great for the typical case but is more clunky for the exceptional one, for example.

Even here there are dangers though. Suppose you optimise an algorithm for what happens 99% of the time, so that you get, let’s say linear time complexity for normal workloads. Unfortunately, for the 1% case you drop into a worse time complexity, like quadratic time. You still have to deal with the 1%, at which point your system may be brought to its knees. It may be bad enough that the code effectively does not work at all for the user. Or if we are talking about services you run, dealing with the 1% case ends up taking up so much CPU or memory that in terms of resources, your 1% case is actually 99% of the problem.

In addition, if you are running in a hostile environment like the internet, an attacker may be able to force the worst case performance, and now you have a Denial of service vulnerability.

You may also have qualitative as well as quantitative differences in the way that the code copes with the “rarely” cases. At the very bottom end it might be to add an assertion that will fail and crash the process if the unlikely thing happens, assuming that this will not be a critical problem and you’ll get notified. Then, once you start getting those notifications, you can assess whether it is worth devoting more resources to.

Or, you might have slightly more graceful handling, and perhaps a manual override to deal with the exceptional case – if such a thing is possible, and isn’t itself more work than just dealing with it in the code.

A final caveat I can think of is that there is always some cut-off at which you count extremely unlikely events as Never. For example, the odds of a collision in a 256-bit hash function like SHA-256 are approximately one in a gazillion. That’s technically Sometimes, but it’s pretty reasonable to count as Never.

Of course, that’s what they probably said about MD5 as well…

Comments §

Comments should load when you scroll to here...