All Unkept
Posted in: Software development, Python, Haskell  —  May 07, 2007 at 02:52 PM

Null pointers vs None vs Maybe

by Luke Plant

In Haskell, there is a datastructure called Maybe. Maybe is used where you would often use null pointers in languages such as C# or Python. Python doesn't have null pointers as such, but it does have the None object, which in many cases has the same kind of patterns and problems associated with it. Haskell needs Maybe because no data types are nullable -- unless you are writing code that interfaces to C, you never have the possibility of a null pointer. Here is a brief explanation why Haskell's Maybe is infinitely better than null pointers.

First, a word about Maybe. It is defined, at its most basical level, something like this in Haskell:

data  Maybe a  =  Nothing | Just a
  deriving (Eq, Ord, Show, Read)

On the left hand side, a stands for any type. This definition means that a variable x which is, for example, of the type Maybe String is either Nothing or a Just y, where y is the actual string. To work out which it is, we normally use pattern matching, or a utility function that uses pattern matching -- but either way, we can't use variable x as if it was a string, we have to explicitly unpack it. This might seem laborious, especially when in other languages you get 'nullability' for free for most variables. A few examples will highlight why Haskell's way is much better. (By the way, the deriving clause tells the compiler to automatically generate methods for Maybe for testing equality, ordering, printing as a string and reading from a string).

Now, onto our examples. Suppose you have a function that is like the lookup function of a dictionary -- it takes a key, and may or may not return a value. I'll use 'K' or 'k' for the type of the key. In C# 2.0, using generics, the function signature might look something like this:

interface ILookup<K, V>
{
   V GetItem(K key);
}

The idea is that GetItem() will return a null if the item isn't found -- this is a typical in languages like C#, whereas Python would probably throw a KeyError. (In both languages, we might do foo[key] rather than foo.GetItem(key), which can be done easily in either). A more functional implementation (but less idiomatic) would be to use delegates. We would require a declaration like this:

delegate V Lookup<K, V>(K key);

In Haskell, the function would be typed like this:

lookup :: (Eq k) => k -> Maybe v

Haskell already beats both Python and C# for the following reasons:

  1. The fact that the nullability is explicit in the signature means that you can't forget that there is a possibility the object is 'null'. Even if you do forget, the compiler won't. So all those NullPointerExceptions? You can say goodbye to them forever. If you have ever seen a NullPointerException (or NoneType object has no attribute 'blah' in python), and I'm sure we all have, then you have all the evidence you need that programmers, even good ones, do forget to do these checks.
  2. Even better than not producing NullPointerExceptions etc. is the fact that you can forget about whether a certain function can return null or not. The compiler will always catch it for you, so you are free to concentrate on the programming. This is a different point -- the first means you aren't producing programs that will ever crash with this error, the second means that as you are actually programming, you have less things in your mind to distract you, so you can keep your registers full with things you need.

But let's make things more interesting. First, let's say our dictionary is caching results of potentially expensive database queries. So, the key might be the SQL query string, and the value could be objects of various kinds. In this case, Nothing is a valid value that the dictionary might hold -- it means we have done the search already and found nothing. The 'lookup' function in this case would return Just Nothing. But what about the C#/Python versions? There is nothing to stop us putting a null value in against a certain key, but when we do the lookup, a null pointer or None object could now either mean 'we already did the search and we found nothing', or 'we haven't done the search'.

Let's add another level. Suppose we are searching for a value in the database that is held in a column that can actually be NULL. For example, we might be doing a search for Mrs Jones's age, and come back with the result that, although we do actually have a record for Mrs Jones, we don't have a value for her age.

In the C# version, we are really in trouble now. We have 3 possible meanings for 'null', with no way to tell them apart. In Python, we could solve this either by throwing different types of exceptions (as we could in C#), or by additional null objects -- in Python there is nothing to stop you creating singleton null objects and assigning them to any variable. We can then check for identity against these null objects to cope with our various errors.

Our Python solution suffers some limitations -- both exceptions and additional null objects will give us headaches if we try to serialise and resurrect them (None will resurrect OK, but for other singleton null objects you need a significantly more sophisticated mechanism than the ones normally used -- try it). But we also have to remember to document and handle all the possibilities. And when it comes to writing generic code that can easily be reused and combined in novel ways, our hacks will get in the way. We could decide to use a tuple to return (success_boolean, actual_value_or_None) at one or more levels, but this would be seen as unpythonic when the vast majority of code out there will either throw exceptions or return None.

By contrast, Haskell's Maybe mechanism is a breath of fresh air. You can just wrap as many layers of Maybe as you need. Our 'lookup' function would not need modifying, but would in fact be returning an object of type Just (Just (Just Int)). This could be:

  1. Nothing, meaning we haven't got a cached result for the search,
  2. Just Nothing to mean we have done the search, but not found a record for Mrs Jones at all,
  3. Just (Just Nothing) to indicate we have already done the search, and the result was the database doesn't know the age of Mrs Jones,
  4. or Just (Just (Just 43)), if Mrs Jones is 43 years old.

We won't get confused with all these levels -- for a start, each bit of code that adds or removes a Maybe wrapper is likely to only deal with one layer. But the compiler will catch it for us anyway if we make a mistake. Furthermore, we don't have to learn various different ways to cope with null -- we only have one, and we can be sure that our code won't need hacks to keep it working in different scenarios.

And, to add some icing to the cake, Maybe has been written as an instance of a Monad. We all know the common pattern in code that is basically "if the value I get at this point is null, then return null from this function". Using Maybe as a monad, we can eliminate a lot of that kind of stuff -- the monad encapsulates this pattern for us. Sometimes it will be clearer to explicitly check for Nothing or Just x, but often you will be able avoid doing that. Using monad notation what you effectively do is write a number of functions that are bound together using the 'bind' method of Monads (though these functions look more like imperative statements in monad notation). The Maybe version of 'bind' knows to abandon ship as soon as it gets a Nothing, and just return Nothing. Of course, this way of doing things is just as type-safe, but it just allows us to avoid some of the explicit null handling.

In conclusion: this is one instance of Haskell's wonderfully designed type system. If languages like C#, C++, Java etc have put you off static typing, Haskell might be able to change your mind.

Comments §

blog comments powered by Disqus