All Unkept
Posted in: Haskell  —  4 August 2008

Haskell API docs suck. A lot.

Haskell API documentation is very lacking for newbies. For instance, I want to understand how to create and use regexes. If you start at Text.Regex.Posix documentation, it tells you that =~ and =~~ are the high level API, and the hyperlinks for those functions go to Text.Regex.Posix.Wrap, where the main functions are not actually documented at all!

So we look at the type signatures -- here is the first:

(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target

So, that leads me to the class declarations for these things. But trying to understand them is rather intimidating:

class RegexOptions regex compOpt execOpt | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where

Or how about this?

class RegexOptions regex compOpt execOpt => RegexMaker regex compOpt execOpt source | regex -> compOpt execOpt, compOpt -> regex execOpt, execOpt -> regex compOpt where

They are using multi-parameter type classes and functional dependencies. Having read bits of Haskell for a while, I happen to know what they are (vaguely), but I don't really understand them, nor does the above really give me any clue to how to actually use this API.

Google to the rescue. (This is bad: I shouldn't have to google for documentation when I'm already looking at the obvious place for something to be documented). The first result for "haskell regex" is a completely useless and hopeless out of date page, but there is a Haskell regex tutorial on a blog that shows us how to do it, and it is astonishingly simple:

> "bar" =~ "(foo|bar)" :: String
"bar"

So what is going on? It looks like the library has been designed extremely cleverly so that in the simple case (regex with default options etc), you can use it very easily, but you don't need to use different functions if you want to add regex options. Furthermore, it is polymorphic in its return type, so we can also do this:

> "bar" =~ "(foo|bar)" :: Bool
True

In fact you can get lists of matches, or lists of match offsets etc -- almost anything you can think, just by specifying (directly or using type inference) the type of the result you want. This is beautifully elegant and clever and I'm sure it gave the designer a warm fuzzy feeling inside (well, it gives me one, and I'm just looking at it). The downside is that if you try to use =~ at a GHCi prompt without a type annotation, you just get a ridiculously unhelpful error.

The problem here is that making the library so clever has also made it utterly impenetrable to the beginner. The main functions are not even documented, and there is no explanation of the crazy type signature. You might say that it is simply a documentation problem, but it is actually a combination of the two -- if the type signature had been something simple, it would have been easy to deduce how to use it. It seems to me that the documentation of a library has got to be proportional to the cleverness of its type signatures, or people are going to be absolutely lost. Since Haskell libraries are almost always implemented by Haskell gurus, and they implement them with themselves in mind (I have no objection to this, they are enthusiasts working for free), they use lots of clever code and advanced Haskell techniques. But this means that if you want people to actually use these libraries (and by consequence Haskell itself), the documentation for Haskell libraries has to be about an order of magnitude better than anything you'd find anywhere else. I suspect it is at least an order of magnitude worse than for something like .NET APIs, which means that relatively speaking the documentation of Haskell is currently in an absolutely dire state.

Sorry, I'm just saying it like it is. These libraries are great when you can get them to work, and I'm really grateful to the authors for their fantastic work, and the effort that has gone into packaging and distributing them (so that installation is literally one short command-line away), but the hurdles are still currently far too great compared to any other language for Haskell to become popular.

Moving forward, I guess one problem is contributing to a library's documentation. There is nothing on the API doc pages that shows you how to do this. I suspect you need to check out the source with darcs (not something I do normally, I just use cabal) and then start email patches or something. Even then, I don't know if I would contribute any documentation -- 'howto' style documentation seems out of place on the API pages, but it is desperately needed.


Comments §

§ On 4 August 2008, Richard Smith wrote: 339
> installation is literally one short command-line away
One short command-line? Hah! cabal-install has 13 dependencies, some of which aren't packaged in most distros. In order to avoid having to learn how to install packages from hackage the hard way, you first have to learn how to install packages from hackage the hard way.

Perhaps this is merely a ploy to get people to realise how much they value cabal-install?

§ On 4 August 2008, gwern wrote: 340
Richard: Well, I've noticed that people seem to be very enthusiastic about cabal-install after going through everything to install it, yes. :)

But is it being hard to install really such a bad thing? It'll be a front-end to Haskell packages, used by any number of people who may want to do minimal to no Haskell development (perhaps they just want XMonad), so it behooves the Haskell community to make sure it gets a lot of testing and usage before it goes into, say, Debian Stable.

As it is, cabal-install isn't entirely done. Witness the latest arguments over how it should handle the case where it installs executables inside its ~/.cabal/ (which is obviously not in one's $PATH by default).

Besides, the prerequisites are all cabalized. If the distros don't have them it's their fault (especially given the various cabal-to-package programs on Hackage).

§ On 4 August 2008, gwern wrote: 341
Luke: Yes, you're entirely right. The procedure is to darcs get the appropriate repository, edit, record, and darcs send the patches to the libraries mailing list @ haskell.org.

(Where the repo is depends; regex-posix isn't a base library, so its repo could be anywhere.)

The issue is just that no one has written and submitted docs, not that Haskellers won't - I think the XMonad Haddocks are really good, and other packages by Don have good documentation as well (like ByteString's).

§ On 5 August 2008, luke wrote: 342
@ Richard: point taken - I guess cabal is a pain to install! It was only about 3 or 4 dependencies for me, as the rest came with GHC or was apt-gettable.

@ gwern: Yes, there are some libraries which are much better. I just started using 'template', a small library which has just the right sort of docs, and ByteString is generally very good, as you say.

I guess I am implicitly comparing to something like Python which has a 'batteries included' standard library, and everything in those it has to have documentation that is absolutely up to scratch. There are then other Python libraries for which the documentation is very variable. The problem with Haskell is that the standard libraries are much narrower in scope, so many common tasks are in the category of having unreliable documentation.

§ On 5 August 2008, Chris K wrote: 344
Hi Luke.

I am the author of the regex-* packages. Sorry about the lack of tutorial level documentation, I write all this as a hobby and hardly ever use any of it. Documentation patches are welcome...

As for the darcs repository is, the Packcage-URL in the cabal file points to
http://darcs.haskell.org/packages/regex-unstable/regex-base/
which is responsible for the high-level type machinery you are quoting. Other packages are in neighboring directories, I suggest using the code under "regex-unstable" regardless of its name.

That high-level API is the fusion of two medium level APIs, class RegexMaker and class RegexLike.

The first compile the source (byte)string into a mostly opaque regex and the second uses that regex to match against some to-be-searched (byte)string. Using these two classes makes for less type-complicated code and allows the compiled Regex to be cached and reused.

The high-level API of class RegexContext builds on RegexLike to create all the nifty dependence on the requested type. One still uses RegexMaker with RegexContext. Then =~ and =~~ are merely fusions of RegexMaker and RegexContext.

§ On 6 August 2008, Dave Menendez wrote: 347
In this case, the problem isn't so much the API documentation as the API itself. The class RegexContext, which allows (=~) to change its behavior based on the specified reutrn type, has no real justification other than to (a) demonstrate the power of type classes and (b) one-up Perl. It complicates the type of (=~) and it makes it difficult to locate any explanation of what it does, since it's dependent on the instances of RegexContext, and Haddock can't document specific instances.

That being said, the API documentation for Haskell libraries is often quite sparse. It depends a lot on who originally wrote the code.

§ On 9 August 2008, Magnus wrote: 348
Interestingly I recently had to use regexps in Haskell for the very first time. I had none of these problems with documentation. I guess this was because I first looked at Text.Regex and found the sentence "Uses the POSIX regular expression interface in Text.Regex.Posix." That's when I stopped looking. Luckily the API in that module was completely self-explanatory :-)

§ On 30 April 2009, Merwok wrote: 423
To be fair, Python has a good documentation but its libraries do not always use the nicest methods available.

Besides, Python gets you spoiled. When you get some third-party code without instrospectable docstrings, even with good external documentation, you’re not happy.

§ On 22 June 2010, DudeOfInternet wrote: 900
I tried to use a type signature of [String] to get a list of all matches but it failed with "No instance for (RegexContext Regex [Char] [String]). Code was

n :: [String]
n = "hello dfadf hello" =~ "hello"

main = print n

However a type of AllTextMatches [] String worked. You just have to use getAllTextMatches to get the actual list.

n :: AllTextMatches [] String
n = "hello dfadf hello" =~ "hello"

main = print $ getAllTextMatches n

~~~~

Another problem I had was =~ won't work on precompiled regexes. It fails with different error messages depending on what order you put the string and regex in.

Probably I'm just dumb but I found this interface to be way too complex. I still don't fully understand it and I've been working with it for a couple hours. It's a nice idea but needs more work to make it thoughtless to use.




§ On 22 June 2010, DudeOfInternet wrote: 902
Oh no! I came back to write that after finally figuring out how to use the regexp syntax that I found it not so bad. It just needs a better explanation. Then I find out that I posted my comment to the wrong blog. Sorry about that, the actual state of my tabs was different from my expected state.

Add comment

Format:

  • Javascript has to be on to get past my spam protection, and cookies, and there is a delay, sorry for any inconvenience!
  • I reserve the right to moderate comments.