All Unkept
Posted in: Python, Software development, Django  —  January 10, 2011 at 04:43 PM

Docs or it doesn't exist

by Luke Plant

tl;dr

Writing good quality documentation for the software libraries you publish always matters. Otherwise, you are doing the world a disservice by publishing.

Definitions

I realise I am talking mainly in the context of libraries published for public consumption, rather than in-house libraries. Some of the reasons given below don't apply so much for the in-house library that people may be forced to use.

By publishing, I specifically mean package repositories like PyPI, djangopackages, Hackage, CPAN — the places where developers will go looking for your library.

By ‘good quality documentation’ I do not mean ‘lots of documentation’. Often documentation can be ruined by thoughtless quantity. I remember some API docs produced by the NDoc documentation tool, which, when I last used it (admittedly about 5 years ago), flagged as warnings any methods or properties that were not documented. This led to developers jumping through the hoops, documenting the identity property of the Person class with such epiphany-inducing revelations as “Gets/sets the identity of the Person object”. The tool, however, doesn't flag in red the missing parts of the documentation that you really need, like how you would go about populating a Person object from the database or why you might want to use it etc., nor does it flag poor quality docs.

The result is documentation that is worse than useless — it promises usefulness by the fact of its existence, but instead wastes hours of your time, as you search it in vain to find anything which could not be deduced from looking at a class browser. Even if there is some documentation that is worth having, I will give up looking, and never find it, due to all the almost-auto-generated dross that surrounds it. Developers need to be persuaded of the importance of documentation, rather than simply instructed to eradicate build warnings.

Also, blog posts are not documentation. Blog posts are adverts. Some blog posts contain ‘tutorial style’ documentation. This is a valid type of documentation, but it belongs in one place, with your software, not on a blog that might disappear one day. If your documentation isn't part of your software package, I have no idea it exists. Googling for it is not a solution — docs found in blog posts are typically fragmented in content, scattered over several blogs, frustratingly incorrect because they are out of date, and always leave you wondering "is that it, or should I look somewhere else?"

Intro

I'm prompted to write this rant by the astonishing number of packages on PyPI and Hackage that have virtually no documentation or very poor documentation — not even a README sometimes. I presume that people put libraries on these repositories because they would like to think that they might be useful to other people. Often they like to think they are doing ‘Open Source’ development by publishing in this way.

However, I would argue that releasing software libraries without documentation is like dumping all your old junk on the street “because someone might want it”. You are not, in reality, being helpful or contributing to society — you are just littering. The same is true of libraries without documentation.

Arguments

A library is a risk

The problem with the person who dumps their junk on the street is that he is not considering his ‘contribution’ from the perspective of the people who might be looking for such things — or from the perspective of the people who might want to use the street.

People who are looking for old fridges etc. have places that they will go for that — junk yards, freecycle, ebay, etc. And people walking down streets have expectations of what they will find on those streets. Once you take those things into account, you won't leave your old fridge on the street if you actually care about other people.

Now, people looking for software libraries have a specific task that they need to achieve. They are looking for a library because they believe the task can be abstracted into a library somehow, and they don't want to have to write the code themselves. They then go to Google or a framework/programming language specific repository with keywords for that task.

Every result they find is an avenue that might need exploring. And everything they find which is not suitable is junk that is just getting in their way. The developer has to do 3 things:

  1. Evaluate whether a certain package could possibly do what they need.
  2. Install the package, possibly including some kind of configuration.
  3. Write the code to get the package to do what they need.

And then, possibly, the developer may need to patch/fork your code to produce a version that does what they need.

Every step represents increasing commitment in terms of time. At each step, every second that I have to spend to find something out is time that I am spending, and therefore potentially wasting, because of your library. I will not know until I have actually finished the last step whether your library will help me — it could easily have some flaw that makes it useless for me.

So your library represents risk to me, especially as I always have alternatives — another library, or just writing the code myself. At each point, I've got to assess whether it is likely I will succeed with this package. I've made an investment in this library already, but will it reward continuing investment, or will it turn out to be another dead end? Should I get out now?

So, at the very least I need an overview that tells me what a library does. Without that, every second I spend looking at a library is probably time wasted. It seems ridiculous that someone would publish a library like this, where there is simply nothing to help you know how you are supposed to use it, but many instances exist (I picked that one at random), and, thinking that you must be missing something obvious, you actually spend time searching for any docs, before concluding that there is no documentation whatsoever, not even a single source code comment that might give you a clue.

I hate your package already, and wish it didn't exist, just like the fridge on the street. You have probably not helped anyone, and you have certainly hindered me.

But I need to know more than what is covered by the overview, since it is very unlikely that the default, simple case will cover all my needs. I also need to know what customisation or extension points there are. If none are documented, I can only presume that none exist.

Even if they do exist, I cannot know from reading the source whether they are essentially accidental or not, and this leads to another point:

Documentation is an API contract

With small libraries in particular, the author is not going to write a document explaining exactly how they see the code evolving. The only thing that people can rely on is what has been documented to work. Anything else may be something that is considered an implementation detail, and relying on it is increasing the risk all the time. Even with languages that clearly distinguish between public and private, often these distinctions are not fine-grained enough for the current purpose, and I still want some assurance from the author that something isn't public by accident.

And most people will not want to fork the code to get a version they know works — they want to know they will get bug fixes.

In addition, the existence of documentation tells you not only what the author is thinking about what is really public/private, it shows that the author actually cares about this library. It shows the kind of pride in your work that helps other people to know that this library is likely to get bug fixes.

Responses

It's open source, and so people can just read the source

If you only have the source, it can take a huge investment of reading and understanding before you can conclude whether the library is even attempting to solve the problem you need to solve. I have to fit all the code inside my head and mentally execute it — or I have to download it and try to use it. Both of these represent a huge investment.

My software is too small to be worth documenting

On the contrary, the smaller the software library, the more important it is to have good documentation.

If there is a relatively small task that I want to find a library for, I am not going to spend a huge amount of time researching the libraries, simply because it is not worth it. Any library will likely have all kinds of disadvantages compared to writing my own code — overhead for features I don't need, bugs in features I do need but the author obviously didn't, the need to add customisations (like subclassing, which serves to make things harder for a maintenance programmer) etc. So for small features, I'm very tempted to write my own solution anyway, since I know that it will fit my needs.

Therefore, I need to know even faster whether the library will do what I want it to do, or whether it can be easily extended to do so. I am much less likely to get as far as step 2 or 3 above, and I am more annoyed with every library that doesn't make step 1 easy — because the time wasted represents an even bigger fraction of my time allowance for this task.

So, if your software is too small to be worth documenting, it is too small to be wasting other people's time with it, and putting it in any public package repositories is doing the world a disservice.

I have only published as a backup of my own code or to share with one friend

I think this is just about the only valid reason to not write docs. If you have put some highly experimental code on github or bitbucket etc. simply as a way to distribute to someone else — perhaps someone else who is going to the write the docs for it — you can justify not writing any documentation. But there is no reason to add it to any package index like PyPI or Hackage — it will only make the useful packages harder to find. Even on bitbucket/github, which often come up as search results from Google, it might help if you label the code as not intended for public use.

If the intended way for people to ‘re-use’ your software is for them to fork it and actually make it into something useful, supporting it entirely themselves, that is fine — but it would be helpful if you don't pretend it is a ‘package’, and just say so somewhere.

Conclusion

If you don't have time to write documentation, don't bother publishing on some package repository — you are simply fooling yourself into thinking you might have done someone a favour, when all you have done is waste resources.

I think we need a new mantra for software libraries, especially in the Open Source world, that applies to entire projects or features of individuals libraries:

Docs or it doesn't exist.

Perhaps, in the context of public package repositories, we should go further:

Docs or delete.


UPDATE:

Of course, I should have also mentioned that with projects like Read The Docs and Sphinx there is even less excuse for creating great quality docs.

Comments §

blog comments powered by Disqus