These days you seem to hear a lot about building a web app in 20 minutes using framework X and language Y. The most exaggerated of these I read recently was (Re)writing Reddit in Lisp in 20 minutes and 100 lines. Here is a brief critique, without having watched the movie at all:
100 lines of code in 20 minutes is 300 lines an hour. Typically you would expect to be able to do 30-40 LOC/hour, so this is very good going. Here are a few explanations, all of which are possible, with varying degrees of plausibility (and I'm ignoring the possibility that the author was exaggerating in terms of time -- though he is clearly exaggerating with the use of the word 'rewrite', as other people have pointed out):
- The author is a super genius.
- He had half the code written in his head before he sat down to type.
- He knew the problem domain exceptionally well -- he wrote something similar in python (or lisp) the week before.
- He had a working installation of everything he needed, to which he could just add a few pages i.e. he had already done all the 'set up' phase.
- He was benefiting from the development honeymoon period (more below).
There are some other possibilities that probably don't apply to this case, but often contribute to the '20 minute web app' syndrome:
- Lots of copy and paste was involved
- You have a framework or piece of software that, out of the box, happens to provide a very large proportion of what you need.
The 'honeymoon' period of software development is the bit that comes after you have set your machine up etc, and you start coding from nothing. If you have nice frameworks or libraries, you can get fantastic levels of productivity at this point. There are various reasons, but I think you can summarise it by saying that you are doing zero 'maintainance' programming, and you have, at that point, a very small application that suffers from none of the problems of large applications.
For instance, achieving OAOO is very easy when you are writing the code for the first time. Even if you do produce duplication, it's very easy to keep track of, and you haven't yet suffered the pain of not obeying OAOO.
You also haven't had to worry about the finishing touches and tying up loose ends -- it makes no sense to do them yet, so you rightly ignore that for now -- but those finishing touches take time. Nor have you had to worry about deployment etc.
The problem is that none of the above will help the more normal programmer to maintain that kind of productivity on a larger application. The honeymoon period is soon over, and finding ways to remove duplication and write maintainable, flexible code with few defects becomes much harder. Even the techniques that may have made the initial development phase so easy and productive may turn out to be the bane of your life (such as copy-and-paste, and lots of the shortcuts and methodologies typically encouraged by PHP).
So, I'm going to present to you an account of an application that took a lot longer than 20 minutes to write. I intend to make it as balanced as possible.
Since September (September 2nd, to be exact, according to my Subversion logs), I've been working on a Django project in my spare time, and it's finally complete. I've kept a pretty complete log of my hours, split by activity, so I'm hoping it will be of some use to those trying to make realistic estimations of coding time.
The project is now live at www.cciw.co.uk. It is a website for a charity that runs outdoor Christian camps, which I've been involved in all my life (literally!). The website has all the details of the camps, camp sites, leaders etc, and a community of people who have been on the camps, based around a message board and photo gallery system. [Edit 2012: the site is now quite different in emphasis since this article in 2006, since the forums are rarely used now]
Until a few days ago, the website was running under PHP with a flatfile database. The new website is mainly a re-implementation of the existing functionality, but I have added quite a few things and tidied quite a lot up, and I didn't bother trying to salvage any of the existing PHP code by porting it -- I wrote everything from scratch.
My main aims were:
- to get rid of the ropey old PHP code (not to mention the flatfile database), and produce some clean, maintainable code.
- to make it easy to moderate the message boards. The main users of the website are 11 - 17 year olds, and the camps are strongly Christian in ethos and aims, so it's very important that the camp website is always a safe and fun place for the campers to interact.
- to make the website manageable by other people instead of just me, and hopefully write myself out of the picture.
- to add some fun new features.
- to generally increase usability.
- to get a reliable database, and a proper SQL one that would enable things like the 'stats roundups' I occasionally do.
- to do some test driven web-development.
I've spent a total of 240 hours on the project, or about 6 working weeks. That's quite a lot, and quite a bit more than you typically hear quoted for Django apps, but the project is probably fairly large compared to most of the ones you hear about:
It contains 22 database models, about 60 view functions (which vary massively in size -- a handful are straight generic views, some of the message board ones are moderately complex), 15 Atom feeds and a total of 56 template files. It also has 14 custom template tags and various other bits and pieces as you would expect in a project of that size.
Also, the figure given above totals all my activities, including:
- learning Django (and Python to some extent)
- data migration (a lot of it -- I was careful to ensure none of the old message boards were lost, and some of them go back 6 years to an even older system. I even managed to rewrite any embedded URLs in message board posts etc so that they are still correct)
- design mockups (I'm not much of a designer, but I'm the only person who is working on this), and then the actual designs in XHTML 1.0 strict, the highest quality stuff I've done so far in terms of semantic HTML and clean CSS.
- all the setup and deployment issues (setup was easy, but final deployment was harder because of another project I wrote that used the same database, and some of the same tables, but was deployed earlier, so I had to do a bit of a database merge).
- a fair amount of content editing
- oh, and writing the code (models, views and templates), testing and debugging, which accounted for about 75% of the time spent (perhaps a little less, it's not always easy to divide up the time correctly).
In terms of code there are about 2000 lines of template code, 6000 lines of Python, and 900 lines of migration scripts (done in Python). I know LOC aren't that accurate a measure of program size, but hopefully that's of some help.
Given those figures, it looks like I was reasonably productive -- averaging over the complete time that comes to about 37 LOC/hour, which is reasonable. I also have been careful to avoid cut-and-paste, which can be an easy way to get stuff done, (and add LOC), but also an easy way to leave an unmaintainable mess behind! The great design of Django, including things like template inheritance, and the power of the Python language makes it possible to really keep duplication to a minimum.
Some of the things that added to the development time were:
- trying to handle my existing data properly (which added a fair amount of special casing etc)
- changing Django APIs -- which meant sometimes I had to rewrite, and sometimes I avoided features that I knew were not stable, and went for something that might not have been optimal just to crack on.
- lack of ability in the design area
- deciding to change to Postgres part way through (though that was fairly trouble free)
- and probably a bit of perfectionism.
I should also point out that the framework is more mature than when I started! In fact, some of the things I coded have become part of Django -- the hours where I was consciously hacking on Django itself I was careful to log separately, but nevertheless some of the code I wrote for CCIW ended being generic and has made it's way into the core framework -- so that's code you won't have to write if you start with Django now.
On the other hand, there are some things which would increase a realistic estimate of the coding time. The main one is that for at least half of the code I was writing, especially the message board code, I had a very good idea of what I was doing, having implemented it once already, even if it was several years ago. It's very difficult to measure the effect of this -- although the python code I wrote bears very little resemblance superficially to the original PHP code, it is very likely that my subconscious knowledge of how it would work in general helped me a lot.
I'm quite pleased with my results! I'm not sure if I can really give a less biased view. I normally find with programming that by the time I've finished a project, I'm already quite unhappy with the quality of the code, and I have a list of 'cleanup' TODOs, or even 'rewrite this large chunk of it' TODOs, which usually never get done. By the time a few years have passed, I'm downright ashamed. So far I don't feel this way about any of the code -- let's see how long that lasts!
The quality of the HTML is pleasing - the Django validator app I wrote (development time not included) made creating the entire site using only XHTML 1.0 strict really very easy -- a task that I used to think was quite a challenge. The only part that proved tricky was writing a bbcode parser that would accept anything the users can throw at it and always produce valid XHTML that matched what the user would expect to get.
In terms of visual design, I'm reasonably pleased -- though there are quite a few places that could do with a designer's eye. And as for end user experience, I can't really say yet. I've tried to slim down the interface and make the pages a bit simpler than they were before, but some new features, especially 'tagging', have added more things back in.
In terms of making the website easy for other people to manage, Django's admin has solved pretty much all of that. For the main models that other people will have to administer (which are details of the camps we run, camp sites they run at and people who run them), it's astonishing how well the automatic admin functionality caters for it -- it does a lot more than I would have managed to create if I were writing a custom admin interface manually.
I've also made the website self-maintaining as much as possible -- for instance, every year each camp that has just finished gets a new forum, and this now creates itself on first access. The website also has a concept of the 'next year', which depends on when camps finish etc, and the 'clock' for this ticks over automatically.
For moderation, everything now has a feed, so it is very easy to aggregate message board posts for the entire website, or for an individual user, and be aware of new topics etc. I discovered some nice patterns while doing the Atom feed work -- detailed below.
On a downside, I did very little test driven development. I came to the conclusion that Django's view functions are very difficult to test. They take complex objects and return complex objects, and their output is highly dependent on what is in the database, so you have to do a lot of set up first. The view functions themselves often do very little -- in fact some were just generic views, so testing them would just have been testing Django. Some do quite a lot however, but what exactly they do will depend on the validity of input and data in the database etc. I realised that unit tests are pretty inappropriate, but functional tests, using a tool like twill, would be perfect.
Unfortunately, after installing and playing around with twill, I never got around to writing tests with it, partly because of the pain of having to write setup code. I know that other Djangoers have done good work here, but with time constraints this was the first thing to go. I did, however, write tests the parts of the system which could be decoupled easily from the view functions -- in particular most of the bbcode parser was developed in a test driven manner, which worked very well.
I didn't use Django's 'high level' feed framework, as it didn't fit very well, but the lower level one was just perfect for what I wanted. Feeds are all available at URLs with '?format=atom' appended to the normal page. To handle this, I've got mini-framework involving a handle_feed_request() function and base CCIWFeed class that inherits from feeds.Feed.
With these in place (25 lines of code), my class for generating the feed for new message board posts, for example, looks like this:
class PostFeed(CCIWFeed): template_name = 'posts' title = "CCIW message boards posts" ## This is called by CCIWFeed.items() def modify_query(self, query_set): return query_set.order_by('-posted_at')[:POST_FEED_MAX_ITEMS] def item_author_name(self, post): return post.posted_by_id def item_author_link(self, post): return add_domain(get_member_href(post.posted_by_id)) def item_pubdate(self, post): return post.posted_at
(Plus there are two templates to support this).
However, you can also get a PostFeed for a specific member -- i.e. all posts that were created by that member. The only thing that needs to change is the title, so the implementation is just the following:
def member_post_feed(member): """Returns a Feed class suitable for the posts of a specific member.""" class MemberPostFeed(PostFeed): title = "CCIW - Posts by %s" % member.user_name return MemberPostFeed
The view code has to call member_post_feed() with a specific member, and passes the generated class to the feed handling code. It doesn't require a special view -- it just requires two lines in the existing HTML view for a specific member's posts.
This pattern is repeated quite a number of times, and I think it is wonderfully elegant -- it's so easy to see what it is supposed to do, and using a class in the same way you use closures is so expressive.
I've enjoyed this project, and I'm very pleased with the result, but I am also happy to get it over with, as it has been dragging on in my spare time since September. The launch of the new website has been a bit of a non-event. The amount of traffic on the site varies enormously -- after the camps in the summer when everyone has just met up again, there is a massive surge in activity -- last August and September the very small active user base managed to create up to 280 posts a day. But right now, there is almost no activity and it's been like that for months. I have to tell myself that it hasn't been a wasted effort :-).
If anyone wants to play with it, you can use the forum at http://www.cciw.co.uk/website/forum/8/ and log in using user name and password 'guest'. I'll probably clear out everything that 'guest' creates every week or so, so create whatever you like.
Finally, if anyone is interested in any of the code, I'd be happy to make it available. Most of it is probably quite specific to CCIW, and not very re-usable, but I made the tagging functionality very generic (I posted to django-devs about this already), and there may be other bits people would want to glean.
For the language partisans: my reference to the Lisp article wasn't meant to imply that using Lisp for web development would't scale or would be worse than Django -- I have no idea, as I have never used Lisp. (By contrast, my references to PHP did come from experience and were meant to imply that Django is much better than PHP :-). My main point was that a small project really doesn't give you much idea about a bigger project -- the techniques used for a prototype or small project may or may not scale. I do know that Django was written by experienced developers who knew the pitfalls of web development, and wrote the framework specifically to avoid them, and to make the common things very easy, and I think they have succeeded admirably.