All Unkept
Posted in: Haskell, Web development  —  7 November 2009

Haskell blog software

I finally finished the Haskell blog project that I've been doing for a long time! You're looking at it now (unless you are reading this a few months/years after I wrote it, in which case I will probably have again re-implemented my blog software in my new language-du-jour...)

The blog software itself is not particularly interesting — fairly standard features, Atom feeds etc. It uses HDBC Sqlite for storage, and HStringTemplate for rendering (a nice library, BTW). For framework stuff, it uses my own Ella library. I didn't find a forms/validation library I could use, and ended up just using a few adhoc bits and pieces. I've used the lovely pandoc to allow reStructuredText both for my own posts and for comments, which is a nice feature IMO.

The main interest for me has been the learning process. You get a much better, rounded understanding of a language from a project like this than you do from the small code samples that people knock around.

The project nearly failed at the last hurdle. Everything was working, but when I uploaded to my server, it failed on some URLs. I realised it was a memory problem — the CGI program must have been killed for using too much memory.

At first, I thought the limits on the server must be unreasonably small. Understanding the output of +RTS -s -RTS is kind of difficult. When I eventually found out that GHC compiled programs never release any memory back to the operating system, I realised that it's the first figure—the total amount of memory allocated in the heap—that was killing me. On the bigger pages, this was over 160 Mb. At that point I stopped complaining to my web host!

By changing to ByteString instead of Data.Text for StringTemplates, and using ByteStrings in a few other places, I achieved a 4-5 fold reduction in memory usage, along with a significant speed up. Most pages now only use about 10-15 Mb to render, which is OK for a short running process I think. It's not ideal, especially when an additional 1k comment on a page seems to require at least 300k extra memory to render, but it's good enough for now. Profiling further will be very hard, as I suspect it will mainly be to do with the guts of HStringTemplate.

I'll be blogging about the experience of developing this over the next few days/weeks, and what I've learnt. It's certainly been enjoyable overall, although it's definitely had its pain points too!

I've put redirection in for all the old, crufty URLs, so there shouldn't be any broken links. Feed readers will likely be confused, sorry!

If you have problems getting through my spam protection, please let me know. It enforces a 10 second wait before it accepts submissions, which serves to prevent thoughtless comments as well as spam :-)

Comments §

§ On 7 November 2009, Tom wrote:
510 "Feed readers will likely be confused, sorry!"

I did indeed wonder why there were 24 new posts in my reader! No worries though, every feed does it to me at some point or another.

§ On 8 November 2009, Simon Michael wrote:
511 Congratulations! I've just re-read this series of posts with ease via (I now realise) your new blog. It makes excellent reading for those of us going through similar struggles. Thanks for documenting and persisting!

§ On 8 November 2009, luke wrote:
513

@Simon: Thanks, it's nice to know that it's useful to someone! A lot of the posts have been filled with boring details (mainly documented for my own benefit) on the one hand, and moaning on the other.

I'm also pleased to know that the 'Related' section works well — it was the first time in a good while that I did some raw SQL that would have been much harder to write using an ORM than by hand. It was quite pleasing to find that I haven't completely forgotten how to get some mileage out of SQL!


§ On 9 November 2009, MightyByte wrote:
514 Have you looked at the formlets library (http://hackage.haskell.org/package/formlets) for form processing? If you haven't, you should check it out. I and other people have used it in Happstack applications, but it's framework independant.

§ On 9 November 2009, luke wrote:
515

@MightyByte: Yes, I've looked at formlets. I concluded that it was far too simplistic and inflexible, especially in terms of the HTML that it produced.

AFAICS, most of the problems I have with it are not fixable. I did actually start to write a post about Haskell web frameworks in general. Here are my comments on formlets:

  • It automatically generates control ids e.g. "input0", "input1" etc.
    • This must also mean that if you change the order in which you define the form, the names of inputs will change. This is quite nasty for client side work(i.e. javascript) -- a change in mere presentation detail will break your app. Most frameworks will not provide automatic ways to match up javascript and the HTML, but at least you normally name controls explicitly, so that you know if change a name you may have to change javascript.
    • It also has the side effect that it will interact less nicely with typical browser features. Many browsers will remember past inputs to a form based on the field name. If the field name is auto-generated, or prone to changing just because other elements have been added, or fields have been reordered, this will break.
  • Often you will need to have <label>s that do not surround the element they are the label for. In that situation you need to have a for attribute that is the ID of the control the label is for. This is not currently possible with current Text.XHtml -- the only way to do it would be add a label that looks up the ID of the generated control in its output HTML, and current XHtml does not allow you to look up the attributes of elements.
  • If validation of one field depends on the value of another, I think you will have a problem. Sometimes you will be able to do it, but it will create a constraint on the order in which the fields are displayed (which is a quite sucky).
  • Can it handle check boxes? With a check box, the name+value will be completely omitted from the input data if it is unchecked. Certainly the input' function and Chris Done's blog article doesn't handle this case (it produces a very unhelpful error message), but you should be able to implement a checkbox without it.
  • Validation errors are returned as a simple list. This means you can't display them against the relevant input field -- you have no idea where they came from.

The major problem is that it has this abstraction which is very neat, but that very neatness means that trying to adapt it for use in the real world will break it horribly. There are all kind of real world constraints for web forms (like CSRF protection, custom HTML output, javascript integration, control over field names). It can't handle some of these at least, and I wasn't going to invest in it when I strongly suspected that it wouldn't be able to handle the rest in any nice way.


Add comment

Format:

  • Javascript has to be on to get past my spam protection, and there is a delay, sorry for any inconvenience.
  • I reserve the right to moderate comments.