<?xml version="1.0" encoding="UTF-8"?>
<feed
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:thr="http://purl.org/syndication/thread/1.0"
  xml:lang="en"
   >
  <title type="text">All Unkept</title>
  <subtitle type="text"></subtitle>

  <updated>2013-05-16T09:47:00Z</updated>
  <generator uri="http://blogofile.com/">Blogofile</generator>

  <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog" />
  <id>http://lukeplant.me.uk/blog/feed/atom/</id>
  <link rel="self" type="application/atom+xml" href="http://lukeplant.me.uk/blog/feed/atom/" />
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Translating sentences with substitutions]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions" />
    <id>http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions</id>
    <updated>2013-01-24T00:14:46Z</updated>
    <published>2013-01-24T00:14:46Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Translating sentences with substitutions]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions"><![CDATA[<div class="document">
<div class="section" id="the-problem">
<h1>The problem</h1>
<p>Many programs build up sentences using bits - often a template into which
different things might be substituted. However, the things you substitute into a
sentence can change the sentence, and vice-versa, in ways that are not
anticipated by the programmer.</p>
<p>For example, plurals. In English, you might try code like this:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
    <span class="k">return</span> <span class="s">&quot;I have 1 pig&quot;</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">return</span> <span class="s">&quot;I have </span><span class="si">%s</span><span class="s"> pigs&quot;</span> <span class="o">%</span> <span class="n">n</span>
</pre>
<p>Localising these strings gives problems, because the rules for how to create
plural forms is different in every language.</p>
<p>This specific problem is generally considered 'solved' by the use of gettext,
but many more exist.</p>
<p>For example, we have another problem as soon as we start substituting nouns:</p>
<pre class="code python literal-block">
<span class="s">&quot;Delete selected </span><span class="si">%s</span><span class="s">?&quot;</span> <span class="o">%</span> <span class="n">object_name</span>
</pre>
<p>Various attributes about the noun could affect the sentence. In French, the
adjective &quot;selected&quot; needs to agree in gender with the noun being substituted
in. So you cannot lookup the translations for &quot;Delete selected %s&quot; and for
<tt class="docutils literal">object_name</tt> separately. (This is a real example picked from Django source code).</p>
<p>Further, depending on how the sentence uses the noun, the form of the noun might
need to change. For example, the noun might appear in the accusative position
for a given sentence and language, which requires a different form of the noun
to be used compared to the nominative form.</p>
<p>Several other examples of this appeared in <a class="reference external" href="https://code.djangoproject.com/ticket/11688">Django ticket 11688</a>. One proposed solution on that
ticket would require a huge amount of knowledge and effort on the part of Django
programmers, and almost certainly would not work anyway.</p>
<p>This post is an attempt to come up with a better solution, or at least kick
start discussion. I haven't been able to find any solutions to this problem
online, and most people seem to be just using gettext, which is a 95% solution —
and maybe that is good enough for most people.</p>
<p>[Update 2013-02-19 - ‘Richard’ pointed me to <a class="reference external" href="http://search.cpan.org/~toddr/Locale-Maketext-1.23/lib/Locale/Maketext/TPJ13.pod">Locale::Maketext article</a>,
which has in essence a similar approach to what I've done here]</p>
</div>
<div class="section" id="assumptions-and-simplifications">
<h1>Assumptions and simplifications</h1>
<p>We will assume that a sentence is a composable unit of meaning, such that
sentences can be translated independently. So, if in language A we have
sentences 1 and 2, in that order, we can translate these into language B by
translating sentence 1 and sentence 2 independently, and putting them together
in the same order.</p>
<p>This is, no doubt a simplification. In some languages, the two sentences might
make more sense if re-ordered, or combined, or split in various ways. Indeed,
some languages may not have a truly equivalent concept of 'sentence' at all.</p>
<p>However, we have to do something, and this is a reasonable approximation.</p>
</div>
<div class="section" id="requirements">
<h1>Requirements</h1>
<p>We need a powerful way of defining sentences in a given human language. It must
be powerful enough that the person doing the translation can do anything they
need, without the programmer needing to be aware of all the things in the
language that will cause difficulty.</p>
<p>So, we'll start with a full programming language, and chop out the things we
shouldn't need.</p>
<p>We shouldn't need side effects - translation should be a pure function. So we'll
use a purely functional programming language without side effects.</p>
<p>We need something fairly readable, because translators are going to have to use
it. It should be as close as possible to declarative in style.</p>
<p>Pattern matching seems like a great fit for some of our needs.</p>
</div>
<div class="section" id="possible-solution">
<h1>Possible solution</h1>
<p>Given the above requirements, let's start with a Haskell-like pure functional
language, whose pattern matching will be extremely helpful. It will obviously
have IO removed, and no type signatures (but that won't stop us inferring them
and being able to statically type-check the code). Everything else will be
borrowed directly from Haskell, so that I can avoid having to make up my own
syntax and semantics.</p>
<p>If the concept works, we can argue about better or simpler syntax for some
constructs, or helper functions that aren't part of the Haskell prelude.</p>
<p>Hopefully, we will find a relatively small subset of Haskell that is needed to
give us all the power we need to solve this problem - a subset small enough that
we could guarantee non-termination ideally, to avoid problems with translations
created by malicious agents.</p>
<p>This will be an example based exploration.</p>
<p>Let's assume that every sentence can be generated by a function. The function
will take as parameters all the substutions that are needed, and return the
translated string.</p>
<p>So, suppose we have the English sentence &quot;I have some pigs&quot;. For every different
language we need, we would have a translation file which contains the function
<tt class="docutils literal">iHaveSomePigs</tt>, which in this case takes zero parameters. So for French:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveSomePigs</span> <span class="ow">=</span> <span class="s">&quot;J'ai des cochons&quot;</span>
</pre>
<p>(The mapping between the English sentence &quot;I have some pigs&quot; and the function
name <tt class="docutils literal">iHaveSomePigs</tt> hasn't been defined, and we'll skate over that detail for
now).</p>
<p>If we have a variable number of pigs, for French we might have:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNPigs</span> <span class="mi">0</span> <span class="ow">=</span> <span class="s">&quot;Je n'ai pas de cochon&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="mi">1</span> <span class="ow">=</span> <span class="s">&quot;J'ai un cochon&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;J'ai &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; cochons&quot;</span>
</pre>
<p>For English we could do this:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNPigs</span> <span class="mi">1</span> <span class="ow">=</span> <span class="s">&quot;I have 1 pig&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;I have &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; pigs&quot;</span>
</pre>
<p>(For those unfamiliar with Haskell, the way that pattern matching works is that
the first definition that matches the arguments is used. Since <tt class="docutils literal">n</tt> is not a
literal, but a variable, it can match any argument.)</p>
<p>We can cope with more complicated rules, such as those used in Polish, perhaps
something like this:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNFiles</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;Mam &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; &quot;</span> <span class="o">++</span> <span class="n">pluralize</span> <span class="n">n</span> <span class="s">&quot;file&quot;</span>

<span class="nf">plurals</span> <span class="s">&quot;file&quot;</span> <span class="ow">=</span> <span class="p">[</span> <span class="s">&quot;plik&quot;</span>
                 <span class="p">,</span> <span class="s">&quot;pliki&quot;</span>
                 <span class="p">,</span> <span class="s">&quot;plików&quot;</span>
                 <span class="p">]</span>

<span class="nf">pluralize</span> <span class="n">n</span> <span class="n">word</span> <span class="ow">=</span> <span class="n">plurals</span> <span class="n">word</span> <span class="o">!!</span> <span class="n">pluralForm</span> <span class="n">n</span>

<span class="nf">pluralForm</span> <span class="n">n</span>
  <span class="o">|</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span>                                                                        <span class="ow">=</span> <span class="mi">0</span>
  <span class="o">|</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">10</span> <span class="o">&gt;=</span> <span class="mi">2</span> <span class="o">&amp;&amp;</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">10</span> <span class="o">&lt;=</span> <span class="mi">4</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> <span class="o">&lt;</span> <span class="mi">10</span> <span class="o">||</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> <span class="o">&gt;=</span> <span class="mi">20</span><span class="p">)</span> <span class="ow">=</span> <span class="mi">1</span>
  <span class="o">|</span> <span class="n">otherwise</span>                                                                     <span class="ow">=</span> <span class="mi">2</span>
</pre>
<p>Note that the complex logix in <tt class="docutils literal">pluralForm</tt> and <tt class="docutils literal">pluralize</tt> only has to be
defined once. Adding more words simply requires additional <tt class="docutils literal">plurals</tt>
lines. It's not the nicest syntax, but could probably be improved, and it's
pretty easy to copy.</p>
<p>Let's add in gender, using the sentences &quot;Delete this %s?&quot; (singular) and &quot;Delete
selected %s?&quot; (plural). We can use guards:</p>
<pre class="code haskell literal-block">
<span class="nf">deleteThisThing</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;Supprimer ce &quot;</span> <span class="o">++</span> <span class="n">singular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;?&quot;</span>
    <span class="o">|</span> <span class="n">otherwise</span>         <span class="ow">=</span> <span class="s">&quot;Supprimer cette &quot;</span> <span class="o">++</span> <span class="n">singular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;?&quot;</span>

    <span class="c1">-- (Ignoring the problem with 'ce' followed by vowel for now...)</span>

<span class="nf">deleteSelectedThings</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;Supprimer les &quot;</span> <span class="o">++</span> <span class="n">plural</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot; sélectionnés&quot;</span>
    <span class="o">|</span> <span class="n">otherwise</span>         <span class="ow">=</span> <span class="s">&quot;Supprimer les &quot;</span> <span class="o">++</span> <span class="n">plural</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot; sélectionnées&quot;</span>

<span class="nf">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">elem</span> <span class="n">thing</span> <span class="p">[</span> <span class="s">&quot;pig&quot;</span>
                               <span class="p">,</span> <span class="s">&quot;man&quot;</span>
                               <span class="c1">-- anything else masculine</span>
                               <span class="p">]</span>

<span class="nf">singular</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">pluralForm</span> <span class="mi">1</span> <span class="n">thing</span>
<span class="nf">plural</span>   <span class="n">thing</span> <span class="ow">=</span> <span class="n">pluralForm</span> <span class="mi">2</span> <span class="n">thing</span>

<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochon&quot;</span>
<span class="nf">pluralForm</span> <span class="mi">2</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochons&quot;</span>

<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;man&quot;</span> <span class="ow">=</span> <span class="s">&quot;homme&quot;</span>
<span class="nf">pluralForm</span> <span class="mi">2</span> <span class="s">&quot;man&quot;</span> <span class="ow">=</span> <span class="s">&quot;hommes&quot;</span>
</pre>
<p>Note that the only thing required by this system is that the functions
<tt class="docutils literal">deleteThisThing</tt> and <tt class="docutils literal">deleteSelectedThings</tt> exist. Everything else is at the
freedom of the translator, and better ways of defining any of these functions
are possible.</p>
<p>Of course, it isn't expected that a translator would be able to produce this by
himself/herself. However, once the basic logic has been set up, this syntax is
readable enough that a translator could easily add more of the same. Lines like:</p>
<pre class="code haskell literal-block">
<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochon&quot;</span>
</pre>
<p>are actually pretty readable. The lack of parentheses in Haskell function calls
is also a bonus (though, as I said earlier, exact syntax could be debated). This
is not really that much harder than editing a .po file if you are just wanting
to add more of the same.</p>
<p>Also, we've got flexibility. If we really don't care about getting the gender
right, we can just do &quot;sélectioné(e)s&quot; and be done with it.</p>
<p>Let's make it harder - we'll add <strong>case</strong>. I'll use NT Greek as an example,
because it has nouns that decline with case (and I don't know any similar modern
languages well enough). I'm going to introduce an enum for the different cases,
using <tt class="docutils literal">data</tt> for now, and for the different genders. I could also do the same
for number (&quot;Singular&quot; and &quot;Plural&quot;), but just using <tt class="docutils literal">1</tt> and <tt class="docutils literal">2</tt> seems
easier.</p>
<p>Our sentence will be &quot;You like the %s.&quot;. For this in Greek, we need to choose
the accusative singular form of the thing we pass in. We also need to pick the
word for &quot;the&quot; (the definite article) which matches the <em>gender</em> and <em>number</em> of
the noun, and it has to match the accusative case too. So, if we pass in a
masculine word, we need the singular accusative masculine definite article
(having fun yet?):</p>
<pre class="code haskell literal-block">
<span class="kr">data</span> <span class="kt">Case</span> <span class="ow">=</span> <span class="kt">Nominative</span> <span class="o">|</span> <span class="kt">Accusative</span> <span class="o">|</span> <span class="kt">Genitive</span> <span class="o">|</span> <span class="kt">Dative</span>
<span class="kr">data</span> <span class="kt">Gender</span> <span class="ow">=</span> <span class="kt">Masculine</span> <span class="o">|</span> <span class="kt">Feminine</span> <span class="o">|</span> <span class="kt">Neuter</span>

<span class="nf">youLikeTheThing</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;φιλεις &quot;</span>
                        <span class="o">++</span> <span class="n">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="p">(</span><span class="n">genderOf</span> <span class="n">thing</span><span class="p">)</span>
                        <span class="o">++</span> <span class="s">&quot; &quot;</span>
                        <span class="o">++</span> <span class="n">accusativeSingular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;.&quot;</span>

<span class="nf">accusativeSingular</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">nounForm</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="n">thing</span>

<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιου&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιω&quot;</span>

<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Nominative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλια&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Accusative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλια&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Genitive</span>   <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιων&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Dative</span>     <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιοις&quot;</span>

<span class="nf">genderOf</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="kt">Neuter</span>
<span class="nf">genderOf</span> <span class="s">&quot;man&quot;</span>  <span class="ow">=</span> <span class="kt">Masculine</span>
<span class="c1">-- etc</span>

<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;ο&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;τον&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;του&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;τω&quot;</span>

<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;το&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;το&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;του&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;τω&quot;</span>

<span class="c1">-- feminine etc</span>

<span class="c1">-- definiteArticle 2 (plurals) etc.</span>
</pre>
<p>Of course, you can easily define shorter aliases to avoid some typing here, and
there may be better ways to generate the tables, though as written above they
are pretty readable, and should be familiar to anyone who knows Greek.</p>
<p>The function <tt class="docutils literal">youLikeTheThing</tt> here is no longer very readable, although it
could be much worse. Some kind of substitution syntax/function could be used.</p>
<p>The code above actually works, BTW, and it actually ran first time I tried - the
only correction I needed to make its output correct was to add a space after the
definite article. You just need to put it in a file <tt class="docutils literal">test.hs</tt>, add the
following line:</p>
<pre class="code haskell literal-block">
<span class="nf">main</span> <span class="ow">=</span> <span class="n">putStrLn</span> <span class="o">$</span> <span class="n">youLikeTheThing</span> <span class="s">&quot;book&quot;</span>
</pre>
<p>and do:</p>
<pre class="code haskell literal-block">
<span class="o">$</span> <span class="n">runhaskell</span> <span class="n">test</span><span class="o">.</span><span class="n">hs</span>
</pre>
<p>There is not a type signature in sight, but you have compile time
guarantees. This is all a testimony to the clarity of Haskell's syntax.</p>
<p>The features of Haskell we've used are:</p>
<ul class="simple">
<li>functions</li>
<li>simple pattern matching on numbers and strings</li>
<li>guards</li>
<li><tt class="docutils literal">data</tt> statements, limited to union types of nullary constructors
i.e. effectively enumerated values. We could use a keyword <tt class="docutils literal">enum</tt> for
clarity.</li>
<li>string concatenation</li>
<li>lists</li>
<li>a few arithmetic and logical operators</li>
</ul>
<p>We haven't used recursion. I can imagine circumstances where it might be useful,
but if deemed too risky, you could add some rules that would disallow it
(e.g. by requiring a function mustn't call itself directly, and must only call
functions that exist prior to it in the source code, to avoid mutual recursion.)
This would be helpful to ensure termination.</p>
<p>You might also want a module system, to be able to pull in some common
definitions and functions for a given language, for consistency across different
projects.</p>
<p>This whole approach has the advantage of being able to refine and special case
as much as you want. Take the sentence &quot;you like the %s&quot;: suppose that if the
thing is a human being e.g. &quot;man&quot; or &quot;woman&quot;, you need to use a completely
different verb. Then you just add a special case first:</p>
<pre class="code haskell literal-block">
<span class="nf">isAPerson</span> <span class="s">&quot;man&quot;</span>   <span class="ow">=</span> <span class="kt">True</span>
<span class="nf">isAPerson</span> <span class="s">&quot;woman&quot;</span> <span class="ow">=</span> <span class="kt">True</span>
<span class="nf">isAPerson</span> <span class="n">n</span>       <span class="ow">=</span> <span class="kt">False</span>

<span class="nf">youLikeTheThing</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isAPerson</span> <span class="n">thing</span> <span class="ow">=</span> <span class="o">...</span>
<span class="c1">-- fall through to the normal case here</span>
</pre>
<p>In the other direction, if you just don't have the time to care about any of
this, you can just use a really simple (and often wrong) formula:</p>
<pre class="code haskell literal-block">
<span class="nf">youLikeTheThing</span> <span class="n">thing</span> <span class="ow">=</span>  <span class="s">&quot;φιλεις τον &quot;</span> <span class="o">++</span> <span class="n">greek</span> <span class="n">thing</span>

<span class="nf">greek</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
</pre>
<p>Notice that the programmer of the main project does not know anything about
plural forms, gender, case etc., or put any of that into the source code. The
only thing he/she would do is call a function with all the things to be
substituted. We could have some mapping from English strings to function names,
or we could just use the function name as a string, e.g. from a Python project
we might call the function like so:</p>
<pre class="code python literal-block">
<span class="n">prompt</span> <span class="o">=</span> <span class="n">translate</span><span class="p">(</span><span class="s">&quot;doYouWantToDelete&quot;</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">)</span>
</pre>
<p>This would call the translation function doYouWantToDelete with the parameters
<tt class="docutils literal">n</tt> and <tt class="docutils literal">object_name</tt>.</p>
<p>As a refinement, we can provide a version which will work when the whole
localisation machinery is turned off i.e. we allow the programmer to provide
their own version of the translation function which returns the default language:</p>
<pre class="code python literal-block">
<span class="n">prompt</span> <span class="o">=</span> <span class="n">translate</span><span class="p">(</span><span class="s">&quot;doYouWantToDelete&quot;</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">,</span>
                   <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">:</span> <span class="s">&quot;Do you want to delete these </span><span class="si">%s</span><span class="s"> </span><span class="si">%s</span><span class="s">(s)&quot;</span> <span class="o">%</span>
                                          <span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">))</span>
</pre>
<p>As before, the provided function can be correct or simplistic as desired for
English.</p>
</div>
<div class="section" id="feedback">
<h1>Feedback</h1>
<p>There are a few questions in my mind:</p>
<ol class="arabic">
<li><p class="first">Would a solution like this work for the languages you know? What additional
features would be needed to cope with other human languages?</p>
</li>
<li><p class="first">Is this vaguely practical? Could you get translators to be able to edit code
like this? If not, and only programmers would be able to do this, are there
enough programmer-translators to make it a viable solution, at least for some
big projects?</p>
<p>I'm aware that the string concatenation gets ugly fairly quicky, and some
kind of interpolation might be needed (including the ability to call
functions within that interpolation). With that in place, I think you could
achieve a reasonable level of readability.</p>
<p>A translation tool could also have language-specific templates to quickly
insert the code for common forms.</p>
</li>
<li><p class="first">Is it possible to have a simpler language that would still be able to cope
with the examples here?</p>
<p>The examples I've come up with suggest to me that you need a full programming
language, and that attempting to start from the other direction (e.g. build
up from the current gettext approach) will produce a monstrosity.</p>
<p>gettext already does a 95% job, and we are at the point of diminishing
returns. So if we are going to try to tackle the final bit, we need to err on
the side of enough power to get <strong>all</strong> the of that 5%, rather than put a lot
of effort in and discover we've only arrived at 96%.</p>
<p>You also cover the case of having a client who insists that the program
should output &quot;cet homme&quot; and not &quot;ce homme&quot; - while it might make your
translation file ugly, you've got the power to be able to do it if you want.</p>
</li>
</ol>
<p>Comments?</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Dynamic typing in a statically typed language]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/dynamic-typing-in-a-statically-typed-language" />
    <id>http://lukeplant.me.uk/blog/posts/dynamic-typing-in-a-statically-typed-language</id>
    <updated>2012-11-14T00:23:22Z</updated>
    <published>2012-11-14T00:23:22Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <summary type="html"><![CDATA[Dynamic typing in a statically typed language]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/dynamic-typing-in-a-statically-typed-language"><![CDATA[<div class="document">
<p>A recent question on programmers.stackexchange.com asked <a class="reference external" href="http://programmers.stackexchange.com/questions/167305/what-functionality-does-dynamic-typing-allow">What functionality
does dynamic typing allow?</a></p>
<p>I thought one of the best short answers to this was from <a class="reference external" href="http://programmers.stackexchange.com/users/1997/mark-ransom">Mark Ransom</a>:</p>
<blockquote>
Theoretically there's nothing you can't do in either, as long as the
languages are Turing Complete. The more interesting question to me is what's
easy or natural in one vs. the other.</blockquote>
<p>This post is about providing an example to back that up, and to respond to
people who claim that, since you can implement dynamic types in a statically
typed language, statically typed languages give you all the benefits of
dynamically typed languages.</p>
<p><strong>[Edit: to those who think I'm being a language or dynamic typing advocate or engaging in any kind of bashing, please read that last paragraph again, and note especially the use of word 'all'.]</strong></p>
<p>Let's set up a problem. It's made up, but it illustrates the point I want to make:</p>
<blockquote>
Given a file, 'invoices.yaml', take the first document in it, extract the
'bill-to' field, and save the data in it as JSON in an output file
'address.json'. You can take it for granted that the contents of that field
can be serialised as JSON (e.g. doesn't contain dates), although that might
not be true for the rest of the document. To keep the example focussed and
simple, everything will be ASCII.</blockquote>
<p>The particular YAML file I used was taken from an example YAML document I found
on the web, and then expanded for the sake of illustration:</p>
<pre class="literal-block">
---
invoice: 34843
date   : 2001-01-23
bill-to:
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
---
invoice: 34844
date   : 2001-01-24
bill-to:
    given  : Pete
    family : Smith
    address:
        lines: |
            3 Amian Rd
        city    : Royal Oak
        state   : MI
        postal  : 48047
</pre>
<p>I'll use Python and Haskell as representatives of dynamic typing and static
typing, because I know them and many would consider them to be very good
representatives of their camps, and I'm a big fan of both languages.</p>
<p>I also think that examining any programming problem in the abstract, or with
respect to ideas like ‘dynamic typing’ or ‘static typing’, is not very relevant,
because in the real world we have to use real, concrete languages, and they come
with a whole set of properties (in terms of the language definition, tool sets,
communities and libraries) that make a massive impact on how you actually use
them.</p>
<p>So I'm going to try to use real libraries that actually exist, ignore solutions
that could theoretically exist but don't, and ignore problems that could
theoretically exist but don't.</p>
<div class="section" id="python">
<h1>Python</h1>
<p>Here is my Python solution:</p>
<pre class="code python literal-block">
<span class="kn">import</span> <span class="nn">yaml</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">yaml</span><span class="o">.</span><span class="n">load_all</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">'invoices.yaml'</span><span class="p">)))[</span><span class="mi">0</span><span class="p">][</span><span class="s">'bill-to'</span><span class="p">],</span>
          <span class="nb">open</span><span class="p">(</span><span class="s">'address.json'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">))</span>
</pre>
<p>Notes: I didn't have to consult docs once. This isn't just due to my
familiarity with Python — it's also the fact that I can fire up IPython and
go:</p>
<pre class="literal-block">
In [1]: import yaml
In [2]: yaml.&lt;TAB&gt;
</pre>
<p>and get a list of likely functions. I can then go:</p>
<pre class="literal-block">
In [3]: yaml.load_all?
</pre>
<p>and get help, or go:</p>
<pre class="literal-block">
In [4]: yaml.load_all??
</pre>
<p>and get the complete source code of the function/method/class/module, in case I
need it.</p>
</div>
<div class="section" id="haskell">
<h1>Haskell</h1>
<p>Now for the Haskell version. First, a disclaimer: I'm <strong>much</strong> less experienced
in Haskell than in Python. I did manage to <a class="reference external" href="http://lukeplant.me.uk/blog/posts/haskell-blog-software/">write my blog software in Haskell</a> at one point, but I
don't use Haskell on anything like a daily basis, and I do use Python that much.</p>
<p>I first need to parse YAML. I've got a choice of packages. Unlike in Python, for
a library like this, the choice you make is likely to have a big impact on the
code you write — switching to a different (perhaps faster) package won't be just
a case of changing an import, as we will see. The choice of packages represents
the fact that even designing how this thing should work in terms of API and data
structures is not straightforward in Haskell, and represents a much bigger
commitment, and therefore problem, for the library user. In Python, while there
are a few API choices (like supporting streaming or not, potentially), mostly
it's pretty obvious how the library should work.</p>
<p>Looking on Hackage, I first find the 'yaml' package. The first line of the
<a class="reference external" href="http://hackage.haskell.org/packages/archive/yaml/0.8.1/doc/html/Data-Yaml.html">Data.Yaml API docs</a>
reads:</p>
<blockquote>
A JSON value represented as a Haskell value.</blockquote>
<p>(Yes, you read that right). This doesn't look good. The whole file has stuff
about JSON, not YAML, with no indication why I want to be using JSON values, not
YAML. But I have a go anyway, perhaps it was deliberate.</p>
<p>When trying to use the decodeFile function, I get an error about needing a type
signature, due to the way <tt class="docutils literal">decodeFile</tt> is defined:</p>
<pre class="code haskell literal-block">
<span class="nf">decodeFile</span> <span class="ow">::</span> <span class="kt">FromJSON</span> <span class="n">a</span> <span class="ow">=&gt;</span> <span class="kt">FilePath</span> <span class="ow">-&gt;</span> <span class="kt">IO</span> <span class="p">(</span><span class="kt">Maybe</span> <span class="n">a</span><span class="p">)</span>
</pre>
<p>There are lots of instances of FromJSON to choose from, but I have to know in
advance the type of data. And it looks like I've got data that isn't going to
fit into any of those types, because it involves heterogenous collections.
<em>[Correction in comments, see below]</em>.</p>
<p>I gave up and tried another package - Data.Yaml.Syck.</p>
<p>First try:</p>
<pre class="code haskell literal-block">
<span class="kr">import</span> <span class="nn">Data.Yaml.Syck</span>

<span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span>
  <span class="n">d</span> <span class="ow">&lt;-</span> <span class="n">parseYamlFile</span> <span class="s">&quot;invoices.yaml&quot;</span>
  <span class="n">print</span> <span class="n">d</span>
</pre>
<p>This works - well, I've got some kind of parsing going on, at least. It looks like
I've got some <tt class="docutils literal">YamlNode</tt> datastructure, and the top thing is an <tt class="docutils literal">EMap</tt> (it
looks like it has only parsed the first document, which is worrying, but doesn't
matter given my requirements, so I'll ignore that). But how do I get data out?</p>
<p>OK, let's try yaml-light - it wraps <tt class="docutils literal">HsSyck</tt> and has some easier utility
functions, like lookupYL.:</p>
<pre class="code haskell literal-block">
<span class="nf">lookupYL</span> <span class="ow">::</span> <span class="kt">YamlLight</span> <span class="ow">-&gt;</span> <span class="kt">YamlLight</span> <span class="ow">-&gt;</span> <span class="kt">Maybe</span> <span class="kt">YamlLight</span>
</pre>
<p>That expects the lookup key to be a <tt class="docutils literal">YamlLight</tt>, so I need to create one from
a string, somehow. The docs show how to turn a <tt class="docutils literal">ByteString</tt> into a
<tt class="docutils literal">YamlLight</tt> node, and I need to pass in a <tt class="docutils literal">String</tt>, which from previous
experience requires doing something like <tt class="docutils literal">pack</tt> from <tt class="docutils literal">Data.ByteString</tt>.</p>
<p>My program so far:</p>
<pre class="code haskell literal-block">
<span class="kr">import</span> <span class="nn">Data.Yaml.YamlLight</span>
<span class="kr">import</span> <span class="nn">Data.ByteString.Char8</span> <span class="p">(</span><span class="nf">pack</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.Maybe</span>

<span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span>
  <span class="n">d</span> <span class="ow">&lt;-</span> <span class="n">parseYamlFile</span> <span class="s">&quot;invoices.yaml&quot;</span>
  <span class="n">print</span> <span class="o">$</span> <span class="n">fromJust</span> <span class="o">$</span> <span class="n">lookupYL</span> <span class="p">(</span><span class="kt">YStr</span> <span class="o">$</span> <span class="n">pack</span> <span class="s">&quot;bill-to&quot;</span><span class="p">)</span> <span class="n">d</span>
</pre>
<p>Which gives this output:</p>
<pre class="literal-block">
YMap (fromList [(YStr &quot;bill-to&quot;,YMap (fromList [(YStr &quot;address&quot;,YMap (fromList [(YStr &quot;city&quot;,YStr &quot;Royal Oak&quot;),(YStr &quot;lines&quot;,YStr &quot;458 Walkman Dr.\nSuite #292\n&quot;),(YStr &quot;postal&quot;,YStr &quot;48046&quot;),(YStr &quot;state&quot;,YStr &quot;MI&quot;)])),(YStr &quot;family&quot;,YStr &quot;Dumars&quot;),(YStr &quot;given&quot;,YStr &quot;Chris&quot;)])),(YStr &quot;date&quot;,YStr &quot;2001-01-23&quot;),(YStr &quot;invoice&quot;,YStr &quot;34843&quot;)])
</pre>
<p>Now I have to dump to JSON. From a Python perspective, all I want is a function
that can take some ‘native values’ and dump them to JSON, like the Python
<tt class="docutils literal">json.dump</tt> function. But every piece of data in my data structure is wrapped
in things like <tt class="docutils literal">YStr</tt> and <tt class="docutils literal">YMap</tt>.</p>
<p>In addition, though I can see the structure of my data in front of me, the
requirements I've been given don't make guarantees that it will stay the same,
just that it can be converted to JSON. I need a routine that will convert
anything YAML to the equivalent in JSON, where that is possible.</p>
<p>It looks like I could create a <tt class="docutils literal">JSON</tt> instance for <tt class="docutils literal">YamlLight</tt>, so that the
<tt class="docutils literal">encode</tt> function I want to use (which dumps JSON to a string) could take
<tt class="docutils literal">YamlLight</tt> as an input directly.  I end up with this:</p>
<pre class="code haskell literal-block">
<span class="kr">import</span> <span class="nn">Data.Yaml.YamlLight</span> <span class="p">(</span><span class="nf">parseYamlFile</span><span class="p">,</span> <span class="nf">lookupYL</span><span class="p">,</span> <span class="kt">YamlLight</span><span class="p">(</span><span class="o">..</span><span class="p">),</span> <span class="nf">unStr</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.ByteString.Char8</span> <span class="p">(</span><span class="nf">pack</span><span class="p">,</span> <span class="nf">unpack</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Text.JSON</span> <span class="p">(</span><span class="kt">JSON</span><span class="p">(</span><span class="o">..</span><span class="p">),</span> <span class="nf">encode</span><span class="p">,</span> <span class="kt">JSValue</span><span class="p">(</span><span class="o">..</span><span class="p">),</span> <span class="nf">toJSString</span><span class="p">,</span> <span class="nf">toJSObject</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.Maybe</span> <span class="p">(</span><span class="nf">fromJust</span><span class="p">)</span>
<span class="kr">import</span> <span class="nn">Data.Map</span> <span class="p">(</span><span class="nf">toList</span><span class="p">)</span>

<span class="kr">instance</span> <span class="kt">JSON</span> <span class="kt">YamlLight</span> <span class="kr">where</span>
  <span class="n">showJSON</span> <span class="n">yml</span> <span class="ow">=</span>
    <span class="kr">case</span> <span class="n">yml</span> <span class="kr">of</span>
      <span class="kt">YStr</span> <span class="n">bs</span> <span class="ow">-&gt;</span> <span class="kt">JSString</span> <span class="o">$</span> <span class="n">toJSString</span> <span class="o">$</span> <span class="n">unpack</span> <span class="n">bs</span>
      <span class="kt">YMap</span> <span class="n">m</span> <span class="ow">-&gt;</span> <span class="kt">JSObject</span> <span class="o">$</span> <span class="n">toJSObject</span> <span class="o">$</span>
                <span class="n">map</span> <span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">)</span> <span class="ow">-&gt;</span> <span class="p">(</span><span class="n">unpack</span> <span class="o">$</span> <span class="n">fromJust</span> <span class="o">$</span> <span class="n">unStr</span> <span class="n">y1</span><span class="p">,</span> <span class="n">showJSON</span> <span class="n">y2</span><span class="p">))</span> <span class="o">$</span>
                <span class="n">toList</span> <span class="n">m</span>
      <span class="kt">YSeq</span> <span class="n">ymls</span> <span class="ow">-&gt;</span> <span class="kt">JSArray</span> <span class="o">$</span> <span class="n">map</span> <span class="n">showJSON</span> <span class="n">ymls</span>
      <span class="kt">YNil</span> <span class="ow">-&gt;</span> <span class="kt">JSNull</span>

<span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span>
  <span class="n">d</span> <span class="ow">&lt;-</span> <span class="n">parseYamlFile</span> <span class="s">&quot;invoices.yaml&quot;</span>
  <span class="n">writeFile</span> <span class="s">&quot;address.json&quot;</span> <span class="o">$</span> <span class="n">encode</span> <span class="o">$</span> <span class="n">fromJust</span> <span class="o">$</span> <span class="n">lookupYL</span> <span class="p">(</span><span class="kt">YStr</span> <span class="o">$</span> <span class="n">pack</span> <span class="s">&quot;bill-to&quot;</span><span class="p">)</span> <span class="n">d</span>
</pre>
<p>This works, and I'm sure there are other solutions. If I were cleverer, and knew
Haskell better, I could perhaps write a cleverer, shorter solution, which would
also be proportionately more difficult for someone else to understand, so I'm
not particularly interested in making this code shorter, as it does the job.</p>
<p>But this illustrates why some people like dynamically typed languages. The fact
that you can implement a variant data type in Haskell (such as <tt class="docutils literal">YamlLight</tt> or
<tt class="docutils literal">JSValue</tt>) doesn't mean much, because these data types are not used
everywhere, and therefore you have multiple competing ones that you've got to
convert between. If you <em>did</em> have a single variant datatype that was used
<em>everywhere</em>... you'd have a dynamically typed language, in effect.</p>
<p>The strictness of the type system gave rise to a choice of libraries and APIs
that made my life harder, not easier. I then had to write glue code to marshall
between the dynamic types used by the two libraries I needed.
<em>[Edit: or, as it turned out, I need to know where to find it, possibly in the form of already written type class instances, or how to get the compiler to write it for me]</em></p>
<p>Some people might still prefer the Haskell version. It has some nice properties,
like the fact that compiler has checked that it can indeed convert any YAML
object into JSON — you'd get a warning if you missed a case. One response to
that might be that if the two types didn't happen to match so well — for
instance if the YAML library started supporting date/time objects — this benefit
would disappear. If you need to avoid all possible problems up front, Haskell
will help you out more. Python, on the other hand, will allow you to avoid
spending time thinking about theoretical problems which may never happen in
reality.</p>
<p>But there are always runtime errors that you could come across, even in Haskell
— for example, if you want to convert this to cope with non-ASCII documents, the
compiler can't point out all the places you need to fix, and if you forget one
you could still get a runtime exception, or worse, silent data corruption.</p>
<p>So, in my opinion, this is a case where dynamic typing shines, and the ability
to implement dynamic typing on top of static typing simply doesn't give you the
benefits you get in a language that embraces dynamic typing to its core.</p>
<p>There are, incidentally, some interesting developments in Haskell that might
allow the possibility of running programs that aren't quite typed correctly, as
long as you don't encounter the type errors in practice. This could counter some
of the points I've raised — see <a class="reference external" href="http://channel9.msdn.com/Blogs/Charles/YOW-2011-Simon-Peyton-Jones-Closer-to-Nirvana">this interview with Simon Peyton-Jones</a>
, from 27:45 onwards.</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Is static type checking a redundant testing mechanism?]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/is-static-type-checking-a-redundant-testing-mechanism/" />
    <id>http://lukeplant.me.uk/blog/posts/is-static-type-checking-a-redundant-testing-mechanism/</id>
    <updated>2009-11-09T15:45:16Z</updated>
    <published>2009-11-09T15:45:16Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Is static type checking a redundant testing mechanism?]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/is-static-type-checking-a-redundant-testing-mechanism/"><![CDATA[<div class="document">
<p>As there has been discussion about <a class="reference external" href="http://blogs.msdn.com/cashto/archive/2009/03/31/it-s-ok-not-to-write-unit-tests.aspx">not writing unit tests</a> recently, I thought I'd use my recent experience in <a class="reference external" href="http://lukeplant.me.uk/blog/posts/haskell-blog-software/">finishing a non-trivial Haskell program</a> to comment on the issue of writing tests (unit tests and other automated tests) in the context of real code.</p>
<p>I'm especially prompted by this comment by Ned Batcheldor that I came across a few weeks ago:</p>
<blockquote>
Since static type checking can't cover all possibilities, you will need
automated testing. Once you have automated testing, static type checking is
redundant.</blockquote>
<p>(that's in a comment on his own <a class="reference external" href="http://nedbatchelder.com/blog/200910/the_scalability_of_programming_languages.html">blog post</a>)</p>
<p>To some extent I agree with this, but I want to give some reasons why a strong
and powerful static type checker really does eliminate the need for
automated tests in some cases—that is to say, there are instances when the static type checking makes the automated tests redundant and not the other way around, and does a better job.</p>
<p>I have very few tests in my Haskell blog software.  There are significantly more in the <a class="reference external" href="http://lukeplant.me.uk/resources/ella/">Ella</a> library which I wrote alongside it, but still far from complete coverage.  While I like test driven development, and did it for some parts of this project, many times it felt like a waste of time.  In some cases it was perhaps misdirected laziness, but I'm not convinced it always was. So what are the characteristics of code that doesn't benefit from automated/unit tests?</p>
<div class="section" id="trivial-code">
<h1>Trivial code</h1>
<p>If code is extremely simple, it can actually be worse to have tests than to not
have them.</p>
<p>In defending that statement, the first thing to remember is that tests can have
bugs in them too.  Now, many bugs in the tests will be caught, as long as you
follow the rule of making sure the test fails, then writing the code, then
making sure it passes.  However, many bugs of <em>omission</em>, which are
also very common, will not be caught i.e. when the test fails to test something it ought to.</p>
<p>Second, there is always a cost to writing tests.  So, as the probability of
making a mistake in your code tends to zero, the usefulness of tests against
that code also tends to zero—and not just to zero, it can go negative.  You
spent x minutes writing a test for something that didn't need testing, which is
lost time and money already, and you also have extra (test) code to maintain in
the future, and a longer test suite to run.</p>
<p>Third, you can write an infinite number of tests, and still have bugs.  You can
have 100% code coverage, and still have bugs. (I'll leave you to do the research
on code coverage if you don't believe me).  So, <strong>you have to stop somewhere, and therefore you need to know *when* to stop</strong>.</p>
<p>So suppose you write a utility function that is used to sanitise phone numbers
that people might enter.  It removes '-' and ' ' characters. (The result will of
course be validated separately, but we want to allow people to enter phone
numbers in a convenient way).  In Python:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">sanitise_phone_number</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">s</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&quot;-&quot;</span><span class="p">,</span><span class="s">&quot;&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="s">&quot;&quot;</span><span class="p">)</span>
</pre>
<p>The testing fanatics might stop to write a unit test, but not the rest of us,
because:</p>
<ol class="arabic simple">
<li>You would mainly be testing that the built-in string library works.</li>
<li>If you think of the ways that the function is <em>likely</em> to be wrong, the test
is just as likely to fail to catch it.  For example, the function above
might really need to strip newline chars as well, but that's not going to be
tested unless I think to write a test for that.</li>
<li>If there actually is a bug here, or the implementation gets more complex so
that it merits a test, I can cross that bridge when I come to it, and it
won't cost me extra.</li>
<li>It's more likely that I'll forget to use this function than that I get it
wrong. Therefore, an integration test would be far more useful.  But in some
cases, integration tests can be extremely expensive, both to write and to
run, especially when testing javascript based web frontends, or GUIs that
are not very testable.  I'm almost certainly going to test this code by at
least one manual integration test, and after that, do I really need to write
an automatic one?</li>
</ol>
<p>However, if I was writing the function in a language that was less capable than
Python, I might well write a test for the above.</p>
</div>
<div class="section" id="declarative-code">
<h1>Declarative code</h1>
<p>(You could argue that this is an extension of trivial code, but it feels slightly different, and the case is even stronger).</p>
<p>Imagine your spec says that you should have 5 news items on the front page of
your web site.  You are using a library that has utility code for getting the
first n items, or page x of n items each.  And of course you are going to use a
constant for that 5, rather than code it right in.  So somewhere you are going
to write (assuming Python):</p>
<pre class="code python literal-block">
<span class="n">NEWS_ITEMS_ON_HOME_PAGE</span> <span class="o">=</span> <span class="mi">5</span>
</pre>
<p>Are you going to write a test that ensures that this value stays at 5, and
doesn't accidentally get changed?  Then your code base violates DRY—you now have
<strong>two</strong> places where you are specifying the number of news items on the home
page.  That is, to some extent, the nature of all tests, but it's worse in this
case.  With non-declarative code and tests, one instance specifies behaviour,
the other implementation, and it's usually obvious which is correct.  But with
declarative code, if one instance is different, how do you know which is
correct?</p>
<p>Or are you going to write a test for the actual home page having 5 items?  That
would be pointless, because it's just testing that you are capable of calling a
trivial API, which itself belongs to thoroughly tested code.  You might want a
sanity check that you have made a typo, but checking that the page returns
anything with a 200 code will often be enough.</p>
<p>What about something like a Django model?  Your spec says that a 'restaurant'
needs to have a 'name' which is a maximum of 100 chars.  You write the following
code:</p>
<pre class="code python literal-block">
<span class="k">class</span> <span class="nc">Restaurant</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
    <span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="s">&quot;Name&quot;</span><span class="p">,</span> <span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
    <span class="c"># ...</span>
</pre>
<p>Are you going to write code to test that you've typed this in correctly?  It
would again be violating DRY.  Are you going to check that this interfaces with
the database correctly?  There are already hundreds of tests in Django which
cover this. Are you going to write tests that are effectively checking for
typos?  Well, if you use this model at all, it's going to be very obvious if
you've made a mistake, and some other simple integration test is going to catch
it.</p>
</div>
<div class="section" id="haskell">
<h1>Haskell</h1>
<p>Now, coming to Haskell. You can guess the point I'm going to make.</p>
<p>In Haskell, a lot of code is either trivial or declarative.</p>
<p>Further, many of the types of errors you could make are caught by the compiler.
Typos and missing imports etc. are always caught, and many other errors beside.</p>
<p>Functional programming languages, especially pure ones, eliminate a lot of the
kind of mistakes that are easy in imperative languages.  Everything being an
expression helps a lot—it forces you to think about every branch and return a
value. In monadic code it becomes possible to avoid this, but a lot of your code
is pure functional.</p>
<div class="section" id="example-1">
<h2>Example 1</h2>
<p>Imagine a more complex function than our <tt class="docutils literal">sanitise_phone_number</tt> above.  It's
going to take a list of 'transformation' functions and an input value and apply
each function to the value in turn, returning the final value.  In some
languages, that would be just about worth writing a test for.  You might have to
worry about iterating over the list, boundary conditions, etc.  But in Haskell
it looks like this:</p>
<pre class="code haskell literal-block">
<span class="nf">apply</span> <span class="ow">=</span> <span class="n">foldl'</span> <span class="p">(</span><span class="n">flip</span> <span class="p">(</span><span class="o">$</span><span class="p">))</span>
</pre>
<p>In the above definition, there is basically nothing that can go wrong.  We
already know that <tt class="docutils literal">foldl'</tt> works, and isn't going to miss anything, or fail
with an empty list.  You can't forget to return the return value, like you can
in Python. The compiler will catch any type errors.  If the function doesn't do
anything approaching what it's supposed to then you'll know as soon as you try
to use it.  I've used point-free style, so there isn't any chance of doing
something silly with the input variables, because they don't even appear in the
function definition!</p>
<p>For something like the above, you would often write your type signature first:</p>
<pre class="code haskell literal-block">
<span class="nf">apply</span> <span class="ow">::</span> <span class="n">a</span> <span class="ow">-&gt;</span> <span class="p">[</span><span class="n">a</span> <span class="ow">-&gt;</span> <span class="n">a</span><span class="p">]</span> <span class="ow">-&gt;</span> <span class="n">a</span>
</pre>
<p>Once you've done that, it's even harder to make a mistake.  It's almost possible
to try vaguely relevant code at random and see if it compiles.  For something
like this, if it compiles, and it looks very simple, it's probably
correct. (There are obviously times when that will fail you, but it's amazing
how often it doesn't.  You often feel like you just have to keep doing what the
compiler tells you and you'll get working code.)</p>
<p>Is the above code 'trivial' or 'declarative'? Well, that's a tough call. A lot of code in Haskell quickly becomes very declarative in style, especially when written point free.</p>
</div>
<div class="section" id="example-2">
<h2>Example 2</h2>
<p>But what about something much bigger—say the generation of an Atom feed?  With a
library that makes use of a strong static type system, this can be actually quite hard to get wrong.</p>
<p>In my blog software, I use the <a class="reference external" href="http://hackage.haskell.org/package/feed">feed</a> library for Atom feeds.  The code I've
had to write is extremely simple—a matter of creating some data structures
corresponding to Atom feeds.  The data structures are defined to force you to
supply all required elements.  Where there is a choice of data type, it forces
you to choose — for example the 'content' field has to be set with either
<tt class="docutils literal">HTMLContent &quot;&lt;h1&gt;your <span class="pre">content&lt;/h1&gt;&quot;</span></tt> or <tt class="docutils literal">TextContent &quot;Your content&quot;</tt>. (For those who don't know Haskell, it should also be pointed out that there is no equivalent to 'null'.  Optional values are made explicit using the <tt class="docutils literal">Maybe</tt> type).</p>
<p>After filling in all the values for these feeds, I wrote some very simple 'glue'
functions that fed in the data and returned the result as an HTTP response.  I
created 4 different feeds, all of which worked perfectly first time, as soon as
I got them to compile.  I cannot see any value, and only cost, in adding tests
for this.  A check for a 200 response code and non empty content might be worth
it, but would be much easier to write as a bash script that uses 'curl' on a few
known URLs.</p>
<p>Had I written this in Python, I might have wanted tests to ensure that the HTML in the Atom feed content was escaped properly and various other things, in addition to a simple check for status 200.  But the API of the feed library, combined with the type checking that the compiler has done, has made that redundant, and has tested it far more easily and thoroughly than I could have done with tests.</p>
<p>And it's not in general true that the simple functional test will catch any type errors, because often it will only exercise one route through the code, ignoring the fact that in many places dynamically typed code can return values of different types, which can cause type failures etc.</p>
</div>
<div class="section" id="example-3">
<h2>Example 3</h2>
<p>One final example of reducing the need for automated tests is the routing system
I've used in <a class="reference external" href="http://lukeplant.me.uk/resources/ella/">Ella</a>.  OK, it's really a chance to show off the only slightly
clever bit of code that I wrote, but hopefully it will explain something of the
power of a strong type system :-)</p>
<p>Consider the following bits of code/configuration in a Django project, which are responsible for matching a URL, pulling out some bits from it and dispatching it to a view function.</p>
<pre class="code python literal-block">
<span class="c">### myproject/urls.py</span>

<span class="n">patterns</span> <span class="o">=</span> <span class="p">(</span><span class="s">''</span><span class="p">,</span>
   <span class="p">(</span><span class="s">r'^members/(\d+)/$'</span><span class="p">,</span> <span class="s">'myproject.views.member_detail'</span><span class="p">),</span>
   <span class="c"># etc...</span>
<span class="p">)</span>

<span class="c">### myproject/views.py</span>

<span class="k">def</span> <span class="nf">member_detail</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="n">memberid</span><span class="p">):</span>
    <span class="n">memberid</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">memberid</span><span class="p">)</span>
    <span class="n">member</span> <span class="o">=</span> <span class="n">get_member</span><span class="p">(</span><span class="n">memberid</span><span class="p">)</span>
    <span class="c"># etc...</span>
</pre>
<p>Now, there are a number of possible failure points in this code that you might
want some regression tests for.  For example, if in the future we change it so
that the URL uses a string such as a user name, rather an integer, we will need
to change the URLconf, the line in <tt class="docutils literal">member_detail</tt> that calls <tt class="docutils literal">int</tt>, and the
definition of <tt class="docutils literal">get_member</tt> (or use a different function).</p>
<p>There is a <a class="reference external" href="http://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY</a>  or OAOO failure here—the fact that we are expecting an integer is specified multiple times, either implicitly or explicitly.  This is one of the causes of fragility in this chunk of code — if one is changed, the others might not be updated, introducing bugs of different kinds.  Now, there are things you can do about this, with some small or large changes to how URLconfs work.  But they are not complete solutions, and one solution not open to Python developers is the one I coded in Ella.</p>
<p>The equivalent bits of code, with type signatures and explanations of them for
those who don't know any Haskell, would look like this in my system.</p>
<pre class="code haskell literal-block">
<span class="c1">----- MyProject/Routes.hs</span>

<span class="kr">import</span> <span class="nn">MyProject.Views</span>

<span class="nf">routes</span> <span class="ow">=</span> <span class="p">[</span>
   <span class="s">&quot;members/&quot;</span> <span class="o">&lt;+/&gt;</span> <span class="n">intParam</span>    <span class="o">//-&gt;</span>   <span class="n">memberDetail</span>  <span class="o">$</span> <span class="kt">[]</span>
   <span class="c1">-- etc...</span>
<span class="p">]</span>

<span class="c1">----- MyProject/Views.hs</span>

<span class="c1">-- memberDetail takes an 'Int' and an HTTP 'Request' object, and returns an</span>
<span class="c1">--  HTTP 'Response' (or 'Nothing' to indicate a 404), doing some IO on the</span>
<span class="c1">--  way.</span>
<span class="nf">memberDetail</span> <span class="ow">::</span> <span class="kt">Int</span> <span class="ow">-&gt;</span> <span class="kt">Request</span> <span class="ow">-&gt;</span> <span class="kt">IO</span> <span class="p">(</span><span class="kt">Maybe</span> <span class="kt">Response</span><span class="p">)</span>
<span class="nf">memberDetail</span> <span class="n">memberId</span> <span class="n">request</span> <span class="ow">=</span> <span class="kr">do</span>
   <span class="n">member</span> <span class="ow">&lt;-</span> <span class="n">getMember</span> <span class="n">memberId</span>
   <span class="c1">-- etc...</span>
</pre>
<p>You should read <tt class="docutils literal"><span class="pre">&lt;+/&gt;</span></tt> as ‘followed by’ and <tt class="docutils literal"><span class="pre">//-&gt;</span></tt> as ‘routes to’.  Just
ignore the <tt class="docutils literal">$ []</tt> bit for now (it exists to allow decorators to be applied
easily in the routing configuration, but we are applying no decorators, hence the empty list).</p>
<p><tt class="docutils literal">intParam</tt> is a ‘matcher’: it attempts to pull off the next chunk of the URL
(ending in a '/'), match it and parse it as an integer.  If it can do so, it
passes the parsed value on to <tt class="docutils literal">memberDetail</tt> as a parameter i.e. it partially
applies <tt class="docutils literal">memberDetail</tt> with an integer.</p>
<p>The beauty of this system is that nothing can go wrong any more.  We still have DRY violations at the moment, but it doesn't cause a problem, because the
compiler checks for consistency.</p>
<p>In fact, we can even remove the DRY violation.  We could change the code like
this:</p>
<pre class="code haskell literal-block">
<span class="c1">----- MyProject/Routes.hs</span>

<span class="kr">import</span> <span class="nn">MyProject.Views</span>

<span class="nf">routes</span> <span class="ow">=</span> <span class="p">[</span>
   <span class="s">&quot;members/&quot;</span> <span class="o">&lt;+/&gt;</span> <span class="n">anyParam</span>    <span class="o">//-&gt;</span>   <span class="n">memberDetail</span>  <span class="o">$</span> <span class="kt">[]</span>
   <span class="c1">-- etc...</span>
<span class="p">]</span>

<span class="c1">----- MyProject/Views.hs</span>

<span class="nf">memberDetail</span> <span class="n">memberId</span> <span class="n">request</span> <span class="ow">=</span> <span class="kr">do</span>
   <span class="n">member</span> <span class="ow">&lt;-</span> <span class="n">getMember</span> <span class="n">memberId</span>
   <span class="c1">-- etc...</span>
</pre>
<p>We've replaced <tt class="docutils literal">intParam</tt> with <tt class="docutils literal">anyParam</tt>, which is a polymorphic version
that can match any parameter of type class <tt class="docutils literal">Param</tt>.  You can define your own
<tt class="docutils literal">Param</tt> instances, so this is completely extensible (and you can also define
your own matchers, for complete power).  We've also removed the type signature
from <tt class="docutils literal">memberDetail</tt>.  So how can <tt class="docutils literal">anyParam</tt> know what type of thing to
match?</p>
<p>This is where type inference comes in.  The function <tt class="docutils literal">getMember</tt> will probably
have a type signature, or it will use its parameter in such a way that its type
signature can be inferred.  From that, the type of <tt class="docutils literal">memberId</tt> can be inferred.
From that, the type of value that <tt class="docutils literal">anyParam</tt> must return can be inferred.  And
from that, finally, the instance of <tt class="docutils literal">Param</tt> can be chosen.  <strong>The compiler is
using the type system to pick which method should be used to match and parse the
URL parameters based on how those parameters are eventually used.</strong></p>
<p>This is very nice. (At least I think so :-).  We've removed the DRY violation,
or, if we choose to use type signatures or explicitly specify types in
<tt class="docutils literal">routes</tt>, DRY violations don't matter because the compiler will catch them for
us.</p>
<p>Would unit or functional tests have caught any problems?  Well, they might. If
they checked the happy case, they will prove whether that still works.  But
they're unlikely to check whether the URLconf is too permissive or not.  But the
compiler <em>can</em> do that kind of consistency check.</p>
<p>The end result is that there are just fewer things that can possibly go wrong.
I'm not saying that you wouldn't bother to write any tests.  But in this case,
if <tt class="docutils literal">memberDetail</tt> was really just glue, you might decide to only test its
component parts (for example, by testing the template that it relies on).  Since
most of the glue has been constructed so that it can't go wrong, you can focus
tests on what <em>can</em> go wrong.  And some sections of the code sink below the threshold at which tests provide positive value.</p>
<p>There are many other ways in which static type checking can make automated tests
redundant.  Parsers are a great example — a spec might define a syntax in BNF
notation.  In Haskell, you might well implement that using parsec.  But if you
look at the code, it will have pretty much a one-to-one correspondence with the
BNF definitions.  Any tests you write will simply check that a few examples
happen to be parsed correctly, as you cannot begin to cover the input space.
It's therefore far better to spend your time manually checking that the code
matches the BNF spec than writing lots of tests.  Unit tests often will not catch the type of errors that a compiler can if there is any polymorphism in the code paths.</p>
</div>
</div>
<div class="section" id="conclusion">
<h1>Conclusion</h1>
<p>Before you flame me, don't think that I'm attacking other languages.  This
experience with Haskell has actually proved to me that Python is still easily my
favourite language for web development, especially in combination with
Django. (I could do a follow up on why that is—I have a growing list of things I
dislike about Haskell, some of which are fixable).  But I often hear the Python
crowd saying things about static typing and testing that come from ignorance,
and the way you would imagine things to be (often based on experience of
Java/C++/C#), and not from experience of something like Haskell.</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Haskell blog software]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/haskell-blog-software/" />
    <id>http://lukeplant.me.uk/blog/posts/haskell-blog-software/</id>
    <updated>2009-11-07T15:57:15Z</updated>
    <published>2009-11-07T15:57:15Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <summary type="html"><![CDATA[Haskell blog software]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/haskell-blog-software/"><![CDATA[<div class="document">
<p>I finally finished the Haskell blog project that I've been doing for <a class="reference external" href="http://lukeplant.me.uk/blog/posts/haskell-blog-rewrite-session-7/">a</a> <a class="reference external" href="http://lukeplant.me.uk/blog/posts/ella/">long</a> <a class="reference external" href="http://lukeplant.me.uk/blog/posts/rewriting-my-blog-in-haskell/">time</a>!  You're looking at it now (unless you are reading this a few months/years after I wrote it, in which case I will probably have <em>again</em> re-implemented my blog software in my new <em>language-du-jour...</em>) <em>[EDIT: I switched to blogofile in June 2012]</em></p>
<p>The <a class="reference external" href="http://bitbucket.org/spookylukey/haskellblog/">blog software</a> itself is not particularly interesting — fairly standard features, Atom feeds etc.  It uses <a class="reference external" href="http://software.complete.org/software/projects/show/hdbc-sqlite3">HDBC Sqlite</a> for storage, and <a class="reference external" href="http://www.haskell.org/haskellwiki/HStringTemplate">HStringTemplate</a> for rendering (a nice library, BTW).  For framework stuff, it uses my own <a class="reference external" href="http://lukeplant.me.uk/resources/ella/">Ella</a> library. I didn't find a forms/validation library I could use, and ended up just using a few adhoc bits and pieces.  I've used the lovely <a class="reference external" href="http://johnmacfarlane.net/pandoc/">pandoc</a> to allow reStructuredText both for my own posts and for comments, which is a nice feature IMO.</p>
<p>The main interest for me has been the learning process.  You get a much better, rounded understanding of a language from a project like this than you do from the small code samples that people knock around.</p>
<p>The project nearly failed at the last hurdle.  Everything was working, but when I uploaded to my server, it failed on some URLs. I realised it was a memory problem — the CGI program must have been killed for using too much memory.</p>
<p>At first, I thought the limits on the server must be unreasonably small.  Understanding the output of <tt class="docutils literal">+RTS <span class="pre">-s</span> <span class="pre">-RTS</span></tt> is kind of difficult. When I eventually found out that <a class="reference external" href="http://hackage.haskell.org/trac/ghc/ticket/698">GHC compiled programs never release any memory back to the operating system</a>, I realised that it's the <em>first</em> figure—the total amount of memory allocated in the heap—that was killing me.  On the bigger pages, this was over 160 Mb.  At that point I stopped complaining to my web host!</p>
<p>By changing to <tt class="docutils literal">ByteString</tt> instead of <tt class="docutils literal">Data.Text</tt> for <tt class="docutils literal">StringTemplate</tt>, and using <tt class="docutils literal">ByteString</tt> in a few other places, I achieved a 4-5 fold reduction in memory usage, along with a significant speed up.  Most pages now only use about 10-15 Mb to render, which is OK for a short running process I think.  It's not ideal, especially when an additional 1k comment on a page seems to require at least 300k extra memory to render, but it's good enough for now.  Profiling further will be very hard, as I suspect it will mainly be to do with the guts of HStringTemplate.</p>
<p>I'll be blogging about the experience of developing this over the next few days/weeks, and what I've learnt.  It's certainly been enjoyable overall, although it's definitely had its <a class="reference external" href="http://lukeplant.me.uk/blog/posts/building-ghc-is-fun/">pain</a> <a class="reference external" href="http://lukeplant.me.uk/blog/posts/haskell-blog-rewrite-session-6/">points</a> too!</p>
<p>I've put redirection in for all the old, crufty URLs, so there shouldn't be any broken links.  Feed readers will likely be confused, sorry!</p>
<p>If you have problems getting through my spam protection, please let me know.  It enforces a 10 second wait before it accepts submissions, which serves to prevent thoughtless comments as well as spam :-)</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Building GHC is fun...]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/building-ghc-is-fun/" />
    <id>http://lukeplant.me.uk/blog/posts/building-ghc-is-fun/</id>
    <updated>2009-11-04T11:09:41Z</updated>
    <published>2009-11-04T11:09:41Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <summary type="html"><![CDATA[Building GHC is fun...]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/building-ghc-is-fun/"><![CDATA[<div class="document">
<p>So, I rewrote my blog softare in Haskell, for kicks.  I've finally finished,
after a long time developing, trying out different ideas, learning Haskell etc.</p>
<p>I had already confirmed that I could build a binary for my target machine.  That
was a <a class="reference external" href="http://lukeplant.me.uk/blog/posts/haskell-blog-rewrite-session-7/">long process</a>, which involved installing GHC 6.4 from binaries, and using that to build GHC 6.8.3.  I have to build from source because of <a class="reference external" href="http://hackage.haskell.org/trac/ghc/ticket/2211">bug #2211</a>.</p>
<p>However, in the process of developing, things have moved on, and it was much
easier to develop with GHC 6.10 and newer libraries than the 6.8.* series.
Which means that I now need GHC 6.10.* on the VM that I'm using to build
binaries.</p>
<p>I tried 6.10.4, but due to <a class="reference external" href="http://hackage.haskell.org/trac/ghc/ticket/3179">bug #3179</a>, I found I had to downgrade
to 6.10.1.</p>
<p>Trying to build that, however, produced <a class="reference external" href="http://hackage.haskell.org/trac/ghc/ticket/3639">bug #3639</a> — it won't build with GHC
6.10.4.  I switched to using GHC 6.8.3 install to try to build it, but it still
isn't happy:</p>
<pre class="literal-block">
Configuring ghc-6.10.1...
cabal-bin: At least the following dependencies are missing:
Cabal -any,
base &lt;3,
filepath &gt;=1 &amp;&amp; &lt;1.2,
haskell98 -any,
hpc -any,
template-haskell -any,
unix -any
make[2]: *** [boot.stage.2] Error 1
make[2]: Leaving directory `/home/build/build/ghc-6.10.1/compiler'
make[1]: *** [stage2] Error 2
make[1]: Leaving directory `/home/build/build/ghc-6.10.1'
make: *** [bootstrap2] Error 2
</pre>
<p>Now, GHC 6.8.3 comes with base = 3.0.2.0, which might be the problem here. If
that's right, then you can't build GHC 6.10.1 with 6.8.3.  So, it sounds like
I'm going to have to build GHC 6.6.1 in order to build 6.10.1.</p>
<p>This seems pretty crazy! It wouldn't be so bad if GHC was quick to build, but
every build takes many hours.</p>
<p>Anyway, here goes, wish me luck!</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Haskell string support]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/haskell-string-support/" />
    <id>http://lukeplant.me.uk/blog/posts/haskell-string-support/</id>
    <updated>2009-08-03T21:58:59Z</updated>
    <published>2009-08-03T21:58:59Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <summary type="html"><![CDATA[Haskell string support]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/haskell-string-support/"><![CDATA[<div class="document">
<p>This is my suggestion about what needs to go into the <a class="reference external" href="http://hackage.haskell.org/platform/contents.html">Haskell Platform</a>.</p>
<p>Consider the following extremely simple program:</p>
<pre class="code haskell literal-block">
<span class="nf">s</span> <span class="ow">=</span> <span class="s">&quot;λ&quot;</span>

<span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span>
    <span class="n">writeFile</span> <span class="s">&quot;test.txt&quot;</span> <span class="n">s</span>
    <span class="n">s2</span> <span class="ow">&lt;-</span> <span class="n">readFile</span> <span class="s">&quot;test.txt&quot;</span>
    <span class="n">print</span> <span class="p">(</span><span class="n">s</span> <span class="o">==</span> <span class="n">s2</span><span class="p">)</span>
</pre>
<p>No prizes for guessing that the output of this program is not &quot;True&quot;. It
highlights an essential problem with the Haskell standard library — many of
the functions provided by the Prelude, <a class="reference external" href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html">System.IO</a>, System.Posix and many
others are completely broken (by design) and silently corrupt your data,
unless it is composed only of ASCII characters.</p>
<p>The problem is that these APIs use Strings for operating system calls (such
as reading/writing files, reading environment variables etc). A String is a
list of unicode Chars, but none of the operating system calls have a clue
what unicode chars are — they work entirely with bytes, which are a
completely different kind of thing. Result: your program breaks without
warning if you don't happen to be using ASCII.</p>
<p>And even worse, many libraries are built on the use of Strings and standard
library functions, and they inherit these same problems, so as a user of
those libraries, you can end up with problems that you can't even work
around. For the library developer, too, it can be a very nasty problem — you
start developing code using Strings, which works fine for ages, but a long
time later you realise you can't support just ASCII, and really you need
Data.ByteString, which requires changing function signatures or duplicating
existing code if you don't want to break compatibility.</p>
<p>This is a rather embarrassing situation for the standard library of a modern
language. What's worse is that even if you include the Haskell Platform as it
currently stands, as far as I can see there is no solution to this bug — no
correct way to simply write a string out to disk and read it back! I presume
this is because there is no universally accepted library for dealing with
encodings. Personally, I'd like to see the standard library change to remove
the pretence that you can talk Unicode to the operating system, but at the
very least we need a standardised way of doing the right thing, so that
developers (of both programs and libraries) don't have to use those broken
functions, and know what the correct alternatives are.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[GHC bug?]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/ghc-bug/" />
    <id>http://lukeplant.me.uk/blog/posts/ghc-bug/</id>
    <updated>2009-07-04T17:38:44Z</updated>
    <published>2009-07-04T17:38:44Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <summary type="html"><![CDATA[GHC bug?]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/ghc-bug/"><![CDATA[
<div class="document">


<p>What do you do when you are dealing with what seems like a bizarre compiler bug,
with the compiler being nothing less than GHC? First, pinch yourself — check;
then try again, 3 times to be sure — check; clear out 'dist/' and any temporary
build files — check; sleep on it — check.</p>
<p>And it's still happening.</p>
<p>I'm trying to use <a class="reference external" href="http://haskell.org/haskellwiki/HStringTemplate">HStringTemplate</a> for my personal blog
software, in particular the <tt class="docutils literal"><span class="pre">renderf</span></tt> function.  I was getting tricky
compilation errors, and in the course of messing around I found the following:</p>
<p>GHC cannot compile a certain function, call it <tt class="docutils literal"><span class="pre">func1</span></tt> for now, which uses
<tt class="docutils literal"><span class="pre">renderf</span></tt>.  But it compiles and works just fine if another function <tt class="docutils literal"><span class="pre">func2</span></tt>
(which doesn't use <tt class="docutils literal"><span class="pre">renderf</span></tt>, but does use a related HStringTemplate function
<tt class="docutils literal"><span class="pre">render</span></tt>) is present in the module, even though <tt class="docutils literal"><span class="pre">func2</span></tt> is not used
<em>anywhere</em> in the project.  Changing some of the details of what <tt class="docutils literal"><span class="pre">func2</span></tt> does
causes compilation to fail again, though other details can be changed.</p>
<p>That has to be impossible, right?  Am I losing my mind?</p>
<p>Ideally I'd create a nice simple test case, but that might take hours, and
changing small things about the voodoo function <tt class="docutils literal"><span class="pre">func2</span></tt> seems to destroy its
magical properties, and I'm suspecting the problem is in me.  So I'll just
post all my code.  The bad news is there are <em>lots</em> of dependencies.  The good
news is I have used cabal, so the following instructions should suffice if you
have cabal installed.</p>
<p>Download and install 'ella' (CGI web framework I'm writing) and dependencies:</p>
<pre class="literal-block">
hg clone -r 30aa625a3f51 http://bitbucket.org/spookylukey/ella/
cd ella/
cabal configure --user &amp;&amp; cabal build &amp;&amp; cabal install --user
cd ..
</pre>
<p>(You can <a class="reference external" href="http://bitbucket.org/spookylukey/ella/get/30aa625a3f51.gz">download a tarball</a> if you don't have mercurial)</p>
<p>Download and build the blog software:</p>
<pre class="literal-block">
hg clone -r e1015168e56b http://bitbucket.org/spookylukey/haskellblog/
cd haskellblog/
cabal configure --user &amp;&amp; cabal build
</pre>
<p>(Again, <a class="reference external" href="http://bitbucket.org/spookylukey/haskellblog/get/e1015168e56b.gz">download the tarball</a> if you don't have mercurial)</p>
<p>The build should succeed.  Now, the voodoo function is at the end of
<tt class="docutils literal"><span class="pre">src/Blog/Views.hs</span></tt>.  Comment it out:</p>
<pre class="literal-block">
perl -pi -e 's/this_is_not_used/-- this_is_not_used/' src/Blog/Views.hs
</pre>
<p>Build again:</p>
<pre class="literal-block">
cabal build
</pre>
<p>Result - <a class="reference external" href="http://hpaste.org/fastcgi/hpaste.fcgi/view?id=6521">this compilation error</a>.</p>
<p>I don't know whether that compilation error is correct or not, but either way,
it seems crazy that it could depend on the existence and implementation of a
completely unused function.</p>
<p>For reference, I'm using GHC 6.10.1.</p>
<p>Any ideas?</p>
</div>

]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Haskell regex problem - help needed!]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/haskell-regex-problem-help-needed/" />
    <id>http://lukeplant.me.uk/blog/posts/haskell-regex-problem-help-needed/</id>
    <updated>2008-11-21T19:14:32Z</updated>
    <published>2008-11-21T19:14:32Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <summary type="html"><![CDATA[Haskell regex problem - help needed!]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/haskell-regex-problem-help-needed/"><![CDATA[
<p>I need some help! I did someone a good deed on a blog the other day, so I'm swallowing my pride and asking for a random kind deed from someone who knows something.</p>

<p>While trying to compile <a href="http://hpaste.org/697#a2">this snippet</a>, I get <a href="http://hpaste.org/9295">this compilation error</a> - a complaint about a missing instance.</p>

<p>This occurs with:</p>
<ul>
  <li>GHC 6.8.2</li>
  <li>Various packages installed system wide (ubuntu 8.04 packages) and locally (using cabal --user --prefix=$HOME/local)</li>
  <li>bytestring-0.9.0.2 or anything later</li>
</ul>

<p>If I revert to the bytestring that comes with my system, 0.9.0.1 , the 
error goes away.  Having finally looked at the differences between 0.9.0.1 and 0.9.0.2, which are tiny, and do not include any differences when it comes to the definition of typeclass instances, it seems clear that this isn't really the problem, but something else very funny is going on.  But I do not have the first what.</p>

<p>I was just coping with it by sticking with bytestring-0.9.0.1, but I won't be able to do that forever...</p>

<p>Do I have to rebuild all the packages in my system or something evil?  Any ideas?</p>

<p>Thanks in advance!</p>

]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Ella]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/ella/" />
    <id>http://lukeplant.me.uk/blog/posts/ella/</id>
    <updated>2008-11-04T18:37:56Z</updated>
    <published>2008-11-04T18:37:56Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Ella]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/ella/"><![CDATA[
<p>I have been continuing very slowly with my <a href="http://hg.lukeplant.me.uk/haskellblog/luke/">Haskell blog</a>.  Yesterday I properly pulled out the <a href="http://www.djangoproject.com/">Django</a>-inspired framework I am writing alongside it, and called it <a href="http://lukeplant.me.uk/resources/ella/">Ella</a>, after another <a href="http://en.wikipedia.org/wiki/Ella_Fitzgerald">jazz genuis</a> (though a vocalist -- I much prefer vocal jazz).</p>

<p>There were a number of reasons I didn't like the existing <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/cgi">Haskell CGI</a> package, but one of the biggest was the lack of explicit request and response objects.  Instead of that it did everything inside a CGI monad, which makes it impossible to do things like reusable pre-processing of the request and post-processing of the response, both of which I will want to be able to do.  I wanted something much more in the style of Django, with explicit request and response objects, something, ironically, much more <i>functional</i> instead of <i>imperative</i> -- it is a surprise that I got this from a Python web framework.  There were also things about the CGI API itself I really didn't like (e.g. didn't differentiate between GET and POST inputs).  Plus, I wanted to have a go at some real Haskell, so I rolled my own.</p>

<p>It is very early days at the moment, and many big things are missing from the API (like proper access to GET and POST parameters, handlers for file uploads, any kind of HTML helpers for form handling etc).  I realised that the API for all of that should only be implemented as I needed it in my actual software, otherwise I would just get it wrong.  What is implemented so far is a <a href="http://lukeplant.me.uk/resources/ella/docs/ella/Ella-Framework.html#3">strongly-typed routing mechanism</a>, and not much more.  It is enough, though, to implement a useful app -- I wrote a very simple 80-line script that handles (via clickable URLs in emails) subscription to a personal mailing list I'm organising for myself.  It also acts as an example at the moment -- see <a href="http://hg.lukeplant.me.uk/mailinglistconfirm/luke/file/16f955d2ee50/src/ConfirmCgi.hs">ConfirmCgi.hs</a>.</p>

<p>Currently there is no home page, though there is some <a href="http://lukeplant.me.uk/resources/ella/docs/ella/">good documentation</a>, and you can get <a href="http://hg.lukeplant.me.uk/ella/luke/">the source</a>.  All of the API is subject to change at any time, but I think what I've done so far is a reasonable basis.</p>

]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Haskell API docs suck. A lot.]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/haskell-api-docs-suck-a-lot/" />
    <id>http://lukeplant.me.uk/blog/posts/haskell-api-docs-suck-a-lot/</id>
    <updated>2008-08-04T20:40:16Z</updated>
    <published>2008-08-04T20:40:16Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <summary type="html"><![CDATA[Haskell API docs suck. A lot.]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/haskell-api-docs-suck-a-lot/"><![CDATA[
<P>
Haskell API documentation is very lacking for newbies.  For instance, I want to understand how to create and use regexes.  If you start at <A class="ext-link" href="http://hackage.haskell.org/packages/archive/regex-posix/0.93.1/doc/html/Text-Regex-Posix.html"><SPAN class="icon">Text.Regex.Posix documentation</SPAN></A>, it tells you that <TT>=~</TT> and <TT>=~~</TT> are the high level API, and the hyperlinks for those functions go to Text.Regex.Posix.Wrap, where the main functions are not actually documented at all!  
</P>
<P>
So we look at the type signatures -- here is the first:
</P>
<PRE class="wiki">(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) =&gt; source1 -&gt; source -&gt; target
</PRE><P>
So, that leads me to the class declarations for these things.  But trying to understand them is rather intimidating:
</P>
<PRE class="wiki">class RegexOptions regex compOpt execOpt | regex -&gt; compOpt execOpt, compOpt -&gt; regex execOpt, execOpt -&gt; regex compOpt where
</PRE><P>
Or how about this?
</P>
<PRE class="wiki">class RegexOptions regex compOpt execOpt =&gt; RegexMaker regex compOpt execOpt source | regex -&gt; compOpt execOpt, compOpt -&gt; regex execOpt, execOpt -&gt; regex compOpt where
</PRE><P>
They are using multi-parameter type classes and functional dependencies.  Having read bits of Haskell for a while, I happen to know what they are (vaguely), but I don't really understand them, nor does the above really give me any clue to how to actually use this API.
</P>
<P>
Google to the rescue.  (This is bad: I shouldn't have to google for documentation when I'm already looking at the obvious place for something to be documented). The first result for "haskell regex" is a completely useless and hopeless out of date page, but there is a <A class="ext-link" href="http://www.serpentine.com/blog/2007/02/27/a-haskell-regular-expression-tutorial/"><SPAN class="icon">Haskell regex tutorial</SPAN></A> on a blog that shows us how to do it, and it is astonishingly simple:
</P>
<PRE class="wiki">&gt; "bar" =~ "(foo|bar)" :: String
"bar"
</PRE><P>
So what is going on?  It looks like the library has been designed extremely cleverly so that in the simple case (regex with default options etc), you can use it very easily, but you don't need to use different functions if you want to add regex options.  Furthermore, it is polymorphic in its return type, so we can also do this:
</P>
<PRE class="wiki">&gt; "bar" =~ "(foo|bar)" :: Bool
True
</PRE><P>
In fact you can get lists of matches, or lists of match offsets etc -- almost anything you can think, just by specifying (directly or using type inference) the type of the result you want.  This is beautifully elegant and clever and I'm sure it gave the designer a warm fuzzy feeling inside (well, it gives me one, and I'm just looking at it).  The downside is that if you try to use <TT>=~</TT> at a GHCi prompt without a type annotation, you just get a ridiculously unhelpful error.  
</P>
<P>
The problem here is that making the library so clever has also made it utterly impenetrable to the beginner.  The main functions are not even documented, and there is no explanation of the crazy type signature.  You might say that it is simply a documentation problem, but it is actually a combination of the two -- if the type signature had been something simple, it would have been easy to deduce how to use it.  It seems to me that the documentation of a library has got to be proportional to the cleverness of its type signatures, or people are going to be absolutely lost.  Since Haskell libraries are almost always implemented by Haskell gurus, and they implement them with <STRONG>themselves</STRONG> in mind (I have no objection to this, they are enthusiasts working for free), they use lots of clever code and advanced Haskell techniques.  But this means that if you want people to actually use these libraries (and by consequence Haskell itself), the documentation for Haskell libraries has to be about <STRONG>an order of magnitude better</STRONG> than anything you'd find anywhere else.  I suspect it is at least <STRONG>an order of magnitude worse</STRONG> than for something like .NET APIs, which means that relatively speaking the documentation of Haskell is currently in an absolutely dire state.
</P>
<P>
Sorry, I'm just saying it like it is.  These libraries are great when you can get them to work, and I'm really grateful to the authors for their fantastic work, and the effort that has gone into packaging and distributing them (so that <A class="ext-link" href="http://hackage.haskell.org/trac/hackage/wiki/CabalInstall"><SPAN class="icon">installation is literally one short command-line away</SPAN></A>), but the hurdles are <STRONG>still</STRONG> currently far too great compared to any other language for Haskell to become popular.
</P>
<P>
Moving forward, I guess one problem is contributing to a library's documentation.  There is nothing on the API doc pages that shows you how to do this.  I suspect you need to check out the source with darcs (not something I do normally, I just use cabal) and then start email patches or something.  Even then, I don't know if I would contribute any documentation -- 'howto' style documentation seems out of place on the API pages, but it is desperately needed.
</P>
<HR>
<P>

]]></content>
  </entry>
</feed>
