<?xml version="1.0" encoding="UTF-8"?>
<feed
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:thr="http://purl.org/syndication/thread/1.0"
  xml:lang="en"
   >
  <title type="text">All Unkept</title>
  <subtitle type="text"></subtitle>

  <updated>2013-05-24T11:16:23Z</updated>
  <generator uri="http://blogofile.com/">Blogofile</generator>

  <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog" />
  <id>http://lukeplant.me.uk/blog/feed/atom/</id>
  <link rel="self" type="application/atom+xml" href="http://lukeplant.me.uk/blog/feed/atom/" />
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Bundling dependencies]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/bundling-dependencies" />
    <id>http://lukeplant.me.uk/blog/posts/bundling-dependencies</id>
    <updated>2013-04-15T11:08:10Z</updated>
    <published>2013-04-15T11:08:10Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Software development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Bundling dependencies]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/bundling-dependencies"><![CDATA[<div class="document">
<p>This post is about maintenance programming and the issue of Open Source
dependencies that may need customising. It compiles some of my current thoughts,
but I'm also eager to find out what other people do.</p>
<div class="section" id="approaches-to-dependencies">
<h1>3 approaches to dependencies</h1>
<ol class="arabic">
<li><p class="first">Pure dependency</p>
<p>The source code of the dependency does not become a part of your project in
any way. For a web project with Python and virtualenv/pip, you would just
list the project name and version in requirements.txt, and it will be
installed when you deploy your project.</p>
<p>This is by far the easiest approach to dependencies.</p>
</li>
<li><p class="first">Forked dependency</p>
<p>You create a fork of the library (usually hosted publicly, but not
necessarily) and add to it the changes you need. You then use this fork from
your main project.</p>
<p>This is done either in the hope that bug fixes and feature additions that you
make will be merged into the original, so that you won't have to maintain
your fork forever, or with the aim of keeping your changes small enough
that it will always be easy to merge in fixes from upstream.</p>
</li>
<li><p class="first">Bundled dependency</p>
<p>You take a copy of the library, and include it directly into your own source
code, so that it becomes a part of your source code, so that you can make
whatever modifications you need. The code becomes a part of your source code
forever.</p>
</li>
</ol>
<p>This post is about number 3 — the bundled dependency.</p>
<p>(There are, of course, variants and mixtures of these — for example, <a class="reference external" href="https://www.djangoproject.com/">Django</a> has often bundled dependencies, but this was
purely because of the confusing state of packaging, and the code was never
modified for use in Django. These libraries have been or will be un-bundled as
soon as possible.)</p>
</div>
<div class="section" id="avoid-it-if-you-possibly-can">
<h1>Avoid it if you possibly can</h1>
<p>The first thing to say about bundling dependencies is that you should avoid doing
so if at all possible:</p>
<ul class="simple">
<li>It can result in large increases in code base.</li>
<li>You won't get critical fixes from upstream, and it can be hard to merge them
in.</li>
</ul>
<p>Bundling a dependency can be a drastic decision — you are taking on all the
technical debt and maintenance burden of the code you are adding. Some
developers look at Open Source libraries and think “wow, all this free source
code I can just add to my project”. Your attitude needs to be exactly the
opposite: “Wow, look at all that code I'm going to have to maintain and debug”.</p>
<p>An external dependency is often much worse from a maintenance point of view than
code you have written yourself:</p>
<ul class="simple">
<li>You may not understand the code very well at all, and you may not have access
to the original reasons for the way it is.</li>
<li>When you add it to your project, you typically lose its history, making it
harder to track down reasons for its current state.</li>
<li>Library code can often be over-generalised and complex. It copes with all
kinds of situations that you don't need, but you will have to understand and
maintain that complexity.</li>
<li>The code will not ‘fit’ into your project well — there may be all kinds of
conventions and decisions that make it alien to your project, but now it is
part of your project and needs to fit.</li>
</ul>
</div>
<div class="section" id="alternatives">
<h1>Alternatives</h1>
<p>To avoid bundling a dependency, you can go for the ‘forked dependency’
above. For the missing features you need, attempt to add extension points that
will give you the flexibility you need, rather than simply hard code something
very specific to your project that will never get merged upstream.</p>
<p>Another alternative is to build what you need yourself, or very selectively add
parts of the dependency into your own source. This may seem more work, but could
be easier to maintain long term.</p>
<p>Finally, you could consider a <a class="reference external" href="http://en.wikipedia.org/wiki/Monkey_patch">monkey patch</a>. But be very careful, and make
sure you know all the places where you are doing that kind of thing, so that you
can assess at what point you should be switching strategy.</p>
</div>
<div class="section" id="when-you-should-consider-it">
<h1>When you should consider it</h1>
<p>However, there are times when you should consider bundling the dependency:</p>
<ul class="simple">
<li>When the changes you want to make are more than bug fixes.</li>
<li>When the changes can't be easily made by adding extension points to the original.</li>
<li>When the number/size of extensions is going to severely inhibit a developer's
ability to understand the code.</li>
</ul>
<p>I recently took on a project that had bundled a copy of <a class="reference external" href="http://www.satchmoproject.com/">Satchmo</a>. It was a bit of a shock, because
requirements.txt also listed Satchmo as a dependency, making me think I was in
situation 1, when actually I was in situation 3, which is much worse.</p>
<p>Sometimes, however, it is unavoidable. e.g. you need multiple fields adding to
DB different models, or you need to make invasive changes in other ways. As I
looked at the number of modifications made to the bundled Satchmo, I realised
there was no way that strategies 1 or 2 would be any good. Strategy 3 had
already been chosen, it was impossible to turn back the clock, and with
hindsight it was probably the right decision.</p>
<p>But implementation of that decision was lacking in lots of ways.</p>
<p>So how do you cope when you are forced to bundle? Here are my hints so far.</p>
<ul>
<li><p class="first">Recognise that you have done a really bad thing, and you need to take equally
drastic action to cope with it. The bigger the dependency you've bundled, the
more likely it is that you have seriously damaged your ability to maintain the
project long term.</p>
</li>
<li><p class="first">Make sure you include the tests of the original dependency, and integrate them
as part of your test suite.</p>
<p>Sounds obvious, but in the project that inspired this post, the opposite had
been done — they had copied all the source code, with the exception of every
file called 'tests.py' or directory called 'tests'. I do not know what
possessed them to do this, but this decision was an extremely expensive one
for their client, and has caused massive damage to the project.</p>
</li>
<li><p class="first">Maintain the test suite properly.</p>
<p>Again, sounds obvious, but tests are extremely valuable to a project, and in
this situation it is vital that you keep them maintained.</p>
<p>It is acceptable to delete tests if they are checking requirements that you no
longer have. But you should be deleting the code that supports those tests as
well.</p>
</li>
<li><p class="first">Take complete ownership of the code.</p>
<p>Having made the decision to bundle, don't treat the code like an external
dependency. It is your code now, only you can fix it. Don't pretend you are
going to merge in upstream changes.</p>
<p>The code should live at the same ‘level’ as the rest of your code — for
example, it should be in the same directory, not off in some 'libs' directory
that makes it harder to find. You need to embrace the fact that it is part of
your maintenance burden.</p>
<p>On the other hand, it is your code now, you can do what you want with it. So
don't be afraid of making changes. A tentative approach will leave you with
the worst of both worlds — a library that doesn't really do what you want, but
that you have to maintain. Make it do what you want.</p>
<p>Obviously, there can be some value in maintaining a separation between &quot;your
stuff&quot; and the &quot;framework stuff&quot; or &quot;library stuff&quot;, but this is just good
coding practice — you wouldn't hard code something very specific into a
function that is supposed to be generic.</p>
</li>
<li><p class="first">Delete, delete, delete.</p>
<p>If there is code that you don't need, just delete it. The more code you can
remove, the better. There can be a case for keeping some code around if:</p>
<ol class="arabic simple">
<li>It is causing very little nuisance to maintenance efforts.</li>
<li>It is fairly likely to be needed in the <strong>near</strong> future.</li>
<li>It is not causing runtime weaknesses (e.g. security problems),
because there is no entrance point to the code.</li>
</ol>
<p>But note that just the existence of code is a maintenance problem. If, for
example, you need to change the signature of a function, you will do a search
for sites that call it. Every hit you get is something you have to
investigate, which takes time. If, in the process of this kind of
investigation, you find some code that might be unused, find out if it is, and
delete aggressively where appropriate.</p>
<p>And code that <strong>might</strong> be needed <strong>one day</strong> is better deleted. By the time
you come to need it, it might be horribly broken, or broken in subtle ways
that will take you longer to debug than to write, or too complex or badly
performing for the context of your evolved application.</p>
<p>This applies to all kinds of code, including templates etc.</p>
</li>
<li><p class="first">Clean aggressively.</p>
<p>If you delete unused code, you'll find that you may well end up with code that
has essentially unused generality, or various other things that no longer make
sense for your specific project.</p>
<p>This is my golden rule for maintenance:</p>
<blockquote>
<p>Leave the code looking as if it had always been designed that way.</p>
</blockquote>
<p>This is a general maintenance principle, but it is especially important for
the situation where you are trying to go from a larger code base to a smaller
one.</p>
<p>Ideally, there should never be artefacts that can only be explained by talking
about the history of the project. This applies to every detail, including:</p>
<ul class="simple">
<li>names of models</li>
<li>names of fields</li>
<li>names of variables and functions</li>
</ul>
<p>Altering models is not hard if you have a good database migration tool
e.g. South for Django.</p>
<p>This principle may seem like it adds to the load of the maintenance
programming, but long term it reduces the load, and reduces the likelihood
that a project will collapse under its own weight. Even <em>with</em> this principle,
projects tend to become unmaintainable — the natural tendency of a project is
towards chaos, and you have to be very proactive about reversing that.</p>
<p>Example 1: after deleting some classes, you end up with a class hierarchy
where each base class is only used once. This adds a lot of overhead when
reading the code. You should clean aggressively — fold the classes together
(unless keeping them separate increases the clarity of the code).</p>
<p>Example 2: The code I'm maintaining uses livesettings (and uses it far too
much in my opinion, for things that ought to be in settings.py). It includes
some options that are unlikely to change for a given project, or are likely to
become ignored easily. For example, there is an &quot;Only authenticated users can
check out&quot; setting. In a project with an overridden login form or login view
(which can easily happen), it's very easy for this switch to become (at least
partly) broken. When you are working on some code that branches on the value
of this switch, there is no point fixing both branches — you won't have decent
tests to ensure that the unused branch is really working.</p>
<p>Instead, find out what the current value is, and just delete the other
branch. Then find all instances of the setting being used, and clean up
similarly. Finally, delete the code that defines the switch in the first
place. Remove every trace — you always have the history if you really need to
see how something was done before.</p>
</li>
<li><p class="first">Lather, rinse, repeat.</p>
<p>The aggressive process of deleting and cleaning leads to more, and you should
follow this up. You may not have the time to do it right now, but you should
be doing as you go — whenever some coding has turned up something that can be
cleaned/deleted, first do the necessary commit for whatever you were working
on. Then do a round of cleaning/deleting, finding all the code paths that are
now dead or can be simplified, commit the change, and repeat,</p>
</li>
</ul>
<p>These things have to go together. Aggressive deleting and cleaning can be made a
lot easier if you have a good test suite. Of course, when deleting code, you
will do a search for sites that might call it. But it ought to be possible to
check if you can delete code simply by running the test suite with it absent.</p>
<p>What other approaches or hints do you have for dealing with this situation?</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Best practices with Django on WebFaction]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/best-practices-with-django-on-webfaction" />
    <id>http://lukeplant.me.uk/blog/posts/best-practices-with-django-on-webfaction</id>
    <updated>2013-04-01T18:24:46Z</updated>
    <published>2013-04-01T18:24:46Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Best practices with Django on WebFaction]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/best-practices-with-django-on-webfaction"><![CDATA[<div class="document">
<p>This blog post is about deploying Django apps. It is addressed primarily to
<a class="reference external" href="http://www.webfaction.com/?affiliate=cciw">WebFaction</a>, but applies to any
other hosts else with one-click installers for this kind of app.</p>
<p>First, many thanks to WebFaction for their great service and support. I've got
very few complaints about them in general.</p>
<p>However, I think their approach to setting up Django projects is far from ideal,
and makes life harder than it needs to be.</p>
<p>The <a class="reference external" href="http://docs.webfaction.com/software/django/getting-started.html">WebFaction Django docs</a> tell us to
create a new Django app by using the control panel and choosing an app of type
‘Django’ and then choosing your Django version. This will install an Apache
instance with mod_wsgi, and a copy of the Django sources, and set up a basic
Django app skeleton.</p>
<p>There are lots of problems with this:</p>
<ol class="arabic">
<li><p class="first">What happens when there is a security issue with Django?</p>
<p>The user will need to have an upgrade mechanism. This is covered in the docs,
but it is an <a class="reference external" href="http://docs.webfaction.com/software/django/config.html#upgrading-your-django-libraries">11 step process</a>.</p>
<p>And this process is a process they will first have to do in their development
environment for testing, in some way, and then again on their live box. How
many people will actually do that, especially for security releases that they
don't feel they need?</p>
<p>Experienced developers, of course, don't do this. For Django, I've used
virtualenv and pip, and removed the existing copy of Django that WebFaction
put there. Of course, I needed to use virtualenv and pip anyway, and so will
99% of all Django users. So my upgrade procedure for Django, like for any
dependency, looks like this:</p>
<ol class="arabic simple">
<li>Edit requirements.txt</li>
<li><tt class="docutils literal">$ pip install <span class="pre">-r</span> requirements.txt</tt></li>
<li>Test and fix - automated or manual etc.</li>
<li>Check in the changes to source control.</li>
<li><tt class="docutils literal">$ fab deploy</tt></li>
</ol>
<p>Note that this includes local upgrade as well as remote. It is much, much
easier and much less error prone than the process WebFaction describes.</p>
</li>
<li><p class="first">It uses an Apache instance.</p>
<p>Apache is a massive overkill for this kind of thing. Just figuring out the
mod_wsgi settings for threads and processes nearly made my head explode, and
I'm a reasonably experienced web developer.</p>
<p>Most experienced Django developers are not recommending Apache. They use
<a class="reference external" href="http://gunicorn.org/">gunicorn</a> or <a class="reference external" href="http://projects.unbit.it/uwsgi/">uWSGI</a> or similar.</p>
<p>And what happens if there is a security issue with the Apache version
installed? Would users get any notifcation of this? How would they upgrade
Apache? I have no idea—I know how to compile Apache from source, but I've no
idea what compilation settings WebFaction uses etc. So far the only solution I've
come up with to this worry is to migrate my projects to gunicorn.</p>
</li>
<li><p class="first">It goes against the grain of how people will want to upload their apps.</p>
<p>New Django developers will follow the tutorial, or something similar like a
django-cms setup guide, so that when they first come to deploy their app they
will have a working app, with settings file etc. that they just want to
deploy.</p>
<p>Experienced Django developers will not be following a tutorial, but will have
a very similar set of files that they want to upload.</p>
<p>They will not start with a settings file or project layout generated by
WebFaction. That is just a confusion that gets in the way.</p>
<p>Rather, they will want to customise their settings.py file, or create a new
one for deployment, and insert into it the database settings, and maybe a few
other tweaks.</p>
<p>Then they will simply want to upload it, including a requirements.txt file,
and have it run.</p>
</li>
</ol>
<p>As a contrast to the process WebFaction have documented, here is my process for deploying
a new app on WebFaction:</p>
<ul class="simple">
<li>On WebFaction control panel, set up:<ul>
<li>a 'Custom app (listening on port)' application</li>
<li>a static-only app for static assets</li>
<li>a static-only app for user uploaded media</li>
<li>a database</li>
</ul>
</li>
</ul>
<p>(I've actually got a script that automates this, but that's another matter).</p>
<ul class="simple">
<li>In my project sources:<ul>
<li>copy the database details to a settings.py file</li>
<li>update some paths in my fabfile.py</li>
<li>check in to VCS</li>
<li><tt class="docutils literal">$ fab deploy</tt></li>
</ul>
</li>
<li>On my WebFaction server, I setup a crontab command that runs gunicorn if it is
not running. (This command actually calls a command in fabfile.py, which has
been uploaded to the server).</li>
</ul>
<p>To upgrade Django or any other dependency, including gunicorn, I do:</p>
<ul class="simple">
<li>Edit requirements.txt</li>
<li>Check in to VCS</li>
<li><tt class="docutils literal">$ fab deploy</tt></li>
</ul>
<p>For initial setup, this is easier than the process WebFaction have
documented. For upgrades, it is <em>much</em> easier. As an experienced developer, I'm
happy writing this myself, and I've got some template fabfile.py files I can use
for new projects.</p>
<p>But newbies get the much harder processes, and they are still left with big
hurdles when it comes to:</p>
<ul class="simple">
<li>uploading their own code</li>
<li>adding dependencies</li>
<li>upgrading dependencies</li>
</ul>
<p>The ‘one-click installers’ WebFaction features sound great in theory, but in
practice they are doing their users a disservice. Please give us, and especially
newbies, a method that we will want to use long term.</p>
<p>Designing such a method is hard, and will probably involve a set of instructions
rather than a ‘one-click’ experience, especially when different Django
developers do things different ways. But newbies often don't have opinions like
experienced developers do, and, providing the process allows customisation,
experienced developers will be able to customise it. IMO, the default process
should leave newbies with a setup that encourages the following best practices
and de-facto standards:</p>
<ul class="simple">
<li>virtualenv and pip</li>
<li>gunicorn (or alternative)</li>
<li>fabric (or alternative, but should be mainly Python or shell commands, so
an experienced developer</li>
<li>./manage.py collectstatic</li>
<li>VCS - Mercurial or Git or (Please don't make it force one or another)</li>
<li>settings.py</li>
</ul>
<p>I have a <a class="reference external" href="https://bitbucket.org/spookylukey/django-fabfile-starter/src/f4c87b0b2676911de2fa4a9784ca705f708b3bf1/fabfile.py?at=default">starter fabfile.py</a>
I used a while back, and have tried to update according to the way I now use
gunicorn. It isn't properly tested in its own right, but it could be a useful
starting point.</p>
<p>Perhaps one approach is to have a public repository (mirrored in different
popular VCS systems), which includes some basic files (<tt class="docutils literal">requirements.txt</tt>,
<tt class="docutils literal">fabfile.py</tt>, <tt class="docutils literal">settings_webfaction.py</tt>, <tt class="docutils literal">wsgi.py</tt>) that can be merged
into a developer's project sources. Picky developers can override what they
want, but newbies will have something that works well.</p>
<p>With that kind of process, I'd feel happy to recommend Django newbies to use
WebFaction. At the moment, I could only recommend experienced developers to use
it for Django deployments, and tell them to ignore WebFaction's own docs about
deployment.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Now it's your turn]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/now-its-your-turn" />
    <id>http://lukeplant.me.uk/blog/posts/now-its-your-turn</id>
    <updated>2013-03-28T16:00:00Z</updated>
    <published>2013-03-28T16:00:00Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Now it's your turn]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/now-its-your-turn"><![CDATA[<div class="document">
<p>I just watched <a class="reference external" href="http://pyvideo.org/video/1787/porting-django-apps-to-python-3">Jacob's talk on &quot;Porting Django apps to Python 3&quot;</a>, and realised
it was time to tackle my own small Django apps.</p>
<p>The problem with porting is that you need your dependencies to be ported
first. But now that <a class="reference external" href="https://docs.djangoproject.com/en/dev/faq/install/#can-i-use-django-with-python-3">Django has Python 3 support</a>,
the finger is no longer pointing at Django — it is pointing at all of us with
Django apps that have only Django as a dependency (or other dependencies that
are already ported to Python 3). As Jacob put it at the end of the talk, <strong>it’s your turn</strong>.</p>
<p>So, I took the challenge, and here is a walk through of what you need to do, and
what I had to do:</p>
<ol class="arabic">
<li><p class="first">Find an app/library you've written that has very few dependencies, or all
dependencies already ported to Python 3. In my case, <a class="reference external" href="https://pypi.python.org/pypi/django-easyfilters">django-easyfilters</a>. Not a massively popular
library, but it has had over 1000 downloads, and I know some people use it.</p>
</li>
<li><p class="first">Install <a class="reference external" href="http://testrun.org/tox/latest/">tox</a> and create a tox.ini file to
run your test suite on more than one version. Start with all the Django
versions you want to support, with Python 2.x combinations (Python 2.6 and
2.7 recommended), and Python 3.3.</p>
<p>My tox.ini file looks like <a class="reference external" href="https://bitbucket.org/spookylukey/django-easyfilters/src/6d6b81828d029714da1b56e9f079ea96433f8238/tox.ini?at=default">this</a>.</p>
</li>
<li><p class="first">Run tox and watch it pass with Python 2.x.</p>
<p>Of course, if you get failures at this point, fix them first. Because I am an
exceptionally good boy, I got no failures, even for Django 1.5 which I had
not tried before with this library. This is my reward for having Done Things
Right (with this particular app :-). As well as a pretty complete test suite,
I even created a small demo app, and left <a class="reference external" href="https://bitbucket.org/spookylukey/django-easyfilters/src/e05a6f11c03219595b4c33b5df782e81f3de88f3/docs/develop.rst?at=default">instructions</a>
about how to use it. I may be the only person to have ever read these
instructions, but it was well worth it.</p>
</li>
<li><p class="first">Watch the tests fail with Python 3.3</p>
<p>OK, now to start fixing it.</p>
<p>Activate the Python 3.3 virtualenv that tox created. In my case:</p>
<pre class="literal-block">
. .tox/py33-django15/bin/activate
</pre>
<p>Then run your test command. In my case:</p>
<pre class="literal-block">
./manage.py test django_easyfilters
</pre>
<p>You may need to do some work even to get it to run at all.</p>
<p>On the first run, out of 45 tests only 7 passed. Gulp.</p>
<p>Now go read <a class="reference external" href="http://docs.python.org/3/howto/pyporting.html#python-2-3-compatible-source">a porting guide, using the 'single source' method</a>,
and check out <a class="reference external" href="http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/">Armin’s Python 3 porting redux</a>.</p>
<p>You'll probably want to install 'six' and add it to your project's
dependencies (unless you are targetting only Django 1.4 and later, in which
case you can use <tt class="docutils literal">django.utils.six</tt>). And add that dependency to your
setup.py file, and your tox.ini file, and set tox running, because it will
need to rebuild all your virtualenvs if six wasn't a dependency before.</p>
<p>Now iterate on fixing your tests. Each time you find a problem, grep the code
base for other instances of it.</p>
<p>I found a relatively small bunch of problems which caused most of my tests to fail:</p>
<ul>
<li><p class="first">Use of implicit relative imports</p>
</li>
<li><p class="first">Use of Decimal._rescale which is removed in Python 3.3</p>
</li>
<li><p class="first">map() had to be replaced with list(map()) a few times</p>
</li>
<li><p class="first">I needed <tt class="docutils literal">from six.moves import xrange</tt> to use <tt class="docutils literal">xrange</tt></p>
</li>
<li><p class="first">s/unicode/six.text_type/ and similar</p>
</li>
<li><p class="first">I copied python_2_unicode_compatible from Django 1.5 for fixing <tt class="docutils literal">__unicode__</tt> and <tt class="docutils literal">__str__</tt>:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">python_2_unicode_compatible</span><span class="p">(</span><span class="n">klass</span><span class="p">):</span>
    <span class="sd">&quot;&quot;&quot;
    A decorator that defines __unicode__ and __str__ methods under Python 2.
    Under Python 3 it does nothing.

    To support Python 2 and 3 with a single code base, define a __str__ method
    returning text and apply this decorator to the class.
    &quot;&quot;&quot;</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">six</span><span class="o">.</span><span class="n">PY3</span><span class="p">:</span>
        <span class="n">klass</span><span class="o">.</span><span class="n">__unicode__</span> <span class="o">=</span> <span class="n">klass</span><span class="o">.</span><span class="n">__str__</span>
        <span class="n">klass</span><span class="o">.</span><span class="n">__str__</span> <span class="o">=</span> <span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">__unicode__</span><span class="p">()</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">klass</span>
</pre>
</li>
<li><p class="first"><tt class="docutils literal">__cmp__</tt> no longer supported.</p>
<p>This was a bit of pain to fix. For some of my classes <tt class="docutils literal">__cmp__</tt> was a much
more natural way to define sorting, and it broke my head trying to rewrite. So
I did this:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="n">six</span><span class="o">.</span><span class="n">PY3</span><span class="p">:</span>
    <span class="c"># Support for __cmp__ implementation below</span>
    <span class="k">def</span> <span class="nf">cmp</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
        <span class="k">return</span> <span class="p">(</span><span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">a</span> <span class="o">&lt;</span> <span class="n">b</span><span class="p">)</span>
    <span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">total_ordering</span>
<span class="k">else</span><span class="p">:</span>
    <span class="n">total_ordering</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">c</span><span class="p">:</span> <span class="n">c</span> <span class="c"># no-op</span>
</pre>
<p>And in the classes, just added this:</p>
<pre class="code python literal-block">
<span class="nd">&#64;total_ordering</span>
<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>

       <span class="c"># ...</span>

       <span class="k">def</span> <span class="nf">__eq__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
           <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">__cmp__</span><span class="p">(</span><span class="n">other</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span>

       <span class="k">def</span> <span class="nf">__lt__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
           <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">__cmp__</span><span class="p">(</span><span class="n">other</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span>
</pre>
<p>You could even write a decorator to do all of this.</p>
<p>When I migrate fully away from Python 2.x I may get round to rewriting these.</p>
</li>
</ul>
</li>
<li><p class="first">You also need to check your setup.py file. If you use setuptools, it should
work fine - under Python 3 this is supported by 'distribute', a setuptools
fork that is a drop-in replacement. Tools like pip and virtualenv install
distribute for Python 3 environments. But you do need to check you don't have
syntax errors under Python 3.</p>
<p>Finally, add the Python 3.3 trove category to your setup.py:</p>
<pre class="literal-block">
&quot;Programming Language :: Python :: 3.3&quot;,
</pre>
</li>
</ol>
<p>Overrall, this took me about 2.5 hours to complete. However, the app is small, and
has a pretty good test suite, which makes things much, much easier. On the other
hand, I've given you a clear plan of attack, which is a big part of the battle.</p>
<p><strong>Now it's your turn</strong></p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Translating sentences with substitutions]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions" />
    <id>http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions</id>
    <updated>2013-01-24T00:14:46Z</updated>
    <published>2013-01-24T00:14:46Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Haskell" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Translating sentences with substitutions]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/translating-sentences-with-substitutions"><![CDATA[<div class="document">
<div class="section" id="the-problem">
<h1>The problem</h1>
<p>Many programs build up sentences using bits - often a template into which
different things might be substituted. However, the things you substitute into a
sentence can change the sentence, and vice-versa, in ways that are not
anticipated by the programmer.</p>
<p>For example, plurals. In English, you might try code like this:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
    <span class="k">return</span> <span class="s">&quot;I have 1 pig&quot;</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">return</span> <span class="s">&quot;I have </span><span class="si">%s</span><span class="s"> pigs&quot;</span> <span class="o">%</span> <span class="n">n</span>
</pre>
<p>Localising these strings gives problems, because the rules for how to create
plural forms is different in every language.</p>
<p>This specific problem is generally considered 'solved' by the use of gettext,
but many more exist.</p>
<p>For example, we have another problem as soon as we start substituting nouns:</p>
<pre class="code python literal-block">
<span class="s">&quot;Delete selected </span><span class="si">%s</span><span class="s">?&quot;</span> <span class="o">%</span> <span class="n">object_name</span>
</pre>
<p>Various attributes about the noun could affect the sentence. In French, the
adjective &quot;selected&quot; needs to agree in gender with the noun being substituted
in. So you cannot lookup the translations for &quot;Delete selected %s&quot; and for
<tt class="docutils literal">object_name</tt> separately. (This is a real example picked from Django source code).</p>
<p>Further, depending on how the sentence uses the noun, the form of the noun might
need to change. For example, the noun might appear in the accusative position
for a given sentence and language, which requires a different form of the noun
to be used compared to the nominative form.</p>
<p>Several other examples of this appeared in <a class="reference external" href="https://code.djangoproject.com/ticket/11688">Django ticket 11688</a>. One proposed solution on that
ticket would require a huge amount of knowledge and effort on the part of Django
programmers, and almost certainly would not work anyway.</p>
<p>This post is an attempt to come up with a better solution, or at least kick
start discussion. I haven't been able to find any solutions to this problem
online, and most people seem to be just using gettext, which is a 95% solution —
and maybe that is good enough for most people.</p>
<p>[Update 2013-02-19 - ‘Richard’ pointed me to <a class="reference external" href="http://search.cpan.org/~toddr/Locale-Maketext-1.23/lib/Locale/Maketext/TPJ13.pod">Locale::Maketext article</a>,
which has in essence a similar approach to what I've done here]</p>
</div>
<div class="section" id="assumptions-and-simplifications">
<h1>Assumptions and simplifications</h1>
<p>We will assume that a sentence is a composable unit of meaning, such that
sentences can be translated independently. So, if in language A we have
sentences 1 and 2, in that order, we can translate these into language B by
translating sentence 1 and sentence 2 independently, and putting them together
in the same order.</p>
<p>This is, no doubt a simplification. In some languages, the two sentences might
make more sense if re-ordered, or combined, or split in various ways. Indeed,
some languages may not have a truly equivalent concept of 'sentence' at all.</p>
<p>However, we have to do something, and this is a reasonable approximation.</p>
</div>
<div class="section" id="requirements">
<h1>Requirements</h1>
<p>We need a powerful way of defining sentences in a given human language. It must
be powerful enough that the person doing the translation can do anything they
need, without the programmer needing to be aware of all the things in the
language that will cause difficulty.</p>
<p>So, we'll start with a full programming language, and chop out the things we
shouldn't need.</p>
<p>We shouldn't need side effects - translation should be a pure function. So we'll
use a purely functional programming language without side effects.</p>
<p>We need something fairly readable, because translators are going to have to use
it. It should be as close as possible to declarative in style.</p>
<p>Pattern matching seems like a great fit for some of our needs.</p>
</div>
<div class="section" id="possible-solution">
<h1>Possible solution</h1>
<p>Given the above requirements, let's start with a Haskell-like pure functional
language, whose pattern matching will be extremely helpful. It will obviously
have IO removed, and no type signatures (but that won't stop us inferring them
and being able to statically type-check the code). Everything else will be
borrowed directly from Haskell, so that I can avoid having to make up my own
syntax and semantics.</p>
<p>If the concept works, we can argue about better or simpler syntax for some
constructs, or helper functions that aren't part of the Haskell prelude.</p>
<p>Hopefully, we will find a relatively small subset of Haskell that is needed to
give us all the power we need to solve this problem - a subset small enough that
we could guarantee non-termination ideally, to avoid problems with translations
created by malicious agents.</p>
<p>This will be an example based exploration.</p>
<p>Let's assume that every sentence can be generated by a function. The function
will take as parameters all the substutions that are needed, and return the
translated string.</p>
<p>So, suppose we have the English sentence &quot;I have some pigs&quot;. For every different
language we need, we would have a translation file which contains the function
<tt class="docutils literal">iHaveSomePigs</tt>, which in this case takes zero parameters. So for French:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveSomePigs</span> <span class="ow">=</span> <span class="s">&quot;J'ai des cochons&quot;</span>
</pre>
<p>(The mapping between the English sentence &quot;I have some pigs&quot; and the function
name <tt class="docutils literal">iHaveSomePigs</tt> hasn't been defined, and we'll skate over that detail for
now).</p>
<p>If we have a variable number of pigs, for French we might have:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNPigs</span> <span class="mi">0</span> <span class="ow">=</span> <span class="s">&quot;Je n'ai pas de cochon&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="mi">1</span> <span class="ow">=</span> <span class="s">&quot;J'ai un cochon&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;J'ai &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; cochons&quot;</span>
</pre>
<p>For English we could do this:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNPigs</span> <span class="mi">1</span> <span class="ow">=</span> <span class="s">&quot;I have 1 pig&quot;</span>
<span class="nf">iHaveNPigs</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;I have &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; pigs&quot;</span>
</pre>
<p>(For those unfamiliar with Haskell, the way that pattern matching works is that
the first definition that matches the arguments is used. Since <tt class="docutils literal">n</tt> is not a
literal, but a variable, it can match any argument.)</p>
<p>We can cope with more complicated rules, such as those used in Polish, perhaps
something like this:</p>
<pre class="code haskell literal-block">
<span class="nf">iHaveNFiles</span> <span class="n">n</span> <span class="ow">=</span> <span class="s">&quot;Mam &quot;</span> <span class="o">++</span> <span class="n">show</span> <span class="n">n</span> <span class="o">++</span> <span class="s">&quot; &quot;</span> <span class="o">++</span> <span class="n">pluralize</span> <span class="n">n</span> <span class="s">&quot;file&quot;</span>

<span class="nf">plurals</span> <span class="s">&quot;file&quot;</span> <span class="ow">=</span> <span class="p">[</span> <span class="s">&quot;plik&quot;</span>
                 <span class="p">,</span> <span class="s">&quot;pliki&quot;</span>
                 <span class="p">,</span> <span class="s">&quot;plików&quot;</span>
                 <span class="p">]</span>

<span class="nf">pluralize</span> <span class="n">n</span> <span class="n">word</span> <span class="ow">=</span> <span class="n">plurals</span> <span class="n">word</span> <span class="o">!!</span> <span class="n">pluralForm</span> <span class="n">n</span>

<span class="nf">pluralForm</span> <span class="n">n</span>
  <span class="o">|</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span>                                                                        <span class="ow">=</span> <span class="mi">0</span>
  <span class="o">|</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">10</span> <span class="o">&gt;=</span> <span class="mi">2</span> <span class="o">&amp;&amp;</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">10</span> <span class="o">&lt;=</span> <span class="mi">4</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> <span class="o">&lt;</span> <span class="mi">10</span> <span class="o">||</span> <span class="n">n</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> <span class="o">&gt;=</span> <span class="mi">20</span><span class="p">)</span> <span class="ow">=</span> <span class="mi">1</span>
  <span class="o">|</span> <span class="n">otherwise</span>                                                                     <span class="ow">=</span> <span class="mi">2</span>
</pre>
<p>Note that the complex logix in <tt class="docutils literal">pluralForm</tt> and <tt class="docutils literal">pluralize</tt> only has to be
defined once. Adding more words simply requires additional <tt class="docutils literal">plurals</tt>
lines. It's not the nicest syntax, but could probably be improved, and it's
pretty easy to copy.</p>
<p>Let's add in gender, using the sentences &quot;Delete this %s?&quot; (singular) and &quot;Delete
selected %s?&quot; (plural). We can use guards:</p>
<pre class="code haskell literal-block">
<span class="nf">deleteThisThing</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;Supprimer ce &quot;</span> <span class="o">++</span> <span class="n">singular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;?&quot;</span>
    <span class="o">|</span> <span class="n">otherwise</span>         <span class="ow">=</span> <span class="s">&quot;Supprimer cette &quot;</span> <span class="o">++</span> <span class="n">singular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;?&quot;</span>

    <span class="c1">-- (Ignoring the problem with 'ce' followed by vowel for now...)</span>

<span class="nf">deleteSelectedThings</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;Supprimer les &quot;</span> <span class="o">++</span> <span class="n">plural</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot; sélectionnés&quot;</span>
    <span class="o">|</span> <span class="n">otherwise</span>         <span class="ow">=</span> <span class="s">&quot;Supprimer les &quot;</span> <span class="o">++</span> <span class="n">plural</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot; sélectionnées&quot;</span>

<span class="nf">isMasculine</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">elem</span> <span class="n">thing</span> <span class="p">[</span> <span class="s">&quot;pig&quot;</span>
                               <span class="p">,</span> <span class="s">&quot;man&quot;</span>
                               <span class="c1">-- anything else masculine</span>
                               <span class="p">]</span>

<span class="nf">singular</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">pluralForm</span> <span class="mi">1</span> <span class="n">thing</span>
<span class="nf">plural</span>   <span class="n">thing</span> <span class="ow">=</span> <span class="n">pluralForm</span> <span class="mi">2</span> <span class="n">thing</span>

<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochon&quot;</span>
<span class="nf">pluralForm</span> <span class="mi">2</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochons&quot;</span>

<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;man&quot;</span> <span class="ow">=</span> <span class="s">&quot;homme&quot;</span>
<span class="nf">pluralForm</span> <span class="mi">2</span> <span class="s">&quot;man&quot;</span> <span class="ow">=</span> <span class="s">&quot;hommes&quot;</span>
</pre>
<p>Note that the only thing required by this system is that the functions
<tt class="docutils literal">deleteThisThing</tt> and <tt class="docutils literal">deleteSelectedThings</tt> exist. Everything else is at the
freedom of the translator, and better ways of defining any of these functions
are possible.</p>
<p>Of course, it isn't expected that a translator would be able to produce this by
himself/herself. However, once the basic logic has been set up, this syntax is
readable enough that a translator could easily add more of the same. Lines like:</p>
<pre class="code haskell literal-block">
<span class="nf">pluralForm</span> <span class="mi">1</span> <span class="s">&quot;pig&quot;</span> <span class="ow">=</span> <span class="s">&quot;cochon&quot;</span>
</pre>
<p>are actually pretty readable. The lack of parentheses in Haskell function calls
is also a bonus (though, as I said earlier, exact syntax could be debated). This
is not really that much harder than editing a .po file if you are just wanting
to add more of the same.</p>
<p>Also, we've got flexibility. If we really don't care about getting the gender
right, we can just do &quot;sélectioné(e)s&quot; and be done with it.</p>
<p>Let's make it harder - we'll add <strong>case</strong>. I'll use NT Greek as an example,
because it has nouns that decline with case (and I don't know any similar modern
languages well enough). I'm going to introduce an enum for the different cases,
using <tt class="docutils literal">data</tt> for now, and for the different genders. I could also do the same
for number (&quot;Singular&quot; and &quot;Plural&quot;), but just using <tt class="docutils literal">1</tt> and <tt class="docutils literal">2</tt> seems
easier.</p>
<p>Our sentence will be &quot;You like the %s.&quot;. For this in Greek, we need to choose
the accusative singular form of the thing we pass in. We also need to pick the
word for &quot;the&quot; (the definite article) which matches the <em>gender</em> and <em>number</em> of
the noun, and it has to match the accusative case too. So, if we pass in a
masculine word, we need the singular accusative masculine definite article
(having fun yet?):</p>
<pre class="code haskell literal-block">
<span class="kr">data</span> <span class="kt">Case</span> <span class="ow">=</span> <span class="kt">Nominative</span> <span class="o">|</span> <span class="kt">Accusative</span> <span class="o">|</span> <span class="kt">Genitive</span> <span class="o">|</span> <span class="kt">Dative</span>
<span class="kr">data</span> <span class="kt">Gender</span> <span class="ow">=</span> <span class="kt">Masculine</span> <span class="o">|</span> <span class="kt">Feminine</span> <span class="o">|</span> <span class="kt">Neuter</span>

<span class="nf">youLikeTheThing</span> <span class="n">thing</span> <span class="ow">=</span> <span class="s">&quot;φιλεις &quot;</span>
                        <span class="o">++</span> <span class="n">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="p">(</span><span class="n">genderOf</span> <span class="n">thing</span><span class="p">)</span>
                        <span class="o">++</span> <span class="s">&quot; &quot;</span>
                        <span class="o">++</span> <span class="n">accusativeSingular</span> <span class="n">thing</span> <span class="o">++</span> <span class="s">&quot;.&quot;</span>

<span class="nf">accusativeSingular</span> <span class="n">thing</span> <span class="ow">=</span> <span class="n">nounForm</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="n">thing</span>

<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιου&quot;</span>
<span class="nf">nounForm</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιω&quot;</span>

<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Nominative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλια&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Accusative</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλια&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Genitive</span>   <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιων&quot;</span>
<span class="nf">nounForm</span> <span class="mi">2</span> <span class="kt">Dative</span>     <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιοις&quot;</span>

<span class="nf">genderOf</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="kt">Neuter</span>
<span class="nf">genderOf</span> <span class="s">&quot;man&quot;</span>  <span class="ow">=</span> <span class="kt">Masculine</span>
<span class="c1">-- etc</span>

<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;ο&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;τον&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;του&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="kt">Masculine</span> <span class="ow">=</span> <span class="s">&quot;τω&quot;</span>

<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Nominative</span> <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;το&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Accusative</span> <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;το&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Genitive</span>   <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;του&quot;</span>
<span class="nf">definiteArticle</span> <span class="mi">1</span> <span class="kt">Dative</span>     <span class="kt">Neuter</span>    <span class="ow">=</span> <span class="s">&quot;τω&quot;</span>

<span class="c1">-- feminine etc</span>

<span class="c1">-- definiteArticle 2 (plurals) etc.</span>
</pre>
<p>Of course, you can easily define shorter aliases to avoid some typing here, and
there may be better ways to generate the tables, though as written above they
are pretty readable, and should be familiar to anyone who knows Greek.</p>
<p>The function <tt class="docutils literal">youLikeTheThing</tt> here is no longer very readable, although it
could be much worse. Some kind of substitution syntax/function could be used.</p>
<p>The code above actually works, BTW, and it actually ran first time I tried - the
only correction I needed to make its output correct was to add a space after the
definite article. You just need to put it in a file <tt class="docutils literal">test.hs</tt>, add the
following line:</p>
<pre class="code haskell literal-block">
<span class="nf">main</span> <span class="ow">=</span> <span class="n">putStrLn</span> <span class="o">$</span> <span class="n">youLikeTheThing</span> <span class="s">&quot;book&quot;</span>
</pre>
<p>and do:</p>
<pre class="code haskell literal-block">
<span class="o">$</span> <span class="n">runhaskell</span> <span class="n">test</span><span class="o">.</span><span class="n">hs</span>
</pre>
<p>There is not a type signature in sight, but you have compile time
guarantees. This is all a testimony to the clarity of Haskell's syntax.</p>
<p>The features of Haskell we've used are:</p>
<ul class="simple">
<li>functions</li>
<li>simple pattern matching on numbers and strings</li>
<li>guards</li>
<li><tt class="docutils literal">data</tt> statements, limited to union types of nullary constructors
i.e. effectively enumerated values. We could use a keyword <tt class="docutils literal">enum</tt> for
clarity.</li>
<li>string concatenation</li>
<li>lists</li>
<li>a few arithmetic and logical operators</li>
</ul>
<p>We haven't used recursion. I can imagine circumstances where it might be useful,
but if deemed too risky, you could add some rules that would disallow it
(e.g. by requiring a function mustn't call itself directly, and must only call
functions that exist prior to it in the source code, to avoid mutual recursion.)
This would be helpful to ensure termination.</p>
<p>You might also want a module system, to be able to pull in some common
definitions and functions for a given language, for consistency across different
projects.</p>
<p>This whole approach has the advantage of being able to refine and special case
as much as you want. Take the sentence &quot;you like the %s&quot;: suppose that if the
thing is a human being e.g. &quot;man&quot; or &quot;woman&quot;, you need to use a completely
different verb. Then you just add a special case first:</p>
<pre class="code haskell literal-block">
<span class="nf">isAPerson</span> <span class="s">&quot;man&quot;</span>   <span class="ow">=</span> <span class="kt">True</span>
<span class="nf">isAPerson</span> <span class="s">&quot;woman&quot;</span> <span class="ow">=</span> <span class="kt">True</span>
<span class="nf">isAPerson</span> <span class="n">n</span>       <span class="ow">=</span> <span class="kt">False</span>

<span class="nf">youLikeTheThing</span> <span class="n">thing</span>
    <span class="o">|</span> <span class="n">isAPerson</span> <span class="n">thing</span> <span class="ow">=</span> <span class="o">...</span>
<span class="c1">-- fall through to the normal case here</span>
</pre>
<p>In the other direction, if you just don't have the time to care about any of
this, you can just use a really simple (and often wrong) formula:</p>
<pre class="code haskell literal-block">
<span class="nf">youLikeTheThing</span> <span class="n">thing</span> <span class="ow">=</span>  <span class="s">&quot;φιλεις τον &quot;</span> <span class="o">++</span> <span class="n">greek</span> <span class="n">thing</span>

<span class="nf">greek</span> <span class="s">&quot;book&quot;</span> <span class="ow">=</span> <span class="s">&quot;βιβλιον&quot;</span>
</pre>
<p>Notice that the programmer of the main project does not know anything about
plural forms, gender, case etc., or put any of that into the source code. The
only thing he/she would do is call a function with all the things to be
substituted. We could have some mapping from English strings to function names,
or we could just use the function name as a string, e.g. from a Python project
we might call the function like so:</p>
<pre class="code python literal-block">
<span class="n">prompt</span> <span class="o">=</span> <span class="n">translate</span><span class="p">(</span><span class="s">&quot;doYouWantToDelete&quot;</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">)</span>
</pre>
<p>This would call the translation function doYouWantToDelete with the parameters
<tt class="docutils literal">n</tt> and <tt class="docutils literal">object_name</tt>.</p>
<p>As a refinement, we can provide a version which will work when the whole
localisation machinery is turned off i.e. we allow the programmer to provide
their own version of the translation function which returns the default language:</p>
<pre class="code python literal-block">
<span class="n">prompt</span> <span class="o">=</span> <span class="n">translate</span><span class="p">(</span><span class="s">&quot;doYouWantToDelete&quot;</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">,</span>
                   <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">:</span> <span class="s">&quot;Do you want to delete these </span><span class="si">%s</span><span class="s"> </span><span class="si">%s</span><span class="s">(s)&quot;</span> <span class="o">%</span>
                                          <span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">object_name</span><span class="p">))</span>
</pre>
<p>As before, the provided function can be correct or simplistic as desired for
English.</p>
</div>
<div class="section" id="feedback">
<h1>Feedback</h1>
<p>There are a few questions in my mind:</p>
<ol class="arabic">
<li><p class="first">Would a solution like this work for the languages you know? What additional
features would be needed to cope with other human languages?</p>
</li>
<li><p class="first">Is this vaguely practical? Could you get translators to be able to edit code
like this? If not, and only programmers would be able to do this, are there
enough programmer-translators to make it a viable solution, at least for some
big projects?</p>
<p>I'm aware that the string concatenation gets ugly fairly quicky, and some
kind of interpolation might be needed (including the ability to call
functions within that interpolation). With that in place, I think you could
achieve a reasonable level of readability.</p>
<p>A translation tool could also have language-specific templates to quickly
insert the code for common forms.</p>
</li>
<li><p class="first">Is it possible to have a simpler language that would still be able to cope
with the examples here?</p>
<p>The examples I've come up with suggest to me that you need a full programming
language, and that attempting to start from the other direction (e.g. build
up from the current gettext approach) will produce a monstrosity.</p>
<p>gettext already does a 95% job, and we are at the point of diminishing
returns. So if we are going to try to tackle the final bit, we need to err on
the side of enough power to get <strong>all</strong> the of that 5%, rather than put a lot
of effort in and discover we've only arrived at 96%.</p>
<p>You also cover the case of having a client who insists that the program
should output &quot;cet homme&quot; and not &quot;ce homme&quot; - while it might make your
translation file ugly, you've got the power to be able to do it if you want.</p>
</li>
</ol>
<p>Comments?</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Full screen WebView Android app]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app" />
    <id>http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app</id>
    <updated>2012-12-19T23:08:00Z</updated>
    <published>2012-12-19T23:08:00Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Christianity" />
    <summary type="html"><![CDATA[Full screen WebView Android app]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app"><![CDATA[<div class="document">
<p>I decided to make an Android app for my <a class="reference external" href="http://learnscripture.net/">Bible memory verse site</a> (created using Django). The main motivation for
the app is to get rid of the unnecessary and annoying address bars and status
bars when using the site on an Android phone. The site is already designed to
adapt to mobiles, so I'm not creating a native app — I just want a better way to
access the web app from an Android phone.</p>
<p>I tried <a class="reference external" href="http://www.appsgeyser.com/">appsgeyser</a>, but discovered they put
adverts on your site, which I definitely don't want — this is an entirely free
app, for a free (and ad-free) site.</p>
<p>My requirements are:</p>
<ul class="simple">
<li>Full screen<ul>
<li>without any controls ever popping up, because you don't need them.</li>
</ul>
</li>
<li>Progress bar for page loading.</li>
<li>Javascript works.</li>
<li>Links work as expected.</li>
<li>Back button works like builtin browser, until you get back to the
home page, where it will cause the app to exit.</li>
</ul>
<p>There are lots of pages and wizards with solutions for bits of these, but
putting them together turned out to be harder — for example, it seems that the
normal way of showing a progress bar for the whole window doesn't work if you've
gone full screen.</p>
<p>Anyway, I've completed the <a class="reference external" href="https://play.google.com/store/apps/details?id=net.learnscripture.webviewapp">learn scripture app</a>
(ha, my first Java app!), and thought I'd share the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src">complete source code</a>.</p>
<p>If you wanting a similar app, you are probably best creating your basic app
structure using a wizard, but it is helpful to see a complete solution. The
important bits are:</p>
<ul class="simple">
<li>permissions for Internet access: <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b09b85147f0416ce528ff85218f1f/AndroidManifest.xml?at=default#cl-11">AndroidManifest.xml</a></li>
<li>the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b/src/net/learnscripture/webviewapp/Dashboard.java?at=default">main activity source code</a></li>
<li>the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b/res/layout/activity_dashboard.xml?at=default">main layout definition</a></li>
</ul>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Why escape-on-input is a bad idea]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea" />
    <id>http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea</id>
    <updated>2012-08-06T20:59:01Z</updated>
    <published>2012-08-06T20:59:01Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Security" />
    <category scheme="http://lukeplant.me.uk/blog" term="PHP" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Why escape-on-input is a bad idea]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea"><![CDATA[<div class="document">
<p>The right way to handle issues with untrusted data is:</p>
<blockquote>
Filter on input, escape on output</blockquote>
<p>This means that you validate or limit data that comes in (filter), but only
transform (escape or encode) it at the point you are sending it as output to
another system that requires the encoding. It has been standard best practice
since just about forever <sup>[citation required]</sup>.</p>
<p>An alternative is “escape on input”: at the point that data enters your system,
you apply a transformation to it to avoid a problem further down the line when
the data is used.</p>
<p>It's come to my attention that some serious web developers (or at least, they
take themselves seriously and are taken seriously by others) are <strong>still</strong>
suggesting the practice of escape-on-input.</p>
<p>For example, with escape-on-input, to avoid XSS any data that enters your system
has HTML escaping applied to it <em>immediately</em>, before your application code
touches it.</p>
<p>I chose that example deliberately, because people are actually recommending it:</p>
<ul>
<li><p class="first"><a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html#comment-572962448">in some recent “PHP sucks” debate</a>.</p>
</li>
<li><p class="first">which, in turn, linked to a <a class="reference external" href="http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html">page by Rasmus Lerdorf recommending
escape-on-input as a sensible way to deal with XSS</a>.
The page, admittedly, is describing a ‘toy’, a ‘no-framework PHP framework’,
yet he does seem to be serious about the usefulness of escape-on-input.</p>
<p>The page is from 2006, and uses the pecl/filter extension, but the extension
has since made it into core (PHP 5.2), and the <a class="reference external" href="http://www.php.net/manual/en/filter.configuration.php">docs for it</a> suggest a
configuration that is clearly intended for XSS prevention. As recently as
2008, and probably to this day, Lerdorf is <a class="reference external" href="http://grokbase.com/p/php/php-internals/083qakz7wj/php-dev-short-open-tag">still defending and recommending
this approach</a>,
and it appears to be part of his reason for thinking that PHP templating
doesn't need an autoescape mechanism.</p>
</li>
<li><p class="first">Just as significantly, <a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security">Etsy are using and recommending escape-on-input</a> (slide
18 onward). As a very successful modern company using PHP, people will look
up to them and copy them.</p>
</li>
</ul>
<p>So, this approach, unfortunately, is popular amongst some, and I can't find a
decent post explaining why it's such a terrible idea both in theory and
practice. Here is my attempt. It should be applicable to almost any system and
any language, although I'll mainly be using examples from web development.</p>
<div class="section" id="in-theory">
<h1>In theory</h1>
<ul>
<li><p class="first">First of all, escape-on-input <strong>is just wrong</strong> — you've taken some input and
applied some transformation that is totally irrelevant to that data. If,
taking our example, you have some data collected by HTTP POST or GET
parameters, applying HTML escaping to it is a layering violation — it mixes an
output formatting concern into input handling. Layering violations make your
code much harder to understand and maintain, because you have to take into
account other layers instead of letting each component and layer do its own
job.</p>
<p>Doing things ‘right’ is very important, even if doing them ‘wrong’ seems to
work and you are tempted to be dismissive of ‘theoretical’ concerns about
purity etc. When you have to maintain code, you will be very glad if things
are in the right place, and not full of hacks and surprises.</p>
</li>
<li><p class="first">You have corrupted your data by default. The system (or the most convenient
API) is now lying about what data has come in. As you have applied a
transformation to the <strong>data itself</strong>, the layering violation is not an
isolated problem in one part of the code, but infects every part of your code,
especially if you store the corrupted data in a database.</p>
<p>Your data is <strong>everything</strong>. As I read recently, <a class="reference external" href="http://blog.datamarket.com/2012/07/08/the-11-best-data-quotes/">“data matures like wine,
applications like fish”</a>. You can
always rewrite your application, but if you corrupt your data, you've done the
worst thing you can to your system.</p>
</li>
<li><p class="first">This is exacerbated by the fact that many encodings are one-way — you cannot
losslessly or unambiguously convert them back. If at a later point you need
the original data, you might be in a pickle.</p>
</li>
<li><p class="first">Escaping your data for one output backend will only deal with <strong>that</strong>
output. A typical web app might deal with at least the following backends,
which have different characters that are dangerous, and have different
requirements for dealing with them:</p>
<ul class="simple">
<li>HTML: ' &lt; &gt; &quot; &amp;</li>
<li>SMTP and HTTP: ; : newlines</li>
<li>SQL: '</li>
<li>JSON: &quot;</li>
<li>Shell - space, quotes and various other characters</li>
</ul>
<p>Any number of others could be added, and all could have security
implications. Using escape-on-input will only fix one of these (apart from
happy coincidences where it might fix more than one), but for the others you
will still need a sensible solution to the problem. Why not have a sensible
solution for all of them?</p>
</li>
<li><p class="first">Escaping on input will not only fail to deal with the problems of more than
one output, it will actually make your data <strong>incorrect</strong> for many outputs.</p>
<p>Suppose you decide to do HTML escaping, and someone enters <em>Jack &amp; Jill</em> as a
title for something. Your escape-on-input turns this to <em>Jack &amp;amp; Jill</em> and
that goes in the DB. Suppose you want to email people and put this title in
the subject line. You now have to apply the reverse transformation to get a
sensible subject line in the email, and you have to <strong>remember</strong> to do this
for every output that is not HTML.</p>
<p>Sometimes, the bug is <a class="reference external" href="http://instagram.com/p/SVfQruppEE/">significantly more annoying</a> than an email with an incorrect title.</p>
<p>You also have daft bugs like the fact that doing a search on that field for
the string ‘amp’ (or ‘quot’, ‘apos’, ‘lt’, ‘gt’ etc. or any substrings) will get
various false matches.</p>
<p>I have seen some people respond to this by saying “it's better to have the
occasional double-encoding bug or incorrect query result than an XSS
exploit&quot;. Well, first, that depends on your business. XSS is a problem because
it costs time and money, and so does corrupting your data. Many people have
data that actually matters, and corrupt data is a big deal, and much harder to
cope with than an XSS bug, because data lives on and on, while your code can
get replaced easily.</p>
<p>Second, this decision affects <strong>frameworks</strong> that are used to handle data of
<strong>all kinds</strong>, and the decision affects the entire code base of your
application and beyond, as described below. Data-handling frameworks that work
on the assumption that your data is not important are insanity. <a class="reference external" href="http://www.biblegateway.com/passage/?search=Psalm%2011:3&amp;version=KJV">If the
foundations be destroyed, what can the righteous do?</a></p>
<p>Third, it's entirely unnecessary. XSS is not hard to fix given decent
programming tools.</p>
</li>
<li><p class="first">At what point does data ‘enter’ your system?</p>
<p>It might sound like a simple question, but it's tricky in reality, and I'll
illustrate using an HTTP request.</p>
<p>In most web apps, the GET and POST parameters are your ‘raw input’. However,
using most normal web framework APIs, data in GET and POST parameters has
already been interpreted. The ‘raw’ data is really the bytes that make up the
HTTP request, which typically will use URL encoding for GET query parameters
and a choice of encodings for POST data (URL encoding or MIME multipart
attachment format).</p>
<p>The framework may also do another level of decoding — interpreting the
series of bytes as a series of unicode code points.</p>
<p>Both parts of this initial transformation makes sense and are appropriate,
because they are reversing the encoding already applied to the data by the
protocol involved. The web browser takes the data you type in — unicode code
points — and applies a series of transformations to it, according to the HTTP
protocol, and your web framework reverses these to get the data back.</p>
<p>Now, if you want to avoid XSS problems, you have to apply the escaping
<strong>after</strong> this initial decoding has been done. But this highlights another
possibility. What if the data requires <em>further</em> decoding before you get the
‘real’ raw data? For example, some data might be sent base64 encoded for a
variety of reasons, or any other type of encoding.</p>
<p>This extra level of encoding gives two problems:</p>
<ul>
<li><p class="first">your automatic HTML escaping may have corrupted the encoded data so that it
now cannot be decoded. For example, you had a GET parameter that held a URL,
which itself had parameters in the query string:</p>
<pre class="literal-block">
GET /foo?bar=1&amp;url=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26y%3D2 HTTP/1.1
</pre>
<p>Your framework's HTTP handling will produce a query dictionary that looks
something like the following:</p>
<pre class="literal-block">
{&quot;bar&quot;: 1,
 &quot;url&quot;: &quot;http://example.com/?x=1&amp;y=2&quot;
 }
</pre>
<p>But your automatic escaping turns that into:</p>
<pre class="literal-block">
{&quot;bar&quot;: 1,
 &quot;url&quot;: &quot;http://example.com/?x=1&amp;amp;y=2&quot;
 }
</pre>
<p>If you want to extract the 'y' parameter from 'url', you are stuck. You
can't correctly interpret the data in the 'url' parameter, because it has
been corrupted. You're going to have to re-decode the input, and you might
not even notice this problem.</p>
</li>
<li><p class="first">Even if the data comes through your automatic escaping unscathed
(e.g. base64 under HTML escaping), or you can undo the corruption and get it
properly decoded, after decoding you will have to <strong>manually apply</strong> HTML
escaping to make it match all the other automatically escaped data. If you
don't, you've potentially got a bug and an XSS exploit.</p>
<p>So your automatic escape-on-input has <strong>missed</strong> data, and this happens
because you can't really define the point at which the data has ‘entered’
your system and needs the escaping applied.</p>
</li>
</ul>
<p>This problem means that the escape-on-input approach is inherently flawed and
<strong>cannot</strong> be fixed. <strong>You just have to patch it up on a case-by-case basis,
which is exactly what escape-on-input is supposed to avoid.</strong></p>
<p>And then, what about other sources of data — data on the file system, in a
cache etc. Are these entry points? Well, it depends on how the data was put
there. You have to manually follow this all the way through your app; get it
wrong and you've got double escaping bugs or security flaws.</p>
<p>(By contrast, escape on output always works, because you apply it at the point
where you know it is needed — in the backend that knows the escaping rules.)</p>
</li>
<li><p class="first">Other systems putting data into your database, or getting data out, have to
abide by your data transformation rules.</p>
<p>These systems might have nothing to do with your primary domain (e.g. a web
site). Making them understand and obey rules that have nothing to do with the
data itself is insanity and extremely short sighted.</p>
<p>You can't deal with this problem when you come to it, because you don't have
to just fix your code, you've got to fix all your data too, and by the time
you cross this bridge you might have a lot of data and might need a very
delicate database migration to get it right. The data may even have escaped
your control (e.g. been copied into other systems), or backwards compatibility
concerns might stop you from making the change you need to make.</p>
</li>
<li><p class="first">Within your main application, the decision to escape on input affects your
whole code base.</p>
<p>If you want to use any libraries, you need to make sure that they are using
all the same assumptions that you have in your main code base.</p>
<p>For example, if you've got a form/widget library in your web app, it will very
often need to echo user input back to them in the case of a form that has
validation errors. This library has to know if you already escaped the input.</p>
<p>Writing the library to work in two modes is asking for trouble. Rather, you
need it to have been written from the beginning to assume the same escaping
rules.</p>
<p>This kills code re-use — you can only use code that assumes the same input
escaping — or it means that you will end up with tons of bugs due to
incompatibilities between the assumptions made in your application code and
the library.</p>
<p>Essentially, this is the problem of a global configuration setting, but worse
since it affects the <em>operand</em> of your entire application (the data going
through it), not just the functionality of various <em>operators</em>.</p>
</li>
<li><p class="first">The confusion caused by the above is likely to <em>increase</em> security
problems. “Keep It Simple, Stupid” remains a very good maxim for developers.</p>
<p>To continue an example used above: you want to send an email with some data
that has already been HTML escaped, and so you need to unescape the data to
avoid emails with the subject &quot;Jack &amp;amp; Jill&quot; when the user entered &quot;Jack &amp;
Jill&quot;. You decide it's not sensible for the mail sending functions to do this
internally, (or maybe they're provided by a 3rd party who made that decision
for you), so the calling code does the unescaping.</p>
<p>You later decide to switch to HTML emails, and the developer who implements it
thinks that since data is already escaped, there is no problem including it
without extra escaping in the body of the HTML email, leading to a
vulnerability (not classic XSS in this case, but still a problem).</p>
<p>There is also the example I gave above where an extra layer of
encoding/decoding in the raw data makes it likely you'll forget to apply the
escaping.</p>
<p>The confusion caused by escape-on-input means your entire code base becomes a
potential source not only of double-escaping bugs but of security problems as
well.</p>
</li>
</ul>
</div>
<div class="section" id="in-practice">
<h1>In practice</h1>
<p>Thankfully, we don't just have to rely on the above analysis to conclude that
escape-on-input is a terrible idea. PHP, always willing to help when it comes to
“examples of how not to do it”, provides us with a perfect case study.</p>
<div class="section" id="magic-quotes">
<h2>Magic quotes</h2>
<p>PHP used to have a feature called magic quotes. It was an escape-on-input
feature that escaped single quotes (<strong>'</strong>) with backslashes. This was to protect
you from SQL injection attacks, by making the data safe for interpolation into a
SQL query.</p>
<p>This caused all kinds of problems.</p>
<p>First, if you are not first passing something through a database, and using
string interpolation to build up SQL queries, you have to remember to strip
those slashes using the function <tt class="docutils literal">stripslashes()</tt>.</p>
<p>If you don't, you get double encoding. It looks like \'this\', you\'ve
almost certainly seen it across the web, though it seems we\'re thankfully
past the worst of it.</p>
<p>Second, even if you remember, you've added some hideous cruft to your code. In
the bit of code which is handling form validation (and is therefore echoing user
input back to the user without the database being involved), you've got these
bizarre <tt class="docutils literal">stripslashes()</tt> calls. What on earth does ‘reverse transforming a
string for SQL statement preparation’ have to do with the task of input
validation?</p>
<p>Third, it turns out that different databases need different escaping mechanisms
to do things fully correctly. So you now have to do <tt class="docutils literal">stripslashes()</tt> on data
even if you are passing it to a database using string-interpolated queries!</p>
<p>Then, since the above problems are common (building up SQL queries by string
interpolation was always a bad idea, and very often you pass on the data to
outputs that don't want SQL escaping at all), it's desirable to have a way to
turn this behaviour off completely.</p>
<p>To handle this, there is a php.ini setting to turn it on/off.</p>
<p>And there were more complications, for example:</p>
<ul class="simple">
<li>do you apply magic quotes to ‘all input’ (<tt class="docutils literal">magic_quotes_runtime</tt>) or just to
GET/POST/COOKIE data (<tt class="docutils literal">magic_quotes_gpc</tt>)? (This is the problem of defining
what exactly is ‘input’)</li>
<li>attempts to fix some of the above with yet more configuration options like
<tt class="docutils literal">magic_quotes_sybase</tt>.</li>
</ul>
<p>And so now you've got even more problems. Since these are global settings, you
can't have library code mess with them, since other code might set the global to
a different value or assume a certain value.</p>
<p>You could try making all code detect the current setting and have different code
paths depending on the result. This works very badly — having multiple code
paths is a recipe for code duplication and bug proliferation. It's extremely
easy to forget to do it, or get one of the paths wrong, since you will likely
only test one configuration value and one set of code paths in reality.</p>
<p>Alternatively, you can make one bit of code responsible for fixing the setting
to a sensible value (the only one being 'off'), and then make all code assume
that from then on. (If you can't turn it off, you can use the code included
<a class="reference external" href="http://www.php.net/manual/en/security.magicquotes.disabling.php">here</a> as a
horrible kludge to reverse it's behaviour).</p>
<p>Eventually, this final approach was the one taken by all significant
projects. <strong>Turn the whole feature off, and assume it is off from then
on</strong>. (Which means the feature is useless, of course).</p>
<p>And of course, thankfully, the PHP developers realised that this entire thing
was a <strong>huge mistake</strong> that caused nothing but a vast amount of confusion and
bugs, and <strong>removed the whole thing</strong> for good in PHP 5.4.</p>
<p>Magic quotes, <a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">as eevee put it</a>, were “so
close to secure-by-default, and yet so far from understanding the concept at
all.”</p>
<p>To digress for a moment: we keep getting told that PHP is improving, and the
community has learnt from its mistakes. Unfortunately it seems the leaders in
the community are bent on <strong>recreating</strong> old mistakes.</p>
<p>According to Lerdorf, the much newer PHP 'filter' extension is <a class="reference external" href="http://grokbase.com/t/php/php-internals/08373a1vvf/short-open-tag/083qakz7wj#20080323qvterw1df6a006qxyg83z9qsb8">“magic_quotes
done right”</a>. But
it still suffers from almost all the problems described here, for all the
reasons described. Global HTML escaping on input is essentially the same as
magic quotes, and just as tragically bad.</p>
</div>
<div class="section" id="elgg">
<h2>Elgg</h2>
<p>In researching for this post, I came across <a class="reference external" href="http://trac.elgg.org/ticket/561">this ticket for Elgg</a>, an open source social networking engine.
Just read through the ticket and see the mess they are in. It's clear they
strongly regret the decision they made to escape-on-input, and, in their own
words, have created “horrendous” problems for themselves, especially as their
application has grown to include other interfaces such as JSON REST APIs.</p>
<p>However, fixing it is very hard. They have to coordinate many changes across
their code base with a big database migration. If data has leaked from the
databases and tables they control into other systems, such as denormalised
tables, other databases, caches etc., or if there is other code by third parties
that makes the old assumptions about encoded data, they are in even more of a
pickle. And both of those things are probably inevitable in something like an
open source framework, which is designed for other people to build on and
extend.</p>
<p>This is the pain that comes from mixing input handling and output encoding,
and from corrupting the data in your database.</p>
</div>
<div class="section" id="etsy">
<h2>Etsy</h2>
<p>According to <a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security">their security presentation</a>,
Etsy are using escape-on-input for XSS protection.</p>
<p>They claim that this is a much more secure option, as it is secure by
default. (They do note, however, the problem with input that is encoded in some
other way, like base64, so they are aware of the problems.)</p>
<p>Their presentation goes on to describe an elaborate system for detecting and
fixing XSS attacks (the slides don't give enough detail for me to understand
what exactly they are doing, but it's clearly a lot of work).</p>
<p>And <a class="reference external" href="http://www.nzinfosec.com/etsy-has-been-one-of-the-best-companies-ive-reported-holes-to/">their system does indeed catch XSS bugs in the wild and allow them to fix
them within hours</a>.</p>
<p>Wait, what?</p>
<p>They've corrupted their database by doing escape-on-input, they've inflicted
themselves with all the development pain described above, and they've <strong>still</strong>
got XSS bugs?</p>
<p>Granted, they've got impressive ways of dealing with these problems. But it's
like <a class="reference external" href="http://xkcd.com/463/">virus checkers on voting machines</a>. Advanced ways
of dealing with problems that shouldn't even be possible tells you that you are
doing it wrong. They've become very fast at <a class="reference external" href="http://www.red-sweater.com/blog/125/easy-programming">re-tying their shoelaces, instead
of working out how to tie shoelaces so they don't come undone</a>.</p>
<p>They claim that with escape-on-input, XSS problems are now greppable, but it
doesn't sound like it. If they were, code audits would be a massively more
efficient way to find XSS problems than the methods they are using.</p>
<p>The main problem is almost certainly that they are using an output system for
HTML that doesn't do HTML escaping by default (I'm guessing they are using PHP
as their template language). If the backend that deals with HTML <strong>actually
deals with HTML</strong> then you eliminate the vast majority of these problems
overnight.</p>
<p>I'm willing to bet that large sites that use Django (or other frameworks that
have basically solved the XSS problem by HTML escaping on output <strong>by default</strong>)
don't have teams and automated systems dedicated to this problem, and don't need
them. In Django apps, XSS problems <strong>are</strong> greppable - you grep for
<tt class="docutils literal">mark_safe</tt> in Python and the <tt class="docutils literal">|safe</tt> filter in templates (and then,
obviously, you may have to recursively grep for any functions that call
<tt class="docutils literal">mark_safe</tt> on inputs). Since all data which isn't ‘mark_safe()’d gets escaped
by the templating engine, and all HTML comes out of the template engine, that's
basically all you need to do.</p>
</div>
</div>
<div class="section" id="now-for-some-flame-bait">
<h1>Now for some flame bait</h1>
<p>How did this happen to Etsy?</p>
<p>Are the Etsy devs stupid? I suspect not. Etsy is clearly doing well, and I
imagine they have enough money to hire top-notch developers. Some of their
<a class="reference external" href="http://www.etsy.com/careers/job_description.php?job_id=ozhhVfwM">careers pages</a> show they
are happy using a variety of languages and technologies, and their <a class="reference external" href="http://codeascraft.etsy.com/">engineering
blog</a> seems to be sane and competent. Even their
security presentation showed considerable ingenuity and technical ability in
dealing with security problems (in entirely the wrong way, unfortunately, but
still impressive).</p>
<p>I doubt they are low quality developers. Rather, I suspect that use of PHP has
addled their brains. They have become far too accustomed to working in an
environment in which insanity reigns — an environment in which <a class="reference external" href="/blogmedia/php_less_than.txt">the less than
operator pretends to work correctly with strings but it's just a trap</a>.</p>
<p>When I programmed in a Windows environment, I theorised that use of Windows
itself contributed to the poor quality of the programming in the code base, and
the fact that developers thought nothing or writing tons of tedious
code. Because Windows was so unscriptable, I imagined that Windows programmers
developed a high tolerance for tedium and repetition, which is exactly the
opposite of qualities needed by a programmer to make a computer do everything
efficiently and reliably. (Since then, I've found that Sturgeon's law was
probably a better explanation for the quality of the code, but I still think the
fundamental idea applies).</p>
<p>With PHP, the fact that it comes with a template language that is simply not fit
for purpose — because it doesn't do HTML escaping by default, or even easily —
has somehow made the Etsy developers believe that it is normal to struggle with
XSS, that it is perfectly reasonable that even after taking the drastic action
of corrupting their entire database by HTML escaping it, they should <strong>still</strong>
need elaborate XSS-catching systems.</p>
<p>Instead of <a class="reference external" href="http://www.youtube.com/watch?v=5mdy8bFiyzY">trying</a> to fix XSS,
they should just fix it. Like <a class="reference external" href="https://docs.djangoproject.com/en/dev/topics/templates/#automatic-html-escaping">this in Django</a>. Or
<a class="reference external" href="http://pypi.python.org/pypi/MarkupSafe/">this in Turbogears and Jinja</a>. Or
<a class="reference external" href="http://www.yesodweb.com/book/shakespearean-templates#types-33">this in Yesod</a>. Or even <a class="reference external" href="http://twig.sensiolabs.org/doc/templates.html#html-escaping">this
in PHP</a> (though
due to limitations of the language you won't be able to have the convenience of
things like <a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe">mark_safe</a>
in Django). But living with an environment of pain and madness makes you think
that it ought to be hard.</p>
<p>Right the way up to Rasmus Lerdorf at the top, many people in the PHP community
live with the insanity of their tools, and add more insanity to cope with it,
rather than fix their tools or choose better ones.</p>
</div>
<div class="section" id="a-lesson-for-pythonistas">
<h1>A lesson for Pythonistas</h1>
<p>Bashing other people is fun, but when I do so I always try to get something more
valuable out of it by using the opportunity to examine myself. The problem I
discussed in the last section (which is just a manifestation of the <a class="reference external" href="http://en.wikipedia.org/wiki/Broken_windows_theory">broken
windows theory</a>) applies
to other communities, and I'll attempt to apply it to the Python community.</p>
<p>Refusing to live with stupidity is one of the reasons that Python 3 is really
important.</p>
<p>Python 3 does not represent a massive leap forward in terms of additions to the
language. Mainly it just fixes a bunch of mistakes in Python 2, and introduces a
whole lot of backwards incompatibilities in the process. One of the biggest is
unicode/bytes. Python 2 was stupid here — it went directly against the Zen of
Python, and said “in the face of ambiguity about what encoding to use, guess.”
This caused a world of pain.</p>
<p>Now, you can work around it in most cases by some sensible conventions and a
certain amount of discipline. You can also cope with the fact the <tt class="docutils literal">&quot;a&quot; &lt; 1</tt>
doesn't raise an exception. You can live with <tt class="docutils literal">next()</tt> being a method in the
iterator protocol, when it should be a method called <tt class="docutils literal">__next__()</tt> and a builtin
<strong>function</strong> <tt class="docutils literal">next()</tt>. You can live with the fact that <tt class="docutils literal">print</tt> is a totally
unnecessary keyword, since it should just be a builtin function. You can get
used to the fact that <cite>class Foo:</cite> means something subtly but significantly
different from <cite>class Foo(object):</cite>. You can work around or ignore dozens of
other little niggles, gotchas and inconsistencies.</p>
<p>But all the while, you are training yourself to tolerate stupidity,
inconsistency and brokenness. Removing these warts is really important, and
worth all the pain of the migration. The alternative is for Python to become the
next PHP.</p>
<p>On top of these things, there are other types of brokenness in Python that
people in the community seem less willing to acknowledge or tackle. For some of
these I think we need exposure to completely different languages — languages
where you can spawn thousands of ‘threads’ easily and get performance benefits,
for example, or languages where you can write code that is both very high level
<strong>and</strong> extremely fast. If we live entirely with Python and its set of
limitations, we'll think that those problems are normal and unavoidable.</p>
<hr class="docutils" />
<p>Updates:</p>
<ul class="simple">
<li>2012/08/07 - corrections about turning magic_quotes_gpc off at runtime.</li>
<li>2012/10/08 - noted bug with queries returning false matches.</li>
</ul>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[PHP, Python and Persuasion]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/php,-python-and-persuasion" />
    <id>http://lukeplant.me.uk/blog/posts/php,-python-and-persuasion</id>
    <updated>2012-07-04T22:22:29Z</updated>
    <published>2012-07-04T22:22:29Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="PHP" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[PHP, Python and Persuasion]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/php,-python-and-persuasion"><![CDATA[<div class="document">
<p>I always find it fascinating to observe conversations in which people's
arguments fail to convince each other. A few days ago we witnessed some PHP
debates, kicked off by <a class="reference external" href="http://www.codinghorror.com/blog/2012/06/the-php-singularity.html">Jeff Attwood</a>.</p>
<p>I foolishly got slightly involved on one ‘<a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html">rebuttal</a>’. At
church last Sunday I also chatted with a friend who has used PHP and likes it,
and this time I tried to put myself in his shoes. It is often much more helpful
talking to people in the flesh, and I think it is always enlightening to look at
why we fail to convince.</p>
<p>One reason we can't rule out is simple irrationality. All of us are vulnerable
to <a class="reference external" href="http://en.wikipedia.org/wiki/Confirmation_bias">confirmation bias</a>, and
people will often go to great lengths to convince themselves that they are doing
the right thing and do not need to change their views or practices. You see this
all the time when two groups of people share experiences after having made
different decisions about how to spend some leisure time. Both groups are often
desperate to believe that they haven't missed out, and will seek to persuade
each other (but in reality, persuade themselves) that they were in the group
that had the most fun.</p>
<p>However, just assuming the other person is being irrational doesn't really help
you, and can in fact hinder communication. Below I will attempt to be more
constructive in looking at the ways we can fail to convince people. I'll try not
to turn it into a rebuttal to the “PHP isn't so bad” posts!</p>
<p>In his great rant <a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">PHP: a fractal of bad design</a>, ‘Eevee’
has lots of great arguments against PHP, and there were others in different
posts. But there are reasons why some will fail to hit home. (This is not a
criticism, by the way — many of these problems are unavoidable if you are
addressing an audience as mixed as “all the PHP developers in the world”).</p>
<ol class="arabic">
<li><p class="first">Expert understanding.</p>
<p>Eevee writes:</p>
<blockquote>
<p>empty($var) is so extremely not-a-function that anything but a variable,
e.g. empty($var || $var2), is a parse error. Why on Earth does the parser
need to know about empty?</p>
</blockquote>
<p>To an experienced programmer, <tt class="docutils literal">empty()</tt> is rather surprising. If you know —
or at least have some idea — about how to implement a programming language,
you'll understand terminology like lexer, parser, interpreter etc. So when
you try <tt class="docutils literal"><span class="pre">empty($var</span> || $var2)</tt>, and it returns a <strong>parse</strong> error, even
though it looks like a function, you think “this programming language must
have been designed by complete amateurs — I don't want anything to do with
it”.  [ <em>EDIT: changed this paragraph, previously I was gettting mixed up
between isset() and empty()</em> ]</p>
<p>For a less experienced or able programmer, however, none of this is a problem.
Programming languages are entirely magic, and one type of magic is no more
surprising than another.</p>
<p>Talking about what the parser is doing is completely incomprehensible to such a
developer. You cannot communicate the reaction you feel, because it requires a
deeper level of understanding of how things are supposed to work. Less able or
experienced developers are simply unable to assess the quality of the tools they
work with.</p>
</li>
<li><p class="first">Craftsmanship</p>
<p>For many coders, the <strong>only</strong> thing that matters is whether PHP allows them to
get something done. The quality of the tools or materials being used does not
matter.</p>
<p>Not only are many coders <strong>unable</strong> to assess the quality of the tools they
using, they wouldn't care even if they could, because programming is simply
about getting something done. Any thought of taking pride in your work is
absent. Such a person will never be convinced by arguments that talk about the
quality of tools.</p>
</li>
<li><p class="first">Amateur vs professional</p>
<p>Eevee accused the PHP world of being created by and filled with amateurs. I
suspect that to many PHP developers, that has the same effect as Java people
saying to Pythonistas that Python is not ‘enterprise ready’. Many Python
developers don't care about “the Enterprise” in the first place, and the word
may even have negative conatations — associations of massively overengineered
and poorly written bloatware written in Java or C#, which could be replaced by a
Python project 10 times smaller.</p>
<p>Many PHP users simply do not aspire to be professional, because they are happy to be
called amateurs — they <strong>are</strong> amateurs, doing stuff just for fun, pure
hobbyists who are not making money out of what they are doing, nor relying on
PHP to behave in a sane manner with important data. PHP has brought joy to their
life. Does it matter to them that PHP doesn't live up to some standard they
don't need?</p>
<p>Many developers simply do not care that much about the data that goes through
their website. “So what if PHP doesn't have any decent way of handling decimal
values?  It's close enough for my needs.” For these people not only is the
quality of the <strong>tool</strong> unimportant, the quality of the <strong>result</strong> is of
little consequence. No-one will die or sue them even if it has major bugs.</p>
<p>Also, professionals often have (or feel) obligations to support the code they
write, whereas amateurs do not. Therefore amateurs will always trade
long-term maintainability for initial deployment speed.</p>
</li>
<li><p class="first">Defending your day job</p>
<p>The previous argument doesn't cover a lot of users, however. Many are using PHP
‘professionally’, and even have customers who may demand that work is done with
PHP. (It is perhaps the essential problem of PHP that a language that was
designed to be a simple template language for non-programmers has turned into
the work-horse of the web, and the network effects caused by adoption amongst
amateurs have made it a language for professionals.)</p>
<p>Now, most people want to feel good about the work they are doing. And most
people are not in a position to have much influence on whether or not they
use PHP. If you tell someone “the tool you are using is Bad For The World”,
they will get defensive, even if they would rather be using something else.</p>
<p>I'm extremely fortunate in the work I do that I get to choose my projects
carefully, and then charge a high rate for work that I can take real pride
in. It's actually fairly unkind of me to berate people for using a language that
they didn't really choose — even if they think they are choosing it and will
defend it to the death. They are only doing so because the alternative is to say
“yes it sucks, and yes I'm doing the world a disservice by continuing to promote
it, but it pays the bills”. I've occasionally been in a similar situation in the
past, and know how demoralising and depressing it is, and it is more comforting
to rationalise your current situation.</p>
<p>In fact, when I was last in a situation like this, I did come up with a whole
bunch of rationalisations, which I still think are valid to some degree.</p>
<ul>
<li><p class="first">Engineers must be pragmatists at some level. You're paid to achieve things,
not for the internal beauty of your code.</p>
</li>
<li><p class="first">I should take pride in customer satisfaction, because that is what matters.</p>
</li>
<li><p class="first">The language itself is not the only factor. You also must consider:</p>
<ul class="simple">
<li>development tools</li>
<li>the availability and quality of libraries</li>
<li>documentation and support for such libraries</li>
<li>availability of people to maintain the software in the future.</li>
</ul>
<p>All of these things depend on human factors and network effects, and can be
used to justify a choice on the basis that “lots of other people are choosing
this”.</p>
</li>
</ul>
<p>(If you are a “PHP developer” reading this, I apologise for being patronising —
this post is not meant to insult, and is not aimed at you but at others).</p>
</li>
<li><p class="first">Experience</p>
<p>The vast majority of PHP developers that I've come across have not used any
other serious web frameworks. That is why they can say things like <a class="reference external" href="http://fabien.potencier.org/article/64/php-is-much-better-than-what-you-think">“PHP is the
best web platform... ever.”</a></p>
<p>To make such a statement, you need to have <strong>in-depth</strong> knowledge of a <strong>large</strong>
number of alternative web platforms — if not <strong>all</strong> the web platforms in
existence. However, the author did not demonstrate <strong>any</strong> knowledge of <strong>any</strong>
alternatives. You are never going to persuade anyone that way. Rather, it will
lead people simply to dismiss your opinions entirely. Being ready to make
pronouncements on subjects for which you clearly don't have a fraction of the
required knowledge is a sign that nothing you say is trustworthy.</p>
<p>However, the problem goes both ways. Eevee's rant, and Jeff's, both contain
statements that are to some degree out of date (or appear out of date to someone
on the other side who knows the 'standard solution' for the problems
highlighted). The reason for this is obvious. If you feel passionately about how
bad something is, you will stop using it, and your level of knowledge of the
technology will go down. This does lead to a bit of a problem. Those who are
unwilling to work with technology X because of how bad it is can always be
dismissed as being ignorant of it.</p>
<p>A facet of this problem is that it is extremely easy to <strong>underestimate</strong>
something that you don't have knowledge of — for instance, you can simply
underestimate the <strong>size</strong> of a competing community.</p>
<p>I'll admit I was surprised to learn that PHP has a package manager with <a class="reference external" href="http://packagist.org/statistics">1900
packages in its main repository</a>. However,
the <a class="reference external" href="http://fabien.potencier.org/article/64/php-is-much-better-than-what-you-think">author</a>
who pointed that out might be more surprised to know some of the following
figures:</p>
<p><strong>Python</strong>:</p>
<ul class="simple">
<li>PyPI has 20,000 packages<ul>
<li>over 2,400 are Django related</li>
<li>over 2,000 are Plone related</li>
<li>over 800 are Zope3 related</li>
<li>over 800 are Zope2 related</li>
<li>over 130 are Turbogears related</li>
</ul>
</li>
</ul>
<p><strong>Perl</strong>:</p>
<ul class="simple">
<li>CPAN has:<ul>
<li>107,764 Perl modules</li>
<li>9,827 authors</li>
</ul>
</li>
</ul>
<p>Let's add <strong>Haskell</strong>, as an example of a minority language if ever there was
one, and not the first language you'd think of for building web sites:</p>
<ul class="simple">
<li>Hackage has 5376 packages:<ul>
<li>350 in the 'Web' category</li>
<li>55 for ‘Yesod’ (one web framework)</li>
</ul>
</li>
</ul>
<p>I'm sure other languages can boast similar or much better figures. I don't even
know where to look with Java — it's quite possibly so large that the idea of a
central repository doesn't even make sense.</p>
<p>PHP, despite having huge numbers of developers, looks rather small in
comparison. Now, all of these statistics are flawed in a variety of ways, PHP's
included, and the bigger a community is, the more they will be flawed — for
example, PyPI download stats will often be way out because people are using
mirrors etc — but this doesn't affect the point I'm making.</p>
<p>The point is that all the communities are large, and these figures are just the
tip of the iceberg in terms of how much is going on in each and every
community. And from within a community, you can see some big figures and think
“well I doubt anyone could seriously be competing with <strong>that</strong>!”. This is true
no matter what community you belong to. And it makes it difficult to communicate
in a meaningful way. Very few can honestly say that they've evaluated the
alternatives in a fair way, because the alternatives are so huge.</p>
</li>
</ol>
<div class="section" id="conclusion">
<h1>Conclusion</h1>
<p>I'm not sure I have a conclusion actually. These are just some of the pitfalls
we face in communicating across different technology communities. I often forget
these things, especially when communicating on the internet, and this post is
here mainly to help me remember!</p>
<p>Because of the broad spectrum of PHP users, I do think that arguments against
PHP are going to have to be more <strong>individual</strong> to be effective.</p>
<p>For example, suppose you have a customer that wants PHP for a website that deals
with money in some way e.g. a shop. I might attempt to shock them by using
<a class="reference external" href="http://www.phpsh.org/">phpsh</a> to demonstrate <a class="reference external" href="http://marc.info/?l=php-internals&amp;m=109057070829170">PHP's inability to get some
basic arithmetic questions correct</a>. I would explain that
all languages have problems with floating point arithmetic, which is why other
languages have good solutions to this problem e.g. <a class="reference external" href="http://docs.python.org/library/decimal.html">Python's decimal module</a>, which makes it easy to get
things right.  PHP has nothing approaching this (and yes I know about BCMath and
GNU Multiple Precision bindings). Rather, PHP's fundamental attitude is to 1)
silently ignore errors, 2) attempt to paper over errors and 3) silently convert
your data even if that means losing information. Is this the langauge you want to
handle your data?</p>
<p>For a different situation, I think you'll need a completely different
argument. And for many situations, don't even try if you think your words are
going to come over as insulting, since that will be counter-productive. That
often applies to the internet, and I need to remember that!</p>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Never fix a bug twice]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/never-fix-a-bug-twice/" />
    <id>http://lukeplant.me.uk/blog/posts/never-fix-a-bug-twice/</id>
    <updated>2012-06-19T13:42:56Z</updated>
    <published>2012-06-19T13:42:56Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Software development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Never fix a bug twice]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/never-fix-a-bug-twice/"><![CDATA[<div class="document">
<p>Noah Sussman wrote a post on <a class="reference external" href="http://infiniteundo.com/post/25230828820/things-you-should-test">Things you should test</a>, “A checklist of things that are worth testing in pretty much any software system.”</p>
<p>Many of the things on the list are helpful reminders. However, I think the mindset it encourages is essentially wrong.</p>
<p>The mindset is basically this:</p>
<blockquote>
Identify common mistakes that developers make, and ensure you are writing tests that check you haven't made them.</blockquote>
<p>The problem with this approach is that it is essentially <a class="reference external" href="http://www.youtube.com/watch?v=kbyekup6i6U">whack-a-mole</a> debugging. There is a never ending supply of bugs to kill.</p>
<p>A much more helpful approach is found in <a class="reference external" href="http://www.red-sweater.com/blog/125/easy-programming">this post on easy programming</a> that advocates “Never fix a bug twice” (about one-third of the way down).</p>
<p>If you come across a bug or class of bugs that often occur, you should <strong>not</strong> be thinking first of all “better add that to my list of classes of bugs that need testing against”. You should rather be thinking “how can I change the system so that this class of bugs disappears entirely?”.</p>
<p>So, to take some of items listed for testing:</p>
<ul>
<li><p class="first">Input handling bugs, such SQL injection and XSS attacks.</p>
<p>In Django apps, I never write tests for SQL injection attacks or XSS attacks. The reason for this is that these types of bugs are very rare. The reason for that is that the APIs that are easiest to use for executing SQL or generating HTML are secure by default. If there are any errors of this type in my programs, they will be very rare and obscure, and trying to catch bugs by a black box scatter-gun approach will be a waste of time.</p>
<p>With XSS, there are times when I am slightly more tempted to generate HTML in ways that might be vulnerable to XSS — for example, using Python string interpolation. At this point, however, you still shouldn't reach for a test to ensure you did it correctly. Rather: <strong>stop writing HTML generation the stupid way!</strong></p>
<p>To do that, you first have to ask “why?”. “Why am I tempted to write HTML this way?”. The answer is usually “there isn't a convenient API for the particular way I want to write this”. The correct solution now becomes obvious: create the API that removes the temptation to introduce potentially buggy code.</p>
<p>So, here is some code in a project I'm writing, that uses string interpolation to implement a template tag for formatting a link in a standard way:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.core.urlresolvers</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="kn">from</span> <span class="nn">django.template</span> <span class="kn">import</span> <span class="n">Library</span>
<span class="kn">from</span> <span class="nn">django.utils.html</span> <span class="kn">import</span> <span class="n">escape</span>
<span class="kn">from</span> <span class="nn">django.utils.safestring</span> <span class="kn">import</span> <span class="n">mark_safe</span>

<span class="n">register</span> <span class="o">=</span> <span class="n">Library</span><span class="p">()</span>

<span class="nd">&#64;register.filter</span>
<span class="k">def</span> <span class="nf">account_link</span><span class="p">(</span><span class="n">account</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">mark_safe</span><span class="p">(</span><span class="s">u'&lt;a href=&quot;</span><span class="si">%s</span><span class="s">&quot; title=&quot;</span><span class="si">%s</span><span class="s"> </span><span class="si">%s</span><span class="s">&quot;&gt;</span><span class="si">%s</span><span class="s">&lt;/a&gt;'</span> <span class="o">%</span> <span class="p">(</span>
            <span class="n">escape</span><span class="p">(</span><span class="n">reverse</span><span class="p">(</span><span class="s">'account_stats'</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">account</span><span class="o">.</span><span class="n">username</span><span class="p">,))),</span>
            <span class="n">escape</span><span class="p">(</span><span class="n">account</span><span class="o">.</span><span class="n">first_name</span><span class="p">),</span>
            <span class="n">escape</span><span class="p">(</span><span class="n">account</span><span class="o">.</span><span class="n">last_name</span><span class="p">),</span>
            <span class="n">escape</span><span class="p">(</span><span class="n">account</span><span class="o">.</span><span class="n">username</span><span class="p">),</span>
            <span class="p">))</span>
</pre>
<p>The problem with this is that I have to remember to use <tt class="docutils literal">escape</tt> on each variable. The reason I wrote it this way is that Django's <tt class="docutils literal">Template</tt> API is kind of bulky for this use case. So, I should instead have written this:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.core.urlresolvers</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="kn">from</span> <span class="nn">django.template</span> <span class="kn">import</span> <span class="n">Library</span>

<span class="kn">from</span> <span class="nn">somewhere</span> <span class="kn">import</span> <span class="n">html_fragment</span>

<span class="n">register</span> <span class="o">=</span> <span class="n">Library</span><span class="p">()</span>

<span class="nd">&#64;register.filter</span>
<span class="k">def</span> <span class="nf">account_link</span><span class="p">(</span><span class="n">account</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">html_fragment</span><span class="p">(</span><span class="s">u'&lt;a href=&quot;</span><span class="si">%s</span><span class="s">&quot; title=&quot;</span><span class="si">%s</span><span class="s"> </span><span class="si">%s</span><span class="s">&quot;&gt;</span><span class="si">%s</span><span class="s">&lt;/a&gt;'</span><span class="p">,</span>
            <span class="n">reverse</span><span class="p">(</span><span class="s">'account_stats'</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">account</span><span class="o">.</span><span class="n">username</span><span class="p">,)),</span>
            <span class="n">account</span><span class="o">.</span><span class="n">first_name</span><span class="p">,</span>
            <span class="n">account</span><span class="o">.</span><span class="n">last_name</span><span class="p">,</span>
            <span class="n">account</span><span class="o">.</span><span class="n">username</span><span class="p">,</span>
            <span class="p">)</span>
</pre>
<p>And then write the API, <tt class="docutils literal">html_fragment</tt>, that makes this work, which is just:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.utils.html</span> <span class="kn">import</span> <span class="n">escape</span>
<span class="kn">from</span> <span class="nn">django.utils.safestring</span> <span class="kn">import</span> <span class="n">mark_safe</span><span class="p">,</span> <span class="n">conditional_escape</span>

<span class="k">def</span> <span class="nf">html_fragment</span><span class="p">(</span><span class="n">template</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">mark_safe</span><span class="p">(</span><span class="n">template</span> <span class="o">%</span> <span class="nb">tuple</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">conditional_escape</span><span class="p">,</span> <span class="n">args</span><span class="p">)))</span>
</pre>
<p><em>EDIT: After some discussion on django-devs and subsequent modification, this
is now in Django 1.5 as 'django.utils.html.format_html', so use that instead
of the above</em></p>
<p>I'm now no longer tempted to write it the vulnerable way, and I don't need a test (though I may want a test or two for <tt class="docutils literal">html_fragment</tt>). I have two ways of doing it - <tt class="docutils literal">html_fragment</tt> for very small snippets, and Django templates for bigger chunks, both of them secure by default.</p>
<p>So, if you find yourself needing tests for specific SQL injection or XSS attacks in your code, you are probably doing it wrong. Fix the underlying API that makes the mistake likely in the first place.</p>
</li>
<li><p class="first">Input handling type checking - “Invalid values such as null and NaN. Strings instead of integers, arrays instead of strings.”</p>
<p>Input handling should generally only occur on trust boundaries, but it should occur systematically on such boundaries. If you have a problem with the wrong type of value being passed to an internal function, due to an external input being passed in, you shouldn't be dealing with it in the function, you should be asking “how was this even possible?”.</p>
<p>One solution of course is a static type system. Of course, that might not be possible/practical in your situation, but you should be thinking about it.</p>
</li>
<li><p class="first">Math-related bugs: in many of the listed instances in Noah's post, the underlying bug is that the language does funny things.</p>
<p>So you should be asking “why am I using this language? Is it the right tool for the job?”.</p>
<p>Or at least “will a library solve this problem?”.</p>
<p>If you are handling decimal values, the solution is a library that does it well, and doesn't make it easy to make mistakes. For example, the Python 'decimal' library does not allow multiplication of floats and decimals - which sounds inconvenient, but stops you from making all kinds of mistakes.</p>
</li>
<li><p class="first">Units of measurement.</p>
<p>If you have potential bugs with this, you should be thinking “why are these possible? How can I eliminate the bug entirely?”.</p>
<p>Different languages have different solutions, often involving including the unit of measurement in the value itself. Haskell has <a class="reference external" href="http://www.haskell.org/haskellwiki/Applications_and_libraries/Mathematics#Physical_units">various solutions</a> that make it impossible to add &quot;3 years&quot; to &quot;2 meters&quot;, for example, and the compiler will stop you at compile time.</p>
<p>For Python, there are things like <a class="reference external" href="http://pypi.python.org/pypi/magnitude">magnitude</a> and <a class="reference external" href="http://pypi.python.org/pypi/units/">units</a> that work at run time.</p>
<p>Of course, even if you use these internally, you'll still have boundaries where you need to do some conversions from inputs etc., and at this point you might need to know what units the inputs are in. But:</p>
<ol class="arabic simple">
<li>You should try to insist that the external system sends the unit information along with the numerical value, so that you <strong>never</strong> have to rely on common knowledge, and you will know immediately if something changed. And your internal APIs have helped you do this by forcing the issue.</li>
<li>Even if the external systems can't be changed, you'll still have very few places to check - just your input and output boundaries, each of which should be managed in a single place.  And writing tests probably won't help you very much — you should write at most one, as a single integration test. Your time will be better spent manually checking the boundaries.</li>
</ol>
</li>
<li><p class="first">Text - “Are Unicode inputs treated differently than ASCII?”</p>
<p>Again, you should be using proper internal datatypes — unicode everywhere — to eliminate potential bugs.</p>
<p>Python 2.x is notorious here - its ‘forgiving’ conversion between bytestrings and unicode causes endless bugs that are found in production, not in development.</p>
<p>And the correct solution is not just to write tests — which you may have to do for now — but rather to fix the underlying cause. This is what has been done in Python 3.</p>
</li>
</ul>
<p>The same type of things applies to almost everything on the list, and in the cases where I can't see how it applies, I imagine that this is due to my own lack of imagination :-) If I can't think of a way of completely eliminating a class of bug, then I should think harder, rather than assume it is impossible.</p>
<p>Notice that from the perspective of an older programmer, there are some notable omissions from the list - things like buffer overflows, for example. Presumably that's because the author uses languages/frameworks where those bugs are very unlikely. And that has happened not because previous generations of programmers wrote lots of tests, but because they wrote languages and systems where those bugs are impossible or almost impossible.</p>
<p>So, in all this I'm not saying that you won't ever have to write tests for these kinds of bugs. But every time you identify a class of bugs, tests are a very poor answer. You should try to ensure that you never fix the same bug twice, and so never write the same test twice.  If you are writing essentially the same test more than once, you are proving that you haven't fixed the underlying issue yet.</p>
<p>If a bug is worth adding to a list of common bugs, then you have identified a systemic problem with your platform, and it is worth <strong>eliminating entirely</strong>. We should be striving for libraries/languages/systems which do that.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Django's CBVs were a mistake]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/djangos-cbvs-were-a-mistake/" />
    <id>http://lukeplant.me.uk/blog/posts/djangos-cbvs-were-a-mistake/</id>
    <updated>2012-05-29T14:44:11Z</updated>
    <published>2012-05-29T14:44:11Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Django's CBVs were a mistake]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/djangos-cbvs-were-a-mistake/"><![CDATA[<div class="document">
<p><em>See end for updates to my ideas on this</em></p>
<hr class="docutils" />
<p>I've written before about the <a class="reference external" href="http://lukeplant.me.uk/blog/posts/class-based-views-and-dry-ravioli/">somewhat doubtful advantages of Class-Based Views</a>.</p>
<p>Since then, I've done more work as a maintenance programmer on a Django project,
and I've been reminded that library and framework design must take into account
the fact that not all developers are experts. Even if you only hire the best,
no-one can be an expert straight away.</p>
<p>Thinking through things more from the perspective of a maintenance programmer,
my doubts about CBVs have increased, to the point where I recently tweeted that
<a class="reference external" href="https://twitter.com/spookylukey/status/198452692522242049">CBVs were a mistake</a>.</p>
<p>So I thought I'd explain my reasons here. First, I'll look at the motivation
behind CBVs, how they are doing at solving what they are supposed to solve, and
then analyse the problems with them in terms of the Zen of Python.</p>
<div class="section" id="what-problems-do-cbvs-solve">
<h1>What problems do CBVs solve?</h1>
<div class="section" id="customising-generic-views">
<h2>Customising generic views</h2>
<p>People kept wanting more functionality and more keyword arguments to <a class="reference external" href="https://github.com/django/django/blob/stable/1.3.x/django/views/generic/list_detail.py">the
list_detail views</a>
(and others, but that those especially, as I remember). The alternative was large copy-and-paste of the code, so people were understandably wanting to avoid that (and avoid writing any code themselves).</p>
<p>So, we replaced them with classes that allows people to override just the bit they need to override. This eliminates the need for code duplication, and removes the burden of lots of feature requests for generic views.</p>
<p>Or does it? Instead of tickets for keyword arguments to list_detail, it seems we
have a bunch of <a class="reference external" href="https://code.djangoproject.com/query?status=assigned&amp;status=new&amp;status=reopened&amp;component=Generic+views&amp;col=id&amp;col=summary&amp;col=status&amp;col=owner&amp;col=type&amp;col=component&amp;order=priority">other tickets asking for changes to CBVs</a>,
many of which can't be implemented as mixins or subclasses.</p>
<p>One of the problems is that if the view calls anything else (e.g. a paginator
class, or a form class), you have to provide hooks for how it calls it, which
means implementing methods that can be overridden. If you forget any, or if the
thing you are calling gains some new keyword arguments, you've got feature
requests, or duplication because someone had to override a larger method just to
change one aspect of it. If you don't forget any, you've got dozens of little
methods to document.</p>
<p>Also, there are problems like this attempt to <a class="reference external" href="https://code.djangoproject.com/ticket/18158">mix FormView with ListView
functionality</a>. Fixing this will
end up with similar amounts of copy-paste, but in this case it requires a fair
bit of debugging first to realise you have a problem.</p>
<p>So I'm not convinced CBVs have made much difference here.</p>
</div>
<div class="section" id="eliminating-flow-control-boilerplate">
<h2>Eliminating flow control boilerplate</h2>
<p>The classic example is editing using a form. You see this pattern again and again using function based views (FBVs from now on):</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.shortcuts</span> <span class="kn">import</span> <span class="n">render</span>

<span class="k">def</span> <span class="nf">contact</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
    <span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">method</span> <span class="o">==</span> <span class="s">'POST'</span><span class="p">:</span>
        <span class="n">form</span> <span class="o">=</span> <span class="n">ContactForm</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">POST</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">form</span><span class="o">.</span><span class="n">is_valid</span><span class="p">():</span>
            <span class="n">send_contact_message</span><span class="p">(</span><span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'email'</span><span class="p">],</span>
                                 <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'message'</span><span class="p">])</span>
            <span class="k">return</span> <span class="n">HttpResponseRedirect</span><span class="p">(</span><span class="s">'/thanks/'</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">form</span> <span class="o">=</span> <span class="n">ContactForm</span><span class="p">()</span>

    <span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s">'contact.html'</span><span class="p">,</span> <span class="p">{</span><span class="s">'form'</span><span class="p">:</span> <span class="n">form</span><span class="p">})</span>
</pre>
<p>Without question this is tedious and annoying. CBVs reduce this to:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.views.generic.edit</span> <span class="kn">import</span> <span class="n">ProcessFormView</span>

<span class="k">class</span> <span class="nc">ContactView</span><span class="p">(</span><span class="n">ProcessFormView</span><span class="p">):</span>
    <span class="n">form_class</span> <span class="o">=</span> <span class="n">ContactForm</span>
    <span class="n">template_name</span> <span class="o">=</span> <span class="s">'contact.html'</span>
    <span class="n">success_url</span> <span class="o">=</span> <span class="s">'/thanks/'</span>

    <span class="k">def</span> <span class="nf">form_valid</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">form</span><span class="p">):</span>
        <span class="n">send_contact_message</span><span class="p">(</span><span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'email'</span><span class="p">],</span>
                             <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'message'</span><span class="p">])</span>
        <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">ContactView</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">form_valid</span><span class="p">(</span><span class="n">form</span><span class="p">)</span>
</pre>
<p>Much better!</p>
<p>However...</p>
<p>It's not really that much <strong>shorter</strong>. 8 lines compared to 11, ignoring
imports. But now let's make it more realistic. We're going to have:</p>
<ul class="simple">
<li>Initial arguments to the form that are based on the request object.</li>
<li>Priority users get a form with the option to indicate 'urgent' status to
message, which results in a text message as well as an email. The template is
also rendered a bit differently for them and needs a flag.</li>
<li>URLs defined using <tt class="docutils literal">reverse</tt> as they should be.</li>
</ul>
<p>FBV:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.core.urlresolvers</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="kn">from</span> <span class="nn">django.shortcuts</span> <span class="kn">import</span> <span class="n">render</span>

<span class="k">def</span> <span class="nf">contact</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
    <span class="n">high_priority_user</span> <span class="o">=</span> <span class="p">(</span><span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_anonymous</span><span class="p">()</span>
                          <span class="ow">and</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">get_profile</span><span class="p">()</span><span class="o">.</span><span class="n">high_priority</span><span class="p">)</span>
    <span class="n">form_class</span> <span class="o">=</span> <span class="n">HighPriorityContactForm</span> <span class="k">if</span> <span class="n">high_priority_user</span> <span class="k">else</span> <span class="n">ContactForm</span>

    <span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">method</span> <span class="o">==</span> <span class="s">'POST'</span><span class="p">:</span>
        <span class="n">form</span> <span class="o">=</span> <span class="n">form_class</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">POST</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">form</span><span class="o">.</span><span class="n">is_valid</span><span class="p">():</span>
            <span class="n">email</span><span class="p">,</span> <span class="n">message</span> <span class="o">=</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'email'</span><span class="p">],</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'message'</span><span class="p">]</span>
            <span class="n">send_contact_message</span><span class="p">(</span><span class="n">email</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">high_priority_user</span> <span class="ow">and</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'urgent'</span><span class="p">]:</span>
                 <span class="n">send_text_message</span><span class="p">(</span><span class="n">email</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">HttpResponseRedirect</span><span class="p">(</span><span class="n">reverse</span><span class="p">(</span><span class="s">'contact_thanks'</span><span class="p">))</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">form</span> <span class="o">=</span> <span class="n">form_class</span><span class="p">(</span><span class="n">initial</span><span class="o">=</span><span class="p">{</span><span class="s">'email'</span><span class="p">:</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">email</span><span class="p">}</span>
                                  <span class="k">if</span> <span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_anonymous</span><span class="p">()</span> <span class="k">else</span> <span class="p">{})</span>

    <span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s">'contact.html'</span><span class="p">,</span> <span class="p">{</span><span class="s">'form'</span><span class="p">:</span> <span class="n">form</span><span class="p">,</span>
                                            <span class="s">'high_priority_user'</span><span class="p">:</span> <span class="n">high_priority_user</span><span class="p">})</span>
</pre>
<p>CBV:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.core.urlresolvers</span> <span class="kn">import</span> <span class="n">reverse_lazy</span>
<span class="kn">from</span> <span class="nn">django.views.generic.edit</span> <span class="kn">import</span> <span class="n">ProcessFormView</span>

<span class="k">class</span> <span class="nc">ContactView</span><span class="p">(</span><span class="n">ProcessFormView</span><span class="p">):</span>
    <span class="n">template_name</span> <span class="o">=</span> <span class="s">'contact.html'</span>
    <span class="n">success_url</span> <span class="o">=</span> <span class="n">reverse_lazy</span><span class="p">(</span><span class="s">'contact_thanks'</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">dispatch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">high_priority_user</span> <span class="o">=</span> <span class="p">(</span><span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_anonymous</span><span class="p">()</span>
                                   <span class="ow">and</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">get_profile</span><span class="p">()</span><span class="o">.</span><span class="n">high_priority</span><span class="p">)</span>
        <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">ContactView</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">dispatch</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">get_form_class</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">HighPriorityContactForm</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">high_priority_user</span> <span class="k">else</span> <span class="n">ContactForm</span>

    <span class="k">def</span> <span class="nf">get_initial</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">initial</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">ContactView</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">get_initial</span><span class="p">()</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">is_anonymous</span><span class="p">():</span>
            <span class="n">initial</span><span class="p">[</span><span class="s">'email'</span><span class="p">]</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">email</span>
        <span class="k">return</span> <span class="n">initial</span>

    <span class="k">def</span> <span class="nf">form_valid</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">form</span><span class="p">):</span>
        <span class="n">email</span><span class="p">,</span> <span class="n">message</span> <span class="o">=</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'email'</span><span class="p">],</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'message'</span><span class="p">]</span>
        <span class="n">send_contact_message</span><span class="p">(</span><span class="n">email</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">high_priority_user</span> <span class="ow">and</span> <span class="n">form</span><span class="o">.</span><span class="n">cleaned_data</span><span class="p">[</span><span class="s">'urgent'</span><span class="p">]:</span>
             <span class="n">send_text_message</span><span class="p">(</span><span class="n">email</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
        <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">ContactView</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">form_valid</span><span class="p">(</span><span class="n">form</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">ContactView</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="n">context</span><span class="p">[</span><span class="s">'high_priority_user'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">high_priority_user</span>
        <span class="k">return</span> <span class="n">context</span>
</pre>
<p>(A few lines could be shaved by not using <tt class="docutils literal">super()</tt> in a number of places, but
only at the expense of future confusion/problems if a maintainer was expecting
the normal declarative behaviour.)</p>
<p>Notice:</p>
<ol class="arabic">
<li><p class="first">I really want <tt class="docutils literal">high_priority_user</tt> to be a local variable that is
calculated once, and used in a couple of places. With a function, that's what
I have. With a CBV, I have to simulate it using an attribute on self. I also
need to override the <tt class="docutils literal">dispatch()</tt> method just to create it. These are both ugly
hacks.</p>
</li>
<li><p class="first">The CBV version is extremely noisy, due to all the calls to <tt class="docutils literal">super()</tt>. This
would be improved in Python 3 (at the expense of a rather magical <tt class="docutils literal">super()</tt>
builtin), but is still far from perfect. Every time you override a method,
you have to mention it twice.</p>
</li>
<li><p class="first">The CBV version is now significantly longer - 24 non-blank lines compared
to 17. Since I'm using the same APIs for requests and forms, and doing the
same thing, this can only mean that the amount of boilerplate has
significantly increased.  Sure, I've removed the flow control boilerplate,
but must have added some other type.</p>
</li>
<li><p class="first">Even if you know the Form API very well, you are going to have to look up the
docs or the source code to find <tt class="docutils literal">get_initial()</tt> etc. and ensure you get the
signature correct. You have to know two sets of APIs to use a form,
instead of one.</p>
</li>
<li><p class="first">In the CBV, flow control is totally hidden. In the process of abstracting
away the duplication, we've also hidden the order of execution. I've ordered
my methods in the order they are called (I think), but that may or may not be
obvious to anyone else, and nothing forced me to do it.</p>
</li>
<li><p class="first">Because of the last point, it is massively more difficult to debug. Which of the
two would you rather maintain? Which do you think a maintenance programmer,
who has never seen this code before, would rather maintain?</p>
<p>To understand what is going on, you've got an intimidating stack of base
classes to navigate, compared to a single function.</p>
</li>
</ol>
<p>The fundamental problem here is that Python sucks at implementing custom flow
control. Not many languages shine here. Ruby has blocks, which help. Haskell has
a pretty good story due to a combination of succinct function definition, lazy
evaluation and the way that IO works. Lisp has macros.  But with Python, we are
limited to: abusing generators, abusing the <tt class="docutils literal">with</tt> statement, or classes and the
<a class="reference external" href="http://en.wikipedia.org/wiki/Template_method_pattern">template method design pattern</a> (which is basically
what CBVs use).</p>
<p>I think we decided to use the latter because it's one of the only options we've
got, but failed to notice that it's just not very good.</p>
<p>There are worse things that can happen with shifting requirements. What if you
want two different forms on the same page? I've done this on more than one
occasion, and it can be a useful thing — when you present the user with two
different courses of action, and you've got completely different info they need
to fill in.</p>
<p>It would be a nightmare to get <tt class="docutils literal">CreateView</tt> or <tt class="docutils literal">UpdateView</tt> to do this. You
will either produce a monstrosity, or you'll have to start from scratch with an
FBV. You've been seduced down a tempting, calm stretch of water, and then left
high and dry when you come to the end of what CBVs offer. If you start with an
FBV, it's an easy change.</p>
<p>Overall, when I work on CBVs, even views I've created, I start to feel like I'm
working on a classic <a class="reference external" href="http://msdn.microsoft.com/en-us/library/system.web.ui.page.aspx">ASP.NET Page class</a>. It is
bringing back painful memories! We’re not as bad as that yet, but methods that
simply modify some data on <tt class="docutils literal">self</tt> are a code-smell that we are heading in that
direction.</p>
<p>Another way of looking at it is that with CBVs, views have become an instance of
the ‘framework pattern’ instead of the ‘library pattern’. With a framework, your
application code gets called by the framework code, and you can easily end up
having to <a class="reference external" href="http://blogs.thesitedoctor.co.uk/tim/2006/06/30/Complete+Lifecycle+Of+An+ASPNet+Page+And+Controls.aspx">understand how the framework is implemented</a>.</p>
<p>With a library, the library code gets called by your application code, and this
is in general much more flexible, much easier to document and much easier to
debug. We ought to be moving Django more in the direction of a library.</p>
<p>With the comparison to ASP.NET, I'm also basically saying CBVs are not
Pythonic. I'll back up this claim in terms of selected parts of the Zen of
Python.</p>
</div>
</div>
<div class="section" id="are-cbvs-pythonic">
<h1>Are CBVs Pythonic?</h1>
<div class="section" id="beautiful-is-better-than-ugly">
<h2>Beautiful is better than ugly</h2>
<p>This is a subjective one, but the noise of all the <tt class="docutils literal">super()</tt> calls,
<tt class="docutils literal">get_context_data()</tt> compared to a simple <tt class="docutils literal">{}</tt> etc. is ugly to me.</p>
</div>
<div class="section" id="simple-is-better-than-complex">
<h2>Simple is better than complex</h2>
<p>Let's notice, first off, that FBVs are simpler than CBVs, according to the basic meaning of having fewer parts.</p>
<p>A class is a datastructure with attributes and functions attached. A function is
just a function. So a class is more complex than a function. If we ever use a
class where a function would work, we have to justify the additional complexity.</p>
</div>
<div class="section" id="complex-is-better-than-complicated">
<h2>Complex is better than complicated</h2>
<p>Are CBVs complicated? Just read the Django source if you think they are not.</p>
<p>In one app I wrote, I had a mixin that implemented a bit of common flow control
for forms — namely it returned a JSON response with validation errors for
certain types of requests. This meant I didn't need separate views for the AJAX
validation, and with CBVs I eliminated all the boilerplate. Great!</p>
<p>Well, it was, until I found a crazy problem to do with the order in which
different base classes set things up. To solve my problem, I ended up writing this:</p>
<pre class="code python literal-block">
<span class="c"># MRO problem: we need BaseCreateView.post to come first in MRO, to</span>
<span class="c"># provide self.object = None, then AjaxyFormMixin must be called before</span>
<span class="c"># ProcessFormView, so that the right thing happens for AJAX.</span>

<span class="k">class</span> <span class="nc">AjaxMroFixer</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>

    <span class="k">def</span> <span class="nf">mro</span><span class="p">(</span><span class="n">cls</span><span class="p">):</span>
        <span class="n">classes</span> <span class="o">=</span> <span class="nb">type</span><span class="o">.</span><span class="n">mro</span><span class="p">(</span><span class="n">cls</span><span class="p">)</span>
        <span class="c"># Move AjaxyFormMixin to one before last that has a 'post' defined.</span>
        <span class="n">new_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">classes</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">AjaxyFormMixin</span><span class="p">]</span>
        <span class="n">have_post</span> <span class="o">=</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">new_list</span> <span class="k">if</span> <span class="s">'post'</span> <span class="ow">in</span> <span class="n">c</span><span class="o">.</span><span class="n">__dict__</span><span class="p">]</span>
        <span class="n">last</span> <span class="o">=</span> <span class="n">have_post</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
        <span class="n">new_list</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">new_list</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">last</span><span class="p">),</span> <span class="n">AjaxyFormMixin</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">new_list</span>
</pre>
<p>It's a metaclass, and implements the <tt class="docutils literal">mro()</tt> method that allows you to
override the method order resolution. I am not proud of this code. (For those who don't get English understatement, please understand what I'm saying: this code is horrific).  Before writing this code, I didn't know that the <tt class="docutils literal">mro()</tt> method existed. Although I value the education, if a framework forces you to learn about <tt class="docutils literal">type.mro()</tt> and implement it, it is doing something wrong.</p>
</div>
<div class="section" id="flat-is-better-than-nested">
<h2>Flat is better than nested</h2>
<p>The hierarchy of classes is certainly a form of nesting, whereas functions are
flat. I think CBVs might be considerably better if you started by writing your
own, so that you had base classes that did everything you needed, but nothing
more, and ended up with a very flat hierarchy.</p>
</div>
<div class="section" id="readability-counts">
<h2>Readability counts</h2>
<p>Get some people who don't know Django to read <tt class="docutils literal">ContactView</tt> above and figure
out what it does, and some others to feed the function version, and see who has
the easier time. Ask them what happens, for example, when the form is invalid. There is no contest here.</p>
</div>
<div class="section" id="explicit-is-better-than-implicit">
<h2>Explicit is better than implicit</h2>
<p>With a CBV, by inheriting from a class you are inheriting all the behaviour of
that class, and all its parent classes. You could argue this is explicit, since
you've explicitly indicated the base class, but you haven't indicated all the
parent classes — they come automatically. So it's more like an implicit request
in practice.</p>
<p>Of course, this is always true with OOP to some extent — you are inheriting a
bunch of behaviour precisely because you don't want to define it again. But
let's notice that it does have its downsides.</p>
<p>For example, let's play spot the difference between the following two views — an FBV:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.shortcuts</span> <span class="kn">import</span> <span class="n">render</span>

<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s">&quot;my_template.html&quot;</span><span class="p">,</span> <span class="p">{})</span>
</pre>
<p>and a CBV:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.views.generic</span> <span class="kn">import</span> <span class="n">TemplateView</span>

<span class="k">class</span> <span class="nc">MyView</span><span class="p">(</span><span class="n">TemplateView</span><span class="p">):</span>
    <span class="n">template_name</span> <span class="o">=</span> <span class="s">&quot;my_template.html&quot;</span>
</pre>
<p>Can you see the important difference? Try to answer before reading on.</p>
<p>I'm talking only about the functional differences when this view is accessed by
a client, not differences of code organisation or re-use or performance.</p>
<p>Well, <tt class="docutils literal">TemplateView</tt> inherits from <tt class="docutils literal">View</tt> which provides a <tt class="docutils literal">dispatch()</tt>
method, and <tt class="docutils literal">TemplateView</tt> provides a <tt class="docutils literal">get()</tt> method which will handle all
GET (and HEAD) requests, by the logic defined in <tt class="docutils literal">View.dispatch()</tt>. However,
neither defines a <tt class="docutils literal">post()</tt> method, which means that you will get a &quot;405 Method
Not Allowed&quot; error if you try POST or other HTTP verbs, to the CBV view, whereas
you will get a 200 with the FBV.</p>
<p>In the CBV, all of this logic has been invoked implicitly. Nothing in what I
wrote was an explicit request for 405s for POST requests, but I got it because I
inherited from <tt class="docutils literal">TemplateView</tt> — even though using a template does not imply
that behaviour.</p>
<p>(By the way, this issue caused a real problem in a site I wrote, which was
easily debugged because I knew a lot about the internals of CBVs, and therefore
easily fixed, but it still adds noise to my class — code that can only be
explained by reference to some inherited behaviour I didn't really want).</p>
<p>Of course, as already mentioned, you can say the same thing whenever you inherit
from a class. But, in my opinion, it is one of the disadvantages of OOP, and it
is showing itself here. The question is: do the advantages of inherited
behaviour outweigh the disadvantages of implicit behaviour?</p>
</div>
<div class="section" id="there-should-be-one-way-to-do-it">
<h2>There should be one way to do it</h2>
<p>I already mentioned one case where the CBV method is just not going to work for
you, or is not going to be worth it — needing two different forms on the same
page. But in the real views I write, I suspect probably the majority would not fit easily into a CBV.</p>
<p>So, in your project you now need both FBVs and CBVs — you've got two ways to do
it. When a maintenance programmer comes along, and needs to do some similar work
that involves a form, they will be confused. Which pattern do they start from?</p>
<p>The simple way to avoid this is to avoid the less flexible solution — avoid
using CBVs.</p>
<p>So simply having two different ways of building up views is a violation of this
principle, but <strong>within</strong> CBVs there are also instances.</p>
<p>First, there are also problems with the attempt to use declarative style, which
is perhaps the most attractive feature of the CBV API. So, you need initial
arguments to your form? Just define the <tt class="docutils literal">initial</tt> attribute on your
class. Unless, however, you need it to be dynamic — you'll have to define
<tt class="docutils literal">get_initial()</tt> instead.</p>
<p>Also, it often isn't obvious where to add certain bits of code. There is often
more than one choice of method to override when you come to add a little bit of
functionality e.g. <tt class="docutils literal">get()</tt> or <tt class="docutils literal">dispatch()</tt>, and which you choose sometimes
has subtle implications, and sometimes doesn't matter. With FBVs, two different
Django developers tasked with the same modification to an existing view would
often produce identical or nearly identical patches, but I suspect that would be
rarer with CBVs.</p>
</div>
<div class="section" id="if-the-implementation-is-hard-to-explain-it-s-a-bad-idea">
<h2>If the implementation is hard to explain, it's a bad idea</h2>
<p>Looking at the implementation, you'll find things like
<tt class="docutils literal">MultipleObjectTemplateResponseMixin</tt>, as well as <tt class="docutils literal">MultipleObjectMixin</tt> and
<tt class="docutils literal">TemplateResponseMixin</tt>, and the former is not just the composition of the other two. This is just one of many signs that things are going
wrong. If you can really compose behaviour just by adding mixins,
<tt class="docutils literal">MultipleObjectTemplateResponseMixin</tt> should not be needed. Explaining things
like this is hard, and the reason is that you just can't build up complex views
using classes and mixins.</p>
</div>
</div>
<div class="section" id="conclusion">
<h1>Conclusion</h1>
<p>Overall, I think CBVs make:</p>
<ul class="simple">
<li>very simple views slightly shorter, much cleaner (and significantly harder
to debug, but you don't need to debug them because they are simple);</li>
<li>views of medium complexity significantly longer, with more boilerplate and
noise and much harder to debug;</li>
<li>views of high complexity almost impossible.</li>
</ul>
<p>You only gain for the simple views, but they were simple anyway, just slightly
tedious. Is this advantage really enough to outweigh the disadvantages I've
listed?</p>
<p>To be honest, I regret dropping the function based generic views for CBVs. There
was an alternative solution to the tickets asking for them to do more: WONTFIX.</p>
<p>Generic views were simple shortcuts for common problems. They didn't actually
involve very much code, and if you needed to duplicate some of it you weren't
duplicating much. We should have just said: this is what generic views do, if
you need them to do something else then write your own. Because that is what you
have to say with class based generic views anyway, just slightly later.</p>
<p>Regarding going forward, I say: stop the rot.</p>
<p>If you're starting a new project, I recommend avoiding CBVs, or just wrapping
them in thin functional wrappers, like <a class="reference external" href="https://gist.github.com/2596285">this one for ListView</a> (or, with <a class="reference external" href="https://github.com/ericflo/django-pagination">django-pagination</a> a simple 3-line view function is probably all you need and is actually easier than subclassing <tt class="docutils literal">ListView</tt>).</p>
<p>For Django core, let's fix up the main problems and bugs CBVs have, and then leave them as solutions to simple problems.</p>
<p>But please: let's not move everything in Django-world — whether Django core/contrib or resuable apps — to CBVs. Just use a function, and <a class="reference external" href="http://pyvideo.org/video/880/stop-writing-classes">stop writing classes</a>.</p>
<hr class="docutils" />
<p><em>Update: 2013-03-09</em></p>
<p>My opinions have changed slightly since I wrote the above, due to comments below
and other helpful blog posts. I'd summarise by saying:</p>
<ul class="simple">
<li>There are some places with CBVs really shine, especially when you are writing
many similar simple views.</li>
<li>You can avoid some of the problems I mentioned by using your own class
hierarchy, that you control completely, and making it flat, and specific to
your needs.</li>
<li>I still prefer to write function based views. I've spent too many hours
debugging class hierarchies of different kinds, and seen too many view
functions that started out fitting into the kind of patterns that CBVs
provide, and then breaking them in big ways.</li>
</ul>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Reasons to love Django, part x of y]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/" />
    <id>http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/</id>
    <updated>2012-05-19T14:12:19Z</updated>
    <published>2012-05-19T14:12:19Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Reasons to love Django, part x of y]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/"><![CDATA[<div class="document">
<p>I needed to add a boolean field to a model. For many web apps, this typically involves:</p>
<ol class="arabic simple">
<li>modifying the model layer, so that the field becomes available as an attribute on retrieved objects, and can be queried against etc.</li>
<li>creating a database migration script that can be run immediately on the development box, and later for staging and production.</li>
<li>running the migration against the development DB.</li>
<li>updating any admin screens for editing the field.</li>
<li>checking the changes and scripts into source control.</li>
<li>deploying - including pushing source code and running migration scripts etc.</li>
</ol>
<p>Using <a class="reference external" href="https://www.djangoproject.com/">Django</a>, from a cold start (no editor/IDE open), this just took me <strong>1 minute 45 seconds of work</strong> for steps 1 - 5, and an additional 45 seconds waiting for step 6, total 2 minutes 30 seconds, and I wasn't rushing.</p>
<p>Step 1 is a one line code addition. Pretty much everything else can and should be generated automatically.</p>
<p>Step 2 is taken care of by a one line command using <a class="reference external" href="http://south.aeracode.org/">South</a>, as is step 3 and the database part of step 6 (which is run de-rigueur from my deployment scripts).</p>
<p>Step 4 is taken care of by Django's admin, which introspects the model and generates the right form for you.</p>
<p>This is one of the reasons I love Django. It's not so much the time it saves, although that is pretty awesome, it's the <em>tedium</em> it saves.</p>
<p>This is also one of the reasons I'm not very tempted by schema-less or schema-light databases, because with Django a nice strict schema brings so little administrative overhead. I was going to have to add <em>something</em> about the change to the model anyway, even if it was only documentation, and having done that in one place, the other additional changes required by a relational DB with strong schema placed virtually no burden on me.</p>
<p>(Of course, things could be more complex on bigger apps, especially if the table is large or sharded. But then again, there's no reason why rolling out your DB change shouldn't be just as automated - it's only the 'waiting' stage that <em>has</em> to take longer for a simple change like adding a column. If the coding/work part is taking much longer than the above example, your tools probably need fixing or replacing.)</p>
</div>
]]></content>
  </entry>
</feed>
