<?xml version="1.0" encoding="UTF-8"?>
<feed
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:thr="http://purl.org/syndication/thread/1.0"
  xml:lang="en"
   >
  <title type="text">All Unkept</title>
  <subtitle type="text"></subtitle>

  <updated>2013-05-24T11:16:23Z</updated>
  <generator uri="http://blogofile.com/">Blogofile</generator>

  <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog" />
  <id>http://lukeplant.me.uk/blog/feed/atom/</id>
  <link rel="self" type="application/atom+xml" href="http://lukeplant.me.uk/blog/feed/atom/" />
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Full screen WebView Android app]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app" />
    <id>http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app</id>
    <updated>2012-12-19T23:08:00Z</updated>
    <published>2012-12-19T23:08:00Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Christianity" />
    <summary type="html"><![CDATA[Full screen WebView Android app]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/full-screen-webview-android-app"><![CDATA[<div class="document">
<p>I decided to make an Android app for my <a class="reference external" href="http://learnscripture.net/">Bible memory verse site</a> (created using Django). The main motivation for
the app is to get rid of the unnecessary and annoying address bars and status
bars when using the site on an Android phone. The site is already designed to
adapt to mobiles, so I'm not creating a native app — I just want a better way to
access the web app from an Android phone.</p>
<p>I tried <a class="reference external" href="http://www.appsgeyser.com/">appsgeyser</a>, but discovered they put
adverts on your site, which I definitely don't want — this is an entirely free
app, for a free (and ad-free) site.</p>
<p>My requirements are:</p>
<ul class="simple">
<li>Full screen<ul>
<li>without any controls ever popping up, because you don't need them.</li>
</ul>
</li>
<li>Progress bar for page loading.</li>
<li>Javascript works.</li>
<li>Links work as expected.</li>
<li>Back button works like builtin browser, until you get back to the
home page, where it will cause the app to exit.</li>
</ul>
<p>There are lots of pages and wizards with solutions for bits of these, but
putting them together turned out to be harder — for example, it seems that the
normal way of showing a progress bar for the whole window doesn't work if you've
gone full screen.</p>
<p>Anyway, I've completed the <a class="reference external" href="https://play.google.com/store/apps/details?id=net.learnscripture.webviewapp">learn scripture app</a>
(ha, my first Java app!), and thought I'd share the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src">complete source code</a>.</p>
<p>If you wanting a similar app, you are probably best creating your basic app
structure using a wizard, but it is helpful to see a complete solution. The
important bits are:</p>
<ul class="simple">
<li>permissions for Internet access: <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b09b85147f0416ce528ff85218f1f/AndroidManifest.xml?at=default#cl-11">AndroidManifest.xml</a></li>
<li>the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b/src/net/learnscripture/webviewapp/Dashboard.java?at=default">main activity source code</a></li>
<li>the <a class="reference external" href="https://bitbucket.org/spookylukey/net.learnscripture.webviewapp/src/7161843ff07b/res/layout/activity_dashboard.xml?at=default">main layout definition</a></li>
</ul>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Why escape-on-input is a bad idea]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea" />
    <id>http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea</id>
    <updated>2012-08-06T20:59:01Z</updated>
    <published>2012-08-06T20:59:01Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Security" />
    <category scheme="http://lukeplant.me.uk/blog" term="PHP" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Why escape-on-input is a bad idea]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea"><![CDATA[<div class="document">
<p>The right way to handle issues with untrusted data is:</p>
<blockquote>
Filter on input, escape on output</blockquote>
<p>This means that you validate or limit data that comes in (filter), but only
transform (escape or encode) it at the point you are sending it as output to
another system that requires the encoding. It has been standard best practice
since just about forever <sup>[citation required]</sup>.</p>
<p>An alternative is “escape on input”: at the point that data enters your system,
you apply a transformation to it to avoid a problem further down the line when
the data is used.</p>
<p>It's come to my attention that some serious web developers (or at least, they
take themselves seriously and are taken seriously by others) are <strong>still</strong>
suggesting the practice of escape-on-input.</p>
<p>For example, with escape-on-input, to avoid XSS any data that enters your system
has HTML escaping applied to it <em>immediately</em>, before your application code
touches it.</p>
<p>I chose that example deliberately, because people are actually recommending it:</p>
<ul>
<li><p class="first"><a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html#comment-572962448">in some recent “PHP sucks” debate</a>.</p>
</li>
<li><p class="first">which, in turn, linked to a <a class="reference external" href="http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html">page by Rasmus Lerdorf recommending
escape-on-input as a sensible way to deal with XSS</a>.
The page, admittedly, is describing a ‘toy’, a ‘no-framework PHP framework’,
yet he does seem to be serious about the usefulness of escape-on-input.</p>
<p>The page is from 2006, and uses the pecl/filter extension, but the extension
has since made it into core (PHP 5.2), and the <a class="reference external" href="http://www.php.net/manual/en/filter.configuration.php">docs for it</a> suggest a
configuration that is clearly intended for XSS prevention. As recently as
2008, and probably to this day, Lerdorf is <a class="reference external" href="http://grokbase.com/p/php/php-internals/083qakz7wj/php-dev-short-open-tag">still defending and recommending
this approach</a>,
and it appears to be part of his reason for thinking that PHP templating
doesn't need an autoescape mechanism.</p>
</li>
<li><p class="first">Just as significantly, <a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security">Etsy are using and recommending escape-on-input</a> (slide
18 onward). As a very successful modern company using PHP, people will look
up to them and copy them.</p>
</li>
</ul>
<p>So, this approach, unfortunately, is popular amongst some, and I can't find a
decent post explaining why it's such a terrible idea both in theory and
practice. Here is my attempt. It should be applicable to almost any system and
any language, although I'll mainly be using examples from web development.</p>
<div class="section" id="in-theory">
<h1>In theory</h1>
<ul>
<li><p class="first">First of all, escape-on-input <strong>is just wrong</strong> — you've taken some input and
applied some transformation that is totally irrelevant to that data. If,
taking our example, you have some data collected by HTTP POST or GET
parameters, applying HTML escaping to it is a layering violation — it mixes an
output formatting concern into input handling. Layering violations make your
code much harder to understand and maintain, because you have to take into
account other layers instead of letting each component and layer do its own
job.</p>
<p>Doing things ‘right’ is very important, even if doing them ‘wrong’ seems to
work and you are tempted to be dismissive of ‘theoretical’ concerns about
purity etc. When you have to maintain code, you will be very glad if things
are in the right place, and not full of hacks and surprises.</p>
</li>
<li><p class="first">You have corrupted your data by default. The system (or the most convenient
API) is now lying about what data has come in. As you have applied a
transformation to the <strong>data itself</strong>, the layering violation is not an
isolated problem in one part of the code, but infects every part of your code,
especially if you store the corrupted data in a database.</p>
<p>Your data is <strong>everything</strong>. As I read recently, <a class="reference external" href="http://blog.datamarket.com/2012/07/08/the-11-best-data-quotes/">“data matures like wine,
applications like fish”</a>. You can
always rewrite your application, but if you corrupt your data, you've done the
worst thing you can to your system.</p>
</li>
<li><p class="first">This is exacerbated by the fact that many encodings are one-way — you cannot
losslessly or unambiguously convert them back. If at a later point you need
the original data, you might be in a pickle.</p>
</li>
<li><p class="first">Escaping your data for one output backend will only deal with <strong>that</strong>
output. A typical web app might deal with at least the following backends,
which have different characters that are dangerous, and have different
requirements for dealing with them:</p>
<ul class="simple">
<li>HTML: ' &lt; &gt; &quot; &amp;</li>
<li>SMTP and HTTP: ; : newlines</li>
<li>SQL: '</li>
<li>JSON: &quot;</li>
<li>Shell - space, quotes and various other characters</li>
</ul>
<p>Any number of others could be added, and all could have security
implications. Using escape-on-input will only fix one of these (apart from
happy coincidences where it might fix more than one), but for the others you
will still need a sensible solution to the problem. Why not have a sensible
solution for all of them?</p>
</li>
<li><p class="first">Escaping on input will not only fail to deal with the problems of more than
one output, it will actually make your data <strong>incorrect</strong> for many outputs.</p>
<p>Suppose you decide to do HTML escaping, and someone enters <em>Jack &amp; Jill</em> as a
title for something. Your escape-on-input turns this to <em>Jack &amp;amp; Jill</em> and
that goes in the DB. Suppose you want to email people and put this title in
the subject line. You now have to apply the reverse transformation to get a
sensible subject line in the email, and you have to <strong>remember</strong> to do this
for every output that is not HTML.</p>
<p>Sometimes, the bug is <a class="reference external" href="http://instagram.com/p/SVfQruppEE/">significantly more annoying</a> than an email with an incorrect title.</p>
<p>You also have daft bugs like the fact that doing a search on that field for
the string ‘amp’ (or ‘quot’, ‘apos’, ‘lt’, ‘gt’ etc. or any substrings) will get
various false matches.</p>
<p>I have seen some people respond to this by saying “it's better to have the
occasional double-encoding bug or incorrect query result than an XSS
exploit&quot;. Well, first, that depends on your business. XSS is a problem because
it costs time and money, and so does corrupting your data. Many people have
data that actually matters, and corrupt data is a big deal, and much harder to
cope with than an XSS bug, because data lives on and on, while your code can
get replaced easily.</p>
<p>Second, this decision affects <strong>frameworks</strong> that are used to handle data of
<strong>all kinds</strong>, and the decision affects the entire code base of your
application and beyond, as described below. Data-handling frameworks that work
on the assumption that your data is not important are insanity. <a class="reference external" href="http://www.biblegateway.com/passage/?search=Psalm%2011:3&amp;version=KJV">If the
foundations be destroyed, what can the righteous do?</a></p>
<p>Third, it's entirely unnecessary. XSS is not hard to fix given decent
programming tools.</p>
</li>
<li><p class="first">At what point does data ‘enter’ your system?</p>
<p>It might sound like a simple question, but it's tricky in reality, and I'll
illustrate using an HTTP request.</p>
<p>In most web apps, the GET and POST parameters are your ‘raw input’. However,
using most normal web framework APIs, data in GET and POST parameters has
already been interpreted. The ‘raw’ data is really the bytes that make up the
HTTP request, which typically will use URL encoding for GET query parameters
and a choice of encodings for POST data (URL encoding or MIME multipart
attachment format).</p>
<p>The framework may also do another level of decoding — interpreting the
series of bytes as a series of unicode code points.</p>
<p>Both parts of this initial transformation makes sense and are appropriate,
because they are reversing the encoding already applied to the data by the
protocol involved. The web browser takes the data you type in — unicode code
points — and applies a series of transformations to it, according to the HTTP
protocol, and your web framework reverses these to get the data back.</p>
<p>Now, if you want to avoid XSS problems, you have to apply the escaping
<strong>after</strong> this initial decoding has been done. But this highlights another
possibility. What if the data requires <em>further</em> decoding before you get the
‘real’ raw data? For example, some data might be sent base64 encoded for a
variety of reasons, or any other type of encoding.</p>
<p>This extra level of encoding gives two problems:</p>
<ul>
<li><p class="first">your automatic HTML escaping may have corrupted the encoded data so that it
now cannot be decoded. For example, you had a GET parameter that held a URL,
which itself had parameters in the query string:</p>
<pre class="literal-block">
GET /foo?bar=1&amp;url=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26y%3D2 HTTP/1.1
</pre>
<p>Your framework's HTTP handling will produce a query dictionary that looks
something like the following:</p>
<pre class="literal-block">
{&quot;bar&quot;: 1,
 &quot;url&quot;: &quot;http://example.com/?x=1&amp;y=2&quot;
 }
</pre>
<p>But your automatic escaping turns that into:</p>
<pre class="literal-block">
{&quot;bar&quot;: 1,
 &quot;url&quot;: &quot;http://example.com/?x=1&amp;amp;y=2&quot;
 }
</pre>
<p>If you want to extract the 'y' parameter from 'url', you are stuck. You
can't correctly interpret the data in the 'url' parameter, because it has
been corrupted. You're going to have to re-decode the input, and you might
not even notice this problem.</p>
</li>
<li><p class="first">Even if the data comes through your automatic escaping unscathed
(e.g. base64 under HTML escaping), or you can undo the corruption and get it
properly decoded, after decoding you will have to <strong>manually apply</strong> HTML
escaping to make it match all the other automatically escaped data. If you
don't, you've potentially got a bug and an XSS exploit.</p>
<p>So your automatic escape-on-input has <strong>missed</strong> data, and this happens
because you can't really define the point at which the data has ‘entered’
your system and needs the escaping applied.</p>
</li>
</ul>
<p>This problem means that the escape-on-input approach is inherently flawed and
<strong>cannot</strong> be fixed. <strong>You just have to patch it up on a case-by-case basis,
which is exactly what escape-on-input is supposed to avoid.</strong></p>
<p>And then, what about other sources of data — data on the file system, in a
cache etc. Are these entry points? Well, it depends on how the data was put
there. You have to manually follow this all the way through your app; get it
wrong and you've got double escaping bugs or security flaws.</p>
<p>(By contrast, escape on output always works, because you apply it at the point
where you know it is needed — in the backend that knows the escaping rules.)</p>
</li>
<li><p class="first">Other systems putting data into your database, or getting data out, have to
abide by your data transformation rules.</p>
<p>These systems might have nothing to do with your primary domain (e.g. a web
site). Making them understand and obey rules that have nothing to do with the
data itself is insanity and extremely short sighted.</p>
<p>You can't deal with this problem when you come to it, because you don't have
to just fix your code, you've got to fix all your data too, and by the time
you cross this bridge you might have a lot of data and might need a very
delicate database migration to get it right. The data may even have escaped
your control (e.g. been copied into other systems), or backwards compatibility
concerns might stop you from making the change you need to make.</p>
</li>
<li><p class="first">Within your main application, the decision to escape on input affects your
whole code base.</p>
<p>If you want to use any libraries, you need to make sure that they are using
all the same assumptions that you have in your main code base.</p>
<p>For example, if you've got a form/widget library in your web app, it will very
often need to echo user input back to them in the case of a form that has
validation errors. This library has to know if you already escaped the input.</p>
<p>Writing the library to work in two modes is asking for trouble. Rather, you
need it to have been written from the beginning to assume the same escaping
rules.</p>
<p>This kills code re-use — you can only use code that assumes the same input
escaping — or it means that you will end up with tons of bugs due to
incompatibilities between the assumptions made in your application code and
the library.</p>
<p>Essentially, this is the problem of a global configuration setting, but worse
since it affects the <em>operand</em> of your entire application (the data going
through it), not just the functionality of various <em>operators</em>.</p>
</li>
<li><p class="first">The confusion caused by the above is likely to <em>increase</em> security
problems. “Keep It Simple, Stupid” remains a very good maxim for developers.</p>
<p>To continue an example used above: you want to send an email with some data
that has already been HTML escaped, and so you need to unescape the data to
avoid emails with the subject &quot;Jack &amp;amp; Jill&quot; when the user entered &quot;Jack &amp;
Jill&quot;. You decide it's not sensible for the mail sending functions to do this
internally, (or maybe they're provided by a 3rd party who made that decision
for you), so the calling code does the unescaping.</p>
<p>You later decide to switch to HTML emails, and the developer who implements it
thinks that since data is already escaped, there is no problem including it
without extra escaping in the body of the HTML email, leading to a
vulnerability (not classic XSS in this case, but still a problem).</p>
<p>There is also the example I gave above where an extra layer of
encoding/decoding in the raw data makes it likely you'll forget to apply the
escaping.</p>
<p>The confusion caused by escape-on-input means your entire code base becomes a
potential source not only of double-escaping bugs but of security problems as
well.</p>
</li>
</ul>
</div>
<div class="section" id="in-practice">
<h1>In practice</h1>
<p>Thankfully, we don't just have to rely on the above analysis to conclude that
escape-on-input is a terrible idea. PHP, always willing to help when it comes to
“examples of how not to do it”, provides us with a perfect case study.</p>
<div class="section" id="magic-quotes">
<h2>Magic quotes</h2>
<p>PHP used to have a feature called magic quotes. It was an escape-on-input
feature that escaped single quotes (<strong>'</strong>) with backslashes. This was to protect
you from SQL injection attacks, by making the data safe for interpolation into a
SQL query.</p>
<p>This caused all kinds of problems.</p>
<p>First, if you are not first passing something through a database, and using
string interpolation to build up SQL queries, you have to remember to strip
those slashes using the function <tt class="docutils literal">stripslashes()</tt>.</p>
<p>If you don't, you get double encoding. It looks like \'this\', you\'ve
almost certainly seen it across the web, though it seems we\'re thankfully
past the worst of it.</p>
<p>Second, even if you remember, you've added some hideous cruft to your code. In
the bit of code which is handling form validation (and is therefore echoing user
input back to the user without the database being involved), you've got these
bizarre <tt class="docutils literal">stripslashes()</tt> calls. What on earth does ‘reverse transforming a
string for SQL statement preparation’ have to do with the task of input
validation?</p>
<p>Third, it turns out that different databases need different escaping mechanisms
to do things fully correctly. So you now have to do <tt class="docutils literal">stripslashes()</tt> on data
even if you are passing it to a database using string-interpolated queries!</p>
<p>Then, since the above problems are common (building up SQL queries by string
interpolation was always a bad idea, and very often you pass on the data to
outputs that don't want SQL escaping at all), it's desirable to have a way to
turn this behaviour off completely.</p>
<p>To handle this, there is a php.ini setting to turn it on/off.</p>
<p>And there were more complications, for example:</p>
<ul class="simple">
<li>do you apply magic quotes to ‘all input’ (<tt class="docutils literal">magic_quotes_runtime</tt>) or just to
GET/POST/COOKIE data (<tt class="docutils literal">magic_quotes_gpc</tt>)? (This is the problem of defining
what exactly is ‘input’)</li>
<li>attempts to fix some of the above with yet more configuration options like
<tt class="docutils literal">magic_quotes_sybase</tt>.</li>
</ul>
<p>And so now you've got even more problems. Since these are global settings, you
can't have library code mess with them, since other code might set the global to
a different value or assume a certain value.</p>
<p>You could try making all code detect the current setting and have different code
paths depending on the result. This works very badly — having multiple code
paths is a recipe for code duplication and bug proliferation. It's extremely
easy to forget to do it, or get one of the paths wrong, since you will likely
only test one configuration value and one set of code paths in reality.</p>
<p>Alternatively, you can make one bit of code responsible for fixing the setting
to a sensible value (the only one being 'off'), and then make all code assume
that from then on. (If you can't turn it off, you can use the code included
<a class="reference external" href="http://www.php.net/manual/en/security.magicquotes.disabling.php">here</a> as a
horrible kludge to reverse it's behaviour).</p>
<p>Eventually, this final approach was the one taken by all significant
projects. <strong>Turn the whole feature off, and assume it is off from then
on</strong>. (Which means the feature is useless, of course).</p>
<p>And of course, thankfully, the PHP developers realised that this entire thing
was a <strong>huge mistake</strong> that caused nothing but a vast amount of confusion and
bugs, and <strong>removed the whole thing</strong> for good in PHP 5.4.</p>
<p>Magic quotes, <a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">as eevee put it</a>, were “so
close to secure-by-default, and yet so far from understanding the concept at
all.”</p>
<p>To digress for a moment: we keep getting told that PHP is improving, and the
community has learnt from its mistakes. Unfortunately it seems the leaders in
the community are bent on <strong>recreating</strong> old mistakes.</p>
<p>According to Lerdorf, the much newer PHP 'filter' extension is <a class="reference external" href="http://grokbase.com/t/php/php-internals/08373a1vvf/short-open-tag/083qakz7wj#20080323qvterw1df6a006qxyg83z9qsb8">“magic_quotes
done right”</a>. But
it still suffers from almost all the problems described here, for all the
reasons described. Global HTML escaping on input is essentially the same as
magic quotes, and just as tragically bad.</p>
</div>
<div class="section" id="elgg">
<h2>Elgg</h2>
<p>In researching for this post, I came across <a class="reference external" href="http://trac.elgg.org/ticket/561">this ticket for Elgg</a>, an open source social networking engine.
Just read through the ticket and see the mess they are in. It's clear they
strongly regret the decision they made to escape-on-input, and, in their own
words, have created “horrendous” problems for themselves, especially as their
application has grown to include other interfaces such as JSON REST APIs.</p>
<p>However, fixing it is very hard. They have to coordinate many changes across
their code base with a big database migration. If data has leaked from the
databases and tables they control into other systems, such as denormalised
tables, other databases, caches etc., or if there is other code by third parties
that makes the old assumptions about encoded data, they are in even more of a
pickle. And both of those things are probably inevitable in something like an
open source framework, which is designed for other people to build on and
extend.</p>
<p>This is the pain that comes from mixing input handling and output encoding,
and from corrupting the data in your database.</p>
</div>
<div class="section" id="etsy">
<h2>Etsy</h2>
<p>According to <a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security">their security presentation</a>,
Etsy are using escape-on-input for XSS protection.</p>
<p>They claim that this is a much more secure option, as it is secure by
default. (They do note, however, the problem with input that is encoded in some
other way, like base64, so they are aware of the problems.)</p>
<p>Their presentation goes on to describe an elaborate system for detecting and
fixing XSS attacks (the slides don't give enough detail for me to understand
what exactly they are doing, but it's clearly a lot of work).</p>
<p>And <a class="reference external" href="http://www.nzinfosec.com/etsy-has-been-one-of-the-best-companies-ive-reported-holes-to/">their system does indeed catch XSS bugs in the wild and allow them to fix
them within hours</a>.</p>
<p>Wait, what?</p>
<p>They've corrupted their database by doing escape-on-input, they've inflicted
themselves with all the development pain described above, and they've <strong>still</strong>
got XSS bugs?</p>
<p>Granted, they've got impressive ways of dealing with these problems. But it's
like <a class="reference external" href="http://xkcd.com/463/">virus checkers on voting machines</a>. Advanced ways
of dealing with problems that shouldn't even be possible tells you that you are
doing it wrong. They've become very fast at <a class="reference external" href="http://www.red-sweater.com/blog/125/easy-programming">re-tying their shoelaces, instead
of working out how to tie shoelaces so they don't come undone</a>.</p>
<p>They claim that with escape-on-input, XSS problems are now greppable, but it
doesn't sound like it. If they were, code audits would be a massively more
efficient way to find XSS problems than the methods they are using.</p>
<p>The main problem is almost certainly that they are using an output system for
HTML that doesn't do HTML escaping by default (I'm guessing they are using PHP
as their template language). If the backend that deals with HTML <strong>actually
deals with HTML</strong> then you eliminate the vast majority of these problems
overnight.</p>
<p>I'm willing to bet that large sites that use Django (or other frameworks that
have basically solved the XSS problem by HTML escaping on output <strong>by default</strong>)
don't have teams and automated systems dedicated to this problem, and don't need
them. In Django apps, XSS problems <strong>are</strong> greppable - you grep for
<tt class="docutils literal">mark_safe</tt> in Python and the <tt class="docutils literal">|safe</tt> filter in templates (and then,
obviously, you may have to recursively grep for any functions that call
<tt class="docutils literal">mark_safe</tt> on inputs). Since all data which isn't ‘mark_safe()’d gets escaped
by the templating engine, and all HTML comes out of the template engine, that's
basically all you need to do.</p>
</div>
</div>
<div class="section" id="now-for-some-flame-bait">
<h1>Now for some flame bait</h1>
<p>How did this happen to Etsy?</p>
<p>Are the Etsy devs stupid? I suspect not. Etsy is clearly doing well, and I
imagine they have enough money to hire top-notch developers. Some of their
<a class="reference external" href="http://www.etsy.com/careers/job_description.php?job_id=ozhhVfwM">careers pages</a> show they
are happy using a variety of languages and technologies, and their <a class="reference external" href="http://codeascraft.etsy.com/">engineering
blog</a> seems to be sane and competent. Even their
security presentation showed considerable ingenuity and technical ability in
dealing with security problems (in entirely the wrong way, unfortunately, but
still impressive).</p>
<p>I doubt they are low quality developers. Rather, I suspect that use of PHP has
addled their brains. They have become far too accustomed to working in an
environment in which insanity reigns — an environment in which <a class="reference external" href="/blogmedia/php_less_than.txt">the less than
operator pretends to work correctly with strings but it's just a trap</a>.</p>
<p>When I programmed in a Windows environment, I theorised that use of Windows
itself contributed to the poor quality of the programming in the code base, and
the fact that developers thought nothing or writing tons of tedious
code. Because Windows was so unscriptable, I imagined that Windows programmers
developed a high tolerance for tedium and repetition, which is exactly the
opposite of qualities needed by a programmer to make a computer do everything
efficiently and reliably. (Since then, I've found that Sturgeon's law was
probably a better explanation for the quality of the code, but I still think the
fundamental idea applies).</p>
<p>With PHP, the fact that it comes with a template language that is simply not fit
for purpose — because it doesn't do HTML escaping by default, or even easily —
has somehow made the Etsy developers believe that it is normal to struggle with
XSS, that it is perfectly reasonable that even after taking the drastic action
of corrupting their entire database by HTML escaping it, they should <strong>still</strong>
need elaborate XSS-catching systems.</p>
<p>Instead of <a class="reference external" href="http://www.youtube.com/watch?v=5mdy8bFiyzY">trying</a> to fix XSS,
they should just fix it. Like <a class="reference external" href="https://docs.djangoproject.com/en/dev/topics/templates/#automatic-html-escaping">this in Django</a>. Or
<a class="reference external" href="http://pypi.python.org/pypi/MarkupSafe/">this in Turbogears and Jinja</a>. Or
<a class="reference external" href="http://www.yesodweb.com/book/shakespearean-templates#types-33">this in Yesod</a>. Or even <a class="reference external" href="http://twig.sensiolabs.org/doc/templates.html#html-escaping">this
in PHP</a> (though
due to limitations of the language you won't be able to have the convenience of
things like <a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe">mark_safe</a>
in Django). But living with an environment of pain and madness makes you think
that it ought to be hard.</p>
<p>Right the way up to Rasmus Lerdorf at the top, many people in the PHP community
live with the insanity of their tools, and add more insanity to cope with it,
rather than fix their tools or choose better ones.</p>
</div>
<div class="section" id="a-lesson-for-pythonistas">
<h1>A lesson for Pythonistas</h1>
<p>Bashing other people is fun, but when I do so I always try to get something more
valuable out of it by using the opportunity to examine myself. The problem I
discussed in the last section (which is just a manifestation of the <a class="reference external" href="http://en.wikipedia.org/wiki/Broken_windows_theory">broken
windows theory</a>) applies
to other communities, and I'll attempt to apply it to the Python community.</p>
<p>Refusing to live with stupidity is one of the reasons that Python 3 is really
important.</p>
<p>Python 3 does not represent a massive leap forward in terms of additions to the
language. Mainly it just fixes a bunch of mistakes in Python 2, and introduces a
whole lot of backwards incompatibilities in the process. One of the biggest is
unicode/bytes. Python 2 was stupid here — it went directly against the Zen of
Python, and said “in the face of ambiguity about what encoding to use, guess.”
This caused a world of pain.</p>
<p>Now, you can work around it in most cases by some sensible conventions and a
certain amount of discipline. You can also cope with the fact the <tt class="docutils literal">&quot;a&quot; &lt; 1</tt>
doesn't raise an exception. You can live with <tt class="docutils literal">next()</tt> being a method in the
iterator protocol, when it should be a method called <tt class="docutils literal">__next__()</tt> and a builtin
<strong>function</strong> <tt class="docutils literal">next()</tt>. You can live with the fact that <tt class="docutils literal">print</tt> is a totally
unnecessary keyword, since it should just be a builtin function. You can get
used to the fact that <cite>class Foo:</cite> means something subtly but significantly
different from <cite>class Foo(object):</cite>. You can work around or ignore dozens of
other little niggles, gotchas and inconsistencies.</p>
<p>But all the while, you are training yourself to tolerate stupidity,
inconsistency and brokenness. Removing these warts is really important, and
worth all the pain of the migration. The alternative is for Python to become the
next PHP.</p>
<p>On top of these things, there are other types of brokenness in Python that
people in the community seem less willing to acknowledge or tackle. For some of
these I think we need exposure to completely different languages — languages
where you can spawn thousands of ‘threads’ easily and get performance benefits,
for example, or languages where you can write code that is both very high level
<strong>and</strong> extremely fast. If we live entirely with Python and its set of
limitations, we'll think that those problems are normal and unavoidable.</p>
<hr class="docutils" />
<p>Updates:</p>
<ul class="simple">
<li>2012/08/07 - corrections about turning magic_quotes_gpc off at runtime.</li>
<li>2012/10/08 - noted bug with queries returning false matches.</li>
</ul>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Reasons to love Django, part x of y]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/" />
    <id>http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/</id>
    <updated>2012-05-19T14:12:19Z</updated>
    <published>2012-05-19T14:12:19Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Reasons to love Django, part x of y]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/reasons-to-love-django-part-x-of-y/"><![CDATA[<div class="document">
<p>I needed to add a boolean field to a model. For many web apps, this typically involves:</p>
<ol class="arabic simple">
<li>modifying the model layer, so that the field becomes available as an attribute on retrieved objects, and can be queried against etc.</li>
<li>creating a database migration script that can be run immediately on the development box, and later for staging and production.</li>
<li>running the migration against the development DB.</li>
<li>updating any admin screens for editing the field.</li>
<li>checking the changes and scripts into source control.</li>
<li>deploying - including pushing source code and running migration scripts etc.</li>
</ol>
<p>Using <a class="reference external" href="https://www.djangoproject.com/">Django</a>, from a cold start (no editor/IDE open), this just took me <strong>1 minute 45 seconds of work</strong> for steps 1 - 5, and an additional 45 seconds waiting for step 6, total 2 minutes 30 seconds, and I wasn't rushing.</p>
<p>Step 1 is a one line code addition. Pretty much everything else can and should be generated automatically.</p>
<p>Step 2 is taken care of by a one line command using <a class="reference external" href="http://south.aeracode.org/">South</a>, as is step 3 and the database part of step 6 (which is run de-rigueur from my deployment scripts).</p>
<p>Step 4 is taken care of by Django's admin, which introspects the model and generates the right form for you.</p>
<p>This is one of the reasons I love Django. It's not so much the time it saves, although that is pretty awesome, it's the <em>tedium</em> it saves.</p>
<p>This is also one of the reasons I'm not very tempted by schema-less or schema-light databases, because with Django a nice strict schema brings so little administrative overhead. I was going to have to add <em>something</em> about the change to the model anyway, even if it was only documentation, and having done that in one place, the other additional changes required by a relational DB with strong schema placed virtually no burden on me.</p>
<p>(Of course, things could be more complex on bigger apps, especially if the table is large or sharded. But then again, there's no reason why rolling out your DB change shouldn't be just as automated - it's only the 'waiting' stage that <em>has</em> to take longer for a simple change like adding a column. If the coding/work part is taking much longer than the above example, your tools probably need fixing or replacing.)</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[WordShell - WordPress command line admin utility]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/wordshell-wordpress-command-line-admin-utility/" />
    <id>http://lukeplant.me.uk/blog/posts/wordshell-wordpress-command-line-admin-utility/</id>
    <updated>2012-05-10T14:51:29Z</updated>
    <published>2012-05-10T14:51:29Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <summary type="html"><![CDATA[WordShell - WordPress command line admin utility]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/wordshell-wordpress-command-line-admin-utility/"><![CDATA[<div class="document">
<p>A friend of mine has developed a <a class="reference external" href="http://wordshell.net/">command-line utility for managing many WordPress installations - WordShell</a>.</p>
<p>I don't use WordPress myself, but having been totally converted to tools like <a class="reference external" href="http://docs.fabfile.org/">fabric</a> for managing remote sites, I'm sure this tool would be invaluable if you need to manage WordPress sites.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Async Raven/Sentry client with Django/Python]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/async-raven-sentry-client-with-django-python/" />
    <id>http://lukeplant.me.uk/blog/posts/async-raven-sentry-client-with-django-python/</id>
    <updated>2012-02-07T21:04:07Z</updated>
    <published>2012-02-07T21:04:07Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Async Raven/Sentry client with Django/Python]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/async-raven-sentry-client-with-django-python/"><![CDATA[<div class="document">
<p><a class="reference external" href="https://github.com/dcramer/sentry">Sentry</a> really ought to use UDP, not TCP, because you don't want logging functionality to stall or even slow down your main application. At the moment, it doesn't support that, although there have been some promising <a class="reference external" href="https://github.com/dcramer/sentry/pull/282">commits</a>.</p>
<p>For my usage (a web application), this means that you can really only use Sentry for logging exceptions, and not for anything less important.</p>
<p>However, there are some alternatives to UDP that make Sentry usable for more than exceptions.  You could use a queue process like Celery or RabbitMQ (<a class="reference external" href="https://groups.google.com/d/msg/disqus-opensource/q2Ej2QFRkGY/0vRRwReE3HcJ">apparently</a> what they use at Disqus).</p>
<p>A more light weight alternative, however, is an asynchronous client that does its work in the background, and so doesn't block your web server thread.</p>
<p>There is some hopeful looking code in <a class="reference external" href="https://github.com/dcramer/raven/blob/1e2f98526b852e792b572e995037007b7c7f9150/raven/contrib/async.py">raven.contrib.async</a>, but unfortunately it currently has a critical <a class="reference external" href="https://github.com/dcramer/raven/commit/1e2f98526b852e792b572e995037007b7c7f9150#commitcomment-942624">bug</a>  (Raven 1.4.2).</p>
<p>However, using that code I cobbled together my own, and this one subclasses <tt class="docutils literal">DjangoClient</tt>, which is what I need:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">raven.contrib.django</span> <span class="kn">import</span> <span class="n">DjangoClient</span>
<span class="kn">from</span> <span class="nn">raven.contrib.async</span> <span class="kn">import</span> <span class="n">AsyncWorker</span>


<span class="k">class</span> <span class="nc">AsyncDjangoClient</span><span class="p">(</span><span class="n">DjangoClient</span><span class="p">):</span>
    <span class="sd">&quot;&quot;&quot;
    This client uses a single background thread to dispatch errors.
    &quot;&quot;&quot;</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">worker</span> <span class="o">=</span> <span class="n">AsyncWorker</span><span class="p">()</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">AsyncDjangoClient</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">send_sync</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">AsyncDjangoClient</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">send</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">worker</span><span class="o">.</span><span class="n">queue</span><span class="o">.</span><span class="n">put_nowait</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">send_sync</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">))</span>
</pre>
<p>Then you need to set SENTRY_CLIENT in your settings to point to this class.</p>
<p>(If you're not using Django, you should be able to do something similar.)</p>
<p>This is working fine for me - I can now enable the Sentry 404 middleware and not see any slowdown on my app, as opposed to the synchronous client which was slowing down 404 responses massively because my Sentry server is not on the same box as my main web app.</p>
<p>I should say this is use at own risk - the AsyncClient in Raven is undocumented as well as broken, so I don't know if it is considered a sensible approach or not!</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Some quick Django optimisation lessons]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/some-quick-django-optimisation-lessons/" />
    <id>http://lukeplant.me.uk/blog/posts/some-quick-django-optimisation-lessons/</id>
    <updated>2012-01-18T10:53:51Z</updated>
    <published>2012-01-18T10:53:51Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Some quick Django optimisation lessons]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/some-quick-django-optimisation-lessons/"><![CDATA[<div class="document">
<p>I recently used <a class="reference external" href="https://github.com/ridethepony/django-fiber">django-fiber</a> for a simple project.</p>
<p>It's a nice CMS, a bit more lightweight than <a class="reference external" href="http://www.django-cms.org">django-cms</a>, with a slightly slicker frontend editing experience, and a bit easier when it comes to sharing content between different pages.</p>
<p>I found however, that it was doing a rather large number of queries to render pages, and some that pulled back lots of data, especially when you were logged in (which enables frontend editing).</p>
<p>So, I set to work and created a <a class="reference external" href="https://github.com/spookylukey/django-fiber/tree/query_count_reduction">query count reduction branch</a>. Below are the results so far, and some lessons.</p>
<div class="section" id="results">
<h1>Results</h1>
<p>I used the <a class="reference external" href="https://github.com/ridethepony/django-fiber-example">example django-fiber project</a> for testing my changes (as well as my own project), and tried them out on both the home page and a more deeply nested page.</p>
<ul class="simple">
<li>URL: /<ul>
<li>Anonymous user:<ul>
<li>Original: 30 queries</li>
<li>Optimised: 15 queries - <strong>factor 2 reduction</strong></li>
</ul>
</li>
<li>Staff user:<ul>
<li>Original: 103 queries</li>
<li>Optimised: 28 queries - <strong>factor 3 reduction</strong></li>
</ul>
</li>
</ul>
</li>
<li>URL: /products/product-b/downloads/<ul>
<li>Anonymous user:<ul>
<li>Original: 64 queries</li>
<li>Optimised: 16 queries - <strong>factor 4 reduction</strong></li>
</ul>
</li>
<li>Staff user:<ul>
<li>Original: 150 queries</li>
<li>Optimised: 31 queries - <strong>factor 5 reduction</strong></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="section" id="lessons">
<h1>Lessons</h1>
<p>As you can see, there was a lot of unnecessary work. Django makes it extremely easy to generate database queries, which is both a good and bad thing, and here it is bad.</p>
<p>Here are some lessons for avoiding this unnecessary work:</p>
<ul>
<li><p class="first">Use <a class="reference external" href="https://github.com/django-debug-toolbar/django-debug-toolbar">django-debug-toolbar</a> when developing, right from the beginning.</p>
</li>
<li><p class="first">Seriously, use <a class="reference external" href="https://github.com/django-debug-toolbar/django-debug-toolbar">django-debug-toolbar</a>.</p>
</li>
<li><p class="first">Use <a class="reference external" href="https://github.com/django-debug-toolbar/django-debug-toolbar">django-debug-toolbar</a> and <strong>keep it turned on</strong>. And look at it regularly. OK, point made.</p>
</li>
<li><p class="first">This project was quite a lot easier to fix than django-cms, because it is much newer, and so has less code, and - more importantly - far fewer compatibility issues to worry about with 3rd parties who depend on certain features.</p>
<p>You <strong>should</strong> think about <a class="reference external" href="http://en.wikipedia.org/wiki/Big_O_notation">big O</a> scaling issues fairly early, because you can easily put yourself into a situation where things are hard to fix, due to:</p>
<ul class="simple">
<li>schema design.</li>
<li>promises you've made regarding functionality to 3rd parties.</li>
</ul>
<p>For example, django-fiber has a concept of 'current' pages (pages which would form part of the bread crumb for the page you are on) and, in addition to the obvious ones (the 'ancestors' of the active page in the tree of pages), it has a <a class="reference external" href="https://github.com/ridethepony/django-fiber/blob/85a46b2c996e0640c1978c059d96ba39cfd36b82/fiber/context_processors.py#L80">feature</a> which allows <strong>any</strong> page in the database to be a candidate 'current page' for any other page, based on a regex field. And so you have to check all these pages when rendering any page.</p>
<p>This does not scale well. Thankfully, it's not too much of a problem if you don't use this feature, since you can do DB level filtering to eliminate most of the pages as potential candidates for this.</p>
<p>But if you do use it, you have a scaling problem - for every time you use it,  then the amount of work you've got to do to render <strong>any</strong> page increases. (And, if the DB filtering isn't efficient, you may still be paying an increasing penalty for every page added to the system even if you never use the feature).</p>
<p><strong>EDIT:</strong> I should have mentioned on the positive side that django-fiber has obviously given thought to general scaling issues, and used django-mptt for the tree structure of their Page model. This made it relatively easy to fix the 'show_menu' template tag to do everything in 2 queries. Otherwise it would have been a nightmare.</p>
</li>
<li><p class="first">Create tests with <a class="reference external" href="https://docs.djangoproject.com/en/dev/topics/testing/#django.test.TestCase.assertNumQueries">assertNumQueries</a>, and add them fairly early. You'll then be alerted to performance regressions that affect scaling.</p>
<p>The tests will need to include tests for whole pages, not just bits, if you are going to analyse Big O scaling correctly, because a set of objects might be retrieved from the DB efficiently, but each one can easily make more database calls when it is actually used.</p>
</li>
</ul>
<p>Some more specific lessons:</p>
<ul>
<li><p class="first">Read and understand the <a class="reference external" href="https://docs.djangoproject.com/en/dev/topics/db/optimization/">Django docs on optimizing DB access</a>.</p>
</li>
<li><p class="first">Understand when QuerySets are evaluated (which is part of the above, but worth mentioning).</p>
<p>There were some examples in the fiber code of really inefficient use of QuerySets, e.g. <tt class="docutils literal">if obj in MyModel.objects.filter(foo=bar)</tt> (<a class="reference external" href="https://github.com/ridethepony/django-fiber/blob/85a46b2c996e0640c1978c059d96ba39cfd36b82/fiber/context_processors.py#L66">example</a>).  This code will load all the MyModel records with foo=bar and create MyModel instances. It does <strong>not</strong> do <tt class="docutils literal">MyModel.objects.filter(foo=bar, <span class="pre">pk=obj.pk).exists()</span></tt>, and it <strong>certainly</strong> does not do <tt class="docutils literal">if obj.foo == bar</tt>.</p>
<p>Although it could do the first of these, Django deliberately does not make this optimisation. Django's QuerySets are deliberately dumb. The rule of thumb is this: a QuerySet will only do one query - the query you have told it to do using methods such as <tt class="docutils literal">filter()</tt>, <tt class="docutils literal">order_by()</tt> and slicing. It does not respond 'intelligently' to any Python builtins such as <tt class="docutils literal">len()</tt>, <tt class="docutils literal">bool()</tt> or the <tt class="docutils literal">in</tt> operator - these simply force the QuerySet to be evaluated. There is efficiency in the way it uses its cache, but there is no 'cleverness'.</p>
<p>(BTW, I remember the <a class="reference external" href="https://groups.google.com/d/msg/django-developers/cK7o7zgZMxU/KWINMmZdJokJ">discussions</a> we had about this a long time ago on django-dev, and I'm convinced with hindsight we made the right discussion. It might seem like a nice opportunity to do some clever queries, but since the cleverness does not extend to actual mind-reading, it will fail and it will get in the way. For instance, in the <a class="reference external" href="https://docs.djangoproject.com/en/dev/topics/db/optimization/#don-t-overuse-count-and-exists">template example</a> in the docs, cleverness on the part of QuerySet would only result in unnecessary work, and it would be much harder to get it to do the right thing).</p>
</li>
<li><p class="first">Don't do queries when you've already got the information you need. There were multiple examples of this.</p>
<p>This will often mean that you will have some duplication of logic - a manager method that defines some filtering for a set of objects, and an instance level method which answers the question 'do I belong to that set of objects'.</p>
<p>This duplication is often unavoidable if you want any kind of performance. Just put a comment on the two methods indicating that they need to be synced with each other, and never use one when the other is what you need.</p>
<p>(I can conceive of a cleverer system that would allow one of these to be automatically created from the other, but it would be limited to the subset of what can be expressed in both SQL and Python, and it doesn't exist in Django).</p>
</li>
<li><p class="first">When defining special values that you need to search for, ensure that you can do efficient DB filtering.</p>
</li>
<li><p class="first">Beware this common bug - writing:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="ow">not</span> <span class="n">foo</span><span class="p">:</span>
</pre>
<p>when what you mean is:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="n">foo</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
</pre>
<p>These are completely different. If <tt class="docutils literal">None</tt> is being used as a sentinel value, the first will treat things like the empty list or empty dictionary incorrectly.   If you mean <tt class="docutils literal">if foo is None</tt> or <tt class="docutils literal">if foo is not None</tt>, <strong>always</strong> write just that, never assume that no other false-y values will be passed in.</p>
<p>This is not just a performance-related bug, but it can cause a massive amount of repeated work where <tt class="docutils literal">None</tt> is a sentinel value meaning &quot;the work has not been done yet&quot;, which is very common. This bug resulted in dozens of unneeded queries (including database UPDATEs  being made for every request) in django-fiber.</p>
</li>
</ul>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Starter fabfile and scripts for a Django project on Webfaction]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/starter-fabfile-and-scripts-for-a-django-project-on-webfaction/" />
    <id>http://lukeplant.me.uk/blog/posts/starter-fabfile-and-scripts-for-a-django-project-on-webfaction/</id>
    <updated>2011-12-23T12:05:38Z</updated>
    <published>2011-12-23T12:05:38Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[Starter fabfile and scripts for a Django project on Webfaction]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/starter-fabfile-and-scripts-for-a-django-project-on-webfaction/"><![CDATA[<div class="document">
<p><em>Note: I now recommend using gunicorn for deployment, various things about this need changing for that setup</em></p>
<hr class="docutils" />
<p>With the new <a class="reference external" href="http://www.webfaction.com/services/hosting">256 Mb limit for WebFaction Shared accounts</a>, at a squeeze you could now host up to 10 Django apps on a single account without upgrades, for a very low price.</p>
<p>This was a game changer for me, and I realised I need to be able to start a new Django project quickly, including setup on WebFaction, either to be able to host a small client website for them, or to generate a demo.</p>
<p>There are a few parts required, one of which is a fabfile for fabric deployment. Since this is often a sticking point for other people, I created one that should be generic enough for most Django apps that run on shared hosting:</p>
<p><a class="reference external" href="https://bitbucket.org/spookylukey/django-fabfile-starter/src/f4c87b0b2676911de2fa4a9784ca705f708b3bf1/">https://bitbucket.org/spookylukey/django-fabfile-starter/src/f4c87b0b2676911de2fa4a9784ca705f708b3bf1/</a></p>
<p>It requires some customisation, and is fairly simplistic, but should be a good starting point for more complex setups.</p>
<p>For quick project setup, there are other things you need:</p>
<ul class="simple">
<li>Use of <a class="reference external" href="http://www.doughellmann.com/projects/virtualenvwrapper/">virtualenvwrapper</a></li>
<li>Use of django-admin.py startproject to get things going. Note some <a class="reference external" href="https://docs.djangoproject.com/en/dev/releases/1.4/#updated-default-project-layout-and-manage-py">recent changes</a> you should be aware of.</li>
<li>Scripts to create sites on WebFaction, which is simple with <a class="reference external" href="http://docs.webfaction.com/xmlrpc-api/">WebFaction API</a>. Something like this:</li>
</ul>
<pre class="code python literal-block">
<span class="c">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">xmlrpclib</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">string</span>


<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">domain</span><span class="p">,</span> <span class="n">shortname</span><span class="p">):</span>
    <span class="n">server</span> <span class="o">=</span> <span class="n">xmlrpclib</span><span class="o">.</span><span class="n">ServerProxy</span><span class="p">(</span><span class="s">'https://api.webfaction.com/'</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">session_id</span><span class="p">,</span> <span class="n">account</span> <span class="o">=</span> <span class="n">server</span><span class="o">.</span><span class="n">login</span><span class="p">(</span><span class="s">'MYUSERNAME'</span><span class="p">,</span> <span class="s">'MYPASSWORD'</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Creating apps on webfaction&quot;</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_app</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">shortname</span> <span class="o">+</span> <span class="s">&quot;_django&quot;</span><span class="p">,</span> <span class="s">&quot;djangotrunk_mw33_27&quot;</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="s">&quot;&quot;</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_app</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">shortname</span> <span class="o">+</span> <span class="s">&quot;_static&quot;</span><span class="p">,</span> <span class="s">&quot;static_only&quot;</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="s">&quot;&quot;</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_app</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">shortname</span> <span class="o">+</span> <span class="s">&quot;_usermedia&quot;</span><span class="p">,</span> <span class="s">&quot;static_only&quot;</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="s">&quot;&quot;</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Creating database on webfaction&quot;</span><span class="p">)</span>
    <span class="n">l</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">letters</span> <span class="o">+</span> <span class="n">string</span><span class="o">.</span><span class="n">digits</span><span class="p">)</span>
    <span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">l</span><span class="p">)</span>
    <span class="n">password</span> <span class="o">=</span> <span class="s">''</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">16</span><span class="p">])</span>
    <span class="n">db_name</span> <span class="o">=</span> <span class="s">&quot;MYUSERNAME_</span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">shortname</span>
    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Remote database: </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">db_name</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Password: </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">password</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_db</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">db_name</span><span class="p">,</span> <span class="s">&quot;postgresql&quot;</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Creating domain on webfaction&quot;</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_domain</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">domain</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_domain</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span> <span class="n">domain</span><span class="p">,</span> <span class="s">'www'</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Creating website entry on webfaction&quot;</span><span class="p">)</span>
    <span class="n">server</span><span class="o">.</span><span class="n">create_website</span><span class="p">(</span><span class="n">session_id</span><span class="p">,</span>
                          <span class="n">shortname</span><span class="p">,</span>
                          <span class="s">'MY.IP.ADD.RESS'</span><span class="p">,</span>
                          <span class="bp">False</span><span class="p">,</span>
                          <span class="p">[</span><span class="n">domain</span><span class="p">,</span> <span class="s">'www.'</span> <span class="o">+</span> <span class="n">domain</span><span class="p">],</span>
                          <span class="p">(</span><span class="n">shortname</span> <span class="o">+</span> <span class="s">'_django'</span><span class="p">,</span> <span class="s">'/'</span><span class="p">),</span>
                          <span class="p">(</span><span class="n">shortname</span> <span class="o">+</span> <span class="s">'_static'</span><span class="p">,</span> <span class="s">'/static'</span><span class="p">),</span>
                          <span class="p">(</span><span class="n">shortname</span> <span class="o">+</span> <span class="s">'_usermedia'</span><span class="p">,</span> <span class="s">'/usermedia'</span><span class="p">))</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">:</span>
        <span class="k">print</span> <span class="s">&quot;Usage: create_webfaction_site.py $DOMAIN $SHORTNAME&quot;</span>
        <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="n">main</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</pre>
<ul>
<li><p class="first">Scripts for setup of your local database. Mine looks like this:</p>
<pre class="literal-block">
#!/bin/sh

sudo -u postgres psql -U postgres -d template1 -c &quot;CREATE DATABASE $1;&quot;
sudo -u postgres psql -U postgres -d template1 -c &quot;CREATE USER $1 WITH PASSWORD 'foo';&quot;
sudo -u postgres psql -U postgres -d template1 -c &quot;GRANT ALL ON DATABASE $1 TO $1;&quot;
# Need create DB privileges to run tests.
sudo -u postgres psql -U postgres -d template1 -c &quot;ALTER USER $1 CREATEDB;&quot;

echo &quot;Local database: $1&quot;
echo &quot;Password: foo&quot;
</pre>
</li>
<li><p class="first">A script to bind it all together. Mine takes just two arguments - a short name for the project and domain name.   A lot is fairly specific to my environment, so I won't share it, but the sake of completeness, I'll list what it does:</p>
<ul>
<li><p class="first">runs mkvirtualenv and creates some directories for the project</p>
</li>
<li><p class="first">runs django-admin.py startproject</p>
</li>
<li><p class="first">copies in the fabfile above, and modifies some settings</p>
</li>
<li><p class="first">creates a basic requirements.txt</p>
</li>
<li><p class="first">runs the local database creation script</p>
</li>
<li><p class="first">prints a reminder for some Django settings.py that would be hard to fix automatically, and anything else which is still to be done, which includes:
* Fixing the WebFaction Apache conf to point to myproject/wsgi.py
* Fixing the WebFaction Apache start script to include the virtualenv generated by the fabfile</p>
<p>Some of these things could be done automatically in the future</p>
</li>
</ul>
</li>
</ul>
<p>Having this written allows me to be massively more efficient with very small
sites, so it is now feasible to use something like <a class="reference external" href="https://www.django-cms.org/">django-cms</a> for a 5 page site, because the setup costs are
so low.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[A prayer to the programming gods]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/a-prayer-to-the-programming-gods/" />
    <id>http://lukeplant.me.uk/blog/posts/a-prayer-to-the-programming-gods/</id>
    <updated>2011-09-19T12:52:12Z</updated>
    <published>2011-09-19T12:52:12Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Software development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[A prayer to the programming gods]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/a-prayer-to-the-programming-gods/"><![CDATA[<div class="document">
<div class="line-block">
<div class="line">O gods of software development and operations, I have sinned.</div>
<div class="line"><br /></div>
<div class="line">Your anger falls on me, and I feel your wrath.</div>
<div class="line"><br /></div>
<div class="line">The web site I have inherited has no unit tests.</div>
<div class="line">It has no deployment script, and no README.</div>
<div class="line">Or database migration tool.</div>
<div class="line">It makes no use of virtualenv or requirements.txt or buildout,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;nor has any description of dependencies.</div>
<div class="line">It has most of the VCS history missing.</div>
<div class="line">Source dependencies are in random folders,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;clearly checked out from private SVN clones of</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;proprietary and open source projects,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;but forked at unknown date with no history.</div>
<div class="line"><br /></div>
<div class="line">And I cry, “Why me?”</div>
<div class="line"><br /></div>
<div class="line">Have I not used a fabfile for projects I have started?</div>
<div class="line">Have I not included a setup.py for my open source apps?</div>
<div class="line">Have I not written helpful docs, or at least a README.rst?</div>
<div class="line">Have I not written correct commit messages, with carefully</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;constructed patches that didn't mix features and fixes?</div>
<div class="line">Are the projects I hand on not covered by automated tests,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;at least for the critical functions?</div>
<div class="line"><br /></div>
<div class="line">But then I consider the sins of my youth,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;and I confess: You are just.</div>
<div class="line"><br /></div>
<div class="line">You could have given me the VBA project I wrote when I was 18.</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;or some of the web apps I have written since.</div>
<div class="line">It could have been the thousands of ASP.NET lines I cranked out in two short years,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;like the proverbial monkeys trying to produce Shakespeare.</div>
<div class="line">It could be raw SQL in the frontend code,</div>
<div class="line">&nbsp;&nbsp;&nbsp;&nbsp;and HTML mixed with business logic.</div>
<div class="line">You could have given me a PHP project.</div>
<div class="line"><br /></div>
<div class="line">But it is Python, and Django at that, and it is easily fixed.</div>
<div class="line">Your chastisement is light indeed.</div>
</div>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[Family Fortunes]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/family-fortunes/" />
    <id>http://lukeplant.me.uk/blog/posts/family-fortunes/</id>
    <updated>2011-05-13T13:57:27Z</updated>
    <published>2011-05-13T13:57:27Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Personal and misc" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Software projects" />
    <summary type="html"><![CDATA[Family Fortunes]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/family-fortunes/"><![CDATA[<div class="document">
<p>Yesterday I needed to play Family Fortunes (a.k.a. Family Feud) at our church youth group. Our church doesn't have internet connection, so I needed something that would work offline.</p>
<p>There were quite a few versions you could download — for a fee. But as well as the price, getting the computer to do it all is often not so much fun as something with humans more in control, and you're limited to what the computer can do — not easy to fix when you type something in wrong.</p>
<p>So in the end, with a couple of hours to go before the club started, I reckoned I could code up a simple implementation of the score board in a web browser. I scraped some questions from <a class="reference external" href="http://www.pub-quiz.net/Family-Fortunes-quiz.htm">here</a>, used a bit of Python to parse and convert to JSON, then used jQuery and HTML5 audio to get <a class="reference external" href="http://lukeplant.me.uk/familyfortunes/">something that works pretty well</a> for our purposes and doesn't need an internet connection.</p>
<p>Usage and source code on <a class="reference external" href="https://bitbucket.org/spookylukey/familyfortunes/overview">bitbucket</a>. In brief:</p>
<ul class="simple">
<li>n - next question</li>
<li>p - previous</li>
<li>1-5 - correct answer</li>
<li>x - wrong answer</li>
</ul>
<p>We set up my laptop as the screen, with some speakers for increased volume, and one of the other leaders used a USB keyboard to control it and decide on correct answers, while I was the game show host. It ended up being pretty fun, and I can imagine re-using on a church holiday etc, with a projector perhaps.</p>
<p>I was also impressed by just how much you can do in a browser these days, with so little time and effort. Recently the brilliant <a class="reference external" href="http://code.google.com/p/flot/">flot</a> library made me realise the same thing — more and more, web development is beating desktop development in terms of APIs, speed of development, ease of development, and even the performance of the result. I very quickly produced a great set of graphs gathering some stats on a website&nbsp;— both attractive and interactive — and realised that even with many of the easiest charting solutions for the desktop, I would not have been able to produce anything like the results.</p>
</div>
]]></content>
  </entry>
  <entry>
    <author>
      <name></name>
      <uri>http://lukeplant.me.uk/blog</uri>
    </author>
    <title type="html"><![CDATA[django-anonymizer released]]></title>
    <link rel="alternate" type="text/html" href="http://lukeplant.me.uk/blog/posts/django-anonymizer-released/" />
    <id>http://lukeplant.me.uk/blog/posts/django-anonymizer-released/</id>
    <updated>2010-12-24T12:40:59Z</updated>
    <published>2010-12-24T12:40:59Z</published>
    <category scheme="http://lukeplant.me.uk/blog" term="Python" />
    <category scheme="http://lukeplant.me.uk/blog" term="Web development" />
    <category scheme="http://lukeplant.me.uk/blog" term="Django" />
    <summary type="html"><![CDATA[django-anonymizer released]]></summary>
    <content type="html" xml:base="http://lukeplant.me.uk/blog/posts/django-anonymizer-released/"><![CDATA[<div class="document">
<p>It is common practice in develpment to use a database that is very similar in content to the real data. It is tempting to download a copy of the production database for smaller apps (or a trimmed version). The problem is that this can lead to having copies of sensitive customer data on development machines and other placers (like automatic backups). This Django app provides an easy and customizable way to anonymize data in your database, while leaving the structure intact and the data looking sensible.</p>
<p>Using some <tt class="docutils literal">manage.py</tt> commands, it will introspect your models to generate basic 'anonymizers' for you, attempting to guess the type of data in fields based on Django field type and field name (e.g. 'email', 'name', 'address' etc), and generate appropriate fake data. These guesses will probably need to be tweaked with knowledge of what is actually stored in your fields. You then run the anonymizers to change your data.</p>
<p>It is currently in a working state, although not all model fields are supported. It has support for the 'unique' constraint, to avoid database IntegrityErrors, and the 'max_length' attribute, but does not yet support 'unique_together'.</p>
<p>Download: <a class="reference external" href="http://pypi.python.org/pypi/django-anonymizer/">PyPI</a></p>
<p>Source: <a class="reference external" href="https://bitbucket.org/spookylukey/django-anonymizer/src">bitbucket</a></p>
<p>Bugs: <a class="reference external" href="https://bitbucket.org/spookylukey/django-anonymizer/issues?status=new&amp;status=open">bitbucket issues</a></p>
<p>The README has fairly good documentation, I don't know when I'll get around to writing some proper docs!</p>
<p>It uses the <a class="reference external" href="http://pypi.python.org/pypi/Faker">Faker</a> package by Dylan Clendenin (thanks!).</p>
<hr class="docutils" />
<p>UPDATE: <a class="reference external" href="http://packages.python.org/django-anonymizer/">Fuller docs available now</a></p>
</div>
]]></content>
  </entry>
</feed>
