<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="https://lukeplant.me.uk/assets/xml/atom.xsl" type="text/xsl media="all"?>
<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom">
  <title>Luke Plant's home page (Posts about PHP)</title>
  <id>https://lukeplant.me.uk/blog/categories/php.xml</id>
  <updated>2024-12-05T14:30:10Z</updated>
  <author>
    <name>Luke Plant</name>
  </author>
  <link rel="self" type="application/atom+xml" href="https://lukeplant.me.uk/blog/categories/php.xml"/>
  <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/categories/php/"/>
  <generator uri="https://getnikola.com/">Nikola</generator>
  <entry>
    <title>WordPress 4.7.2 post mortem</title>
    <id>https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/</id>
    <updated>2017-02-17T15:49:05Z</updated>
    <published>2017-02-17T15:49:05Z</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/"/>
    <summary type="html">&lt;p&gt;Some lessons from the recent WordPress vulnerability&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;A few weeks ago, &lt;a class="reference external" href="https://wordpress.org/news/2017/01/wordpress-4-7-2-security-release/"&gt;WordPress released version 4.7.2&lt;/a&gt; to
address several security vulnerabilities, including &lt;a class="reference external" href="https://make.wordpress.org/core/2017/02/01/disclosure-of-additional-security-fix-in-wordpress-4-7-2/"&gt;one critical one&lt;/a&gt;.
This vulnerability allowed a remote, unauthorised attack to update web pages via
the REST API. Since then, &lt;a class="reference external" href="https://wpengine.com/blog/rest-api-vulnerability/"&gt;hundreds of thousands of unpatched installations have
been defaced&lt;/a&gt;. In addition,
in some cases this can lead to remote code execution, and &lt;a class="reference external" href="https://blog.sucuri.net/2017/02/rce-attempts-against-the-latest-wordpress-rest-api-vulnerability.html"&gt;that has been seen in
the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The vulnerability was found by Sucuri, and &lt;a class="reference external" href="https://blog.sucuri.net/2017/02/content-injection-vulnerability-wordpress-rest-api.html"&gt;they have detailed the issue on
their blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I haven't found much by way of deeper analysis, and so this post is my take on
some of the deeper coding/development process issues behind such a serious
security problem. It is meant to be constructive – that is, what positive
lessons can be learned for avoiding this kind of thing in our own projects.&lt;/p&gt;
&lt;p&gt;(Upfront note about my biases: I'm not a fan of PHP, though I used to be a long
time ago, or WordPress, especially when it comes to security; after PHP I've
worked mainly with Python, and I'm a (rather inactive) Django core developer;
I've used various other languages, and I'm drawn to languages like Haskell and
similar.)&lt;/p&gt;
&lt;section id="summary-of-coding-error"&gt;
&lt;h2&gt;Summary of coding error&lt;/h2&gt;
&lt;p&gt;The primary error was a piece of code that failed to check a returned value to
see if it was an error value. In particular, the code looked like the following
(removing irrelevant details, as I will for all the samples in this post):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code php"&gt;&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-1" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-1" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-1"&gt;&lt;/a&gt;&lt;span class="x"&gt;public function update_item_permissions_check( $request ) {&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-2" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-2" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-2"&gt;&lt;/a&gt;&lt;span class="x"&gt;    $post = get_post( request['id'] );&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-3" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-3" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-4" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-4" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-4"&gt;&lt;/a&gt;&lt;span class="x"&gt;    if ($post &amp;amp;&amp;amp; ! $this-&amp;gt;check_update_permissions( $post ) {&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-5" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-5" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-5"&gt;&lt;/a&gt;&lt;span class="x"&gt;        return new WP_Error(...);&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-6" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-6" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-6"&gt;&lt;/a&gt;&lt;span class="x"&gt;    }&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-7" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-7" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-7"&gt;&lt;/a&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-8" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-8" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-8"&gt;&lt;/a&gt;&lt;span class="x"&gt;    # Various other conditions checked&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-9" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-9" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-10" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-10" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-10"&gt;&lt;/a&gt;&lt;span class="x"&gt;    return true;&lt;/span&gt;
&lt;a id="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-11" name="rest_code_af902bf9b5f847d39a6b9843e5fd97ba-11" href="https://lukeplant.me.uk/blog/posts/wordpress-4.7.2-post-mortem/#rest_code_af902bf9b5f847d39a6b9843e5fd97ba-11"&gt;&lt;/a&gt;&lt;span class="x"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In this case, &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; could return an error value, but this was never
checked for, so the permissions check ends up returning &lt;code class="docutils literal"&gt;true&lt;/code&gt;. Therefore, the
permissions check was inadequate, allowing the privilege escalation – i.e. an
unauthorised user could change any post by forcing the error condition in
&lt;code class="docutils literal"&gt;get_post&lt;/code&gt;, which turns out to be easy to do.&lt;/p&gt;
&lt;p&gt;There are various other bits of code that have issues, particularly failure to
properly validate the incoming post ID, and code which did it different ways. If
any one of them been done differently, the vulnerability might have been
avoided. You can look at Securi's posts above to see the full details, and I'll
discuss more below.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="causes-of-the-vulnerability"&gt;
&lt;h2&gt;Causes of the vulnerability&lt;/h2&gt;
&lt;p&gt;Now I'll come to a list of issues that I think were, or might have been, behind
this vulnerability.&lt;/p&gt;
&lt;p&gt;I'm aware that some of these may annoy you if your framework/language of choice
is implicated as inferior – but if you react by dismissing that point, and
saying it was all the &lt;strong&gt;others&lt;/strong&gt; that are to blame, you've missed the whole
point of doing a post mortem on someone else's misfortune.&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Dynamic typing/null values.&lt;/p&gt;
&lt;p&gt;The &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; method was able to return a null value, instead of an
object, without forcing that value to be checked. This is a flaw in all
dynamically typed languages, and many statically typed ones. In statically
typed languages that do not allow null values (e.g. Haskell, Rust in normal
code), you'd be forced to choose an explicit error handling mechanism that
would be much less prone to this kind of issue.&lt;/p&gt;
&lt;p&gt;There are multiple other ways that these kinds of statically typed languages
would have prevented an invalid string from being passed through multiple
levels of code – the design of such languages generally forces you to
sanitise earlier, eliminating the confusion that caused this bug.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PHP flaw – very poor error handling mechanisms.&lt;/p&gt;
&lt;p&gt;Builtin PHP functions, and therefore any PHP project, has a whole range of
error handling mechanisms – errors, warnings, returning error values, and
exceptions. At every point, calling code needs to know which system will be
used to handle errors. The calling code in &lt;code class="docutils literal"&gt;update_item_permissions_check&lt;/code&gt;
above would have been fine if &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; had thrown an exception for an
invalid post, but it didn't. To review the code, you need to know the
implementation and conventions of the code that is being called, which is
seriously impeded by having multiple options.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Poor choice of error handling.&lt;/p&gt;
&lt;p&gt;While PHP has flawed error handling (as above), it is still possible to do
your best. Since PHP has no method to ensure that calling code checks for
returned errors, using returned error values (whether null values, like
&lt;code class="docutils literal"&gt;null&lt;/code&gt;, &lt;code class="docutils literal"&gt;false&lt;/code&gt; etc., or custom error objects) is one of the worst kinds
of error handling mechanisms in PHP, and WordPress made the mistake of using
it.&lt;/p&gt;
&lt;p&gt;In fact, doing it correctly in WordPress is harder, in that they use not only
builtin false-y values to indicate errors, but they also have a custom error
class &lt;code class="docutils literal"&gt;WP_Error&lt;/code&gt;, which is not false-y (and probably can't be false-y due to
PHP limitations), so that properly checking for a null/error condition is
either very verbose, or requires you to remember which convention is used.
(e.g. &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; returns &lt;code class="docutils literal"&gt;null&lt;/code&gt; for error conditions, but
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;WP_REST_Posts_Controller::get_post&lt;/span&gt;&lt;/code&gt; returns &lt;code class="docutils literal"&gt;WP_Error&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;This contrasts with languages like Go, for example. Go returns a tuple for
every method that can fail, and if you don't check/use the error value from
the calling code, that's a compile time error, so it is much, much harder to
get this wrong. Plus, the language design means that basically all Go code
will work the same, so you know what correct code looks like.&lt;/p&gt;
&lt;p&gt;It also contrasts with conventions in Python and frameworks like &lt;a class="reference external" href="https://www.djangoproject.com/"&gt;Django&lt;/a&gt;, which will typically raise exceptions in
situations like this – &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;Post.objects.get(id='invalidid')&lt;/span&gt;&lt;/code&gt; will raise
&lt;code class="docutils literal"&gt;Post.DoesNotExist&lt;/code&gt;, rather than return &lt;code class="docutils literal"&gt;None&lt;/code&gt; etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Very poor framework design decision – merging of parameters from URL path
component and query string parameters.&lt;/p&gt;
&lt;p&gt;The WordPress code for the REST API has some routing methodology that allows
parameters to be defined in the paths of URLs e.g.:&lt;/p&gt;
&lt;pre class="literal-block"&gt;"/(?P&amp;lt;id&amp;gt;[\d+]+)"&lt;/pre&gt;
&lt;p&gt;This regex matches part of the path component in a URL, and the name &lt;code class="docutils literal"&gt;id&lt;/code&gt;
is used as a key in a dictionary of parameters passed to the handling
function. (This is pretty much exactly like Django URL routing). These
matched parameters are available in the request object passed to the
functions above.&lt;/p&gt;
&lt;p&gt;However, that object also includes GET (and perhaps POST) parameters, and as
it happens, when you do the simple dictionary access on that object, you get
a merged version in which GET takes priority over the URL path components.
This means that it is possible to pass an &lt;code class="docutils literal"&gt;id&lt;/code&gt; value that doesn't match the
regex:&lt;/p&gt;
&lt;pre class="literal-block"&gt;/123/?id=123somethingelse&lt;/pre&gt;
&lt;p&gt;...and it is &lt;code class="docutils literal"&gt;123somethingelse&lt;/code&gt;, not &lt;code class="docutils literal"&gt;123&lt;/code&gt; that will be returned by
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;request['id']&lt;/span&gt;&lt;/code&gt; shown in the pseudo-code above. While there are more
specific methods available for getting URL path or query string parameters,
the dictionary/array access shown is the most convenient (and therefore most
encouraged) way to get data out of it.&lt;/p&gt;
&lt;p&gt;This feature seems to be inspired by PHP's &lt;code class="docutils literal"&gt;$_REQUEST&lt;/code&gt;, but is worse,
because &lt;code class="docutils literal"&gt;$_REQUEST&lt;/code&gt; is no more convenient than &lt;code class="docutils literal"&gt;$_POST&lt;/code&gt; or &lt;code class="docutils literal"&gt;$_GET&lt;/code&gt;, and
in fact requires more characters to type. Here, however, not only is the
merged dictionary more convenient to access than the un-merged ones, it
also includes URL path components to confuse things even more.&lt;/p&gt;
&lt;p&gt;This meant that the checking of the format of the ID parameter that appeared
to take place in the routing code (via a regex that limited to numeric IDs)
was ineffective.&lt;/p&gt;
&lt;p&gt;Lesson: Merging data from multiple distinct sources into a single bag with
the same namespace is very often a bad idea. In particular, if the different
sources are not subject to the same validation rules, or correct usage might
rely on users knowing which source has precedence, you are asking for
trouble.&lt;/p&gt;
&lt;p&gt;Historic Django note: long ago before 1.0 Django had a very similar feature —
dictionary access on the request object did a merge of &lt;code class="docutils literal"&gt;request.GET&lt;/code&gt; and
&lt;code class="docutils literal"&gt;request.POST&lt;/code&gt; parameters. It was realised this was a bad idea, and for the
1.0 release this was replaced with a more explicit and discouraged
&lt;code class="docutils literal"&gt;request.REQUEST&lt;/code&gt; object. This too was eventually &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/releases/1.7/#merging-of-post-and-get-arguments-into-wsgirequest-request"&gt;deprecated in 1.7&lt;/a&gt;
and later removed. I heard &lt;a class="reference external" href="https://twitter.com/simonw/status/702950100583227399"&gt;some protests&lt;/a&gt; about this, but I'm
glad it's gone – it is too much of a liability to justify its very occasional
usefulness.&lt;/p&gt;
&lt;p&gt;(Update – just hours after writing this post, while reviewing deprecation
warnings for a Django 1.8 project I maintain, a usage of &lt;code class="docutils literal"&gt;request.REQUEST&lt;/code&gt;
was flagged and – surprise, surprise – it haboured a security vulnerability).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple calls to retrieve the same thing from the database.&lt;/p&gt;
&lt;p&gt;You might think that since &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; returns a null object (of some kind),
then there won't be a post object to update, and therefore no vulnerability.
However, it turns out that &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; is called in multiple places within
the handler – once to check permissions, and again to actually do the update,
something like this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;public function update_item( $request ) {
    $id = (int)request['id'];
    $post = get_post( $id );

    # Some checks and processing...then:

    wp_update_post( $post, $some_other_data );
}&lt;/pre&gt;
&lt;p&gt;When this code is run, the permission checks should have already been run (in
a separate code path). This is a performance bug in itself – it is doing
multiple DB queries (assuming no caching) to get the same thing.&lt;/p&gt;
&lt;p&gt;There is a bigger problem, which is that it is possible for the different
callees to call &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; with different data, and as it turns out they
do. The difference allowed the attack to proceed – the code checking for
permission to edit the object did not find an object to edit, but the code
doing the editing did. This is a flawed permission checking API.&lt;/p&gt;
&lt;p&gt;Lesson: if your permission checking code is separate from the main path that
retrieves the data, it might be doing something critically different.&lt;/p&gt;
&lt;p&gt;Lesson: if you are loading the same data from the database multiple times,
this is a code smell that you've got duplication that might be harbouring
mistakes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple attempts at sanitising.&lt;/p&gt;
&lt;p&gt;Sanitising is good, but it should be done in the right place. In
&lt;code class="docutils literal"&gt;update_item&lt;/code&gt; above, when it called &lt;code class="docutils literal"&gt;get_post&lt;/code&gt; it made a half-hearted
attempt to sanitise the post ID value it passed in, by converting to an
integer. This made it different from the other call to &lt;code class="docutils literal"&gt;get_post&lt;/code&gt;, with
the result described above – this code was able to retrieve a post object
to update, while the permissions checking code was not.&lt;/p&gt;
&lt;p&gt;The problem is that just sanitising what you pass to a call doesn't sanitise
where you got it from, or sanitise other people's usage of it which might
have happened earlier. Sanitising is in the wrong place if it does not make
it harder for any other code to get the unsanitised data.&lt;/p&gt;
&lt;p&gt;Lesson: If you're sanitising data all over the place, it's a sign you don't
actually know how to clean input data up, and you could in fact be making
things worse.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PHP weak typing flaw – ‘casting’ (as it is called in PHP) doesn't raise
errors for invalid values, and in fact sometimes just ignores bad data.&lt;/p&gt;
&lt;p&gt;Namely, in PHP:&lt;/p&gt;
&lt;pre class="literal-block"&gt;(int)"123abc" === 123&lt;/pre&gt;
&lt;p&gt;However:&lt;/p&gt;
&lt;pre class="literal-block"&gt;is_numeric("123abc") === false&lt;/pre&gt;
&lt;p&gt;So, in &lt;code class="docutils literal"&gt;update_item&lt;/code&gt; above, ‘casting’ to an int just ignores the data that
couldn't be converted. If PHP had done something more strict (e.g. raise an
error, or even return null), this vulnerability would not have happened. In
fact, if these two methods had simply agreed with each other, the
vulnerability would not have happened either. The difference between them
meant that one bit of code thought “There isn't a valid ID that I could even
use to do a DB lookup”, and another was able to find a post just fine, by
ignoring the data that didn't look like a number.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A permissions framework that defaults to “allow”.&lt;/p&gt;
&lt;p&gt;The permission method responsible for the failure basically works like this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;if error_condition_1:
    return Error(...)

if missing_permission_1:
    return Error(...)

if missing_permission_2:
    return Error(...)

return True&lt;/pre&gt;
&lt;p&gt;It's basically impossible to look at the code and guess whether they missed
anything (in this case, checking for an error code in one of the called
functions was missing). A lot of permission checking code looks like this,
or has the same “default to allow” construction. For example, in Django code,
views are all public by default, and you have to remember to add a decorator
to limit access.&lt;/p&gt;
&lt;p&gt;Many database applications also work the same way – often any code in the
application can access and update any record of any table, and you have to
specifically remember every single limitation you want to apply if you don't
want the “allow all” behaviour. Fixing this can be hard, requiring big
changes in architecture.&lt;/p&gt;
&lt;p&gt;One way to do it is to have allowed functionality encapsulated into some kind
of code object, that is only returned when correct credentials are passed.
Or, permissions check methods should return some credential object that must
then be passed to every model layer function that wants to take some action.&lt;/p&gt;
&lt;p&gt;At the very least, rather than the REST interface and the normal admin
interface doing the same checks, the model layer should do this. This results
in less code surface area to attack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A “don't let it crash” mentality.&lt;/p&gt;
&lt;p&gt;The design of PHP has this mentality – most builtin functionality will just
try to continue and return &lt;strong&gt;something&lt;/strong&gt; rather raise an error that might
terminate execution of the program. This is due to its history as a template
language. Unfortunately, while this might sometimes be a reasonable design
choice for templating (though that is highly debatable), it is always a
terrible choice for a general purpose language. It is better to crash than to do
the wrong thing.&lt;/p&gt;
&lt;p&gt;The same mentality can be seen in some of the code involved in this
vulnerability.&lt;/p&gt;
&lt;p&gt;For example, &lt;code class="docutils literal"&gt;update_item&lt;/code&gt; has:&lt;/p&gt;
&lt;pre class="literal-block"&gt;public function update_item( $request ) {
    $id   = (int) $request['id'];
    $post = get_post( $id );&lt;/pre&gt;
&lt;p&gt;Converting to integer presumably is designed to makes sure that &lt;code class="docutils literal"&gt;get_post&lt;/code&gt;
doesn't crash – we will at least be passing it something of the right form.
It is a misguided and misimplemented form of defensive programming. If it had
been omitted, the vulnerability would have been avoided.&lt;/p&gt;
&lt;p&gt;In &lt;code class="docutils literal"&gt;update_item_permissions_check&lt;/code&gt;, we have this code:&lt;/p&gt;
&lt;pre class="literal-block"&gt;public function update_item_permissions_check( $request ) {

    $post = get_post( $request['id'] );

    if ( $post &amp;amp;&amp;amp; ! $this-&amp;gt;check_update_permission( $post ) ) {
        ...&lt;/pre&gt;
&lt;p&gt;The code checks that &lt;code class="docutils literal"&gt;$post&lt;/code&gt; is not &lt;code class="docutils literal"&gt;null&lt;/code&gt; or false-y before attempting
to call a method on it. This is again defensive programming, and seems
reasonable, but in fact is only designed to make sure the code the doesn't
crash &lt;em&gt;here&lt;/em&gt; – “I must remember not to call a function that might crash with
&lt;code class="docutils literal"&gt;null&lt;/code&gt; parameters”. However, the big problem is that the code doesn't
actually deal with the possibility that &lt;code class="docutils literal"&gt;$post == null&lt;/code&gt;. If instead the
code had avoided the &lt;code class="docutils literal"&gt;null&lt;/code&gt; check, then for the exploit scenario
&lt;code class="docutils literal"&gt;check_update_permission&lt;/code&gt; would simply have caused a crash – which would
have been harmless (you just get a 500 error, which is an error for that
user, but doesn't affect anyone else).&lt;/p&gt;
&lt;p&gt;In fact the 4.7.2 code still has the same structure, but now looks like this:&lt;/p&gt;
&lt;pre class="literal-block"&gt;$post = $this-&amp;gt;get_post( $request['id'] );
if ( is_wp_error( $post ) ) {
    return $post;
}

if ( $post &amp;amp;&amp;amp; ! $this-&amp;gt;check_update_permission( $post ) ) {
    ...&lt;/pre&gt;
&lt;p&gt;The last line here makes no sense now – unlike the global function
&lt;code class="docutils literal"&gt;get_post&lt;/code&gt;, &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;$this-&amp;gt;get_post&lt;/span&gt;&lt;/code&gt; never returns a false-y value, it returns
either a &lt;code class="docutils literal"&gt;WP_Post&lt;/code&gt; or a &lt;code class="docutils literal"&gt;WP_Error&lt;/code&gt;. The rest of the codebase is littered
with this kind of thing, and in many cases it is the right thing (at least
locally). This pattern can easily become a habit, and so it becomes hard to
spot that here it is a vulnerability (4.7.1 code), or makes no sense (4.7.2).&lt;/p&gt;
&lt;p&gt;Lesson: the aim should be that your program does the &lt;strong&gt;right&lt;/strong&gt; thing, and if
it can't do that, it should do &lt;strong&gt;nothing&lt;/strong&gt;. The worst thing you can do is
have a philosophy of “whatever happens, the program should not crash in the
line of code I'm currently writing”.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code-base compatibility constraints.&lt;/p&gt;
&lt;p&gt;Looking through the code, it becomes clear that there are multiple different
ways to do everything.&lt;/p&gt;
&lt;p&gt;The REST API uses code that is Ruby-on-Rails-ish – controller classes with
request objects being passed in to methods. This feels modern-ish (despite
various design flaws I've mentioned and being very verbose) – anyone using a
recent web framework would understand roughly how it works. Most of the rest
of the code base uses the classic “Spaghetti PHP files” anti-pattern – a
chain of includes that you have to follow to work out where things are
actually done, request handling code executing at the top level rather than
just function and class definitions.&lt;/p&gt;
&lt;p&gt;As noted above, some functions use false-y values to indicate errors, others
use &lt;code class="docutils literal"&gt;WP_Error&lt;/code&gt; (e.g. &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;WP_REST_Posts_Controller::get_post&lt;/span&gt;&lt;/code&gt;). You've just
got to remember which is which to get it right. In fact, of the first
variety, some code uses &lt;code class="docutils literal"&gt;false&lt;/code&gt; (e.g. &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;WP_Post::get_instance&lt;/span&gt;&lt;/code&gt;), while
some use &lt;code class="docutils literal"&gt;null&lt;/code&gt; (e.g. global &lt;code class="docutils literal"&gt;get_post&lt;/code&gt;), which is the kind of
inconsistency that is asking for trouble.&lt;/p&gt;
&lt;p&gt;Why hasn't any of this been cleaned up? My guess is backwards-compatibility.
WordPress's crown jewel is its installation base, which is huge. Fixing
their code to be more consistent and secure would involve breaking tons of
plugins, so they are not going to do that. Their only option is to try to
write new code using better patterns, but this is itself a problem – the
classic &lt;a class="reference external" href="http://antipatterns.com/lavaflow.htm"&gt;Lava flow anti-pattern&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sometimes people complain about the rate at which Django &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/internals/deprecation/"&gt;deprecates&lt;/a&gt; things, but
I've already given one example above of why it is very important that we do
so. Django has a pretty good track record on security only because we are
willing to sometimes break things at the &lt;strong&gt;API&lt;/strong&gt; level, rather than just
leave sharp edges everywhere and tell people not to cut themselves.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Volume of code. The more verbose your code, the easier it will be for code
reviews to miss things like this – both specific errors, and poor internal
API design.&lt;/p&gt;
&lt;p&gt;Look at the WordPress code, and you will find there is &lt;strong&gt;tons&lt;/strong&gt; of it. It is
extremely verbose – it's like you are reading code designed to mimic the bad
points of Java with none of its good points. Every class is so big it needs
its own file, despite having extremely little meat in it.&lt;/p&gt;
&lt;p&gt;WordPress is currently about 500,000 LOC (PHP+JS+CSS, including comments, not
including docs, tests and bundled themes, and not including other text that
does not need to be maintained – I tried to come up with some criteria that
would be roughly fair to several projects, knowing that they are different in
nature).&lt;/p&gt;
&lt;p&gt;Compare this (with the caveats already mentioned, and using similar criteria
for counting) to:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;the web framework framework Django – about 135,000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Django based CMS Mezzanine – 43,000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Django based CMS WagTail – 57,000&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the Django based CMS django-fiber – 18,000&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Or, within a CMS, compare the &lt;a class="reference external" href="https://core.trac.wordpress.org/browser/trunk/src/wp-includes/rest-api/endpoints"&gt;REST backend in WordPress&lt;/a&gt;
which is about 9,000 LOC (not including an additional 3,000 in the base
classes and other supporting code), with the &lt;a class="reference external" href="https://github.com/ridethepony/django-fiber/blob/master/fiber/rest_api/views.py"&gt;REST backend in django-fiber&lt;/a&gt;
– about 300 LOC (it uses &lt;a class="reference external" href="http://www.django-rest-framework.org/"&gt;django-rest-framework&lt;/a&gt; to do the heavy lifting).&lt;/p&gt;
&lt;p&gt;If we decide that's not fair, we could also compare to &lt;a class="reference external" href="https://github.com/wagtail/wagtail/tree/master/wagtail/api/v2"&gt;WagTail's REST API&lt;/a&gt; which
doesn't use a framework (other than Django), and therefore includes its own
router and serialization classes, and still only weighs in at 1,400 LOC —
about one tenth the size of the equivalent code in WordPress.&lt;/p&gt;
&lt;p&gt;Yes, django-fiber uses django-rest-framework, and all the above use Django
and other projects, but those other projects provide re-usable code, code
that has many features unused by the above CMS's, and code that is therefore
very generic. WordPress code on the other hand, despite its 'framework' code
being written for internal use, and not really being usable outside
WordPress, manages to be many times more verbose. You could put Django,
django-rest-framework (15k), Mezzanine, Wagtail, django-fiber, Pillow (image
library used by many Python CMSs, 22k), django-mptt and django-treebeard
(Django database tree libraries used by these CMSes, 4k and 6k),
django-compressor (build tool used by several, 5k) all together and
still be nowhere near matching WordPress for LOC.&lt;/p&gt;
&lt;p&gt;How does WordPress manage to be so big? Yes, WordPress code contains more
comments, but that's still code you have to scroll past, or read to
understand. Whatever the reason, one of the consequences is that when
reviewing code, you are just going to fall asleep that much sooner. I
couldn't find much by way of detailed review for the &lt;a class="reference external" href="https://core.trac.wordpress.org/ticket/33982"&gt;ticket that introduced
the WordPress REST infrastructure&lt;/a&gt; given it is such a big
patch (but I might be looking in the wrong places).&lt;/p&gt;
&lt;p&gt;I suspect that the verbosity of WordPress is likely triggered by the poor
quality of language features and/or poor internal design.&lt;/p&gt;
&lt;p&gt;The linked Python code is generally much more compact, and this is probably
mainly due to the design of Python, especially things like first-class
classes and decorators. In some cases, there are definitions of REST API
endpoints that have no methods defined in them at all. Declarative code was
sufficient in many cases, and the code is easy to read and very free of noise.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not learning from other systems and frameworks.&lt;/p&gt;
&lt;p&gt;I've highlighted a few places in the design of WordPress's REST
infrastructure that look like very poor design decisions (e.g. merging of
parameters from different sources). These things are are known to be poor
design decisions in other parts of the web development world (e.g. Django
removed or discouraged them years ago), yet experienced PHP developers
didn't see them when it came to code review time. This could have been
because they fell asleep due to the volume, but I'd tentatively suggest that
another factor might have been a narrow experience of the world, and one
particularly dominated by the norms of the PHP world, where the whole
language and framework design normalises poor design decisions.&lt;/p&gt;
&lt;p&gt;This may be unfair – a lot of the new code for the REST API looks like it is
inspired by other frameworks, which presumably means the developers have had
some exposure to other systems.&lt;/p&gt;
&lt;p&gt;But either way – the lesson is that a failure to learn from other
programming sub-cultures and other languages means we might be walking into
traps that we could easily avoid. For myself, I think that in the Python
community there can be a snobbishness about other languages (e.g. the way I
look down on PHP) that can lead to a complacent lack of knowledge of other
languages and systems that are way ahead in some areas.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Well, that's all. Hopefully there is something we can learn from this nasty
vulnerability!&lt;/p&gt;
&lt;/section&gt;</content>
    <category term="django" label="Django"/>
    <category term="php" label="PHP"/>
    <category term="python" label="Python"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>Why escape-on-input is a bad idea</title>
    <id>https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/</id>
    <updated>2012-08-06T20:59:01+01:00</updated>
    <published>2012-08-06T20:59:01+01:00</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/"/>
    <summary type="html">&lt;p&gt;With examples from the web development world especially PHP, and lessons for Pythonistas&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;The right way to handle issues with untrusted data is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Filter on input, escape on output&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means that you validate or limit data that comes in (filter), but only
transform (escape or encode) it at the point you are sending it as output to
another system that requires the encoding. It has been standard best practice
since just about forever &lt;sup&gt;[citation required]&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;An alternative is “escape on input”: at the point that data enters your system,
you apply a transformation to it to avoid a problem further down the line when
the data is used.&lt;/p&gt;
&lt;p&gt;It's come to my attention that some serious web developers (or at least, they
take themselves seriously and are taken seriously by others) are &lt;strong&gt;still&lt;/strong&gt;
suggesting the practice of escape-on-input.&lt;/p&gt;
&lt;p&gt;For example, with escape-on-input, to avoid XSS any data that enters your system
has HTML escaping applied to it &lt;em&gt;immediately&lt;/em&gt;, before your application code
touches it.&lt;/p&gt;
&lt;p&gt;I chose that example deliberately, because people are actually recommending it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html#comment-572962448"&gt;in some recent “PHP sucks” debate&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;which, in turn, linked to a &lt;a class="reference external" href="http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html"&gt;page by Rasmus Lerdorf recommending
escape-on-input as a sensible way to deal with XSS&lt;/a&gt;.
The page, admittedly, is describing a ‘toy’, a ‘no-framework PHP framework’,
yet he does seem to be serious about the usefulness of escape-on-input.&lt;/p&gt;
&lt;p&gt;The page is from 2006, and uses the pecl/filter extension, but the extension
has since made it into core (PHP 5.2), and the &lt;a class="reference external" href="http://www.php.net/manual/en/filter.configuration.php"&gt;docs for it&lt;/a&gt; suggest a
configuration that is clearly intended for XSS prevention. As recently as
2008, and probably to this day, Lerdorf is &lt;a class="reference external" href="http://grokbase.com/p/php/php-internals/083qakz7wj/php-dev-short-open-tag"&gt;still defending and recommending
this approach&lt;/a&gt;,
and it appears to be part of his reason for thinking that PHP templating
doesn't need an autoescape mechanism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Just as significantly, &lt;a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security"&gt;Etsy are using and recommending escape-on-input&lt;/a&gt;
(slide 18 onward). As a very successful modern company using PHP, people will
look up to them and copy them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, this approach, unfortunately, is popular amongst some, and I can't find a
decent post explaining why it's such a terrible idea both in theory and
practice. Here is my attempt. It should be applicable to almost any system and
any language, although I'll mainly be using examples from web development.&lt;/p&gt;
&lt;section id="in-theory"&gt;
&lt;h2&gt;In theory&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;First of all, escape-on-input &lt;strong&gt;is just wrong&lt;/strong&gt; – you've taken some input and
applied some transformation that is totally irrelevant to that data. If,
taking our example, you have some data collected by HTTP POST or GET
parameters, applying HTML escaping to it is a layering violation – it mixes an
output formatting concern into input handling. Layering violations make your
code much harder to understand and maintain, because you have to take into
account other layers instead of letting each component and layer do its own
job.&lt;/p&gt;
&lt;p&gt;Doing things ‘right’ is very important, even if doing them ‘wrong’ seems to
work and you are tempted to be dismissive of ‘theoretical’ concerns about
purity etc. When you have to maintain code, you will be very glad if things
are in the right place, and not full of hacks and surprises.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You have corrupted your data by default. The system (or the most convenient
API) is now lying about what data has come in. As you have applied a
transformation to the &lt;strong&gt;data itself&lt;/strong&gt;, the layering violation is not an
isolated problem in one part of the code, but infects every part of your code,
especially if you store the corrupted data in a database.&lt;/p&gt;
&lt;p&gt;Your data is &lt;strong&gt;everything&lt;/strong&gt;. As I read recently, &lt;a class="reference external" href="http://blog.datamarket.com/2012/07/08/the-11-best-data-quotes/"&gt;“data matures like wine,
applications like fish”&lt;/a&gt;. You can
always rewrite your application, but if you corrupt your data, you've done the
worst thing you can to your system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is exacerbated by the fact that many encodings are one-way – you cannot
losslessly or unambiguously convert them back. If at a later point you need
the original data, you might be in a pickle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping your data for one output backend will only deal with &lt;strong&gt;that&lt;/strong&gt;
output. A typical web app might deal with at least the following backends,
which have different characters that are dangerous, and have different
requirements for dealing with them:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;HTML: ' &amp;lt; &amp;gt; " &amp;amp;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;URLs: / : &amp;amp; ? # text starting 'javascript:'&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Javascript: " '&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SMTP and HTTP: ; : newlines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQL: '&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JSON: "&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell - space, quotes and various other characters&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Any number of others could be added, and all could have security
implications. Using escape-on-input will only fix one of these - apart from
happy coincidences where it might fix more than one. Security should not rely
on happy coincidences, and for the other outputs you will still need a
sensible solution to the problem. Why not have a sensible solution for all of
them?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping for one output may not deal with even that single output correctly,
because escaping can be &lt;strong&gt;context dependent&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Various outputs can be embedded in others, and they have &lt;strong&gt;different escaping
rules&lt;/strong&gt;. So, you can embed URLs in HTML. And URLs in CSS. And CSS in HTML. And
Javascript in URLs. And Javascript in HTML...&lt;/p&gt;
&lt;p&gt;If you prepared something for HTML, did you prepare it for HTML element body
context, or HTML attribute context, or URLs in HTML attributes, or CSS in
HTML? Or URLs in CSS in HTML? If someone passes in a value for a URL which is
then used in an &lt;code class="docutils literal"&gt;href&lt;/code&gt; attribute in HTML, HTML escaping of &lt;strong&gt;&amp;lt; &amp;gt; &amp;amp; ; " '&lt;/strong&gt; won't
protect adequately you from XSS. Interactions between CSS/Javascript parsers
and HTML parsers make things &lt;a class="reference external" href="https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet"&gt;even more complex&lt;/a&gt;.
So “escape at the beginning and then forget about it” does not work even for
the single output of ‘HTML’, because it is not a single output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping on input will not only fail to deal with the problems of more than
one output, it will actually make your data &lt;strong&gt;incorrect&lt;/strong&gt; for many outputs.&lt;/p&gt;
&lt;p&gt;Suppose you decide to do HTML escaping, and someone enters &lt;em&gt;Jack &amp;amp; Jill&lt;/em&gt; as a
title for something. Your escape-on-input turns this to &lt;em&gt;Jack &amp;amp;amp; Jill&lt;/em&gt; and
that goes in the DB. Suppose you want to email people and put this title in
the subject line. You now have to apply the reverse transformation to get a
sensible subject line in the email, and you have to &lt;strong&gt;remember&lt;/strong&gt; to do this
for every output that is not HTML.&lt;/p&gt;
&lt;p&gt;Sometimes, the bug is &lt;a class="reference external" href="http://instagram.com/p/SVfQruppEE/"&gt;significantly more annoying&lt;/a&gt; than an email with an incorrect title:&lt;/p&gt;
&lt;img alt="Sweater with HTML escaping incorrectly applied to text" class="align-center" src="https://lukeplant.me.uk/blogmedia/dan_leedham_technical_and_online.jpeg"&gt;
&lt;p&gt;One ruined sweatshirt, however, is tame compared to the hassle many people
suffer due to &lt;a class="reference external" href="https://www.wsj.com/articles/internet-mangles-names-accents-web-forms-11664462695"&gt;having a name that a computer won’t accept or mangles&lt;/a&gt;.
Looking through that article, it’s clear that often the software is escaping
on input, resulting in escaped versions being stored in the database (e.g. a
woman with an apostrophe in her name is recorded as “Leah D&amp;amp;#38;andrea”),
which then causes no end of problems.&lt;/p&gt;
&lt;p&gt;You also have daft bugs like the fact that doing a search on that field for
the string ‘amp’ (or ‘quot’, ‘apos’, ‘lt’, ‘gt’ etc. or any substrings) will get
various false matches.&lt;/p&gt;
&lt;p&gt;I have seen some people respond to this by saying “it's better to have the
occasional double-encoding bug or incorrect query result than an XSS exploit”.
Well, first, that depends on your business. XSS is a problem because it costs
time and money, and so does corrupting your data. Many people have data that
actually matters, and corrupt data is a big deal, and much harder to cope with
than an XSS bug, because data lives on and on. If we took just the example
above of storing people’s names incorrectly, the grief caused by
escape-on-input is massive.&lt;/p&gt;
&lt;p&gt;Second, this decision affects &lt;strong&gt;frameworks&lt;/strong&gt; that are used to handle data of
&lt;strong&gt;all kinds&lt;/strong&gt;, and the decision affects the entire code base of your
application and beyond, as described below. Data-handling frameworks that work
on the assumption that your data is not important are insanity. &lt;a class="reference external" href="http://www.biblegateway.com/passage/?search=Psalm%2011:3&amp;amp;version=KJV"&gt;If the
foundations be destroyed, what can the righteous do?&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Third, it's entirely unnecessary. XSS is not hard to fix given decent
programming tools.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At what point does data ‘enter’ your system?&lt;/p&gt;
&lt;p&gt;It might sound like a simple question, but it's tricky in reality, and I'll
illustrate using an HTTP request.&lt;/p&gt;
&lt;p&gt;In most web apps, the GET and POST parameters are your ‘raw input’. However,
using most normal web framework APIs, data in GET and POST parameters has
already been interpreted. The ‘raw’ data is really the bytes that make up the
HTTP request, which typically will use URL encoding for GET query parameters
and a choice of encodings for POST data (URL encoding or MIME multipart
attachment format).&lt;/p&gt;
&lt;p&gt;The framework may also do another level of decoding – interpreting the
series of bytes as a series of unicode code points.&lt;/p&gt;
&lt;p&gt;Both parts of this initial transformation makes sense and are appropriate,
because they are reversing the encoding already applied to the data by the
protocol involved. The web browser takes the data you type in – unicode code
points – and applies a series of transformations to it, according to the HTTP
protocol, and your web framework reverses these to get the data back.&lt;/p&gt;
&lt;p&gt;Now, if you want to avoid XSS problems, you have to apply the escaping
&lt;strong&gt;after&lt;/strong&gt; this initial decoding has been done. But this highlights another
possibility. What if the data requires &lt;em&gt;further&lt;/em&gt; decoding before you get the
‘real’ raw data? For example, some data might be sent base64 encoded for a
variety of reasons, or any other type of encoding.&lt;/p&gt;
&lt;p&gt;This extra level of encoding gives two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;your automatic HTML escaping may have corrupted the encoded data so that it
now cannot be decoded. For example, you had a GET parameter that held a URL,
which itself had parameters in the query string:&lt;/p&gt;
&lt;pre class="literal-block"&gt;GET /foo?bar=1&amp;amp;url=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26y%3D2 HTTP/1.1&lt;/pre&gt;
&lt;p&gt;Your framework's HTTP handling will produce a query dictionary that looks
something like the following:&lt;/p&gt;
&lt;pre class="literal-block"&gt;{"bar": 1,
 "url": "http://example.com/?x=1&amp;amp;y=2"
 }&lt;/pre&gt;
&lt;p&gt;But your automatic escaping turns that into:&lt;/p&gt;
&lt;pre class="literal-block"&gt;{"bar": 1,
 "url": "http://example.com/?x=1&amp;amp;amp;y=2"
 }&lt;/pre&gt;
&lt;p&gt;If you want to extract the &lt;code class="docutils literal"&gt;y&lt;/code&gt; parameter from &lt;code class="docutils literal"&gt;url&lt;/code&gt;, you are stuck. You
can't correctly interpret the data in the &lt;code class="docutils literal"&gt;url&lt;/code&gt; parameter, because it has
been corrupted. You're going to have to unescape the input, and you might
not even notice this problem.&lt;/p&gt;
&lt;p&gt;A better example might be handling the ’Referer’ header. (Which you have
presumably applied the same HTML encoding to, right? If you did, you have
this problem, if you didn't, you have to remember to do it manually, which
is a potential XSS vulnerability).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even if the data comes through your automatic escaping unscathed
(e.g. base64 under HTML escaping), or you can undo the corruption and get it
properly decoded, after decoding you will have to &lt;strong&gt;manually apply&lt;/strong&gt; HTML
escaping to make it match all the other automatically escaped data. If you
don't, you've potentially got a bug and an XSS exploit.&lt;/p&gt;
&lt;p&gt;So your automatic escape-on-input has &lt;strong&gt;missed&lt;/strong&gt; data, and this happens
because you can't really define the point at which the data has ‘entered’
your system and needs the escaping applied.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This problem means that the escape-on-input approach is inherently flawed and
&lt;strong&gt;cannot&lt;/strong&gt; be fixed. &lt;strong&gt;You just have to patch it up on a case-by-case basis,
which is exactly what escape-on-input is supposed to avoid.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;And then, what about other sources of data – data on the file system, in a
cache etc. Are these entry points? Well, it depends on how the data was put
there. You have to manually follow this all the way through your app; get it
wrong and you've got double escaping bugs or security flaws.&lt;/p&gt;
&lt;p&gt;(By contrast, escape on output always works, because you apply it at the point
where you know it is needed – in the backend that knows the escaping rules.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other systems putting data into your database, or getting data out, have to
abide by your data transformation rules.&lt;/p&gt;
&lt;p&gt;These systems might have nothing to do with your primary domain (e.g. a web
site). Making them understand and obey rules that have nothing to do with the
data itself is insanity and extremely short sighted.&lt;/p&gt;
&lt;p&gt;You can't deal with this problem when you come to it, because you don't have
to just fix your code, you've got to fix all your data too, and by the time
you cross this bridge you might have a lot of data and might need a very
delicate database migration to get it right. The data may even have escaped
your control (e.g. been copied into other systems), or backwards compatibility
concerns might stop you from making the change you need to make.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Within your main application, the decision to escape on input affects your
whole code base.&lt;/p&gt;
&lt;p&gt;If you want to use any libraries, you need to make sure that they are using
all the same assumptions that you have in your main code base.&lt;/p&gt;
&lt;p&gt;For example, if you've got a form/widget library in your web app, it will very
often need to echo user input back to them in the case of a form that has
validation errors. This library has to know if you already escaped the input.&lt;/p&gt;
&lt;p&gt;Writing the library to work in two modes is asking for trouble. Rather, you
need it to have been written from the beginning to assume the same escaping
rules.&lt;/p&gt;
&lt;p&gt;This kills code re-use – you can only use code that assumes the same input
escaping – or it means that you will end up with tons of bugs due to
incompatibilities between the assumptions made in your application code and
the library.&lt;/p&gt;
&lt;p&gt;Essentially, this is the problem of a global configuration setting, but worse
since it affects the &lt;em&gt;operand&lt;/em&gt; of your entire application (the data going
through it), not just the functionality of various &lt;em&gt;operators&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Another example might be a cron job that sends out emails, using data from the
database. If the data comes from a web form that applied "escape on input" to
avoid XSS, then the code will need to apply HTML unescaping - despite the fact
that this script has absolutely nothing to do with the web (it reads a
database and sends plain text emails).&lt;/p&gt;
&lt;p&gt;Effectively, &lt;strong&gt;this means that the XSS solution, far from being a solution
applied at a single point, is in fact spread out over the entire code base,&lt;/strong&gt;
as it includes every time that pre-escaped data has to be un-escaped.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The confusion caused by the above is likely to &lt;em&gt;increase&lt;/em&gt; security
problems. “Keep It Simple, Stupid” remains a very good maxim for developers.&lt;/p&gt;
&lt;p&gt;To continue an example used above: you want to send an email with some data
that has already been HTML escaped, and so you need to unescape the data to
avoid emails with the subject “Jack &amp;amp;amp; Jill” when the user entered “Jack &amp;amp;
Jill”. You decide it's not sensible for the mail sending functions to do this
internally, (or maybe they're provided by a third party who made that decision
for you), so the calling code does the unescaping.&lt;/p&gt;
&lt;p&gt;You later decide to switch to HTML emails, and the developer who implements it
thinks that since data is already escaped, there is no problem including it
without extra escaping in the body of the HTML email, leading to a
vulnerability (not classic XSS in this case, but still a problem).&lt;/p&gt;
&lt;p&gt;There is also the example I gave above where an extra layer of
encoding/decoding in the raw data makes it likely you'll forget to apply the
escaping.&lt;/p&gt;
&lt;p&gt;The confusion caused by escape-on-input means your entire code base becomes a
potential source not only of double-escaping bugs but of security problems as
well.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="in-practice"&gt;
&lt;h2&gt;In practice&lt;/h2&gt;
&lt;p&gt;Thankfully, we don't just have to rely on the above analysis to conclude that
escape-on-input is a terrible idea. PHP, always willing to help when it comes to
“examples of how not to do it”, provides us with a perfect case study.&lt;/p&gt;
&lt;section id="magic-quotes"&gt;
&lt;h3&gt;Magic quotes&lt;/h3&gt;
&lt;p&gt;PHP used to have a feature called magic quotes. It was an escape-on-input
feature that escaped single quotes (&lt;strong&gt;'&lt;/strong&gt;) with backslashes. This was to protect
you from SQL injection attacks, by making the data safe for interpolation into a
SQL query.&lt;/p&gt;
&lt;p&gt;This caused all kinds of problems.&lt;/p&gt;
&lt;p&gt;First, if you are not first passing something through a database, and using
string interpolation to build up SQL queries, you have to remember to strip
those slashes using the function &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you don't, you get double encoding. It looks like \\'this\\', you\\'ve
almost certainly seen it across the web, though it seems we\\'re thankfully
past the worst of it.&lt;/p&gt;
&lt;p&gt;Second, even if you remember, you've added some hideous cruft to your code. In
the bit of code which is handling form validation (and is therefore echoing user
input back to the user without the database being involved), you've got these
bizarre &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt; calls. What on earth does ‘reverse transforming a
string for SQL statement preparation’ have to do with the task of input
validation?&lt;/p&gt;
&lt;p&gt;Third, it turns out that different databases need different escaping mechanisms
to do things fully correctly. So you now have to do &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt; on data
even if you are passing it to a database using string-interpolated queries!&lt;/p&gt;
&lt;p&gt;Then, since the above problems are common (building up SQL queries by string
interpolation was always a bad idea, and very often you pass on the data to
outputs that don't want SQL escaping at all), it's desirable to have a way to
turn this behaviour off completely.&lt;/p&gt;
&lt;p&gt;To handle this, there is a php.ini setting to turn it on/off.&lt;/p&gt;
&lt;p&gt;And there were more complications, for example:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;do you apply magic quotes to ‘all input’ (&lt;code class="docutils literal"&gt;magic_quotes_runtime&lt;/code&gt;) or just to
GET/POST/COOKIE data (&lt;code class="docutils literal"&gt;magic_quotes_gpc&lt;/code&gt;)? (This is the problem of defining
what exactly is ‘input’)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;attempts to fix some of the above with yet more configuration options like
&lt;code class="docutils literal"&gt;magic_quotes_sybase&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And so now you've got even more problems. Since these are global settings, you
can't have library code mess with them, since other code might set the global to
a different value or assume a certain value.&lt;/p&gt;
&lt;p&gt;You could try making all code detect the current setting and have different code
paths depending on the result. This works very badly – having multiple code
paths is a recipe for code duplication and bug proliferation. It's extremely
easy to forget to do it, or get one of the paths wrong, since you will likely
only test one configuration value and one set of code paths in reality.&lt;/p&gt;
&lt;p&gt;Alternatively, you can make one bit of code responsible for fixing the setting
to a sensible value (the only one being 'off'), and then make all code assume
that from then on. (If you can't turn it off, you can use the code included
&lt;a class="reference external" href="http://www.php.net/manual/en/security.magicquotes.disabling.php"&gt;here&lt;/a&gt; as a
horrible kludge to reverse its behaviour).&lt;/p&gt;
&lt;p&gt;Eventually, this final approach was the one taken by all significant
projects. &lt;strong&gt;Turn the whole feature off, and assume it is off from then
on&lt;/strong&gt;. (Which means the feature is useless, of course).&lt;/p&gt;
&lt;p&gt;And of course, thankfully, the PHP developers realised that this entire thing
was a &lt;strong&gt;huge mistake&lt;/strong&gt; that caused nothing but a vast amount of confusion and
bugs, and &lt;strong&gt;removed the whole thing&lt;/strong&gt; for good in PHP 5.4.&lt;/p&gt;
&lt;p&gt;Magic quotes, &lt;a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/"&gt;as eevee put it&lt;/a&gt;, were “so
close to secure-by-default, and yet so far from understanding the concept at
all.”&lt;/p&gt;
&lt;p&gt;To digress for a moment: we keep getting told that PHP is improving, and the
community has learnt from its mistakes. Unfortunately it seems the leaders in
the community are bent on &lt;strong&gt;recreating&lt;/strong&gt; old mistakes.&lt;/p&gt;
&lt;p&gt;According to Lerdorf, the much newer PHP filter extension is &lt;a class="reference external" href="http://grokbase.com/t/php/php-internals/08373a1vvf/short-open-tag/083qakz7wj#20080323qvterw1df6a006qxyg83z9qsb8"&gt;“magic_quotes
done right”&lt;/a&gt;. But
it still suffers from almost all the problems described here, for all the
reasons described. Global HTML escaping on input is essentially the same as
magic quotes, and just as tragically bad.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="elgg"&gt;
&lt;h3&gt;Elgg&lt;/h3&gt;
&lt;p&gt;In researching for this post, I came across &lt;a class="reference external" href="http://trac.elgg.org/ticket/561"&gt;this ticket for Elgg&lt;/a&gt;, an open source social networking engine.
Just read through the ticket and see the mess they are in. It's clear they
strongly regret the decision they made to escape-on-input, and, in their own
words, have created “horrendous” problems for themselves, especially as their
application has grown to include other interfaces such as JSON REST APIs.&lt;/p&gt;
&lt;p&gt;However, fixing it is very hard. They have to coordinate many changes across
their code base with a big database migration. If data has leaked from the
databases and tables they control into other systems, such as denormalised
tables, other databases, caches etc., or if there is other code by third parties
that makes the old assumptions about encoded data, they are in even more of a
pickle. And both of those things are probably inevitable in something like an
open source framework, which is designed for other people to build on and
extend.&lt;/p&gt;
&lt;p&gt;This is the pain that comes from mixing input handling and output encoding,
and from corrupting the data in your database.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="etsy"&gt;
&lt;h3&gt;Etsy&lt;/h3&gt;
&lt;p&gt;According to &lt;a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security"&gt;their security presentation&lt;/a&gt;,
Etsy are using escape-on-input for XSS protection.&lt;/p&gt;
&lt;p&gt;They claim that this is a much more secure option, as it is secure by
default. (They do note, however, the problem with input that is encoded in some
other way, like base64, so they are aware of the problems.)&lt;/p&gt;
&lt;p&gt;Their presentation goes on to describe an elaborate system for detecting and
fixing XSS attacks (the slides don't give enough detail for me to understand
what exactly they are doing, but it's clearly a lot of work).&lt;/p&gt;
&lt;p&gt;And &lt;a class="reference external" href="http://www.nzinfosec.com/etsy-has-been-one-of-the-best-companies-ive-reported-holes-to/"&gt;their system does indeed catch XSS bugs in the wild and allow them to fix
them within hours&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Wait, what?&lt;/p&gt;
&lt;p&gt;They've corrupted their database by doing escape-on-input, they've inflicted
themselves with all the development pain described above, and they've &lt;strong&gt;still&lt;/strong&gt;
got XSS bugs?&lt;/p&gt;
&lt;p&gt;Granted, they've got impressive ways of dealing with these problems. But it's
like &lt;a class="reference external" href="http://xkcd.com/463/"&gt;virus checkers on voting machines&lt;/a&gt;. Advanced ways
of dealing with problems that shouldn't even be possible tells you that you are
doing it wrong. They've become very fast at &lt;a class="reference external" href="http://www.red-sweater.com/blog/125/easy-programming"&gt;re-tying their shoelaces, instead
of working out how to tie shoelaces so they don't come undone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;They claim that with escape-on-input, XSS problems are now greppable, but it
doesn't sound like it. If they were, code audits would be a massively more
efficient way to find XSS problems than the methods they are using.&lt;/p&gt;
&lt;p&gt;The main problem is almost certainly that they are using an output system for
HTML that doesn't do HTML escaping by default (I'm guessing they are using PHP
as their template language). If the backend that deals with HTML &lt;strong&gt;actually
deals with HTML&lt;/strong&gt; then you eliminate the vast majority of these problems
overnight.&lt;/p&gt;
&lt;p&gt;I'm willing to bet that large sites that use Django (or other frameworks that
have basically solved the XSS problem by HTML escaping on output &lt;strong&gt;by default&lt;/strong&gt;)
don't have teams and automated systems dedicated to this problem, and don't need
them. In Django apps, XSS problems &lt;strong&gt;are&lt;/strong&gt; greppable - you grep for
&lt;code class="docutils literal"&gt;mark_safe&lt;/code&gt; in Python and the &lt;code class="docutils literal"&gt;|safe&lt;/code&gt; filter in templates (and then,
obviously, you may have to recursively grep for any functions that call
&lt;code class="docutils literal"&gt;mark_safe&lt;/code&gt; on inputs). Since all data which isn't &lt;code class="docutils literal"&gt;mark_safe()&lt;/code&gt; gets escaped
by the templating engine, and all HTML comes out of the template engine, that's
basically all you need to do.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="now-for-some-flame-bait"&gt;
&lt;h2&gt;Now for some flame bait&lt;/h2&gt;
&lt;p&gt;How did this happen to Etsy?&lt;/p&gt;
&lt;p&gt;Are the Etsy devs stupid? I suspect not. Etsy is clearly doing well, and I
imagine they have enough money to hire top-notch developers. Some of their
&lt;a class="reference external" href="http://www.etsy.com/careers/job_description.php?job_id=ozhhVfwM"&gt;careers pages&lt;/a&gt; show they
are happy using a variety of languages and technologies, and their &lt;a class="reference external" href="http://codeascraft.etsy.com/"&gt;engineering
blog&lt;/a&gt; seems to be sane and competent. Even their
security presentation showed considerable ingenuity and technical ability in
dealing with security problems (in entirely the wrong way, unfortunately, but
still impressive).&lt;/p&gt;
&lt;p&gt;I doubt they are low quality developers. Rather, I suspect that use of PHP has
addled their brains. They have become far too accustomed to working in an
environment in which insanity reigns – an environment in which &lt;a class="reference external" href="https://lukeplant.me.uk/blogmedia/php_less_than.txt"&gt;the less than
operator pretends to work correctly with strings but it's just a trap&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When I programmed in a Windows environment, I theorised that use of Windows
itself contributed to the poor quality of the programming in the code base, and
the fact that developers thought nothing or writing tons of tedious
code. Because Windows was so unscriptable, I imagined that Windows programmers
developed a high tolerance for tedium and repetition, which is exactly the
opposite of qualities needed by a programmer to make a computer do everything
efficiently and reliably. (Since then, I've found that Sturgeon's law was
probably a better explanation for the quality of the code, but I still think the
fundamental idea applies).&lt;/p&gt;
&lt;p&gt;With PHP, the fact that it comes with a template language that is simply not fit
for purpose – because it doesn't do HTML escaping by default, or even easily —
has somehow made the Etsy developers believe that it is normal to struggle with
XSS, that it is perfectly reasonable that even after taking the drastic action
of corrupting their entire database by HTML escaping it, they should &lt;strong&gt;still&lt;/strong&gt;
need elaborate XSS-catching systems.&lt;/p&gt;
&lt;p&gt;Instead of &lt;a class="reference external" href="http://www.youtube.com/watch?v=5mdy8bFiyzY"&gt;trying&lt;/a&gt; to fix XSS,
they should just fix it. Like &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/templates/language/#automatic-html-escaping"&gt;this in Django&lt;/a&gt;. Or
&lt;a class="reference external" href="http://pypi.python.org/pypi/MarkupSafe/"&gt;this in Turbogears and Jinja&lt;/a&gt;. Or
&lt;a class="reference external" href="http://www.yesodweb.com/book/shakespearean-templates#shakespearean-templates_types"&gt;this in Yesod&lt;/a&gt;. Or even &lt;a class="reference external" href="http://twig.sensiolabs.org/doc/templates.html#html-escaping"&gt;this
in PHP&lt;/a&gt; (though
due to limitations of the language you won't be able to have the convenience of
things like &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe"&gt;mark_safe&lt;/a&gt;
in Django). But living with an environment of pain and madness makes you think
that it ought to be hard.&lt;/p&gt;
&lt;p&gt;Right the way up to Rasmus Lerdorf at the top, many people in the PHP community
live with the insanity of their tools, and add more insanity to cope with it,
rather than fix their tools or choose better ones.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="a-lesson-for-pythonistas"&gt;
&lt;h2&gt;A lesson for Pythonistas&lt;/h2&gt;
&lt;p&gt;Bashing other languages is fun, but when I do so I always try to get something
more valuable out of it by using the opportunity to examine myself. The problem
I discussed in the last section (which is just a manifestation of the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Broken_windows_theory"&gt;broken
windows theory&lt;/a&gt;) applies
to other communities, and I'll attempt to apply it to the Python community.&lt;/p&gt;
&lt;p&gt;Refusing to live with stupidity is one of the reasons that Python 3 is really
important.&lt;/p&gt;
&lt;p&gt;Python 3 does not represent a massive leap forward in terms of additions to the
language. Mainly it just fixes a bunch of mistakes in Python 2, and introduces a
whole lot of backwards incompatibilities in the process. One of the biggest is
unicode/bytes. Python 2 was stupid here – it went directly against the Zen of
Python, and said “in the face of ambiguity about what encoding to use, guess.”
This caused a world of pain.&lt;/p&gt;
&lt;p&gt;Now, you can work around it in most cases by some sensible conventions and a
certain amount of discipline. You can also cope with the fact the &lt;code class="docutils literal"&gt;"a" &amp;lt; 1&lt;/code&gt;
doesn't raise an exception. You can live with &lt;code class="docutils literal"&gt;next()&lt;/code&gt; being a method in the
iterator protocol, when it should be a method called &lt;code class="docutils literal"&gt;__next__()&lt;/code&gt; and a builtin
&lt;strong&gt;function&lt;/strong&gt; &lt;code class="docutils literal"&gt;next()&lt;/code&gt;. You can live with the fact that &lt;code class="docutils literal"&gt;print&lt;/code&gt; is a totally
unnecessary keyword, since it should just be a builtin function. You can get
used to the fact that &lt;cite&gt;class Foo:&lt;/cite&gt; means something subtly but significantly
different from &lt;cite&gt;class Foo(object):&lt;/cite&gt;. You can work around or ignore dozens of
other little niggles, gotchas and inconsistencies.&lt;/p&gt;
&lt;p&gt;But all the while, you are training yourself to tolerate stupidity,
inconsistency and brokenness. Removing these warts is really important, and
worth all the pain of the migration. The alternative is for Python to become the
next PHP.&lt;/p&gt;
&lt;p&gt;On top of these things, there are other types of brokenness in Python that
people in the community seem less willing to acknowledge or tackle. For some of
these I think we need exposure to completely different languages – languages
where you can spawn thousands of ‘threads’ easily and get performance benefits,
for example, or languages where you can write code that is both very high level
&lt;strong&gt;and&lt;/strong&gt; extremely fast. If we live entirely with Python and its set of
limitations, we'll think that those problems are normal and unavoidable.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Main updates:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;2012/08/07 - corrections about turning magic_quotes_gpc off at runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2012/10/08 - noted bug with queries returning false matches.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2014/05/05 - added info about different contexts in HTML&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;</content>
    <category term="django" label="Django"/>
    <category term="php" label="PHP"/>
    <category term="python" label="Python"/>
    <category term="security" label="Security"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>PHP, Python and Persuasion</title>
    <id>https://lukeplant.me.uk/blog/posts/php-python-and-persuasion/</id>
    <updated>2012-07-04T22:22:29+01:00</updated>
    <published>2012-07-04T22:22:29+01:00</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/php-python-and-persuasion/"/>
    <summary type="html">&lt;p&gt;Notes to self on how (not) to convince people.&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;I always find it fascinating to observe conversations in which people's
arguments fail to convince each other. A few days ago we witnessed some PHP
debates, kicked off by &lt;a class="reference external" href="http://www.codinghorror.com/blog/2012/06/the-php-singularity.html"&gt;Jeff Attwood&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I foolishly got slightly involved on one ‘&lt;a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html"&gt;rebuttal&lt;/a&gt;’. At
church last Sunday I also chatted with a friend who has used PHP and likes it,
and this time I tried to put myself in his shoes. It is often much more helpful
talking to people in the flesh, and I think it is always enlightening to look at
why we fail to convince.&lt;/p&gt;
&lt;p&gt;One reason we can't rule out is simple irrationality. All of us are vulnerable
to &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Confirmation_bias"&gt;confirmation bias&lt;/a&gt;, and
people will often go to great lengths to convince themselves that they are doing
the right thing and do not need to change their views or practices. You see this
all the time when two groups of people share experiences after having made
different decisions about how to spend some leisure time. Both groups are often
desperate to believe that they haven't missed out, and will seek to persuade
each other (but in reality, persuade themselves) that they were in the group
that had the most fun.&lt;/p&gt;
&lt;p&gt;However, just assuming the other person is being irrational doesn't really help
you, and can in fact hinder communication. Below I will attempt to be more
constructive in looking at the ways we can fail to convince people. I'll try not
to turn it into a rebuttal to the “PHP isn't so bad” posts!&lt;/p&gt;
&lt;p&gt;In his great rant &lt;a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/"&gt;PHP: a fractal of bad design&lt;/a&gt;, ‘Eevee’
has lots of great arguments against PHP, and there were others in different
posts. But there are reasons why some will fail to hit home. (This is not a
criticism, by the way – many of these problems are unavoidable if you are
addressing an audience as mixed as “all the PHP developers in the world”).&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Expert understanding.&lt;/p&gt;
&lt;p&gt;Eevee writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;empty($var) is so extremely not-a-function that anything but a variable,
e.g. empty($var || $var2), is a parse error. Why on Earth does the parser
need to know about empty?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To an experienced programmer, &lt;code class="docutils literal"&gt;empty()&lt;/code&gt; is rather surprising. If you know —
or at least have some idea – about how to implement a programming language,
you'll understand terminology like lexer, parser, interpreter etc. So when
you try &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;empty($var&lt;/span&gt; || $var2)&lt;/code&gt;, and it returns a &lt;strong&gt;parse&lt;/strong&gt; error, even
though it looks like a function, you think “this programming language must
have been designed by complete amateurs – I don't want anything to do with
it”.  [ &lt;em&gt;EDIT: changed this paragraph, previously I was getting mixed up
between isset() and empty()&lt;/em&gt; ]&lt;/p&gt;
&lt;p&gt;For a less experienced or able programmer, however, none of this is a problem.
Programming languages are entirely magic, and one type of magic is no more
surprising than another.&lt;/p&gt;
&lt;p&gt;Talking about what the parser is doing is completely incomprehensible to such a
developer. You cannot communicate the reaction you feel, because it requires a
deeper level of understanding of how things are supposed to work. Less able or
experienced developers are simply unable to assess the quality of the tools they
work with.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Craftsmanship&lt;/p&gt;
&lt;p&gt;For many coders, the &lt;strong&gt;only&lt;/strong&gt; thing that matters is whether PHP allows them to
get something done. The quality of the tools or materials being used does not
matter.&lt;/p&gt;
&lt;p&gt;Not only are many coders &lt;strong&gt;unable&lt;/strong&gt; to assess the quality of the tools they
using, they wouldn't care even if they could, because programming is simply
about getting something done. Any thought of taking pride in your work is
absent. Such a person will never be convinced by arguments that talk about the
quality of tools.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amateur vs professional&lt;/p&gt;
&lt;p&gt;Eevee accused the PHP world of being created by and filled with amateurs. I
suspect that to many PHP developers, that has the same effect as Java people
saying to Pythonistas that Python is not ‘enterprise ready’. Many Python
developers don't care about “the Enterprise” in the first place, and the word
may even have negative connotations – associations of massively over-engineered
and poorly written bloatware written in Java or C#, which could be replaced by a
Python project 10 times smaller.&lt;/p&gt;
&lt;p&gt;Many PHP users simply do not aspire to be professional, because they are happy to be
called amateurs – they &lt;strong&gt;are&lt;/strong&gt; amateurs, doing stuff just for fun, pure
hobbyists who are not making money out of what they are doing, nor relying on
PHP to behave in a sane manner with important data. PHP has brought joy to their
life. Does it matter to them that PHP doesn't live up to some standard they
don't need?&lt;/p&gt;
&lt;p&gt;Many developers simply do not care that much about the data that goes through
their website. “So what if PHP doesn't have any decent way of handling decimal
values?  It's close enough for my needs.” For these people not only is the
quality of the &lt;strong&gt;tool&lt;/strong&gt; unimportant, the quality of the &lt;strong&gt;result&lt;/strong&gt; is of
little consequence. No-one will die or sue them even if it has major bugs.&lt;/p&gt;
&lt;p&gt;Also, professionals often have (or feel) obligations to support the code they
write, whereas amateurs do not. Therefore amateurs will always trade
long-term maintainability for initial deployment speed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Defending your day job&lt;/p&gt;
&lt;p&gt;The previous argument doesn't cover a lot of users, however. Many are using PHP
‘professionally’, and even have customers who may demand that work is done with
PHP. (It is perhaps the essential problem of PHP that a language that was
designed to be a simple template language for non-programmers has turned into
the work-horse of the web, and the network effects caused by adoption amongst
amateurs have made it a language for professionals.)&lt;/p&gt;
&lt;p&gt;Now, most people want to feel good about the work they are doing. And most
people are not in a position to have much influence on whether or not they
use PHP. If you tell someone “the tool you are using is Bad For The World”,
they will get defensive, even if they would rather be using something else.&lt;/p&gt;
&lt;p&gt;I'm extremely fortunate in the work I do that I get to choose my projects
carefully, and then charge a high rate for work that I can take real pride
in. It's actually fairly unkind of me to berate people for using a language that
they didn't really choose – even if they think they are choosing it and will
defend it to the death. They are only doing so because the alternative is to say
“yes it sucks, and yes I'm doing the world a disservice by continuing to promote
it, but it pays the bills”. I've occasionally been in a similar situation in the
past, and know how demoralising and depressing it is, and it is more comforting
to rationalise your current situation.&lt;/p&gt;
&lt;p&gt;In fact, when I was last in a situation like this, I did come up with a whole
bunch of rationalisations, which I still think are valid to some degree.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Engineers must be pragmatists at some level. You're paid to achieve things,
not for the internal beauty of your code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I should take pride in customer satisfaction, because that is what matters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The language itself is not the only factor. You also must consider:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;development tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the availability and quality of libraries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;documentation and support for such libraries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;availability of people to maintain the software in the future.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these things depend on human factors and network effects, and can be
used to justify a choice on the basis that “lots of other people are choosing
this”.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(If you are a “PHP developer” reading this, I apologise for being patronising —
this post is not meant to insult, and is not aimed at you but at others).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Experience&lt;/p&gt;
&lt;p&gt;The vast majority of PHP developers that I've come across have not used any
other serious web frameworks. That is why they can say things like &lt;a class="reference external" href="http://fabien.potencier.org/article/64/php-is-much-better-than-what-you-think"&gt;“PHP is the
best web platform... ever.”&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To make such a statement, you need to have &lt;strong&gt;in-depth&lt;/strong&gt; knowledge of a &lt;strong&gt;large&lt;/strong&gt;
number of alternative web platforms – if not &lt;strong&gt;all&lt;/strong&gt; the web platforms in
existence. However, the author did not demonstrate &lt;strong&gt;any&lt;/strong&gt; knowledge of &lt;strong&gt;any&lt;/strong&gt;
alternatives. You are never going to persuade anyone that way. Rather, it will
lead people simply to dismiss your opinions entirely. Being ready to make
pronouncements on subjects for which you clearly don't have a fraction of the
required knowledge is a sign that nothing you say is trustworthy.&lt;/p&gt;
&lt;p&gt;However, the problem goes both ways. Eevee's rant, and Jeff's, both contain
statements that are to some degree out of date (or appear out of date to someone
on the other side who knows the 'standard solution' for the problems
highlighted). The reason for this is obvious. If you feel passionately about how
bad something is, you will stop using it, and your level of knowledge of the
technology will go down. This does lead to a bit of a problem. Those who are
unwilling to work with technology X because of how bad it is can always be
dismissed as being ignorant of it.&lt;/p&gt;
&lt;p&gt;A facet of this problem is that it is extremely easy to &lt;strong&gt;underestimate&lt;/strong&gt;
something that you don't have knowledge of – for instance, you can simply
underestimate the &lt;strong&gt;size&lt;/strong&gt; of a competing community.&lt;/p&gt;
&lt;p&gt;I'll admit I was surprised to learn that PHP has a package manager with &lt;a class="reference external" href="http://packagist.org/statistics"&gt;1900
packages in its main repository&lt;/a&gt;. However,
the &lt;a class="reference external" href="http://fabien.potencier.org/article/64/php-is-much-better-than-what-you-think"&gt;author&lt;/a&gt;
who pointed that out might be more surprised to know some of the following
figures:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;PyPI has 20,000 packages&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;over 2,400 are Django related&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;over 2,000 are Plone related&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;over 800 are Zope3 related&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;over 800 are Zope2 related&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;over 130 are Turbogears related&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Perl&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;CPAN has:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;107,764 Perl modules&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;9,827 authors&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's add &lt;strong&gt;Haskell&lt;/strong&gt;, as an example of a minority language if ever there was
one, and not the first language you'd think of for building web sites:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Hackage has 5376 packages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;350 in the 'Web' category&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;55 for ‘Yesod’ (one web framework)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'm sure other languages can boast similar or much better figures. I don't even
know where to look with Java – it's quite possibly so large that the idea of a
central repository doesn't even make sense.&lt;/p&gt;
&lt;p&gt;PHP, despite having huge numbers of developers, looks rather small in
comparison. Now, all of these statistics are flawed in a variety of ways, PHP's
included, and the bigger a community is, the more they will be flawed – for
example, PyPI download stats will often be way out because people are using
mirrors etc – but this doesn't affect the point I'm making.&lt;/p&gt;
&lt;p&gt;The point is that all the communities are large, and these figures are just the
tip of the iceberg in terms of how much is going on in each and every
community. And from within a community, you can see some big figures and think
“well I doubt anyone could seriously be competing with &lt;strong&gt;that&lt;/strong&gt;!”. This is true
no matter what community you belong to. And it makes it difficult to communicate
in a meaningful way. Very few can honestly say that they've evaluated the
alternatives in a fair way, because the alternatives are so huge.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I'm not sure I have a conclusion actually. These are just some of the pitfalls
we face in communicating across different technology communities. I often forget
these things, especially when communicating on the internet, and this post is
here mainly to help me remember!&lt;/p&gt;
&lt;p&gt;Because of the broad spectrum of PHP users, I do think that arguments against
PHP are going to have to be more &lt;strong&gt;individual&lt;/strong&gt; to be effective.&lt;/p&gt;
&lt;p&gt;For example, suppose you have a customer that wants PHP for a website that deals
with money in some way e.g. a shop. I might attempt to shock them by using
&lt;a class="reference external" href="http://www.phpsh.org/"&gt;phpsh&lt;/a&gt; to demonstrate &lt;a class="reference external" href="http://marc.info/?l=php-internals&amp;amp;m=109057070829170"&gt;PHP's inability to get some
basic arithmetic questions correct&lt;/a&gt;. I would explain that
all languages have problems with floating point arithmetic, which is why other
languages have good solutions to this problem e.g. &lt;a class="reference external" href="http://docs.python.org/library/decimal.html"&gt;Python's decimal module&lt;/a&gt;, which makes it easy to get
things right.  PHP has nothing approaching this (and yes I know about BCMath and
GNU Multiple Precision bindings). Rather, PHP's fundamental attitude is to 1)
silently ignore errors, 2) attempt to paper over errors and 3) silently convert
your data even if that means losing information. Is this the language you want to
handle your data?&lt;/p&gt;
&lt;p&gt;For a different situation, I think you'll need a completely different
argument. And for many situations, don't even try if you think your words are
going to come over as insulting, since that will be counter-productive. That
often applies to the internet, and I need to remember that!&lt;/p&gt;
&lt;/section&gt;</content>
    <category term="django" label="Django"/>
    <category term="php" label="PHP"/>
    <category term="python" label="Python"/>
  </entry>
  <entry>
    <title>PHP View-Controller</title>
    <id>https://lukeplant.me.uk/blog/posts/php-view-controller/</id>
    <updated>2005-02-16T22:45:51Z</updated>
    <published>2005-02-16T22:45:51Z</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/php-view-controller/"/>
    <summary type="html">&lt;p&gt;Description of my PHP based model view controller implementation.&lt;/p&gt;</summary>
    <content type="html">&lt;blockquote&gt;
&lt;p&gt;2017 update: This is a really old post. I no longer think that PHP is
appropriate for web development, especially plain PHP templates because they
do not auto-escape by default.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My new blog software uses a PHP template engine for its presentation layer.
Because the templates are just plain PHP, you can put any logic in them that you
like. The point here is that &lt;strong&gt;presentation requires logic&lt;/strong&gt;, and if you limit
the logic that your presentation layer can do, you have to make up that deficit
somewhere else. For example, the template engine can use functions like PHP's
&lt;a class="reference external" href="http://uk.php.net/manual/en/function.date.php"&gt;date()&lt;/a&gt; function to format
dates that are to be displayed. If you didn't use PHP for the template engine,
you would either have to pre-format the dates before passing them to the
template, or implement some method in your own template engine for providing
this functionality, which in both cases adds to code bloat and in the former
makes your code much less clean.&lt;/p&gt;
&lt;p&gt;A template engine which is just PHP makes it possible to implement 'Model View
Controller', or something like it. In this case, the 'View' is your template.
The template is 90% HTML, and has PHP embedded in it whenever required. As
described in the article about the &lt;a class="reference external" href="http://www.massassi.com/php/articles/template_engines/"&gt;template engine&lt;/a&gt; I'm using, the syntax
for doing this is pretty light (and not much more complex than most other
template engines).&lt;/p&gt;
&lt;p&gt;I've implemented a simple controller system that ties in with the templates, and
here is a brief description of how it works. The controller is the object that
handles the incoming web requests. In my implementation, you can have
sub-controllers to which the request can be handed off if needed - this works
very well with the template system, because templates can be very easily nested.&lt;/p&gt;
&lt;p&gt;The controller is responsible for checking any input and deciding what to do.
It will usually work in conjunction with one main template, and it does two
things for the template:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;It has to get any data from the database that the template needs. This is
passed to the template with a call like &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;$template-&amp;gt;set('bloglist',&lt;/span&gt;
$list);&lt;/code&gt;, and since the template can do advanced formatting, the controller
passes the data fairly 'raw'. In my code, this happens in the function
&lt;code class="docutils literal"&gt;handleRequest()&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It provides functions that the template will need to generate its output. For
example, the 'bloglist' template needs to be able to generate permalinks, but
the knowledge of how to do this lies with the controller. So the controller
has a function &lt;code class="docutils literal"&gt;permalink()&lt;/code&gt; which accepts a post ID as a parameter, and
returns a URL.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;How exactly the controller and template are wired together, and by whom, is
flexible - for example a controller could be passed a template to attach to, or
could choose the template for itself. The strategy will often depend on whether
the controller is the main one for the page, or a sub-controller. If the
template needs 'callback' functions from a controller, then at some point
&lt;code class="docutils literal"&gt;setController()&lt;/code&gt; needs to be called on the template, passing a reference to
the controller.&lt;/p&gt;
&lt;p&gt;After all the controllers are done with processing, the &lt;code class="docutils literal"&gt;fetch()&lt;/code&gt; function on
the main template is called. The first thing the template does is call
&lt;code class="docutils literal"&gt;fetch()&lt;/code&gt; on any embedded template variables, replacing the nested template
with its string output. It then executes the template file (which is just a case
of &lt;code class="docutils literal"&gt;include()&lt;/code&gt; ing it), and returns the output from this as a string. The
string (hopefully now a complete web page or RSS feed) is then usually echoed
back to the web browser.&lt;/p&gt;
&lt;p&gt;A diagram is shown below, that shows the case with just a single controller and
template. Note that the instantiation start lines are dotted because which is
actually instantiated first is flexible. Also, whether the main controller calls
the &lt;code class="docutils literal"&gt;fetch()&lt;/code&gt; method on its template, or whether that's called from outside,
is flexible too so that's also represented as 'fuzzy'. With nested templates
and/or controllers, the basic flow is just the same, and the first &lt;code class="docutils literal"&gt;fetch()&lt;/code&gt;
is done when all the controllers are done with their work.&lt;/p&gt;
&lt;img alt="PHP View controller diagram" class="align-center" src="https://lukeplant.me.uk/blogmedia/phpviewcontroller.png"&gt;
&lt;p&gt;In some cases, whether the controller should 'push' data to the template or the
template should get it from the controller using functions is debatable too -
you have to use your judgement.&lt;/p&gt;
&lt;p&gt;The output of all this is a string, which gives you amazing flexibility. It
means that you can use one of these templates in any other PHP page which isn't
based on this template system - just echo the output of &lt;code class="docutils literal"&gt;fetch()&lt;/code&gt; at the
appropriate place. You could also save the output to disk, or do string
replacements, or HTML validation, or e-mail it to the user as well etc. etc.&lt;/p&gt;
&lt;p&gt;One of the things I've stressed here is the flexibility of the system, and the
low cost barrier - an include that is only 40 lines of PHP, and uses normal PHP
for templating, isn't going to break the performance bank, nor does it force you
into a certain way of working. Also, because (with the normal way of using them)
nothing is output until everything is calculated, the controllers can change
their mind at any point and, for example, redirect you to a different page, or
send cookies and headers at any time.&lt;/p&gt;
&lt;p&gt;Another advantage of this method is simplicity - the lifecycle of a web request
is in just two distinct phases - process input, create ouput - and following it
through is very easy. These phases correspond to the application logic and
presentation logic parts of the process respectively, and the code for these two
is separated physically as well. Compare this to the &lt;a class="reference external" href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconControlExecutionLifecycle.asp"&gt;Control Execution
Lifecycle&lt;/a&gt;
in ASP.NET that you have to understand to develop anything complex. The 'event'
driven model might seem to have some advantages, but it brings a multitude of
entry points to your code, and working out what order events happen it etc.
becomes a total nightmare, and brings all the disadvantages and complexities of
Windows like programming into what is essentially a very simple process.&lt;/p&gt;
&lt;p&gt;Hopefully that helps to understand the basic system. I've more examples to bring
of how this works in practice, and some other techniques you can easily add
(such as simple form persistance).&lt;/p&gt;
&lt;p&gt;In conclusion, I believe this is a great way to develop web pages. The only time
you might have difficulties is if you have a team of web designers who really
don't want to see a jot of PHP. My templates freely mix PHP and HTML, provided
the PHP is for presentation logic only. This usually means it is usually pretty
simple PHP, and which bits are HTML and which are PHP is very obvious, but
sometimes that won't be enough to keep designers happy.&lt;/p&gt;</content>
    <category term="php" label="PHP"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>My blog code</title>
    <id>https://lukeplant.me.uk/blog/posts/my-blog-code/</id>
    <updated>2005-02-16T13:15:21Z</updated>
    <published>2005-02-16T13:15:21Z</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/my-blog-code/"/>
    <summary type="html">&lt;p&gt;Source code for my new blog is now available&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;If anyone's interested, you can get the &lt;a class="reference external" href="https://lukeplant.me.uk/downloads/lukesblogcode-2005-02-15.tar.gz"&gt;code for my blog&lt;/a&gt; here. I've put it together pretty
quickly - I've just packaged everything from my own directories that you need
for it to work without error, plus a README. If you actually wanted to use it
you'd need to replace at least the main page template and customise the others
and the CSS. You could remove some things completely (like my blogroll and
related functions).&lt;/p&gt;
&lt;p&gt;It's licensed under the MIT license, so you can do pretty much anything with it.
As it is, it might be useful for anyone with some PHP knowledge who wants a
simple but powerful blog. It's got admin functions for managing posts, comments
and categories (quite basic looking at the moment, but they do the job). The
design is fairly clean so adding features to it should be pretty easy.&lt;/p&gt;
&lt;p&gt;I'll try and add some more code comments (which are rather lacking at the
moment), and I'll also be doing a few quick articles on the methods I've used,
which are a more important product of what I've learnt than the blog software
itself.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Update: See &lt;a class="reference external" href="https://lukeplant.me.uk/blog/posts/my-blog-code/websitecode.html"&gt;Website code&lt;/a&gt; for a new version of the code.&lt;/p&gt;</content>
    <category term="php" label="PHP"/>
    <category term="software-development" label="Software development"/>
    <category term="software-projects" label="Software projects"/>
  </entry>
  <entry>
    <title>New blog system</title>
    <id>https://lukeplant.me.uk/blog/posts/new-blog-system/</id>
    <updated>2005-02-13T00:14:42Z</updated>
    <published>2005-02-13T00:14:42Z</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/new-blog-system/"/>
    <summary type="html">&lt;p&gt;I finished my new blog software, back to actually writing some blog entries...&lt;/p&gt;</summary>
    <content type="html">&lt;blockquote&gt;
&lt;p&gt;2017 update: This is a really old post. This blog no longer uses the PHP
code described, and I no longer think that PHP is appropriate for web
development, especially plain PHP templates because they do not auto-escape
by default.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've managed to complete my new blogging software, and I'm rather pleased
with myself about it too.&lt;/p&gt;
&lt;p&gt;The web pages may look fairly similar, but under the hood it's completely brand
new. One new feature is a much better categories system - have a look at the
side bar - and a proper template engine, which is reflected in the fact that the
title of the page changes on the pages for individual items, amongst other
things. Also, for those of you using the RSS feed, have a look at my &lt;a class="reference external" href="https://lukeplant.me.uk/feeds.php"&gt;feeds&lt;/a&gt; page where you can customise your feed to posts that are of
interest to you. The comment forms now also have a 'Preview' button.&lt;/p&gt;
&lt;p&gt;The lack of filtering by category was one thing that was holding me back a bit -
I'm aware that posting about software and computers half the time is going to
make this fairly boring to most of the Christians who might read this blog. But
now you can filter that out yourself!&lt;/p&gt;
&lt;p&gt;I was pleased with how quickly I was able to develop this lot, and with the
quality of the resulting code – it took me about 5 evenings, and then about
half of today to add basic admin functions.&lt;/p&gt;
&lt;p&gt;My method is based around the &lt;a class="reference external" href="http://www.massassi.com/php/articles/template_engines/"&gt;excellent PHP Template Engine&lt;/a&gt; I mentioned
before. Did I mentioned that it is the &lt;a class="reference external" href="http://www.massassi.com/php/articles/template_engines/"&gt;best PHP Template Engine there is&lt;/a&gt;? It's &lt;a class="reference external" href="http://www.massassi.com/php/articles/template_engines/"&gt;really
great&lt;/a&gt; (that's for
Google). In total I've got around 1600 lines of code, but that includes the
template engine (which is only about 40 lines), admin functions for adding,
listing, deleting and editing posts, a very simple but powerful flatfile
database class I had to write at the same time, functions for getting cached
remote files (used for my blogrolls), the RSS feed, the RSS feed picker, and
a trackback server (pinched mainly from WordPress), as well as the front end
stuff. I've also implemented a kind of MVC (&lt;a class="reference external" href="http://c2.com/cgi/wiki?ModelViewController"&gt;Model View Controller&lt;/a&gt;) system, and I have a complete
data access layer (though one which is as simple as it can be - due to PHP
data structures my data model is reduced to a list of constants). I've loved
how flexible the template engine stuff is too - you can use it for as much or
as little of any page as you want.&lt;/p&gt;
&lt;p&gt;Lots of things came together to make this project first very quick, and secondly
something I'm proud of (can you tell?!): First was the template engine, and
realising that the goal is separation of presentation and 'business' logic, NOT
separation of PHP and HTML (or separation of declarative and imperative code, to
put it more generically); Second, keeping in mind lots of the &lt;a class="reference external" href="http://directory.google.com/Top/Computers/Programming/Methodologies/Object-Oriented/Criticism/"&gt;criticisms of OOP&lt;/a&gt; I've been reading, especially &lt;a class="reference external" href="http://www.cs.loyola.edu/~binkley/772/articles/oopbad.htm"&gt;Object Oriented
Programming Oversold&lt;/a&gt; and related
articles; third, and ironically, being more aware of MVC and other design
patterns; fourth, having a better grasp of how to design databases. The
resulting code tries to use a variety of paradigms &lt;strong&gt;as and when they are
useful&lt;/strong&gt;. It also uses some techniques and ideas I've not seen before, which at
first seemed like a bit of a hack, but after reading some more of the stuff
about &lt;a class="reference external" href="http://www.geocities.com/tablizer/top.htm"&gt;Table Oriented Programming&lt;/a&gt;
I've now decided are a deliberate move!&lt;/p&gt;
&lt;p&gt;The code still isn't perfect (or complete - I haven't don't bulk moderation of
comments yet, or admin front end for adding categories), and it hasn't totally
eliminated in all areas the tedious mapping of database fields to user interface
fields, which was something I was aiming for. There are also things I'm still
working out and experimenting with, but I've got a pretty good basis for further
PHP apps.&lt;/p&gt;
&lt;p&gt;One nice thing I've implemented is separation of the presentation and control of
forms such as the comments form. This will enable me to make it fairly spam
proof in the future, which is fast becoming a priority.&lt;/p&gt;
&lt;p&gt;I'll be writing up separate entries on the range of things I've learnt with this
project - it has been frustrating not having the software complete enough to use
it for blogging for this past week!&lt;/p&gt;</content>
    <category term="blogging-and-bloggers" label="Blogging and bloggers"/>
    <category term="lukeplantmeuk" label="lukeplant.me.uk"/>
    <category term="php" label="PHP"/>
    <category term="software-development" label="Software development"/>
    <category term="web-development" label="Web development"/>
  </entry>
</feed>
