<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="https://lukeplant.me.uk/assets/xml/atom.xsl" type="text/xsl media="all"?>
<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom">
  <title>Luke Plant's home page (Posts about Security)</title>
  <id>https://lukeplant.me.uk/blog/categories/security.xml</id>
  <updated>2025-11-11T13:59:27Z</updated>
  <author>
    <name>Luke Plant</name>
  </author>
  <link rel="self" type="application/atom+xml" href="https://lukeplant.me.uk/blog/categories/security.xml"/>
  <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/categories/security/"/>
  <generator uri="https://getnikola.com/">Nikola</generator>
  <entry>
    <title>6 digit OTP for Two Factor Auth (2FA) is brute-forceable in 3 days</title>
    <id>https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/</id>
    <updated>2019-05-11T11:50:00+01:00</updated>
    <published>2019-05-11T11:50:00+01:00</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/"/>
    <summary type="html">&lt;p&gt;OTP/TOTP for two factor auth (2FA/MFA) is very easy to misunderstand and implement insecurely&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;It is common these days to use “TOTP” as an additional factor in 2FA (Two Factor
Auth) / &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Multi-factor_authentication"&gt;multi-factor auth&lt;/a&gt;. If you have used
&lt;a class="reference external" href="https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&amp;amp;hl=en_us"&gt;Google Authenticator&lt;/a&gt;
to log in to a site (you can do this with GitHub, for example), then you have
used it, and many other apps and sites use the same scheme, and some SMS based
2FA systems may be based on the same concept. TOTP stands for Time-Based
One-Time Password, and is specified in &lt;a class="reference external" href="https://tools.ietf.org/html/rfc6238"&gt;RFC 6238&lt;/a&gt;. It is based on HOTP, &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4226"&gt;HMAC-Based
One-Time Password&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What the RFC for TOTP does not mention at all, and the RFC for HTOP &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4226#section-7.3"&gt;mentions
with very little detail&lt;/a&gt;, is
that &lt;strong&gt;the security of these methods depends critically on how you throttle
request attempts and/or lock user accounts for repeated failures.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some systems already have adequate throttling/locking in place, but some
certainly don't, and this post is aimed at the latter. Getting the throttling
right can be quite tricky too.&lt;/p&gt;
&lt;p&gt;(I should mention that this post is not really original. The insight here I got
from &lt;a class="reference external" href="https://sakurity.com/blog/2015/07/18/2fa.html"&gt;Why You don't Need 2 Factor Authentication&lt;/a&gt;, I am just presenting part of
that page in a more detailed way, doing the maths for you, and discussing the
consequences, without necessarily agreeing with the conclusions of that page).&lt;/p&gt;
&lt;p&gt;To put it simply, with conservative assumptions and common defaults, without
account locking (or something similar) an attacker can brute-force a TOTP
password &lt;strong&gt;in just 3 days&lt;/strong&gt;. In fact quite a bit faster is often possible.&lt;/p&gt;
&lt;p&gt;The attack scenario here is that you have set up 2FA using Google Authenticator
(or similar), and an attacker already has your username and password. After
getting past the username/password dance they are presented with a screen asking
for an OTP.&lt;/p&gt;
&lt;p&gt;(If you had set up SMS instead, you will at least get an unexpected text that
will alert you that someone has your password, but not with Google
Authenticator.)&lt;/p&gt;
&lt;p&gt;The whole point of 2FA is that it is supposed to stop an attacker from getting
any further. For a high value account, a motivated attacker &lt;a class="reference external" href="https://www.kennethreitz.org/essays/on-cybersecurity-and-being-targeted"&gt;can and will&lt;/a&gt;
continue at this point. (And if you don't consider your accounts high value, why
are you bothering with 2FA?).&lt;/p&gt;
&lt;p&gt;Now the attacker has to try to guess your OTP. How likely is that to succeed?
Well, Google Authenticator provides a 6 digit code, giving one million
possibilities, and it has a 30 second timeout. Let's assume the attacker can
make 10 requests per second. (This is completely reasonable in many scenarios,
and significantly higher might be possible). Since we don't have time to try all
the possibilities, the chance of success is (30 × 10)/1000000 = 0.0003 = 0.03%,
which seems pretty good. Right? Wrong.&lt;/p&gt;
&lt;p&gt;We must remember that an attacker does not need to have 100% guarantee of
success to attempt something. An attacker will try it if they think they have a
'good' chance of success. Let's assume that is 90%.&lt;/p&gt;
&lt;p&gt;Without a timeout, the time to get to 90% chance of success is 0.9 × 1000000 /
10 requests/second = 90000 seconds = 1.04 days.&lt;/p&gt;
&lt;p&gt;Now we add a timeout, say an hour, or 90 seconds, or whatever. What happens
when the first code times out? According to the TOTP scheme, &lt;strong&gt;you can just
try the next one&lt;/strong&gt;. The timeout therefore stops an attacker from being able to
try all the possibilities, and rules out a 100% effective attack. But they don't
care about that, they just care about having a &lt;em&gt;good&lt;/em&gt; chance of success.&lt;/p&gt;
&lt;p&gt;Guessing randomly is pretty much our best strategy now. The probabilities look
like this:&lt;/p&gt;
&lt;div class="math"&gt;
\begin{equation*}
chanceOfSuccess = 1 - chanceOfFailure
\end{equation*}
&lt;/div&gt;
&lt;div class="math"&gt;
\begin{equation*}
chanceOfFailure = {chanceOfFailingOnce}^{numberOfAttempts}
\end{equation*}
&lt;/div&gt;
&lt;p&gt;This last step is a critical part – if you succeed once, you succeed, so you
have to fail &lt;strong&gt;every time&lt;/strong&gt; to fail overall. The chance of failing N times is
the chance of failing the first time, times the chance of failing the second
time.... etc. times the chance of failing the Nth time.&lt;/p&gt;
&lt;div class="math"&gt;
\begin{equation*}
chanceOfFailingOnce = 1 - \frac{1}{numberOfPossibilities}
\end{equation*}
&lt;/div&gt;
&lt;div class="math"&gt;
\begin{equation*}
numberOfAttempts = timeInSeconds \times requestsPerSecond
\end{equation*}
&lt;/div&gt;
&lt;p&gt;Re-arranging for &lt;span class="math"&gt;\(timeInSeconds\)&lt;/span&gt; and substituting:&lt;/p&gt;
&lt;div class="math"&gt;
\begin{equation*}
timeInSeconds = \frac{ln(1 - chanceOfSuccess)}{ln(1 - \frac{1}{numberOfPossibilities}) \times requestsPerSecond}
\end{equation*}
&lt;/div&gt;
&lt;p&gt;Putting in &lt;span class="math"&gt;\(chanceOfSuccess = 0.9\)&lt;/span&gt;, &lt;span class="math"&gt;\(numberOfPossibilities =
1000000\)&lt;/span&gt; (6 digit code) and 10 requests per second we get 230258 seconds or 2.67
days. (To check, put that number of seconds back into the formulas and you'll
see the probability of success does come out at 90%).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note 1&lt;/strong&gt;: &lt;strong&gt;The timeout does not appear in that formula!&lt;/strong&gt;
&lt;a class="brackets" href="https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/#timeoutinformulae" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; Reducing your timeout could make a big difference to
usability, but makes &lt;strong&gt;zero difference to security&lt;/strong&gt;. That may be
counter-intuitive, but consider:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;Reducing the timeout from infinity to a few minutes only increased the attack
time from 1 day to 2.67 days (aiming for 90% success rate). Clearly the
timeout isn't that critical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Say you are thinking of a number between 1 and 10,000 and give me one
thousand attempts to guess it. To make it harder, you change the number every
100 guesses. Now, to make it harder still, you are thinking of changing it
every 50 guesses. Would that work? Well, in the first case I get 100 guesses
at 10 different numbers, in the second I get 50 guesses at 200 different
numbers, but that makes no (practical) difference – I get the same number of
guesses, all of them unlikely whether I guess randomly or in sequence, and I
only have to guess correctly once to succeed. Mathematically, it boils down
to the fact that &lt;span class="math"&gt;\({(x^a)}^b = x^{ab}\)&lt;/span&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Security is often counter-intuitive, and some security policies can often be
nothing more than &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Security_theater"&gt;security theatre&lt;/a&gt;. Timeouts are a common target
for “tightening” measures, because they seem to be easily understandable by the
lay-person.&lt;/p&gt;
&lt;p&gt;A while back there was a Django ticket filed that &lt;a class="reference external" href="https://code.djangoproject.com/ticket/28622"&gt;asked for the ability to
reduce the password reset timeout to less than 1 day&lt;/a&gt;, because “In many applications a
day is far too long and doesn't meet security requirements”. I explained that
due to the way our password reset is implemented (very differently from
HOTP/TOTP), changing the timeout makes precisely &lt;a class="reference external" href="https://groups.google.com/forum/#!msg/django-developers/65iOQunvkPY/pP5mF-44AQAJ"&gt;zero difference&lt;/a&gt;
to the ability of an attacker to brute force, and with no timeout at all, or
throttling, the mechanism is many millions of times stronger than many of the
mechanisms that do indeed need timeouts for security. But I don't think anyone
believed me, a short timeout seems better.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note 2&lt;/strong&gt;: 3 days is not very long, and entirely feasible for many attackers.
If you don't have mitigation measures in place, your 2FA is broken.&lt;/p&gt;
&lt;p&gt;In fact, TOTP also has a tolerance factor to allow for delay in transmission,
that allows &lt;code class="docutils literal"&gt;n&lt;/code&gt; previous tokens to be used, with a recommend default of &lt;code class="docutils literal"&gt;n ==
1&lt;/code&gt;. This effectively doubles your request rate (you are guessing two numbers at
once, either will count), reducing the time required to &lt;strong&gt;less than 36 hours&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If you go for even odds (50%) rather than 90%, it comes down even further. Using
an out-of-the-box installation of &lt;a class="reference external" href="https://github.com/Bouke/django-two-factor-auth"&gt;django-two-factor-auth&lt;/a&gt;, which builds on &lt;a class="reference external" href="https://github.com/django-otp/django-otp/"&gt;django-otp&lt;/a&gt;, on my development machine I was
able to get 20 requests/sec for the 2FA handler without trying hard. I set up a
Google Authenticator device for an account and achieved a brute-force in &lt;strong&gt;under
8 hours&lt;/strong&gt;. An attacker could start after you went to bed and might be done by
the time you were out of the shower.&lt;/p&gt;
&lt;p&gt;One mitigation technique is attempt throttling. However, this has got to be done
carefully. It clearly can't be done globally for the 2FA handlers or you could
easily &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Denial-of-service_attack"&gt;DOS&lt;/a&gt; yourself.
It has to be done carefully on a per-user basis, which takes up storage and
would be easy to get wrong. Doing it by IP address also doesn't work well —
attackers can easily hire large number of IP addresses these days. Even if we
reduced the number of attempts per second by a factor of 10, that attack time
would only go up to 30 days, which would still be worthwhile for some attackers.&lt;/p&gt;
&lt;p&gt;An alternative is account locking after a number of failures, which is much
better. However it also brings problems. It means that your 2FA must only be
accessible for people who already have passed one level of security, otherwise
you have a denial of service vulnerability. Plus you need all the account
unlocking procedures etc, and need to make sure they are secure, and not
actually effectively another attack route.&lt;/p&gt;
&lt;p&gt;Another option is to use some kind of back-off for failure attempts, which is
what the HOTP RFC recommends briefly. For example you could use exponential
back-off – you add enforced delays for attempting a token check, requiring the
user to wait 1, 2, 4, 8, 16s etc. after each successive failure. This has a
number of advantages:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;it doesn't slow down a legitimate user who just mis-typed once or twice&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;it requires very little storage - just a single database field per user.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;it is highly effective in terms of throttling, to the point of being a kind of
account locking &lt;a class="brackets" href="https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/#backoffeffectiveness" id="footnote-reference-2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;accompanied with appropriate error messages (“You've been temporarily blocked
due to X successive failures”) it could alert the real account owner to the
presence of an attacker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the soft “account locking” automatically expires, rather than requiring manual
intervention of any kind, so that you don't get DOS for the case of
unresponsive support staff.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(For the curious/worried regarding django-two-factor-auth/django-otp, a few
weeks ago I implemented exactly this for the HOTP and TOTP backends in
django-otp, and the fix is availabile in version &lt;a class="reference external" href="https://pypi.org/project/django-otp/"&gt;0.6.0&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;SMS-based 2FA may or may not be better than TOTP, depending on how they are
implemented, throttled etc. – some SMS systems just use TOTP and use an SMS
message to send the current token, in which case they are equivalent.&lt;/p&gt;
&lt;p&gt;SMS does have the advantage that at least the genuine account holder will
probably realise that something is going on (although that isn't how the
security of 2FA is supposed to work primarily). However, SMS does also have &lt;a class="reference external" href="https://authy.com/blog/security-of-sms-for-2fa-what-are-your-options/"&gt;a
ton of problems&lt;/a&gt;, so
maybe it's not any better overall.&lt;/p&gt;
&lt;p&gt;Increasing the length of the token &lt;strong&gt;does&lt;/strong&gt; help, as does increasing the
alphabet of characters used (although apparently that may have usability issues
on phones). Every factor of 10 in the number of possibilities for the token
results in a factor of 10 in the time required to brute force. But some apps
(e.g. Google Authenticator) only supports 6 digit tokens.&lt;/p&gt;
&lt;p&gt;Given the move towards 2FA, the disappointing thing is how little info there is
about this. I found &lt;a class="reference external" href="https://security.stackexchange.com/questions/145604/best-practices-for-handling-wrong-totp-tokens/145606"&gt;this stackexchange question&lt;/a&gt;
complete with some misguided answers, and some good advice, but little by way of
rigorous best practice.&lt;/p&gt;
&lt;p&gt;The RFC's have plenty of details about the crypto and algorithms, possibly
because those are pretty easy to define and implement in a small chunk of code,
but little on the other security critical requirements which are much harder to
pin down. This is a security problem in itself – programmers are attracted to
something like TOTP because it looks like a properly thought out, defined
solution, and the core of it is a nice programming exercise. You can get it
‘working’ relatively easily – but without the critical and more fiddly parts
that need to be in place.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;[Update 2019-06-01 - added footnote calculation showing effective rate for
exponential backoff throttling]&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="backoffeffectiveness" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/#footnote-reference-2"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;Exponential increases very rapidly. The limiting
factor for an attacker is how often the throttling can get reset by a
successful attempt. Let's assume everything is in the attacker's favour:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;The legitimate user does a successful 2FA login every day, at a predictable
time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The attacker times their guesses so that the backoff time expires just
before this point, so the legitimate user doesn't see an error message and
just enters the 2FA code successfully, resetting the backoff for the
attacker.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since we are starting with a 1 second delay and doubling for each failure,
this gives the attacker approx &lt;span class="math"&gt;\(log_2(secondsInADay)\)&lt;/span&gt; &lt;span class="math"&gt;\(= log_2(60 \times 60 \times 24)\)&lt;/span&gt;
&lt;span class="math"&gt;\(\approx  16\)&lt;/span&gt; attempts per day.&lt;/p&gt;
&lt;p&gt;This is an effective requests per second of &lt;span class="math"&gt;\(0.000185\)&lt;/span&gt;. At this rate,
using our formula above (and accounting for the TOTP tolerance factor which
doubles your effective rate, as above), it would require 59 years to get to
even odds of achieving a brute force on a 6 digit token, which is probably
OK.&lt;/p&gt;
&lt;p&gt;By contrast, if we go with straight “N requests per second” type rate
limiting, for the sake of usability for legitimate users you probably
wouldn't want to throttle more aggressively than 1 request every 10 seconds
per user i.e. 0.1 requests per second. In this scenario it takes just 40 days
to get to even odds of a brute force, which is certainly realistic for some
attackers.&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="timeoutinformulae" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/6-digit-otp-for-two-factor-auth-is-brute-forceable-in-3-days/#footnote-reference-1"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;The timeout would be important if we used the strategy
of checking every number in order, because in that case the probability for
each guess is no longer independent. The above analysis applies for random
guesses, but with a fairly low timeout, in which the attacker tries only a
fairly small fraction of the possible codes before the code changes, random
and sequential are effectively the same.&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;</content>
    <category term="django" label="Django"/>
    <category term="security" label="Security"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>A simple password-less, email-only login system</title>
    <id>https://lukeplant.me.uk/blog/posts/a-simple-passwordless-email-only-login-system/</id>
    <updated>2016-07-17T16:52:10+01:00</updated>
    <published>2016-07-17T16:52:10+01:00</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/a-simple-passwordless-email-only-login-system/"/>
    <summary type="html">&lt;p&gt;A simple password-less login system to consider for some use cases, with Django code.&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;This post is about a simple password-less login system I created for one web
site which can be useful in some use cases. I’ll describe the basic process, the
rationale, and the advantages and disadvantages of the system. Then I’ll outline
some implementation considerations, and link to my source code which implements
it.&lt;/p&gt;
&lt;section id="outline"&gt;
&lt;h2&gt;Outline&lt;/h2&gt;
&lt;p&gt;The authentication system is simply this:&lt;/p&gt;
&lt;p&gt;To log in, a user enters their email address. The web site sends them an email
containing a unique link which will directly log them in to the site. There is
no option to use a password. If they have used the site in the past with the
same email address, upon logging in they will be using the same account as
before, otherwise a new account will be created. Every time the user wants to
log in, they must go through the same process (so usually you will make the
login session last a significant period of time). For this reason the method is
particularly suited to sites where people do not log in very often.&lt;/p&gt;
&lt;p&gt;This is &lt;a class="reference external" href="https://medium.com/@lucas_gonze/history-of-email-only-auth-6b33b0065f74#.wvk4gmgaz"&gt;not a new idea&lt;/a&gt;,
although I don't think I was conscious of other implementations when I created
mine. In this post I'm presenting my rationale for it, listing some advantages
that I haven't seen elsewhere, and other implementation pointers.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="rationale"&gt;
&lt;h2&gt;Rationale&lt;/h2&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;Many systems really need a working email address, because you need to be able
to contact users. In this case you have to do some kind of email verification
step at some point anyway (some systems do it at the beginning of the
process, others try to fit it in somewhere else and nag users until they have
done it). If you fail to have email verification, then people can easily get
locked out of the site because password reset usually relies on sending an
email, and you don’t have contact details when you need them.&lt;/p&gt;
&lt;p&gt;With this system, email verification and login are combined.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In terms of security from the user’s point of view, no-one can hack their
account by guessing their password, because they don’t have a password. They
can hack it only by gaining access to their email, but given the password
reset mechanism most sites have, this is no different to normal. We’ve simply
eliminated one source of getting hacked.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the site implementation, not having a password to store is even better —
there is no way you can mess up password hashing and storage, no possibility
of a password database being stolen, because you simply do not have
passwords.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not having a password to enter the first time reduces friction for most
users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In terms of user experience when coming back to a site, many people end up
doing something similar to the above process anyway, because they forget
their passwords. This is especially true for sites that people are not going
to use very often – for example, a booking process for a conference that
might happen once a year. In this case, people either:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;choose weak passwords that they can remember easily, which is bad for security,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;re-use a password so they can remember it easily, again bad for security, or,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;forget their password.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, with a password reset, the process is much, much worse:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;First they have to remember if they signed up for the site in the past, to
work out if they should “log in” or “create account”.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then they have to make several attempts at remembering their password.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then they’ve got to use the password reset feature (hopefully it isn’t
hidden, but I’ve seen users struggle with this when literally the only
things on the page were the login form and a “Forgotten your password?”
link).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They then have to check their email and click the link.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Now they have to negotiate a new password form, possibly including
a strength monitor that won’t allow them to choose a weak password.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Having finally set a new password, they now have to navigate to the login
form again (because sites very rarely integrate password reset with log
in, also usually for some good reasons), and re-type their email address
(often for the third time by now), and their password (again, typically for
the third time, not including all the failed password attempts).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By removing the password entirely, most of these steps are eliminated. Steps
1, 2 and 3 are replaced by a single method for logging in – “Enter your email
address”. Step 4 is the same, steps 5 and 6 are eliminated.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are some additional advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;By doing email verification every time, we ensure that we still have a working
email address. If we use some email/username + password combination for login,
we have to add some kind of regular “Is this still your email address?”
feature, or find ourselves unable to contact our users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For any prompting or promotional emails that we send to a user, we can log
them straight away in using this mechanism. As already discussed, this is not
a reduction in security in the typical case. If we implement the system using
a query string parameter containing a token and a generic middleware that
checks the token, we can use this system on any page on the site with no extra
work.&lt;/p&gt;
&lt;p&gt;So, for example, if we send an email asking for payment, the link can take
them straight to the payment page, already logged in. This is the ideal
situation, and we can do it with the tiniest amount of work (adding a query
parameter to a URL in an email), because we can just re-use the existing login
mechanism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are significant improvements for privacy concerns.&lt;/p&gt;
&lt;p&gt;A typical email + password login system has some problems when it comes to
privacy, because it is often possible for an attacker to determine that a
certain person has an account with a web site. This can be often be done from several
pages on the site:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;The account creation form&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The log in form&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The password reset form&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And it can be done in a number of ways:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;p&gt;By looking at the different error/validation messages that are returned by
these pages, for the cases of existing or non-existing accounts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even if the messages returned are identical, by doing &lt;a class="reference external" href="http://cwe.mitre.org/data/definitions/208.html"&gt;timing attacks&lt;/a&gt; on the pages.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Fixing method 1 often results in UX problems – e.g. if a user doesn’t have an
account and is trying to log in, we can no longer tell a user that they don’t
have an account and need to create one, we can only tell them their
email/password combination is incorrect, and leave them to struggle. Similarly
with password reset. Our user encountering scenario 5 above now feels like
this:&lt;/p&gt;
&lt;img alt="Man getting very mad with computer." class="align-center" src="https://lukeplant.me.uk/blogmedia/computer_rage.gif"&gt;
&lt;p&gt;Fixing method 2 can be very hard. The use of strong password hashing makes a
timing attack on the login page trivial if no precautions are taken. &lt;a class="reference external" href="https://www.djangoproject.com/"&gt;Django&lt;/a&gt;, for instance, was vulnerable to this for a
long while. It now has &lt;a class="reference external" href="https://code.djangoproject.com/ticket/20760"&gt;rudimentary mitigation&lt;/a&gt;, which fixes trivial attacks,
but a complete fix is very hard. Making the code paths for “yes we found a
user record” and “no we didn’t” take exactly the same amount of time would be
very hard, and an attacker who was in the same data centre as your server
(where network transit noise is much reduced) would probably not have a hard
time doing a timing attack on the current code.&lt;/p&gt;
&lt;p&gt;However, with the system described in this post, these attacks, and the UX
problems, are all completely mitigated. We send the verification email whether
there is already an account or not, with exactly the same message (which
doesn’t confuse the user), without looking up the account in the database
first. We can check whether we need to create a new account or retrieve the
old one when the email has been verified, so there is no timing attack
possible on this part of the code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On a code level, the amount of code required for this is very small. Compared
to the typical alternative (email/username+password, all the forms to manage
passwords, password reset etc.), it is tiny. That alone gives big maintenance
and security advantages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="disadvantages"&gt;
&lt;h2&gt;Disadvantages&lt;/h2&gt;
&lt;p&gt;There are of course some disadvantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Not all users have secure email systems, and emails could be intercepted,
allowing an attacker to use someone else’s account. As noted already, you are
already living with the same issue if you have a password-reset-via-email-link
feature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Users have to go through the “check your email” cycle every time they log in.
For the kind of site that people are using daily, and if the login session is
configured to expire relatively quickly, this will be annoying. But for use
cases where users don’t visit the site often (e.g. occasional conference
booking), this won’t be a problem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If someone’s email address changes, this system has more problems, because in
essence it uses an email address as the primary key for the account. To deal
with this, you would need to store some other personal info or communication
mechanism that could be used to verify the person is the same person, and then
have some automatic or manual process for merging accounts etc.&lt;/p&gt;
&lt;p&gt;Alternatively, you can live with the fact that if their email address changes,
they no longer have access to their old account. For the site I built (a
booking system for yearly summer camps), this has not been a problem – it just
means that people don’t have the shortcut of being able to re-use information
from previous years.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="implementation-issues"&gt;
&lt;h2&gt;Implementation issues&lt;/h2&gt;
&lt;p&gt;There are some implementation issues to be aware of, especially security related:&lt;/p&gt;
&lt;ol class="arabic"&gt;
&lt;li&gt;&lt;p&gt;You need a correct and secure way of creating the unique login links. They
need to contain some kind of token that verifies an email address, a token
which cannot be guessed by an attacker.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The login links should expire – so that a temporary breach of someone’s email
account, or accidentally sharing the link, doesn’t given an attacker login
access forever.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When comparing the token, you need to be aware of timing attacks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security tokens in URLs are a dangerous thing, as they can easily be pinched.
It can happen when a user copy-pastes or shares a URL, and it can happen if a
page links to has any third party resources, which will then be able to see
the URL (and the token) via the Referer header.&lt;/p&gt;
&lt;p&gt;Because of this, the token should be checked before any page is rendered, and
you should redirect immediately, either to a failure page if it doesn’t
match, or to a URL without the token. If you use a query string parameter for
the token, this is easy – for the success case you just return HTTP redirect
response to the same URL but without the token query parameter (and with a
login cookie attached to the response).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You should do case insensitive comparison on email address when looking for
an existing account – people don’t always type their email addresses with
the same case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As with all login mechanisms, you should give attention to providing a “log
this account out on every device that is logged in to it”. This is often
linked to a password change mechanism (which we don't have). It can require
additional work to ensure that we are removing a session, not just removing
the session cookie from browser.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In my implementation, I use Django’s &lt;a class="reference external" href="https://docs.djangoproject.com/en/1.8/topics/signing/#verifying-timestamped-values"&gt;TimestampSigner&lt;/a&gt;
to sign the email address. This takes care of &lt;strong&gt;1&lt;/strong&gt; (Django uses a HMAC on the
string), &lt;strong&gt;2&lt;/strong&gt; (you just pass the &lt;code class="docutils literal"&gt;max_age&lt;/code&gt; parameter to &lt;code class="docutils literal"&gt;unsign&lt;/code&gt;) and &lt;strong&gt;3&lt;/strong&gt;
(Django’s &lt;code class="docutils literal"&gt;Signer&lt;/code&gt; uses a &lt;code class="docutils literal"&gt;constant_time_compare&lt;/code&gt; function &lt;a class="reference external" href="https://docs.djangoproject.com/en/1.8/_modules/django/core/signing/#Signer"&gt;internally&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I then base64 encode the result to produce a tidy URL. This results in a longish
URL, but not too long to be impractical. I created a &lt;a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/313dcd1780a5a375d8a95d7dd7b8b2238e830c98/cciw/bookings/email.py#L31"&gt;small class&lt;/a&gt;
to wrap up the encoding and decoding.&lt;/p&gt;
&lt;p&gt;(An alternative would be to create a nonce and store it in a database on the
server, associated with the email address. The implementation above has the
advantage that it doesn't require server side resources, but the disadvantage of
requiring a longer URL).&lt;/p&gt;
&lt;p&gt;I do the checking in a &lt;a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/313dcd1780a5a375d8a95d7dd7b8b2238e830c98/cciw/bookings/middleware.py#L41"&gt;middleware&lt;/a&gt;,
including the redirect to handle item &lt;strong&gt;4&lt;/strong&gt; above. I currently use Django’s
signed cookies for implementing login. If a server side session was used, then
it would be easier to implement “log out from all devices”.&lt;/p&gt;
&lt;p&gt;I’m using a custom model for this account, which does not have a password field,
and I’m also using a normal &lt;code class="docutils literal"&gt;User&lt;/code&gt; model for other purposes, so it doesn’t
make sense for me to release this as a standalone Django authentication library.
But feel free to take the code and do so, or borrow in any other way.&lt;/p&gt;
&lt;p&gt;There are other variations on this that could be used, but I think the basic
pattern is very useful for some use cases, eliminating a lot of the user hassles
and programmer headaches often found with passwords.&lt;/p&gt;
&lt;p&gt;This is also not meant to be an alternative to things like OAuth. It is meant to
be an alternative to email+password logins. If OAuth is used as well (should you
venture down that &lt;a class="reference external" href="http://www.oauthsecurity.com/"&gt;somewhat&lt;/a&gt;  &lt;a class="reference external" href="http://oauthbible.com/"&gt;dubious&lt;/a&gt;  &lt;a class="reference external" href="https://library.launchkit.io/the-unexpected-costs-of-third-party-login-cda41c087653#.jfxe2sh2q"&gt;path&lt;/a&gt;,
), then enhancements are possible – for instance, for people who create accounts
via OAuth, there could the option to disable login by email link. This would
mitigate the risk of account takeover due to people with insecure email
providers.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Updates:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;2016-07-18: Note about alternatives like OAuth2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2016-07-18: Note about implementing “log out from all devices”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2016-07-19: Note about security of email services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2016-07-19: Paragraph about prior art&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(Thanks to reddit comments for the prompts that pointed some of these issues
out).&lt;/p&gt;
&lt;/section&gt;</content>
    <category term="django" label="Django"/>
    <category term="python" label="Python"/>
    <category term="security" label="Security"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>Why escape-on-input is a bad idea</title>
    <id>https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/</id>
    <updated>2012-08-06T20:59:01+01:00</updated>
    <published>2012-08-06T20:59:01+01:00</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/"/>
    <summary type="html">&lt;p&gt;With examples from the web development world especially PHP, and lessons for Pythonistas&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;The right way to handle issues with untrusted data is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Filter on input, escape on output&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means that you validate or limit data that comes in (filter), but only
transform (escape or encode) it at the point you are sending it as output to
another system that requires the encoding. It has been standard best practice
since just about forever &lt;sup&gt;[citation required]&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;An alternative is “escape on input”: at the point that data enters your system,
you apply a transformation to it to avoid a problem further down the line when
the data is used.&lt;/p&gt;
&lt;p&gt;It's come to my attention that some serious web developers (or at least, they
take themselves seriously and are taken seriously by others) are &lt;strong&gt;still&lt;/strong&gt;
suggesting the practice of escape-on-input.&lt;/p&gt;
&lt;p&gt;For example, with escape-on-input, to avoid XSS any data that enters your system
has HTML escaping applied to it &lt;em&gt;immediately&lt;/em&gt;, before your application code
touches it.&lt;/p&gt;
&lt;p&gt;I chose that example deliberately, because people are actually recommending it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html#comment-572962448"&gt;in some recent “PHP sucks” debate&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;which, in turn, linked to a &lt;a class="reference external" href="http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html"&gt;page by Rasmus Lerdorf recommending
escape-on-input as a sensible way to deal with XSS&lt;/a&gt;.
The page, admittedly, is describing a ‘toy’, a ‘no-framework PHP framework’,
yet he does seem to be serious about the usefulness of escape-on-input.&lt;/p&gt;
&lt;p&gt;The page is from 2006, and uses the pecl/filter extension, but the extension
has since made it into core (PHP 5.2), and the &lt;a class="reference external" href="http://www.php.net/manual/en/filter.configuration.php"&gt;docs for it&lt;/a&gt; suggest a
configuration that is clearly intended for XSS prevention. As recently as
2008, and probably to this day, Lerdorf is &lt;a class="reference external" href="http://grokbase.com/p/php/php-internals/083qakz7wj/php-dev-short-open-tag"&gt;still defending and recommending
this approach&lt;/a&gt;,
and it appears to be part of his reason for thinking that PHP templating
doesn't need an autoescape mechanism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Just as significantly, &lt;a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security"&gt;Etsy are using and recommending escape-on-input&lt;/a&gt;
(slide 18 onward). As a very successful modern company using PHP, people will
look up to them and copy them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, this approach, unfortunately, is popular amongst some, and I can't find a
decent post explaining why it's such a terrible idea both in theory and
practice. Here is my attempt. It should be applicable to almost any system and
any language, although I'll mainly be using examples from web development.&lt;/p&gt;
&lt;section id="in-theory"&gt;
&lt;h2&gt;In theory&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;First of all, escape-on-input &lt;strong&gt;is just wrong&lt;/strong&gt; – you've taken some input and
applied some transformation that is totally irrelevant to that data. If,
taking our example, you have some data collected by HTTP POST or GET
parameters, applying HTML escaping to it is a layering violation – it mixes an
output formatting concern into input handling. Layering violations make your
code much harder to understand and maintain, because you have to take into
account other layers instead of letting each component and layer do its own
job.&lt;/p&gt;
&lt;p&gt;Doing things ‘right’ is very important, even if doing them ‘wrong’ seems to
work and you are tempted to be dismissive of ‘theoretical’ concerns about
purity etc. When you have to maintain code, you will be very glad if things
are in the right place, and not full of hacks and surprises.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You have corrupted your data by default. The system (or the most convenient
API) is now lying about what data has come in. As you have applied a
transformation to the &lt;strong&gt;data itself&lt;/strong&gt;, the layering violation is not an
isolated problem in one part of the code, but infects every part of your code,
especially if you store the corrupted data in a database.&lt;/p&gt;
&lt;p&gt;Your data is &lt;strong&gt;everything&lt;/strong&gt;. As I read recently, &lt;a class="reference external" href="http://blog.datamarket.com/2012/07/08/the-11-best-data-quotes/"&gt;“data matures like wine,
applications like fish”&lt;/a&gt;. You can
always rewrite your application, but if you corrupt your data, you've done the
worst thing you can to your system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is exacerbated by the fact that many encodings are one-way – you cannot
losslessly or unambiguously convert them back. If at a later point you need
the original data, you might be in a pickle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping your data for one output backend will only deal with &lt;strong&gt;that&lt;/strong&gt;
output. A typical web app might deal with at least the following backends,
which have different characters that are dangerous, and have different
requirements for dealing with them:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;HTML: ' &amp;lt; &amp;gt; " &amp;amp;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;URLs: / : &amp;amp; ? # text starting 'javascript:'&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Javascript: " '&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SMTP and HTTP: ; : newlines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQL: '&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JSON: "&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Shell - space, quotes and various other characters&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Any number of others could be added, and all could have security
implications. Using escape-on-input will only fix one of these - apart from
happy coincidences where it might fix more than one. Security should not rely
on happy coincidences, and for the other outputs you will still need a
sensible solution to the problem. Why not have a sensible solution for all of
them?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping for one output may not deal with even that single output correctly,
because escaping can be &lt;strong&gt;context dependent&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Various outputs can be embedded in others, and they have &lt;strong&gt;different escaping
rules&lt;/strong&gt;. So, you can embed URLs in HTML. And URLs in CSS. And CSS in HTML. And
Javascript in URLs. And Javascript in HTML...&lt;/p&gt;
&lt;p&gt;If you prepared something for HTML, did you prepare it for HTML element body
context, or HTML attribute context, or URLs in HTML attributes, or CSS in
HTML? Or URLs in CSS in HTML? If someone passes in a value for a URL which is
then used in an &lt;code class="docutils literal"&gt;href&lt;/code&gt; attribute in HTML, HTML escaping of &lt;strong&gt;&amp;lt; &amp;gt; &amp;amp; ; " '&lt;/strong&gt; won't
protect adequately you from XSS. Interactions between CSS/Javascript parsers
and HTML parsers make things &lt;a class="reference external" href="https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet"&gt;even more complex&lt;/a&gt;.
So “escape at the beginning and then forget about it” does not work even for
the single output of ‘HTML’, because it is not a single output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Escaping on input will not only fail to deal with the problems of more than
one output, it will actually make your data &lt;strong&gt;incorrect&lt;/strong&gt; for many outputs.&lt;/p&gt;
&lt;p&gt;Suppose you decide to do HTML escaping, and someone enters &lt;em&gt;Jack &amp;amp; Jill&lt;/em&gt; as a
title for something. Your escape-on-input turns this to &lt;em&gt;Jack &amp;amp;amp; Jill&lt;/em&gt; and
that goes in the DB. Suppose you want to email people and put this title in
the subject line. You now have to apply the reverse transformation to get a
sensible subject line in the email, and you have to &lt;strong&gt;remember&lt;/strong&gt; to do this
for every output that is not HTML.&lt;/p&gt;
&lt;p&gt;Sometimes, the bug is &lt;a class="reference external" href="http://instagram.com/p/SVfQruppEE/"&gt;significantly more annoying&lt;/a&gt; than an email with an incorrect title:&lt;/p&gt;
&lt;img alt="Sweater with HTML escaping incorrectly applied to text" class="align-center" src="https://lukeplant.me.uk/blogmedia/dan_leedham_technical_and_online.jpeg"&gt;
&lt;p&gt;One ruined sweatshirt, however, is tame compared to the hassle many people
suffer due to &lt;a class="reference external" href="https://www.wsj.com/articles/internet-mangles-names-accents-web-forms-11664462695"&gt;having a name that a computer won’t accept or mangles&lt;/a&gt;.
Looking through that article, it’s clear that often the software is escaping
on input, resulting in escaped versions being stored in the database (e.g. a
woman with an apostrophe in her name is recorded as “Leah D&amp;amp;#38;andrea”),
which then causes no end of problems.&lt;/p&gt;
&lt;p&gt;You also have daft bugs like the fact that doing a search on that field for
the string ‘amp’ (or ‘quot’, ‘apos’, ‘lt’, ‘gt’ etc. or any substrings) will get
various false matches.&lt;/p&gt;
&lt;p&gt;I have seen some people respond to this by saying “it's better to have the
occasional double-encoding bug or incorrect query result than an XSS exploit”.
Well, first, that depends on your business. XSS is a problem because it costs
time and money, and so does corrupting your data. Many people have data that
actually matters, and corrupt data is a big deal, and much harder to cope with
than an XSS bug, because data lives on and on. If we took just the example
above of storing people’s names incorrectly, the grief caused by
escape-on-input is massive.&lt;/p&gt;
&lt;p&gt;Second, this decision affects &lt;strong&gt;frameworks&lt;/strong&gt; that are used to handle data of
&lt;strong&gt;all kinds&lt;/strong&gt;, and the decision affects the entire code base of your
application and beyond, as described below. Data-handling frameworks that work
on the assumption that your data is not important are insanity. &lt;a class="reference external" href="http://www.biblegateway.com/passage/?search=Psalm%2011:3&amp;amp;version=KJV"&gt;If the
foundations be destroyed, what can the righteous do?&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Third, it's entirely unnecessary. XSS is not hard to fix given decent
programming tools.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At what point does data ‘enter’ your system?&lt;/p&gt;
&lt;p&gt;It might sound like a simple question, but it's tricky in reality, and I'll
illustrate using an HTTP request.&lt;/p&gt;
&lt;p&gt;In most web apps, the GET and POST parameters are your ‘raw input’. However,
using most normal web framework APIs, data in GET and POST parameters has
already been interpreted. The ‘raw’ data is really the bytes that make up the
HTTP request, which typically will use URL encoding for GET query parameters
and a choice of encodings for POST data (URL encoding or MIME multipart
attachment format).&lt;/p&gt;
&lt;p&gt;The framework may also do another level of decoding – interpreting the
series of bytes as a series of unicode code points.&lt;/p&gt;
&lt;p&gt;Both parts of this initial transformation makes sense and are appropriate,
because they are reversing the encoding already applied to the data by the
protocol involved. The web browser takes the data you type in – unicode code
points – and applies a series of transformations to it, according to the HTTP
protocol, and your web framework reverses these to get the data back.&lt;/p&gt;
&lt;p&gt;Now, if you want to avoid XSS problems, you have to apply the escaping
&lt;strong&gt;after&lt;/strong&gt; this initial decoding has been done. But this highlights another
possibility. What if the data requires &lt;em&gt;further&lt;/em&gt; decoding before you get the
‘real’ raw data? For example, some data might be sent base64 encoded for a
variety of reasons, or any other type of encoding.&lt;/p&gt;
&lt;p&gt;This extra level of encoding gives two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;your automatic HTML escaping may have corrupted the encoded data so that it
now cannot be decoded. For example, you had a GET parameter that held a URL,
which itself had parameters in the query string:&lt;/p&gt;
&lt;pre class="literal-block"&gt;GET /foo?bar=1&amp;amp;url=http%3A%2F%2Fexample.com%2F%3Fx%3D1%26y%3D2 HTTP/1.1&lt;/pre&gt;
&lt;p&gt;Your framework's HTTP handling will produce a query dictionary that looks
something like the following:&lt;/p&gt;
&lt;pre class="literal-block"&gt;{"bar": 1,
 "url": "http://example.com/?x=1&amp;amp;y=2"
 }&lt;/pre&gt;
&lt;p&gt;But your automatic escaping turns that into:&lt;/p&gt;
&lt;pre class="literal-block"&gt;{"bar": 1,
 "url": "http://example.com/?x=1&amp;amp;amp;y=2"
 }&lt;/pre&gt;
&lt;p&gt;If you want to extract the &lt;code class="docutils literal"&gt;y&lt;/code&gt; parameter from &lt;code class="docutils literal"&gt;url&lt;/code&gt;, you are stuck. You
can't correctly interpret the data in the &lt;code class="docutils literal"&gt;url&lt;/code&gt; parameter, because it has
been corrupted. You're going to have to unescape the input, and you might
not even notice this problem.&lt;/p&gt;
&lt;p&gt;A better example might be handling the ’Referer’ header. (Which you have
presumably applied the same HTML encoding to, right? If you did, you have
this problem, if you didn't, you have to remember to do it manually, which
is a potential XSS vulnerability).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even if the data comes through your automatic escaping unscathed
(e.g. base64 under HTML escaping), or you can undo the corruption and get it
properly decoded, after decoding you will have to &lt;strong&gt;manually apply&lt;/strong&gt; HTML
escaping to make it match all the other automatically escaped data. If you
don't, you've potentially got a bug and an XSS exploit.&lt;/p&gt;
&lt;p&gt;So your automatic escape-on-input has &lt;strong&gt;missed&lt;/strong&gt; data, and this happens
because you can't really define the point at which the data has ‘entered’
your system and needs the escaping applied.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This problem means that the escape-on-input approach is inherently flawed and
&lt;strong&gt;cannot&lt;/strong&gt; be fixed. &lt;strong&gt;You just have to patch it up on a case-by-case basis,
which is exactly what escape-on-input is supposed to avoid.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;And then, what about other sources of data – data on the file system, in a
cache etc. Are these entry points? Well, it depends on how the data was put
there. You have to manually follow this all the way through your app; get it
wrong and you've got double escaping bugs or security flaws.&lt;/p&gt;
&lt;p&gt;(By contrast, escape on output always works, because you apply it at the point
where you know it is needed – in the backend that knows the escaping rules.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other systems putting data into your database, or getting data out, have to
abide by your data transformation rules.&lt;/p&gt;
&lt;p&gt;These systems might have nothing to do with your primary domain (e.g. a web
site). Making them understand and obey rules that have nothing to do with the
data itself is insanity and extremely short sighted.&lt;/p&gt;
&lt;p&gt;You can't deal with this problem when you come to it, because you don't have
to just fix your code, you've got to fix all your data too, and by the time
you cross this bridge you might have a lot of data and might need a very
delicate database migration to get it right. The data may even have escaped
your control (e.g. been copied into other systems), or backwards compatibility
concerns might stop you from making the change you need to make.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Within your main application, the decision to escape on input affects your
whole code base.&lt;/p&gt;
&lt;p&gt;If you want to use any libraries, you need to make sure that they are using
all the same assumptions that you have in your main code base.&lt;/p&gt;
&lt;p&gt;For example, if you've got a form/widget library in your web app, it will very
often need to echo user input back to them in the case of a form that has
validation errors. This library has to know if you already escaped the input.&lt;/p&gt;
&lt;p&gt;Writing the library to work in two modes is asking for trouble. Rather, you
need it to have been written from the beginning to assume the same escaping
rules.&lt;/p&gt;
&lt;p&gt;This kills code re-use – you can only use code that assumes the same input
escaping – or it means that you will end up with tons of bugs due to
incompatibilities between the assumptions made in your application code and
the library.&lt;/p&gt;
&lt;p&gt;Essentially, this is the problem of a global configuration setting, but worse
since it affects the &lt;em&gt;operand&lt;/em&gt; of your entire application (the data going
through it), not just the functionality of various &lt;em&gt;operators&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Another example might be a cron job that sends out emails, using data from the
database. If the data comes from a web form that applied "escape on input" to
avoid XSS, then the code will need to apply HTML unescaping - despite the fact
that this script has absolutely nothing to do with the web (it reads a
database and sends plain text emails).&lt;/p&gt;
&lt;p&gt;Effectively, &lt;strong&gt;this means that the XSS solution, far from being a solution
applied at a single point, is in fact spread out over the entire code base,&lt;/strong&gt;
as it includes every time that pre-escaped data has to be un-escaped.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The confusion caused by the above is likely to &lt;em&gt;increase&lt;/em&gt; security
problems. “Keep It Simple, Stupid” remains a very good maxim for developers.&lt;/p&gt;
&lt;p&gt;To continue an example used above: you want to send an email with some data
that has already been HTML escaped, and so you need to unescape the data to
avoid emails with the subject “Jack &amp;amp;amp; Jill” when the user entered “Jack &amp;amp;
Jill”. You decide it's not sensible for the mail sending functions to do this
internally, (or maybe they're provided by a third party who made that decision
for you), so the calling code does the unescaping.&lt;/p&gt;
&lt;p&gt;You later decide to switch to HTML emails, and the developer who implements it
thinks that since data is already escaped, there is no problem including it
without extra escaping in the body of the HTML email, leading to a
vulnerability (not classic XSS in this case, but still a problem).&lt;/p&gt;
&lt;p&gt;There is also the example I gave above where an extra layer of
encoding/decoding in the raw data makes it likely you'll forget to apply the
escaping.&lt;/p&gt;
&lt;p&gt;The confusion caused by escape-on-input means your entire code base becomes a
potential source not only of double-escaping bugs but of security problems as
well.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="in-practice"&gt;
&lt;h2&gt;In practice&lt;/h2&gt;
&lt;p&gt;Thankfully, we don't just have to rely on the above analysis to conclude that
escape-on-input is a terrible idea. PHP, always willing to help when it comes to
“examples of how not to do it”, provides us with a perfect case study.&lt;/p&gt;
&lt;section id="magic-quotes"&gt;
&lt;h3&gt;Magic quotes&lt;/h3&gt;
&lt;p&gt;PHP used to have a feature called magic quotes. It was an escape-on-input
feature that escaped single quotes (&lt;strong&gt;'&lt;/strong&gt;) with backslashes. This was to protect
you from SQL injection attacks, by making the data safe for interpolation into a
SQL query.&lt;/p&gt;
&lt;p&gt;This caused all kinds of problems.&lt;/p&gt;
&lt;p&gt;First, if you are not first passing something through a database, and using
string interpolation to build up SQL queries, you have to remember to strip
those slashes using the function &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you don't, you get double encoding. It looks like \\'this\\', you\\'ve
almost certainly seen it across the web, though it seems we\\'re thankfully
past the worst of it.&lt;/p&gt;
&lt;p&gt;Second, even if you remember, you've added some hideous cruft to your code. In
the bit of code which is handling form validation (and is therefore echoing user
input back to the user without the database being involved), you've got these
bizarre &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt; calls. What on earth does ‘reverse transforming a
string for SQL statement preparation’ have to do with the task of input
validation?&lt;/p&gt;
&lt;p&gt;Third, it turns out that different databases need different escaping mechanisms
to do things fully correctly. So you now have to do &lt;code class="docutils literal"&gt;stripslashes()&lt;/code&gt; on data
even if you are passing it to a database using string-interpolated queries!&lt;/p&gt;
&lt;p&gt;Then, since the above problems are common (building up SQL queries by string
interpolation was always a bad idea, and very often you pass on the data to
outputs that don't want SQL escaping at all), it's desirable to have a way to
turn this behaviour off completely.&lt;/p&gt;
&lt;p&gt;To handle this, there is a php.ini setting to turn it on/off.&lt;/p&gt;
&lt;p&gt;And there were more complications, for example:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;do you apply magic quotes to ‘all input’ (&lt;code class="docutils literal"&gt;magic_quotes_runtime&lt;/code&gt;) or just to
GET/POST/COOKIE data (&lt;code class="docutils literal"&gt;magic_quotes_gpc&lt;/code&gt;)? (This is the problem of defining
what exactly is ‘input’)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;attempts to fix some of the above with yet more configuration options like
&lt;code class="docutils literal"&gt;magic_quotes_sybase&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And so now you've got even more problems. Since these are global settings, you
can't have library code mess with them, since other code might set the global to
a different value or assume a certain value.&lt;/p&gt;
&lt;p&gt;You could try making all code detect the current setting and have different code
paths depending on the result. This works very badly – having multiple code
paths is a recipe for code duplication and bug proliferation. It's extremely
easy to forget to do it, or get one of the paths wrong, since you will likely
only test one configuration value and one set of code paths in reality.&lt;/p&gt;
&lt;p&gt;Alternatively, you can make one bit of code responsible for fixing the setting
to a sensible value (the only one being 'off'), and then make all code assume
that from then on. (If you can't turn it off, you can use the code included
&lt;a class="reference external" href="http://www.php.net/manual/en/security.magicquotes.disabling.php"&gt;here&lt;/a&gt; as a
horrible kludge to reverse its behaviour).&lt;/p&gt;
&lt;p&gt;Eventually, this final approach was the one taken by all significant
projects. &lt;strong&gt;Turn the whole feature off, and assume it is off from then
on&lt;/strong&gt;. (Which means the feature is useless, of course).&lt;/p&gt;
&lt;p&gt;And of course, thankfully, the PHP developers realised that this entire thing
was a &lt;strong&gt;huge mistake&lt;/strong&gt; that caused nothing but a vast amount of confusion and
bugs, and &lt;strong&gt;removed the whole thing&lt;/strong&gt; for good in PHP 5.4.&lt;/p&gt;
&lt;p&gt;Magic quotes, &lt;a class="reference external" href="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/"&gt;as eevee put it&lt;/a&gt;, were “so
close to secure-by-default, and yet so far from understanding the concept at
all.”&lt;/p&gt;
&lt;p&gt;To digress for a moment: we keep getting told that PHP is improving, and the
community has learnt from its mistakes. Unfortunately it seems the leaders in
the community are bent on &lt;strong&gt;recreating&lt;/strong&gt; old mistakes.&lt;/p&gt;
&lt;p&gt;According to Lerdorf, the much newer PHP filter extension is &lt;a class="reference external" href="http://grokbase.com/t/php/php-internals/08373a1vvf/short-open-tag/083qakz7wj#20080323qvterw1df6a006qxyg83z9qsb8"&gt;“magic_quotes
done right”&lt;/a&gt;. But
it still suffers from almost all the problems described here, for all the
reasons described. Global HTML escaping on input is essentially the same as
magic quotes, and just as tragically bad.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="elgg"&gt;
&lt;h3&gt;Elgg&lt;/h3&gt;
&lt;p&gt;In researching for this post, I came across &lt;a class="reference external" href="http://trac.elgg.org/ticket/561"&gt;this ticket for Elgg&lt;/a&gt;, an open source social networking engine.
Just read through the ticket and see the mess they are in. It's clear they
strongly regret the decision they made to escape-on-input, and, in their own
words, have created “horrendous” problems for themselves, especially as their
application has grown to include other interfaces such as JSON REST APIs.&lt;/p&gt;
&lt;p&gt;However, fixing it is very hard. They have to coordinate many changes across
their code base with a big database migration. If data has leaked from the
databases and tables they control into other systems, such as denormalised
tables, other databases, caches etc., or if there is other code by third parties
that makes the old assumptions about encoded data, they are in even more of a
pickle. And both of those things are probably inevitable in something like an
open source framework, which is designed for other people to build on and
extend.&lt;/p&gt;
&lt;p&gt;This is the pain that comes from mixing input handling and output encoding,
and from corrupting the data in your database.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="etsy"&gt;
&lt;h3&gt;Etsy&lt;/h3&gt;
&lt;p&gt;According to &lt;a class="reference external" href="http://www.slideshare.net/zanelackey/effective-approaches-to-web-application-security"&gt;their security presentation&lt;/a&gt;,
Etsy are using escape-on-input for XSS protection.&lt;/p&gt;
&lt;p&gt;They claim that this is a much more secure option, as it is secure by
default. (They do note, however, the problem with input that is encoded in some
other way, like base64, so they are aware of the problems.)&lt;/p&gt;
&lt;p&gt;Their presentation goes on to describe an elaborate system for detecting and
fixing XSS attacks (the slides don't give enough detail for me to understand
what exactly they are doing, but it's clearly a lot of work).&lt;/p&gt;
&lt;p&gt;And &lt;a class="reference external" href="http://www.nzinfosec.com/etsy-has-been-one-of-the-best-companies-ive-reported-holes-to/"&gt;their system does indeed catch XSS bugs in the wild and allow them to fix
them within hours&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Wait, what?&lt;/p&gt;
&lt;p&gt;They've corrupted their database by doing escape-on-input, they've inflicted
themselves with all the development pain described above, and they've &lt;strong&gt;still&lt;/strong&gt;
got XSS bugs?&lt;/p&gt;
&lt;p&gt;Granted, they've got impressive ways of dealing with these problems. But it's
like &lt;a class="reference external" href="http://xkcd.com/463/"&gt;virus checkers on voting machines&lt;/a&gt;. Advanced ways
of dealing with problems that shouldn't even be possible tells you that you are
doing it wrong. They've become very fast at &lt;a class="reference external" href="http://www.red-sweater.com/blog/125/easy-programming"&gt;re-tying their shoelaces, instead
of working out how to tie shoelaces so they don't come undone&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;They claim that with escape-on-input, XSS problems are now greppable, but it
doesn't sound like it. If they were, code audits would be a massively more
efficient way to find XSS problems than the methods they are using.&lt;/p&gt;
&lt;p&gt;The main problem is almost certainly that they are using an output system for
HTML that doesn't do HTML escaping by default (I'm guessing they are using PHP
as their template language). If the backend that deals with HTML &lt;strong&gt;actually
deals with HTML&lt;/strong&gt; then you eliminate the vast majority of these problems
overnight.&lt;/p&gt;
&lt;p&gt;I'm willing to bet that large sites that use Django (or other frameworks that
have basically solved the XSS problem by HTML escaping on output &lt;strong&gt;by default&lt;/strong&gt;)
don't have teams and automated systems dedicated to this problem, and don't need
them. In Django apps, XSS problems &lt;strong&gt;are&lt;/strong&gt; greppable - you grep for
&lt;code class="docutils literal"&gt;mark_safe&lt;/code&gt; in Python and the &lt;code class="docutils literal"&gt;|safe&lt;/code&gt; filter in templates (and then,
obviously, you may have to recursively grep for any functions that call
&lt;code class="docutils literal"&gt;mark_safe&lt;/code&gt; on inputs). Since all data which isn't &lt;code class="docutils literal"&gt;mark_safe()&lt;/code&gt; gets escaped
by the templating engine, and all HTML comes out of the template engine, that's
basically all you need to do.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="now-for-some-flame-bait"&gt;
&lt;h2&gt;Now for some flame bait&lt;/h2&gt;
&lt;p&gt;How did this happen to Etsy?&lt;/p&gt;
&lt;p&gt;Are the Etsy devs stupid? I suspect not. Etsy is clearly doing well, and I
imagine they have enough money to hire top-notch developers. Some of their
&lt;a class="reference external" href="http://www.etsy.com/careers/job_description.php?job_id=ozhhVfwM"&gt;careers pages&lt;/a&gt; show they
are happy using a variety of languages and technologies, and their &lt;a class="reference external" href="http://codeascraft.etsy.com/"&gt;engineering
blog&lt;/a&gt; seems to be sane and competent. Even their
security presentation showed considerable ingenuity and technical ability in
dealing with security problems (in entirely the wrong way, unfortunately, but
still impressive).&lt;/p&gt;
&lt;p&gt;I doubt they are low quality developers. Rather, I suspect that use of PHP has
addled their brains. They have become far too accustomed to working in an
environment in which insanity reigns – an environment in which &lt;a class="reference external" href="https://lukeplant.me.uk/blogmedia/php_less_than.txt"&gt;the less than
operator pretends to work correctly with strings but it's just a trap&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When I programmed in a Windows environment, I theorised that use of Windows
itself contributed to the poor quality of the programming in the code base, and
the fact that developers thought nothing or writing tons of tedious
code. Because Windows was so unscriptable, I imagined that Windows programmers
developed a high tolerance for tedium and repetition, which is exactly the
opposite of qualities needed by a programmer to make a computer do everything
efficiently and reliably. (Since then, I've found that Sturgeon's law was
probably a better explanation for the quality of the code, but I still think the
fundamental idea applies).&lt;/p&gt;
&lt;p&gt;With PHP, the fact that it comes with a template language that is simply not fit
for purpose – because it doesn't do HTML escaping by default, or even easily —
has somehow made the Etsy developers believe that it is normal to struggle with
XSS, that it is perfectly reasonable that even after taking the drastic action
of corrupting their entire database by HTML escaping it, they should &lt;strong&gt;still&lt;/strong&gt;
need elaborate XSS-catching systems.&lt;/p&gt;
&lt;p&gt;Instead of &lt;a class="reference external" href="http://www.youtube.com/watch?v=5mdy8bFiyzY"&gt;trying&lt;/a&gt; to fix XSS,
they should just fix it. Like &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/templates/language/#automatic-html-escaping"&gt;this in Django&lt;/a&gt;. Or
&lt;a class="reference external" href="http://pypi.python.org/pypi/MarkupSafe/"&gt;this in Turbogears and Jinja&lt;/a&gt;. Or
&lt;a class="reference external" href="http://www.yesodweb.com/book/shakespearean-templates#shakespearean-templates_types"&gt;this in Yesod&lt;/a&gt;. Or even &lt;a class="reference external" href="http://twig.sensiolabs.org/doc/templates.html#html-escaping"&gt;this
in PHP&lt;/a&gt; (though
due to limitations of the language you won't be able to have the convenience of
things like &lt;a class="reference external" href="https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe"&gt;mark_safe&lt;/a&gt;
in Django). But living with an environment of pain and madness makes you think
that it ought to be hard.&lt;/p&gt;
&lt;p&gt;Right the way up to Rasmus Lerdorf at the top, many people in the PHP community
live with the insanity of their tools, and add more insanity to cope with it,
rather than fix their tools or choose better ones.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="a-lesson-for-pythonistas"&gt;
&lt;h2&gt;A lesson for Pythonistas&lt;/h2&gt;
&lt;p&gt;Bashing other languages is fun, but when I do so I always try to get something
more valuable out of it by using the opportunity to examine myself. The problem
I discussed in the last section (which is just a manifestation of the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Broken_windows_theory"&gt;broken
windows theory&lt;/a&gt;) applies
to other communities, and I'll attempt to apply it to the Python community.&lt;/p&gt;
&lt;p&gt;Refusing to live with stupidity is one of the reasons that Python 3 is really
important.&lt;/p&gt;
&lt;p&gt;Python 3 does not represent a massive leap forward in terms of additions to the
language. Mainly it just fixes a bunch of mistakes in Python 2, and introduces a
whole lot of backwards incompatibilities in the process. One of the biggest is
unicode/bytes. Python 2 was stupid here – it went directly against the Zen of
Python, and said “in the face of ambiguity about what encoding to use, guess.”
This caused a world of pain.&lt;/p&gt;
&lt;p&gt;Now, you can work around it in most cases by some sensible conventions and a
certain amount of discipline. You can also cope with the fact the &lt;code class="docutils literal"&gt;"a" &amp;lt; 1&lt;/code&gt;
doesn't raise an exception. You can live with &lt;code class="docutils literal"&gt;next()&lt;/code&gt; being a method in the
iterator protocol, when it should be a method called &lt;code class="docutils literal"&gt;__next__()&lt;/code&gt; and a builtin
&lt;strong&gt;function&lt;/strong&gt; &lt;code class="docutils literal"&gt;next()&lt;/code&gt;. You can live with the fact that &lt;code class="docutils literal"&gt;print&lt;/code&gt; is a totally
unnecessary keyword, since it should just be a builtin function. You can get
used to the fact that &lt;cite&gt;class Foo:&lt;/cite&gt; means something subtly but significantly
different from &lt;cite&gt;class Foo(object):&lt;/cite&gt;. You can work around or ignore dozens of
other little niggles, gotchas and inconsistencies.&lt;/p&gt;
&lt;p&gt;But all the while, you are training yourself to tolerate stupidity,
inconsistency and brokenness. Removing these warts is really important, and
worth all the pain of the migration. The alternative is for Python to become the
next PHP.&lt;/p&gt;
&lt;p&gt;On top of these things, there are other types of brokenness in Python that
people in the community seem less willing to acknowledge or tackle. For some of
these I think we need exposure to completely different languages – languages
where you can spawn thousands of ‘threads’ easily and get performance benefits,
for example, or languages where you can write code that is both very high level
&lt;strong&gt;and&lt;/strong&gt; extremely fast. If we live entirely with Python and its set of
limitations, we'll think that those problems are normal and unavoidable.&lt;/p&gt;
&lt;hr class="docutils"&gt;
&lt;p&gt;Main updates:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;2012/08/07 - corrections about turning magic_quotes_gpc off at runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2012/10/08 - noted bug with queries returning false matches.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2014/05/05 - added info about different contexts in HTML&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;</content>
    <category term="django" label="Django"/>
    <category term="php" label="PHP"/>
    <category term="python" label="Python"/>
    <category term="security" label="Security"/>
    <category term="web-development" label="Web development"/>
  </entry>
  <entry>
    <title>Updated validator and CsrfMiddleware</title>
    <id>https://lukeplant.me.uk/blog/posts/updated-validator-and-csrfmiddleware/</id>
    <updated>2005-12-14T23:45:01Z</updated>
    <published>2005-12-14T23:45:01Z</published>
    <author>
      <name>Luke Plant</name>
    </author>
    <link rel="alternate" type="text/html" href="https://lukeplant.me.uk/blog/posts/updated-validator-and-csrfmiddleware/"/>
    <summary type="html">&lt;p&gt;I've released some small updates to my 'Django validator app' and 'CsrfMiddleware'...&lt;/p&gt;</summary>
    <content type="html">&lt;p&gt;I've released some small updates to my 'Django validator app' and
'CsrfMiddleware'. The main changes are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;added a setup.py to both of them, after working out how these work
and a lot of fiddling around.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;added support for mod_python to the validator app (thanks nesh)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;added a setting to allow the validator to ignore certain paths.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Get them here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://lukeplant.me.uk/resources/djangovalidator/"&gt;Django Validator&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://lukeplant.me.uk/resources/csrfmiddleware/"&gt;CsrfMiddleware&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I've also discovered that my CsrfMiddleware is currently number 6 in a google
search for &lt;code class="docutils literal"&gt;Cross Site Request Forgery&lt;/code&gt;, which is rather pleasing, or
perhaps it just tells you how little there is on the web about this exploit.&lt;/p&gt;</content>
    <category term="django" label="Django"/>
    <category term="security" label="Security"/>
    <category term="software-projects" label="Software projects"/>
  </entry>
</feed>
