Luke Plant's home page (Posts about Software development)https://lukeplant.me.uk/blog/categories/software-development.xml2024-03-11T13:22:14ZLuke PlantNikolaNo one actually wants simplicityhttps://lukeplant.me.uk/blog/posts/no-one-actually-wants-simplicity/2023-08-22T18:49:31+01:002023-08-22T18:49:31+01:00Luke Plant<p>We think we do, but in fact every web developer will happily sacrifice simplicity to the first shiny thing promising them relief from the mildest of ailments.</p><p>The reason that modern web development is <a class="reference external" href="https://www.youtube.com/watch?v=BtJAsvJOlhM">swamped with complexity</a> is that no one really wants things to be simple. We just think we do, while our choices prove otherwise.</p>
<p>A lot of developers want simplicity in the same way that a lot of clients claim they want a fast website. You respond “OK, so we can remove some of these 17 Javascript trackers and other bloat that’s making your website horribly slow?” – no, apparently those are all critical business functionality.</p>
<p>In other words, they prioritise everything over speed. And then they wonder why using their website is like rowing a boat through a lake of molasses on a cold day using nothing but a small plastic spoon.</p>
<p>The same is often true of complexity. The real test is the question “what are you willing to sacrifice to achieve simplicity?” If the answer is “nothing”, then you don’t actually love simplicity at all, it’s your lowest priority.</p>
<p>When I say “sacrifice”, I don’t mean that choosing simplicity will mean you are worse off overall – simplicity brings massive benefits. But it does mean that there will be some things that tempt you to believe you are missing out.</p>
<p>For every developer, it might be something different. For one, the tedium of having to spend half an hour a month ensuring that two different things are kept in sync easily justifies the adoption of a bulky framework that solves that particular problem. For another, the ability to control how a checkbox animates when you check it is of course a valid reason to add another 50 packages and 3 layers of frameworks to their product. For another, adding an abstraction with thousands of lines of codes, dozens of classes and page after page of documentation in order to avoid manually writing a <a class="reference external" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/">tiny factory function for a test</a> is a great trade-off.</p>
<p>Of course we all claim to hate complexity, but it’s actually just complexity added by other people that we hate — our own bugbears are always exempted, and for things we understand we quickly become unable to even see there is a potential problem for other people. Certainly there are frameworks and dependencies that justify their existence and adoption, but working out which ones they are is hard.</p>
<p>I think a good test of whether you truly love simplicity is whether you are able to remove things you have added, especially code you’ve written, even when it is still providing value, because you realise it is not providing enough value.</p>
<p>Another test is what you are tempted to do when a problem arises with some of the complexity you’ve added. Is your first instinct to add even more stuff to fix it, or is it to remove and live with the loss?</p>
<p>The only path I can see through all this is to cultivate an almost obsessive suspicion of <a class="reference external" href="https://en.wikipedia.org/wiki/Fear_of_missing_out">FOMO</a>. I think that’s probably key to learning to <a class="reference external" href="https://grugbrain.dev/#grug-on-saying-no">say no</a>.</p>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/ao2x0v/no_one_actually_wants_simplicity">Discussion of this post on Lobsters</a></p></li>
</ul>
</section>The curse of scalable technologyhttps://lukeplant.me.uk/blog/posts/the-curse-of-scalable-technology/2021-12-02T13:55:14Z2021-12-02T13:55:14ZLuke Plant<p>Some of the downsides of technology stacks that are massively scalable and general purpose</p><p>The scalability and breadth of usefulness of some tools in the software
development world is huge. Relational databases spring to mind as one example —
at the low end, a database as small as one table with a few dozen rows can be
useful, while at the higher end millions or billions of rows in thousands of
tables is not unusual, and you might choose the same technology for both.</p>
<p>A more obvious example, perhaps, is programming languages. Code bases can be
anything from a handful of lines for simple scripts, to millions of lines, and
many languages scale to both these extremes. There are many more examples in the
programming world of libraries and tools that scale like this.</p>
<p>This is an amazing success, but it brings with it some issues. In the software
development world, you can be rubbing shoulders online with people working at
very different scales to you, and in very different contexts. That can lead to:</p>
<ul class="simple">
<li><p>endless debates where we talk past each other, each convinced that our
experience and expertise (which may be real) qualify us to tell other people
they are all doing it wrong.</p></li>
<li><p>copying inappropriate solutions, resulting in over-engineering (or
under-engineering, but I suspect that happens less), due to being unaware of
the context in which solutions really make sense.</p></li>
<li><p>projects that get side-tracked trying to please everyone, where a more focused
approach would have been better. Similarly when choosing how to develop our
expertise.</p></li>
<li><p>becoming arrogant and dismissive of others when you see what looks like
obvious incompetence, due to being unaware of the context in which someone
else’s code or decisions do in fact make sense. (That’s not to say that there
is no incompetence, <a class="reference external" href="https://thedailywtf.com/">there might be plenty</a>, but
you may or may not be competent to judge that!)</p></li>
</ul>
<p><a class="reference external" href="https://scattered-thoughts.net/writing/on-bad-advice/">Jamie Brandon touched on the subject of “Context Matters” in a recent post</a>. On that theme, I
thought it might be useful to list some of the different ways in which your
context might be different from that of other people you might be interacting
with on the internet.</p>
<p>In no particular order, important aspects of development context include:</p>
<ul class="simple">
<li><p>Size of code base.</p></li>
<li><p>Size of development team.</p></li>
<li><p>Size of the larger organisation that your software development team exists
within or relates to.</p></li>
<li><p>Nature of hierarchies within the organisation running your software development.</p></li>
<li><p>Organisation type – e.g. just for fun, charity, government non-profit, for
profit, open source community (of many different kinds) etc.</p></li>
<li><p>Business domain e.g. medicine, law, finance, physics, gaming etc.</p></li>
<li><p>Speed at which you must react to market changes. Especially, the speed at
which you might need to scale operations, or develop new features.</p></li>
<li><p>Need for or use of specialists vs generalists e.g. for web development,
you have “full stack” developers at one end of the scale, vs specialists in
databases, backend application code (which may have multiple layers), frontend
application code, design, UI/UX etc.</p></li>
<li><p>Priority given to avoiding or fixing faults. Consider critical medical
systems, for example, compared with an experimental game you are developing
for fun in public.</p></li>
<li><p>Priority given to avoiding regressions. This can be in direct tension to the
above, due to <a class="reference external" href="https://www.hyrumslaw.com/">Hyrum's Law</a>.</p></li>
<li><p>Priority of getting the best possible performance. Performance could be
measured in terms of memory usage, CPU time, network latency, network
bandwidth etc. For some projects, a 1% improvement is considered a very worthy
goal, for others, solutions that are thousands of times slower than optimal
could be absolutely fine, and a factor of 10 improvement might not even be
worth it.</p></li>
<li><p>Kind of users your software has. They could be other developers like you,
or end users with a massive range of ability with computers.</p></li>
<li><p>Nature of your relationship with the software’s user. This can range from:</p>
<ul>
<li><p>you are the user</p></li>
<li><p>you work for the user and are paid by them</p></li>
<li><p>you work on a team with the user, and are both paid by the same people</p></li>
<li><p>you don’t know the user directly at all, but are nonetheless paid by them
(indirectly).</p></li>
<li><p>you don’t know the user and they don’t pay for the software even indirectly.</p></li>
<li><p>probably many other variations.</p></li>
</ul>
</li>
<li><p>Number of users of your software – this could easily range over 9 orders
of magnitude.</p></li>
<li><p>Amount of data you process – again, a massive range.</p></li>
<li><p>Relative value to your “business” of each customer.</p></li>
<li><p>To what degree you know what hardware your software will run on.</p></li>
<li><p>How close you are to hardware limits (compare embedded systems to most desktop
software, for example).</p></li>
<li><p>How many different environments (e.g. operating systems) your software needs
to work in.</p></li>
<li><p>Extensibility requirements – from “none”, through “extendable by developers”
or “extendable by end users” (for example with embedded scripting engines)</p></li>
<li><p>Long term maintenance needs. Some projects are kind of “throw-away”, and some
see few changes after a “release” e.g. some (but not all) games. On the other
hand, some software will be maintained for decades, sometimes being heavily
changed after initial development, other less so.</p></li>
<li><p>Need for new developers and maintainers. In some projects, you’ll have a
pretty static team, others will see huge amount of turnover.</p></li>
<li><p>Ability to find more developers and maintainers. Some projects have no money
at all, while for others money is no object. Some may attract new developers
by virtue of being “fun”, others may not be able to.</p></li>
<li><p>Backwards compatibility needs with external dependencies. Differences here
will make a large difference in how much design you need to do up front.</p></li>
</ul>
<p>All of the above are <strong>dimensions</strong> in which software projects live. Many of
them, on their own, make profound differences to technology decisions. So
considered together, the space is truly vast. You could be “next door” to
someone in 10 of these dimensions, but miles away in some others.</p>
<p>(Obviously some of these dimensions have strong correlations with each other
e.g. large code bases are usually managed by larger teams, which reduces the
populated possibility space, but there are probably many exceptions to whatever
correlations you assume)</p>
<p>And there are probably many more dimensions I’m not aware of.</p>
<p>Trying to work out how much advice can be generalised is extremely hard. This is
compounded by the fact that people who have experience in a lot of different
projects often do not have <strong>in depth</strong> knowledge, or knowledge spanning a
really long time period. I know from experience that conclusions I’ve come to
after 2 or 3 years on a project are different to after 1 year, and they might
change again after 5 or 10 years. So it may be that the most experienced people
(judging by breadth) are actually the least qualified to advise others, due to
lack of depth – but also the least aware of that!</p>
<p>And then you have the problem that many people with a lot of experience are
pretty silent about it, and you have no idea how many they are (because they are
not vocal about their existence either!) Further, the most vocal might not be
the best qualified to help with your situation. For example, I know from at
least 2 data points that it’s entirely possible to run a multi-million dollar
business that has a main database containing much less than 100 Mb of data. But
I don’t know how common that is, and I suspect you will probably hear a lot more
from companies that have a massively different profit-to-data ratio.</p>
<p>When I think too much about this, I feel I am stuck between two extremes: wild
and unjustified extrapolations from the tiny bit of experience I have gained so
far on the one hand, and failing to learn from anything on the other. The latter
seems much worse – none of us would be alive today if we reasoned “well just
because a lion ate my friend, it would be unscientific and unjustified to jump
to the conclusion that this lion might eat me”.</p>
<p>So I think the right path is something like this:</p>
<ul class="simple">
<li><p>Try to generalise from your experiences, but don’t hold your opinions too
strongly.</p></li>
<li><p>Listen to other people’s conclusions, but try to learn as much as you can
about the context that formed them.</p></li>
<li><p>See the value in expertise and approaches that have a limited scope of
application.</p></li>
</ul>
<p>Any other ideas?</p>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://twitter.com/spookylukey/status/1466415509491134482">Discussion of this post on Twitter</a></p></li>
<li><p><a class="reference external" href="https://lobste.rs/s/eugugj/curse_scalable_technology">Discussion of this post on Lobsters</a></p></li>
<li><p><a class="reference external" href="https://news.ycombinator.com/item?id=36273917">Discussion of this post on Hacker News</a></p></li>
</ul>
</section>A Django PAGNI: efficient bulk propertieshttps://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/2021-08-25T16:54:17+01:002021-08-25T16:54:17+01:00Luke Plant<p>When using Django database models and adding a calculated property of some kind, you should probably ensure it will be efficient in bulk even if that isn’t needed yet.</p><p>Adding to my “Probably Are Gonna Need It” list (started by my <a class="reference external" href="https://lukeplant.me.uk/blog/posts/yagni-exceptions/">YAGNI exceptions</a> post a few months back,
with follow ups by <a class="reference external" href="https://simonwillison.net/2021/Jul/1/pagnis/">Simon Willison</a> and <a class="reference external" href="https://jacobian.org/2021/jul/8/appsec-pagnis/">Jacob Kaplan-Moss</a>), this post is about a
pattern that often crops up in Django applications. It probably applies to many
other database-driven applications too, but I’m more confident about saying it
is a PAGNI in Django – namely, a situation where you are usually better off
taking the risk of doing the extra work up front.</p>
<p>To state it briefly:</p>
<blockquote>
<p>If you have a calculated property that relates to a Django model and
requires a database query (or other expensive work), consider making
efficient in bulk even if you don't need it in bulk right now.</p>
</blockquote>
<section id="example-initial-requirement">
<h2>Example – initial requirement</h2>
<p>Suppose we are writing an internal task management app for our team. A
requirement comes along: the user’s dashboard page should have some text that
indicates how many “in progress” tasks they have.</p>
<p>We already have a <code class="docutils literal">Task</code> model associated with our <code class="docutils literal">User</code> model:</p>
<div class="code"><pre class="code python"><a id="rest_code_6839f8d517614cec8f763f9765bce317-1" name="rest_code_6839f8d517614cec8f763f9765bce317-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-1"></a><span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="n">AbstractBaseUser</span><span class="p">):</span>
<a id="rest_code_6839f8d517614cec8f763f9765bce317-2" name="rest_code_6839f8d517614cec8f763f9765bce317-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-2"></a> <span class="k">pass</span>
<a id="rest_code_6839f8d517614cec8f763f9765bce317-3" name="rest_code_6839f8d517614cec8f763f9765bce317-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-3"></a>
<a id="rest_code_6839f8d517614cec8f763f9765bce317-4" name="rest_code_6839f8d517614cec8f763f9765bce317-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-4"></a><span class="k">class</span> <span class="nc">Task</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<a id="rest_code_6839f8d517614cec8f763f9765bce317-5" name="rest_code_6839f8d517614cec8f763f9765bce317-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-5"></a> <span class="n">assigned_to</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="s1">'myproject.User'</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s1">'assigned_tasks'</span><span class="p">)</span>
<a id="rest_code_6839f8d517614cec8f763f9765bce317-6" name="rest_code_6839f8d517614cec8f763f9765bce317-6" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_6839f8d517614cec8f763f9765bce317-6"></a> <span class="n">state</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">choices</span><span class="o">=</span><span class="p">[</span><span class="s2">"ready_for_work"</span><span class="p">,</span> <span class="s2">"in_progress"</span><span class="p">,</span> <span class="s2">"done"</span><span class="p">])</span>
</pre></div>
<p>We might already have a custom QuerySet with an <code class="docutils literal">in_progress</code> method that does
the appropriate filtering:</p>
<div class="code"><pre class="code python"><a id="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-1" name="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-1"></a><span class="k">class</span> <span class="nc">TaskQuerySet</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">QuerySet</span><span class="p">):</span>
<a id="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-2" name="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-2"></a> <span class="k">def</span> <span class="nf">in_progress</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-3" name="rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_69cd51cbdd7b4be3a35cb652acfb7ad2-3"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">state</span><span class="o">=</span><span class="s2">"in_progress"</span><span class="p">)</span> <span class="c1"># or perhaps something more complex</span>
</pre></div>
<p>We could then do the calculation with just the following code:</p>
<div class="code"><pre class="code python"><a id="rest_code_e1c89de5231d45b29d932d464de01715-1" name="rest_code_e1c89de5231d45b29d932d464de01715-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_e1c89de5231d45b29d932d464de01715-1"></a><span class="n">Task</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">in_progress</span><span class="p">()</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">assigned_to</span><span class="o">=</span><span class="n">user</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<a id="rest_code_e1c89de5231d45b29d932d464de01715-2" name="rest_code_e1c89de5231d45b29d932d464de01715-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_e1c89de5231d45b29d932d464de01715-2"></a>
<a id="rest_code_e1c89de5231d45b29d932d464de01715-3" name="rest_code_e1c89de5231d45b29d932d464de01715-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_e1c89de5231d45b29d932d464de01715-3"></a><span class="c1"># or, from a ``User`` instance it might be:</span>
<a id="rest_code_e1c89de5231d45b29d932d464de01715-4" name="rest_code_e1c89de5231d45b29d932d464de01715-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_e1c89de5231d45b29d932d464de01715-4"></a>
<a id="rest_code_e1c89de5231d45b29d932d464de01715-5" name="rest_code_e1c89de5231d45b29d932d464de01715-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_e1c89de5231d45b29d932d464de01715-5"></a><span class="n">user</span><span class="o">.</span><span class="n">assigned_tasks</span><span class="o">.</span><span class="n">in_progress</span><span class="p">()</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>At a SQL level, this is doing a simple <code class="docutils literal">SELECT COUNT()</code> with some filtering e.g.:</p>
<div class="code"><pre class="code sql"><a id="rest_code_bddc1639be0a4dd39ff6c408fe901ca2-1" name="rest_code_bddc1639be0a4dd39ff6c408fe901ca2-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_bddc1639be0a4dd39ff6c408fe901ca2-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">myproject_tasks</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">123</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'in_progress'</span><span class="w"></span>
</pre></div>
<p>We should probably wrap that up in a method or property, which we could easily
add to our <code class="docutils literal">User</code> model like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-1" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-1"></a><span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="n">AbstractBaseUser</span><span class="p">):</span>
<a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-2" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-2"></a> <span class="o">...</span>
<a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-3" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-3"></a>
<a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-4" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-4"></a> <span class="nd">@cached_property</span>
<a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-5" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-5"></a> <span class="k">def</span> <span class="nf">in_progress_tasks_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_dce3e983afb54dd48073ab8d843a1e0f-6" name="rest_code_dce3e983afb54dd48073ab8d843a1e0f-6" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_dce3e983afb54dd48073ab8d843a1e0f-6"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">assigned_tasks</span><span class="o">.</span><span class="n">in_progress</span><span class="p">()</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>You can now use this property in a template very easily:</p>
<div class="code"><pre class="code html+django"><a id="rest_code_2771e99d002f4db0b6f8aee8fe72f8c1-1" name="rest_code_2771e99d002f4db0b6f8aee8fe72f8c1-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_2771e99d002f4db0b6f8aee8fe72f8c1-1"></a><span class="p"><</span><span class="nt">p</span><span class="p">></span>Tasks in progress: <span class="cp">{{</span> <span class="nv">request.user.in_progress_tasks_count</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
</pre></div>
<p>You may or may not like putting properties on the user model like that, but the
code is simple and gets the job done, and seems to be fine. An alternative
structure would put this code in a utility function in the model layer
somewhere, which would require a bit more work to make the data available in our
template, but either way it doesn’t affect how this example will unfold.</p>
</section>
<section id="example-new-requirement">
<h2>Example – new requirement</h2>
<p>Some time later, a request comes up like this:</p>
<blockquote>
<p>You know that "in progress tasks count" on the user dashboard? Can we add
that as a column to the admin screen that shows the list of users?</p>
</blockquote>
<p>This sounds very simple – they just want a piece of information we already know
how to calculate to appear in another place. What could be easier?</p>
<p>If you are using the Django admin for the admin screen, and you coded the first
part as above, the solution could indeed be very simple to execute – as simple
as adding <code class="docutils literal">"in_progress_tasks_count"</code> to the <a class="reference external" href="https://docs.djangoproject.com/en/stable/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_display">list_display</a>
property – a five minute job maximum.</p>
<p>It will work fine. But you've hit the <a class="reference external" href="https://adamj.eu/tech/2020/09/01/django-and-the-n-plus-one-queries-problem/">dreaded N+1 queries problem</a>.</p>
<p>For each user instance displayed in the list, we will end up executing a
separate SQL query to get the task count. This is completely unnecessary – there
are multiple ways to do this much more efficiently in SQL:</p>
<ul>
<li><p>Using a SQL <cite>COUNT</cite> with a sub-query added to the main user query.</p></li>
<li><p>Using a SQL <cite>COUNT FILTER</cite> and a join added to the main user query.</p></li>
<li><p>Using a second query like this:</p>
<div class="code"><pre class="code sql"><a id="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-1" name="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">myproject_user</span><span class="w"></span>
<a id="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-2" name="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-2"></a><span class="k">WHERE</span><span class="w"> </span><span class="n">assigned_to_id</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'in_progress'</span><span class="w"></span>
<a id="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-3" name="rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_0c4ba0c77f4f4e228f1ddc8b185eef07-3"></a><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">assigned_to_id</span><span class="p">;</span><span class="w"></span>
</pre></div>
<p>where our list of user IDs values comes from the first query we executed.</p>
</li>
</ul>
<p>Instead of those, we’ve ended up with a very slow method that is going to hurt
us quite quickly in terms of performance. We might not notice the problem on our
development machines, but it will quickly add up in production.</p>
<aside class="admonition admonition-note">
<p class="admonition-title">Note</p>
<p>If you are using SQLite, which is actually <a class="reference external" href="https://sqlite.org/np1queryprob.html">pretty good at lots of small
queries</a>, this might not actually be
a problem.</p>
</aside>
<p>Even if we do notice it, we’re going to have a problem doing it the right way at
this point:</p>
<ul>
<li><p>The weight of the existing code pushes us in the wrong direction. The easy
thing is slow.</p>
<p>Also remember that by the time we go to do this, in addition to
<code class="docutils literal">in_progress_tasks_count</code>, we might also have <code class="docutils literal">completed_tasks_count</code>,
<code class="docutils literal">deferred_tasks_count</code> etc.</p>
</li>
<li><p>The low estimate we probably made for this new requirement, either explicitly
or just internally in the time we’ve allowed for it, pushes us to find a quick
solution.</p></li>
<li><p>“One way to do it”, and “Once And Only Once” push against us implementing a
second way to do the same calculation.</p></li>
</ul>
<p>So the result will be:</p>
<ul class="simple">
<li><p>either the wrong way, which will be slow and contribute further to patterns
that will make us even slower in the future,</p></li>
<li><p>or, doing rework which will be an unexpected and unwelcome cost at this point.</p></li>
</ul>
</section>
<section id="implementation-tips">
<h2>Implementation tips</h2>
<p>So, if we want to do this a bulk efficient way, what are our options?</p>
<ol class="arabic">
<li><p>We can load our <code class="docutils literal">User</code> objects in a query with an annotation that does the
calculation in the database, as part of the main query.</p>
<p>For the case above, we could use a custom User QuerySet method something like:</p>
<div class="code"><pre class="code python"><a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-1" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-1"></a><span class="k">class</span> <span class="nc">UserQuerySet</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">QuerySet</span><span class="p">):</span>
<a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-2" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-2"></a> <span class="k">def</span> <span class="nf">with_in_progress_tasks_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-3" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-3"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">in_progress_tasks_count</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">Count</span><span class="p">(</span>
<a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-4" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-4"></a> <span class="s1">'assigned_tasks'</span><span class="p">,</span>
<a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-5" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-5"></a> <span class="nb">filter</span><span class="o">=</span><span class="n">Q</span><span class="p">(</span><span class="n">assigned_tasks__state</span><span class="o">=</span><span class="s1">'in_progress'</span><span class="p">)</span>
<a id="rest_code_94e5f667a8ab4fed8503693faa6676ad-6" name="rest_code_94e5f667a8ab4fed8503693faa6676ad-6" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_94e5f667a8ab4fed8503693faa6676ad-6"></a> <span class="p">))</span>
</pre></div>
<p>This produces pretty nice SQL, although one downside of this code is that we
have duplicated some logic from our <code class="docutils literal">TaskQuerySet.in_progress</code> filter.</p>
<p>You then need to load your user objects with
<code class="docutils literal">User.objects.with_in_progress_tasks_count()</code>, which would require
overriding <code class="docutils literal">ModelAdmin.get_queryset</code> if you are using the Django admin for
example.</p>
<p>To keep everything happy, it is often useful to still define a property on
the model like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_deb94ffd0f504398998a714f338e6330-1" name="rest_code_deb94ffd0f504398998a714f338e6330-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_deb94ffd0f504398998a714f338e6330-1"></a><span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="n">AbstractBaseUser</span><span class="p">):</span>
<a id="rest_code_deb94ffd0f504398998a714f338e6330-2" name="rest_code_deb94ffd0f504398998a714f338e6330-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_deb94ffd0f504398998a714f338e6330-2"></a>
<a id="rest_code_deb94ffd0f504398998a714f338e6330-3" name="rest_code_deb94ffd0f504398998a714f338e6330-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_deb94ffd0f504398998a714f338e6330-3"></a> <span class="nd">@cached_property</span>
<a id="rest_code_deb94ffd0f504398998a714f338e6330-4" name="rest_code_deb94ffd0f504398998a714f338e6330-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_deb94ffd0f504398998a714f338e6330-4"></a> <span class="k">def</span> <span class="nf">in_progress_tasks_count</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
<a id="rest_code_deb94ffd0f504398998a714f338e6330-5" name="rest_code_deb94ffd0f504398998a714f338e6330-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_deb94ffd0f504398998a714f338e6330-5"></a> <span class="k">raise</span> <span class="ne">AssertionError</span><span class="p">(</span><span class="s2">"Use User.objects.with_in_progress_tasks_count() if you want to use this"</span><span class="p">)</span>
</pre></div>
</li>
<li><p>We can have a separate query which does the count, and then decorates the
list of user objects with the value of <code class="docutils literal">in_progress_tasks_count</code>.</p>
<p>One of the advantages of this method is that the queries can be easier to
write, whether you are using the ORM or dropping down to raw SQL, and they
don’t complicate the main query at all, which can be a big bonus. Also,
sometimes it can be easier to re-use existing custom QuerySet methods this way.</p>
<p>One of the disadvantages is that it can be difficult to insert this extra bit
of work at the right point, especially if you are in the context of framework
code (like the Django admin or DRF), where the full QuerySet is built up and
evaluated outside of your control.</p>
<p>For this situation, in a number of projects I’ve started using a <a class="reference external" href="https://gist.github.com/spookylukey/8d1a4c73845d1ec86a875fd44b6bdc32">mechanism
for adding callbacks that run immediately after a QuerySet is evaluated</a>.</p>
<p>Usage in this case would look like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-1" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-1" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-1"></a><span class="k">class</span> <span class="nc">UserQuerySet</span><span class="p">(</span><span class="n">AfterFetchQuerySetMixin</span><span class="p">,</span> <span class="n">models</span><span class="o">.</span><span class="n">QuerySet</span><span class="p">):</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-2" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-2" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-2"></a> <span class="k">def</span> <span class="nf">with_in_progress_tasks_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-3" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-3" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-3"></a> <span class="k">def</span> <span class="nf">add_in_progress_tasks_count</span><span class="p">(</span><span class="n">user_list</span><span class="p">):</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-4" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-4" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-4"></a> <span class="n">user_ids</span> <span class="o">=</span> <span class="p">[</span><span class="n">u</span><span class="o">.</span><span class="n">id</span> <span class="k">for</span> <span class="n">u</span> <span class="ow">in</span> <span class="n">user_list</span><span class="p">]</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-5" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-5" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-5"></a> <span class="n">counts</span> <span class="o">=</span> <span class="p">(</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-6" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-6" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-6"></a> <span class="n">Task</span><span class="o">.</span><span class="n">objects</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-7" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-7" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-7"></a> <span class="o">.</span><span class="n">in_progress</span><span class="p">()</span><span class="o">.</span><span class="n">order_by</span><span class="p">()</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-8" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-8" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-8"></a> <span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-9" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-9" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-9"></a> <span class="n">assigned_to__id__in</span><span class="o">=</span><span class="n">user_ids</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-10" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-10" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-10"></a> <span class="p">)</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-11" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-11" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-11"></a> <span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s1">'assigned_to_id'</span><span class="p">)</span><span class="o">.</span><span class="n">annotate</span><span class="p">(</span><span class="n">count</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">Count</span><span class="p">(</span><span class="s1">'id'</span><span class="p">))</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-12" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-12" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-12"></a> <span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'assigned_to_id'</span><span class="p">,</span> <span class="s1">'count'</span><span class="p">)</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-13" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-13" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-13"></a> <span class="p">)</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-14" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-14" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-14"></a> <span class="n">counts_dict</span> <span class="o">=</span> <span class="p">{</span><span class="n">user_id</span><span class="p">:</span> <span class="n">c</span> <span class="k">for</span> <span class="n">user_id</span><span class="p">,</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">counts</span><span class="p">)</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-15" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-15" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-15"></a> <span class="c1"># Decorate user list</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-16" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-16" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-16"></a> <span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">user_list</span><span class="p">:</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-17" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-17" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-17"></a> <span class="n">user</span><span class="o">.</span><span class="n">in_progress_tasks_count</span> <span class="o">=</span> <span class="n">counts</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">id</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<a id="rest_code_a3907ecca43840dabbfbd30a1a9e5138-18" name="rest_code_a3907ecca43840dabbfbd30a1a9e5138-18" href="https://lukeplant.me.uk/blog/posts/django-pagni-efficient-bulk-properties/#rest_code_a3907ecca43840dabbfbd30a1a9e5138-18"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">register_after_fetch_callback</span><span class="p">(</span><span class="n">add_in_progress_tasks_count</span><span class="p">)</span>
</pre></div>
<p>This pattern has come up often enough for me that I’m wondering whether
something like this should be included in Django itself. <code class="docutils literal">AfterFetchQuerySetMixin</code> has
to depend on some internals to work.</p>
</li>
</ol>
<p>For this kind of work, in general you might need to get good at using <a class="reference external" href="https://docs.djangoproject.com/en/stable/topics/db/aggregation/">Django’s
aggregation features</a>, which are not
the easiest in my opinion. <a class="reference external" href="https://hakibenita.com/django-group-by-sql">Haki Benita’s guide to Group By in Django</a> is invaluable!</p>
</section>
<section id="what-about-bulk-write-operations">
<h2>What about bulk write operations?</h2>
<p>Above we’ve addressed bulk reads, but what about doing bulk writes in a
similarly efficient manner? In my experience, this depends much more on the task
in hand. In many cases, going from operating on a single row to needing
operations to be efficient in bulk is less common. Often, even if we do need
bulk operations, we can afford to do it more slowly in a background task. Your
mileage may vary etc.</p>
</section>
<section id="discussion-why-yagni-fails">
<h2>Discussion —why YAGNI fails</h2>
<p><a class="reference external" href="https://martinfowler.com/bliki/Yagni.html">YAGNI</a> is based on a few main
observations, which I think are normally true:</p>
<ul class="simple">
<li><p>The time to develop a feature later is (approximately) the same as the time to
develop it earlier.</p></li>
<li><p>Life is full of surprises, and you might not need the feature later even when
you suspect you will.</p></li>
<li><p>Even if you can correctly guess what features will be needed eventually, there
is always an opportunity cost of delivering something before you need to
(“cost of delay” as Martin Fowler describes it). There is almost certainly an
endless list of features/improvements you do need, so if you have implemented
a feature that is not used (yet), then that’s a planning mistake that has cost
you in terms of features needed more urgently.</p></li>
<li><p>In addition to the cost of delay, Martin Fowler also points out the <strong>cost of
carry</strong> – the complexity burden you carry for having added something unneeded.</p></li>
</ul>
<p>Something counts as a YAGNI exception not if you just correctly predict the
future, but if implementing it before you need to ends up with lower costs
overall.</p>
<p>I’m claiming these arguments fail in this case, but why?</p>
<p>First, there is always <strong>some</strong> cost associated with re-work, and for a
relatively small feature like this the overheads are significant. In particular,
implementing the bulk-inefficient way then re-working and implementing the
bulk-efficient way is always going to take longer than just implementing the
bulk-efficient way, especially once you’ve added the desire to not repeat logic.</p>
<p>In this case, it’s true that you may not need the efficiency later on, but often
you do, and the bulk-efficient way also works fine for non-bulk usage, without
introducing that much complexity, and without a large opportunity cost because
it doesn’t take that long, especially if you set up the patterns at the
beginning.</p>
<p>In addition, the attitude of ”I’ll just re-design when I need it” fails to take
into account some powerful forces:</p>
<ul>
<li><p>The disproportionate effect that existing code structure has on code that
follows it. Overwhelmingly, coders will try to make new code fit the existing
pattern, <a class="reference external" href="https://wiki.lesswrong.com/wiki/Chesterton%27s_Fence">which is not a bad instinct</a>, but isn’t always the
right thing.</p>
<p>Every bit of code you write is actually setting a fairly powerful precedent
that requires significant work to overcome.</p>
<p>This actually means that the “cost of carry” argument may go the other way – if
you establish a pattern of bulk-efficient operations, it makes it easier to
implement every other (unrelated) feature that might also need bulk-efficient
operations, whether from the beginning or later.</p>
</li>
<li><p>The pressures of time-management and estimates you’ve already made. It can be
very hard to revisit an estimate of 30 minutes for a simple task and say
“actually it’s going to take 2 days”.</p></li>
</ul>
<p>Up-front thinking about performance like this is not premature optimization —
these are not small nano-second or micro-second differences, but milli-seconds
that often add up to seconds, and the patterns you choose at the beginning will
have big consequences for your performance down the line.</p>
</section>Life on the diagonal – adventures in 2-D timehttps://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/2021-07-19T12:59:00+01:002021-07-19T12:59:00+01:00Luke Plant<p>A post to help illustrate and popularise the concept of 2-D time, a technique which is essential for some situations.</p><p>In his post on <a class="reference external" href="https://martinfowler.com/eaaDev/timeNarrative.html">Temporal Patterns</a>,
Martin Fowler writes:</p>
<blockquote>
<p>We've all learned, if only from bad Science Fiction books, that time is the
fourth dimension. The trouble is that this is wrong… Time isn't the fourth
dimension, it's the fourth and fifth dimensions!</p>
</blockquote>
<p>When I first read this, I bookmarked the page as interesting, but had no
practical need for the pattern described, until 5 years later, when it became
indispensable in some work for a client. If you have a system that needs 2-D
time but doesn't have it, you will become as lost and confused as someone who
attempts to navigate a city with a 1-D map.</p>
<p>My aim in this post is to make this excellent pattern more well known, and to
give further examples of the pattern in action.</p>
<nav class="contents" id="contents" role="doc-toc">
<p class="topic-title">Contents</p>
<ul class="simple">
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#the-setup" id="toc-entry-1">The setup</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#the-ledger-solution" id="toc-entry-2">The ledger solution</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#the-calendar-solution" id="toc-entry-3">The calendar solution</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#data-storage" id="toc-entry-4">Data storage</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#querying" id="toc-entry-5">Querying</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#a-2-d-calendar" id="toc-entry-6">A 2-D calendar</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#vertical-time" id="toc-entry-7">Vertical time</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#horizontal-time" id="toc-entry-8">Horizontal time</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#diagonal-time" id="toc-entry-9">Diagonal time</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#understanding-diagonal-time" id="toc-entry-10">Understanding diagonal time</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#handling-the-future" id="toc-entry-11">Handling the future</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#other-tips-and-pointers" id="toc-entry-12">Other tips and pointers</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#conclusion" id="toc-entry-13">Conclusion</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#credits" id="toc-entry-14">Credits</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#footnotes" id="toc-entry-15">Footnotes</a></p></li>
</ul>
</nav>
<section id="the-setup">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-1" role="doc-backlink">The setup</a></h2>
<p>There are many ways this pattern can manifest itself or be used, and Martin
Fowler's article has some <a class="reference external" href="https://martinfowler.com/eaaDev/timeNarrative.html#dimensions">examples</a> where it is
essential. I'm going to use a slightly different example.</p>
<p>Let's say we are a business with customers, and we track the amount of money
that they have spent on various services in an account for each customer. In the
examples that follow I'll assume it's an email hosting company, that provides
several plans and other add-on services.</p>
</section>
<section id="the-ledger-solution">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-2" role="doc-backlink">The ledger solution</a></h2>
<p>We might have a simple log of every time money gets spent. But problems arise
when we need to make corrections of some kind.</p>
<p>Every bank account I've come across, at least from an external perspective,
handles this by not really handling it. Corrections to existing entries are not
allowed. So if you challenge a bank fee, and you win, the result will be a
reversal that adds a new item, cancelling out part or whole of the fee, and
restoring the correct balance (hopefully). I'll call this the “ledger” approach.</p>
<p>The immutable ledger has some compelling advantages, especially for the
implementer, but sometimes it just doesn't cut it. In particular, it means that
the balance is actually the only thing you can correct. You cannot correct the
date or amount of any item. So, if you are trying to present to your customer an
account that shows you haven't fiddled them, it's not hugely useful. They cannot
simply check that you charged them the right amount for each item – they have to
do their own list of all item amounts, then do some sums to check that the
listed item amounts plus the adjustment amounts add up to what they expect. Or
you will have to add those calculations for them in a plain text description
field, which is hard to check.</p>
<p>Can we do better?</p>
<p>We could allow destructive updates in order to make corrections. However, we may
have already sent notification emails or done automatic charges to a credit card
based on the old amounts. If we simply change the amounts, things are not going
to add up, and we'll get ourselves into hot water pretty quickly. For that
situation, we'd like to know both <strong>what we thought the value was</strong>, and <strong>what
we think it now is</strong> – and possibly any number of corrections.</p>
<p>Instead of adding hack after hack to get closer to the solution, I'll present
the answer, which is 2-D time. Then we'll see how this one conceptual jump,
which at first is a bit mind-boggling, suddenly makes almost impossibly
complicated sums extremely straight-forward.</p>
</section>
<section id="the-calendar-solution">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-3" role="doc-backlink">The calendar solution</a></h2>
<p>Instead of an immutable ledger, we are going to have a fully mutable calendar
that we can add and remove events from and alter as we wish. However, the
calendar is also fully versioned, so that we don't have to do destructive
updates and throw away any information.</p>
<p>This system will give us a bi-temporal view of the changing amount of money in
each customer's account:</p>
<ul class="simple">
<li><p>The first dimension is “actual” time, or “event” time. This is like the time
that would appear in a ledger, if we had one – the time at which money is
spent.</p></li>
<li><p>The second dimension is “record” time – the time at which we make a change to
the calendar. We can also call it “knowledge” time.</p></li>
</ul>
<p>Now, it is possible in some cases that we will hide one or even both of these
times in normal operations. When you charge a customer for a service, you might
have an interface that asks simply for the amount, which you charge immediately
— meaning that you set both the event time and the record time to “right now”.
But it is still important to understand that these are two distinct times that
are in fact in completely different dimensions, and are completely incomparable
to each other.</p>
</section>
<section id="data-storage">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-4" role="doc-backlink">Data storage</a></h2>
<p>This section is strictly optional – the remainder of the post doesn't depend on
understanding how we store and query the data – but it can help some people to
have a more concrete understanding.</p>
<p>There are multiple ways that we can store this 2-D calendar, but I'll present a
simple, flexible one that can also be made fairly efficient:</p>
<ul>
<li><p>Every event that goes on the calendar needs an ID of some kind. It mustn't
clash with any other event, obviously, but the same ID must be used for each
different version of the event if you make changes.</p>
<p>In our example, charges in our calendar could relate to different tables – for
example the payments table and the subscriptions table – so we're going to use
a string ID with different forms for different related tables. For example:</p>
<ul class="simple">
<li><p>For incoming payments from the customer, I'm going to use <code class="docutils literal"><span class="pre">payment-$P</span></code>
where <code class="docutils literal">$P</code> is the ID from the payments table.</p></li>
<li><p>For deductions against “subscriptions”, I'm going to use
<code class="docutils literal"><span class="pre">subscription-$S-month-$N</span></code> for the event ID, where:</p>
<ul>
<li><p><code class="docutils literal">$S</code> is the ID in the subscription table, which is a table of all the
services all users have subscribed to.</p></li>
<li><p><code class="docutils literal">$N</code> is an increasing integer, 1 for the first month etc.</p></li>
</ul>
</li>
</ul>
</li>
<li><p>We need columns to record the amount, the <code class="docutils literal">charged_at</code> timestamp of when the
deduction was made for the service, and the <code class="docutils literal">recorded_at</code> timestamp of when we
put the data into the database.</p></li>
<li><p>We will use the <code class="docutils literal">recorded_at</code> timestamp to distinguish between different
versions of the same event.</p></li>
<li><p>We will store this data in an append-only table with no destructive updates
(UPDATE or DELETE). This gives us the guarantee of being able to retrieve
earlier versions of the calendar.</p>
<p>We achieve “edit” by adding a new entry with the same event ID, but different
details. The row with the most recent <code class="docutils literal">recorded_at</code> “wins” in terms of
representing current data.</p>
<p>We can do deletions by adding a new record with the same <code class="docutils literal">event_id</code> as an
existing one, with the <code class="docutils literal">charged_at</code> timestamp set to NULL, indicating the
event is no longer on our calendar. (An alternative would be to just set the
<code class="docutils literal">amount</code> to zero, but that would disallow zero amount items from being
visible, which we might want).</p>
</li>
</ul>
<p>Here is an example in which we add:</p>
<ul class="simple">
<li><p>a payment</p></li>
<li><p>the first month's deduction for “Basic email plan”</p></li>
<li><p>an amendment in which we reduce the amount to give them the discounted price
which wasn't applied in the first month for some reason.</p></li>
<li><p>the second month's deduction.</p></li>
</ul>
<table>
<thead>
<tr class="header">
<th><code>id</code></th>
<th><code>customer</code></th>
<th><code>event_id</code></th>
<th><code>recorded_at</code></th>
<th><code>charged_at</code></th>
<th><code>amount</code></th>
<th><code>description</code></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>1</td>
<td>payment-1</td>
<td>2021-01-09</td>
<td>2021-01-09</td>
<td>100</td>
<td>Credit card payment</td>
</tr>
<tr class="even">
<td>2</td>
<td>1</td>
<td>subscription-123-month-1</td>
<td>2021-01-10</td>
<td>2021-01-10</td>
<td>-10</td>
<td>Basic email plan</td>
</tr>
<tr class="odd">
<td>3</td>
<td>1</td>
<td>subscription-123-month-1</td>
<td>2021-01-25</td>
<td>2021-01-10</td>
<td>-8</td>
<td>Basic email plan (discounted)</td>
</tr>
<tr class="even">
<td>4</td>
<td>1</td>
<td>subscription-123-month-2</td>
<td>2021-02-10</td>
<td>2021-02-10</td>
<td>-8</td>
<td>Basic email plan (discounted)</td>
</tr>
</tbody>
</table>
<p>Note:</p>
<ul class="simple">
<li><p>Typically I'd use a full timestamp, not just date, for both time columns
above.</p></li>
<li><p>Event ID <code class="docutils literal"><span class="pre">subscription-123-month-1</span></code> appears twice, distinguished by
different <code class="docutils literal">recorded_at</code> timestamps. The row with id 3 is an amendment to the
row with id 2, and supersedes it.</p></li>
<li><p>The <code class="docutils literal">charged_at</code> timestamp value might be the same as <code class="docutils literal">recorded_at</code>, but
it might not be.</p></li>
<li><p>The <code class="docutils literal">id</code> column here serves no purposes, other than a simple primary key for
items in the table. We should be able to add a unique constraint on
<code class="docutils literal">(event_id, recorded_at)</code></p></li>
</ul>
<p>We might also want some additional columns:</p>
<ul class="simple">
<li><p>nullable foreign key fields relating to the subscriptions/payments table etc.,
to allow easier filtering if needed. This can be important for performance and
scalability too.</p></li>
<li><p>some auditing fields which will include the user or higher-level actions that
triggered the addition or change to the calendar.</p></li>
</ul>
<p><strong>Edit:</strong> as an alternative implementation strategy, you might consider "Temporal
Tables", a SQL 2011 extension that is available in some database systems such
<a class="reference external" href="https://mariadb.com/kb/en/temporal-tables/">Maria DB</a>, <a class="reference external" href="https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15">SQL Server</a>,
as a <a class="reference external" href="https://pgxn.org/dist/temporal_tables/">PostgreSQL extension</a> and
probably others. Thanks to <a class="reference external" href="https://twitter.com/AdamChainz/">Adam Johnson</a> for
the pointer. There are also databases like <a class="reference external" href="https://docs.xtdb.com/concepts/bitemporality/">XTDB with built-in support for bitemporality</a>.</p>
<section id="querying">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-5" role="doc-backlink">Querying</a></h3>
<p>There are different things we might want to do with this data, but here are some
common things that come up:</p>
<ul>
<li><p>We need to get a single version for each event. We can do this easily with
window functions and probably other techniques. Here is a simple query that
will get the most recent version of the calendar, which is typically what you
care most about. This code should work in SQLite (tested), PostgreSQL and
probably others, but may not be the most efficient way.</p>
<div class="code"><pre class="code sql"><a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-1" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-1" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="k">DISTINCT</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-2" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-2" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-2"></a><span class="w"> </span><span class="n">event_id</span><span class="p">,</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-3" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-3" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-3"></a><span class="w"> </span><span class="n">first_value</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-4" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-4" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-4"></a><span class="w"> </span><span class="n">first_value</span><span class="p">(</span><span class="n">charged_at</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">charged_at</span><span class="p">,</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-5" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-5" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-5"></a><span class="w"> </span><span class="n">first_value</span><span class="p">(</span><span class="n">description</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">description</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-6" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-6" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-6"></a><span class="k">FROM</span><span class="w"> </span><span class="n">calendar</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-7" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-7" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-7"></a><span class="n">WINDOW</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">event_id</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">recorded_at</span><span class="w"> </span><span class="k">DESC</span><span class="p">)</span><span class="w"></span>
<a id="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-8" name="rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-8" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#rest_code_e9c98c26fb61484f95b80ffbdb9f9d81-8"></a><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">charged_at</span><span class="p">;</span><span class="w"></span>
</pre></div>
<p>Results:</p>
<table>
<thead>
<tr class="header">
<th><code>event_id</code></th>
<th><code>amount</code></th>
<th><code>charged_at</code></th>
<th><code>description</code></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>payment-1</td>
<td>100</td>
<td>2021-01-09</td>
<td>Credit card payment</td>
</tr>
<tr class="even">
<td>subscription-123-month-1</td>
<td>-8</td>
<td>2021-01-10</td>
<td>Basic email plan (discounted)</td>
</tr>
<tr class="odd">
<td>subscription-123-month-2</td>
<td>-8</td>
<td>2021-02-10</td>
<td>Basic email plan (discounted)</td>
</tr>
</tbody>
</table>
<p>This query doesn't exclude null events, however – that would need to be done
by wrapping in an outer query, or with application level filtering.</p>
</li>
<li><p>We might want to get old versions of the calendar. This can be done by adding a <code class="docutils literal">WHERE
recorded_at < ...</code> clause to the above query</p></li>
</ul>
</section>
</section>
<section id="a-2-d-calendar">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-6" role="doc-backlink">A 2-D calendar</a></h2>
<p>Let's put the example I detailed above onto a 2-D visualisation. To re-iterate
for those that skipped that section:</p>
<ul class="simple">
<li><p>we had an incoming payment of $100 on 2021-01-09</p></li>
<li><p>we had a deduction of $10 on the next day, 2021-01-10</p></li>
<li><p>on 2021-01-25, a few weeks later, we adjusted the previous deduction to $8</p></li>
<li><p>we had a further deduction of $8 on 2021-02-10</p></li>
</ul>
<p>We'll represent the calendar itself <strong>vertically</strong>. Successive new versions of
the calendar, corresponding to changes in the knowledge dimension of time, are
represented horizontally.</p>
<img alt="ditaa diagram" class="ditaa" src="https://lukeplant.me.uk/blogmedia/ditaa/2dtime-calendar.svg">
</section>
<section id="vertical-time">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-7" role="doc-backlink">Vertical time</a></h2>
<p>When a customer comes onto their account page and wants to see a list of
charges, we'll present something like the last column of the above diagram – a
simple list of charges ordered by date, just as you would have for the ledger
system. The dates shown are the ones down the left hand side, in “event time” —
we have just a simple vertical slice at a single horizontal coordinate.</p>
</section>
<section id="horizontal-time">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-8" role="doc-backlink">Horizontal time</a></h2>
<p>Horizontal movements in time provide a solid basis for all auditing features. In
auditing, we are asking “when did the calendar change?”. We might be interested
in all changes, changes to a specific event, or changes across a specific
horizontal band. The dates that appear in our auditing report are the dates
across the top of the above diagram, only some of which correspond to the dates
of charges made.</p>
</section>
<section id="diagonal-time">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-9" role="doc-backlink">Diagonal time</a></h2>
<p>Things get more interesting here!</p>
<p>Suppose we have a monthly statement email that shows the change in balance
over last month, possibly itemised.</p>
<p>In the ledger system, this is simple. Amendments to previous months are not
allowed, so we just report on what happened over the last month, and everything
will add up relative to the previous month's statement.</p>
<p>For the calendar solution, however, it's not quite so simple.</p>
<p>Suppose the customer has challenged something we charged in a previous month,
and we agreed to reduce or remove the charge. But, for the sake of a fuller
example, let's make things a bit harder.</p>
<p>Let's say we offer a “pay up front discount” if you pay a year in advance for an
email plan. This feature is itself further complicated by the fact that
customers have other services being charged from the same account, and the
possibilities of debts etc. The upshot is that we calculate the discount on a
monthly basis using a method that depends on the current amount in credit.</p>
<p>So, if we amend a previous entry, later amounts also need to be changed. We
might have a more or less sophisticated or automatic method for handling the
dependencies and updates, but whatever we do, we now have several entries
that have been amended.</p>
<p>If we base our monthly statement email just on the most recent version of the
calendar at the time we send it, the result will be confusing. Let's see an
example:</p>
<p>The email statement for Month 1 said:</p>
<ul class="simple">
<li><p>initial balance: $0</p></li>
<li><p>payment received: $100</p></li>
<li><p>service X: -$50</p></li>
<li><p>email plan: -$10</p></li>
<li><p>final balance: $40</p></li>
</ul>
<p>Then the phone call happened where we agreed to cancel the charge for service X,
refunding them $50, plus reducing the email plan to $9 due to our pay up front
discount which they now qualify for, for a total of $51 refund.</p>
<p>For Month 2, if we just create the statement based on month 2 in the most recent
version of the calendar, it will read:</p>
<ul class="simple">
<li><p>initial balance: $91</p></li>
<li><p>email plan: -$9</p></li>
<li><p>final balance: $82</p></li>
</ul>
<p>Here, the “final balance” is now correct, but the “initial balance” is
confusing, since it isn't the same as the “final balance” of the previous email,
and there is no record here at all of the refund, which in some sense “happened”
in month 2.</p>
<p>Have we made things harder for ourselves? Slightly, but we've opened up more
possibilities too. The trick is to answer the question of “when” the refund
happened using 2-D time, not 1-D time.</p>
<p>The refund was done as a set of amendments to existing entries, which means it
was a “refinement of knowledge” that happened in “knowledge time”. For this
reason, the refund doesn't appear as an event on the calendar itself. It
affected two events on the calendar, so the refund was actually done at two
different points in “event time”.</p>
<p>Let's see the two dimensions on a diagram:</p>
<img alt="ditaa diagram" class="ditaa" src="https://lukeplant.me.uk/blogmedia/ditaa/2dtime-diagonal.svg">
<p>When we send out the email for month 2, the key point to understand is that
since we last sent the statement out, for the end of month 1, we have moved in
<strong>two dimensions of time</strong>, not one:</p>
<ul class="simple">
<li><p>The “event time” on the calendar (vertical), has moved on by one month,
introducing a new charge to be included in the balance.</p></li>
<li><p>But knowledge time (horizontal) has also moved on by one month. We know things
that we didn't know the previous month, or know things more accurately (e.g.
that we shouldn't have charged for service X).</p></li>
</ul>
<p>In other words we have made a <strong>diagonal jump</strong>. Once we have this
understanding, constructing a correct email to send out becomes actually very
straightforward. We have to compare the state of affairs at <code class="docutils literal">(end of month 2, end
of month 2)</code> with <code class="docutils literal">(end of month 1, end of month 1)</code>.</p>
<p>Once we do this:</p>
<ul class="simple">
<li><p>we can easily calculate the “initial balance” of $40 – it's just the balance
at <code class="docutils literal">(end of month 1, end of month 1)</code>.</p></li>
<li><p>we calculate the final balance at <code class="docutils literal">(end of month 2, end of month 2)</code>, giving $82
as expected.</p></li>
<li><p>we can give an itemised breakdown of the difference between the initial
balance and the final balance. It is composed of 2 sections:</p>
<ul>
<li><p>new entries (which will be for month 2).</p></li>
<li><p>any amendments for previous months, for which we can give complete details
of the changes (one charge completely refunded, one charge partly refunded
in this case).</p></li>
</ul>
</li>
</ul>
<p>Now, of course, for the simple and more common case of “no amendments", we don't
display the amendments section at all, and everything appears just as simple as
if we were using a ledger approach.</p>
<section id="understanding-diagonal-time">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-10" role="doc-backlink">Understanding diagonal time</a></h3>
<p>A more fine-grained diagonal view is possible. Suppose a customer comes onto
their account page every day, or even every hour, and makes a note of their
balance. As charges and amendments are made, they will see the balance go up and
down accordingly. They can make a plot of this over time, and the time axis here
will in fact be a diagonal slice through our two time dimensions – they are
sampling the 2-D calendar at points where “event time” = “record time”.</p>
<p>Alternatively, they could make a plot of the balance by looking once at the end
of the year, using the list of charge amounts and dates. This will show only the
amended data, and the plot they make will correspond to a vertical cut, going
down “event time”, with the “record time” coordinate set to “most recent
knowledge”. This plot will be different from the first if there have been any
amendments, but will always end at the same balance.</p>
<p>The diagonal movement in 2-D time actually corresponds to the way we live our
lives – it could be called “experiential time”.</p>
<p>Our experience of life consists of two sets of changes:</p>
<ul class="simple">
<li><p>things that happen to us as time progresses,</p></li>
<li><p>changes in knowledge about things that have happened to us.</p></li>
</ul>
<p>So, suppose I ask you what kind of week you've had. Your response might include:</p>
<ul>
<li><p>things that have happened to you this last week.</p></li>
<li><p>things that actually happened to you before, but you only found out about this
week. Perhaps:</p>
<ul class="simple">
<li><p>news about a nice payout from an investment that matured several months ago,
but you only got notified of this week.</p></li>
<li><p>new information about a childhood experience that sheds a completely
different light on it, for good or bad, and causes you to reassess some
important relationships etc.</p></li>
</ul>
<p>These things could legitimately be thought of as things that have happened to
you ”this week”, because of their effects on your experience this week. <strong>We
live life on the diagonal</strong>.</p>
</li>
</ul>
</section>
</section>
<section id="handling-the-future">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-11" role="doc-backlink">Handling the future</a></h2>
<p>We can use the same calendar to track expected or planned expenditure in the
future, and so we can use it to do forecasts, for example.</p>
<p>In the ledger approach this would be very tricky – since you can't amend
anything, you want to be careful not to put anything on that you are unsure of.
But with the calendar, amendments are no problem at all. We can start with very
incomplete and uncertain knowledge, and refine it over time, and that is no
problem. If you are summing up expenditure to produce a balance, you will
presumably want to exclude anything with a future date as “speculative”, but
having it in the same calendar database table is not an issue.</p>
</section>
<section id="other-tips-and-pointers">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-12" role="doc-backlink">Other tips and pointers</a></h2>
<ul>
<li><p>Once you have this system in place, in many cases we can actually model bug
fixes as “refinements in knowledge”, which means that rolling out a change
which corrects many incorrectly calculated charges is not a headache – it's
just updating the calendar in the normal way.</p></li>
<li><p>Having an explicit understanding of the two dimensions can be helpful for
requirements analysis and eliminating the impossible.</p>
<p>For example, when implementing refund requests, or certain kinds of refund
requests, instead of amending existing entries, we could have new ones. This
will make our calendar work more like a ledger (while still having our
calendar super-powers available for some kinds of amendments). This may be a
legitimate decision. However, a single type of refund cannot be <strong>both</strong> an
amendment to existing events <strong>and</strong> an event in its own right. You have to
choose!</p>
</li>
<li><p>You may need to think about whether other tables in your system need the full
2-D treatment, or whether it will be localised to just the calendar table.</p></li>
</ul>
</section>
<section id="conclusion">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-13" role="doc-backlink">Conclusion</a></h2>
<p>Personally, I found this temporal pattern both mind-bending (in an enjoyable
way), and indispensably useful in some situations, once I got my head around it.
In the same way I hope you'll find 2-D time both fun and profitable!</p>
</section>
<section id="credits">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-14" role="doc-backlink">Credits</a></h2>
<p>This post brought to you by:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/stathissideris/ditaa/">ditaa</a> for SVG diagrams.</p></li>
<li><p><a class="reference external" href="https://plugins.getnikola.com/">Nikola's nice plugin system</a> that enabled
me to make use of ditaa without leaving my beloved Emacs!</p></li>
</ul>
</section>
<hr class="docutils">
<section id="footnotes">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/life-on-the-diagonal-adventures-in-2d-time/#toc-entry-15" role="doc-backlink">Footnotes</a></h2>
<p>To be absolutely clear – neither Martin Fowler nor I am saying that time is
really two dimensional, but that it can be helpful to view it like that. We
might go further and ask, “Can you have more that two time dimensions?”.</p>
<p>To answer that, let's consider the base case – a database with no time
dimensions. It might be a table of facts about the colours of different things:</p>
<table>
<thead>
<tr class="header">
<th>Thing</th>
<th>Colour</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Grass</td>
<td>Green</td>
</tr>
<tr class="even">
<td>Sky</td>
<td>Blue</td>
</tr>
</tbody>
</table>
<p>This is entirely non-temporal. I could then add a <code class="docutils literal">recorded_timestamp</code> column:</p>
<table>
<thead>
<tr class="header">
<th>Thing</th>
<th>Colour</th>
<th>Timestamp</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Grass</td>
<td>Green</td>
<td>2021-07-19</td>
</tr>
<tr class="even">
<td>Sky</td>
<td>Blue</td>
<td>2021-07-20</td>
</tr>
</tbody>
</table>
<p>This reads as, "On Monday, I learnt that grass is green. On Tuesday, I learnt
that the sky is blue”. This schema also allows us to have contradictory facts if
they are interpreted as storing what I <strong>believed</strong> to be true on different
dates. This is a <strong>temporal</strong> recording of <strong>non-temporal data</strong>.</p>
<p>I could then go 2-D by having a <strong>temporal recording</strong> of <strong>temporal data</strong> - for
example, if I recorded the weather on certain days:</p>
<table>
<thead>
<tr class="header">
<th>Day</th>
<th>Weather</th>
<th>Timestamp</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2021-07-19</td>
<td>Rainy</td>
<td>2021-07-19</td>
</tr>
<tr class="even">
<td>2021-07-20</td>
<td>Sunny</td>
<td>2021-07-20</td>
</tr>
<tr class="odd">
<td>2021-07-19</td>
<td>Mostly rainy</td>
<td>2021-07-20</td>
</tr>
</tbody>
</table>
<p>Here, the information itself relates to certain times, and the timestamp at
which I learn about it is a different dimension to the timestamp in the data
itself. In the example above, on Monday I recorded the weather that day was
rainy, but on Tuesday I found out that wasn't entirely true, and recorded that
the weather on Monday was mostly rainy.</p>
<p>Can we go further? Well, suppose we have another system that takes snapshots of
the table above, recording each against a timestamp. This would be a <strong>temporal
recording</strong> of <strong>2-D temporal data</strong>, so it would be 3-D temporal data. And you can
keep going of course. However, it might be difficult to find a use for that kind
of thing...</p>
</section>YAGNI exceptionshttps://lukeplant.me.uk/blog/posts/yagni-exceptions/2021-06-29T09:42:00+01:002021-06-29T09:42:00+01:00Luke Plant<p>A list of some exceptions to the principle of “You Ain't Gonna Need It” – that is, times when doing a bit more up front usually pays off.</p><p>I'm essentially a believer in <a class="reference external" href="https://martinfowler.com/bliki/Yagni.html">You Aren't Gonna Need It</a> – the principle that you should
add features to your software – including generality and abstraction – when it
becomes clear that you need them, and not before.</p>
<p>However, there are some things which really are easier to do earlier than later,
and where natural tendencies or a ruthless application of YAGNI might neglect
them. This is my collection so far:</p>
<ul>
<li><p>Applications of <a class="reference external" href="http://wiki.c2.com/?ZeroOneInfinityRule">Zero One Many</a>. If
the requirements go from saying “we need to be able to store an address for
each user”, to “we need to be able to store two addresses for each user”, 9
times out of 10 you should go straight to “we can store many addresses for
each user”, with a soft limit of two for the user interface only, because
there is a very high chance you will need more than two. You will almost
certainly win significantly by making that assumption, and even if you lose it
won't be by much.</p></li>
<li><p>Versioning. This can apply to protocols, APIs, file formats etc. It is good to
think about how, for example, a client/server system will detect and respond
to different versions ahead of time (i.e. even when there is only one
version), especially when you don't control both ends or can't change them
together, because it is too late to think about this when you find you need a
version 2 after all. This is really an application of <a class="reference external" href="http://wiki.c2.com/?EmbraceChange">Embrace Change</a>, which is a principle at the heart of
YAGNI.</p></li>
<li><p>Logging. Especially for after-the-fact debugging, and in non-deterministic or
hard to reproduce situations, where it is often too late to add it after you
become aware of a problem.</p></li>
<li><p>Timestamps.</p>
<p>For example, creation timestamps, as <a class="reference external" href="https://twitter.com/simonw/status/1384580075329179650">Simon Willison tweeted</a>:</p>
<blockquote>
<p>A lesson I re-learn on every project: always have an automatically
populated "created_at" column on every single database table. Any time you
think "I won't need it here" you're guaranteed to want to use it for
debugging something a few weeks later.</p>
</blockquote>
<p>More generally, instead of a boolean flag, e.g. <code class="docutils literal">completed</code>, a nullable
timestamp of when the state was entered, <code class="docutils literal">completed_at</code>, can be much
more useful.</p>
</li>
<li><p>Generalising from the “logging” and “timestamps” points, collecting a bit more
data than you need right now is usually not a problem (unless it is personal
or otherwise sensitive data), because you can always throw it away. But if you
never collected it, it's gone forever. I have won significantly when I've
anticipated the need for auditing which wasn't completely explicit in the
requirements, and I've lost significantly when I've gone for data minimalism
which lost key information and limited what I could do with the data later.</p></li>
<li><p>A relational database.</p>
<p>By this I mean, if you need a database at all, you should jump to having a
relational one straight away, and default to a relational schema, even if your
earliest set of requirements could be served by a “document database” or some
basic flat-file system. Most data is relational by nature, and a
non-relational database is a very bad default for almost all applications.</p>
<p>If you choose a relational database like PostgreSQL, and it later turns out a
lot of your data is “document like”, you can use its <a class="reference external" href="https://www.postgresql.org/docs/current/datatype-json.html">excellent support for
JSON</a>.</p>
<p>However, if you choose a non-rel DB like MongoDB, even when it seems like
you've got a perfect fit in terms of current schema needs, most likely a new,
“simple” requirement will cause you a lot of pain, and <a class="reference external" href="http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/">prompt a rewrite in
Postgres</a>
(see sections “How MongoDB Stores Data” and “Epilogue” in that article).</p>
<p>I thought <a class="reference external" href="https://lobste.rs/s/63eb9g/when_rewrite#c_7gwj71">a comment on Lobsters</a> I read the other day was
insightful here:</p>
<blockquote>
<p>I wonder if the reason that “don’t plan, don’t abstract, don’t engineer
for the future” is such good advice is that most people are already
building on top of highly-abstracted and featureful platforms, which don’t
need to be abstracted further?</p>
</blockquote>
<p>We can afford to do YAGNI only when the systems we are working with are
malleable and featureful. Relational databases are extremely flexible systems
that provide insurance against future requirements changes. For example, my
advice in the previous section implicitly depends on the fact that removing
data you don't need can be as simple as "DROP COLUMN", which is <a class="reference external" href="https://stackoverflow.com/questions/15699989/dropping-column-in-postgres-on-a-large-dataset">almost free</a>
(well, sometimes…).</p>
</li>
</ul>
<p>That's my list so far, I'll probably add to it over time. Do you agree? What did
I miss?</p>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/quywfp/yagni_exceptions_2021">Discussion of this post on Lobsters</a></p></li>
<li><p><a class="reference external" href="https://twitter.com/spookylukey/status/1409967250426281984">Discussion of this post on Twitter</a>.</p></li>
<li><p><a class="reference external" href="https://simonwillison.net/2021/Jul/1/pagnis/">Simon Willison's response post on PAGNIs</a> with <a class="reference external" href="https://twitter.com/simonw/status/1410678459756552198">Twitter discussion</a> and <a class="reference external" href="https://lobste.rs/s/nokjr0/pagnis_probably_are_gonna_need_its">Lobsters discussion</a>.</p></li>
<li><p><a class="reference external" href="https://jacobian.org/2021/jul/8/appsec-pagnis/">Jacob Kaplan-Moss's response post on security PAGNIs</a>, and <a class="reference external" href="https://twitter.com/jacobian/status/1413157068375302146">Twitter discussion</a>.</p></li>
</ul>
</section>Everything is an Xhttps://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/2020-11-11T16:00:00Z2020-11-11T16:00:00ZLuke Plant<p>Analysis and examples of the popular high-level pattern of making all the things in your system conform to a common interface.</p><p>Translations of this post (I can't vouch for their accuracy):</p>
<ul class="simple">
<li><p><a class="reference external" href="https://www.chenqing.work/?p=3430">Chinese</a></p></li>
</ul>
<hr class="docutils">
<p>“Everything is an X” is a very high level pattern that you see applied in the
design of lots of systems, including programming languages and user interfaces.
It has a lot of advantages, and some disadvantages. I'll discuss some of these,
then look at some examples, which will be necessary to understand what I'm
really talking about.</p>
<p>When we say “everything” here, we’re talking very loosely – many things in the
system are not, in fact, instances of “X”, but are something lower level, or
just completely different. “Everything is an X” really means “a surprising
number of things in this system are an X”.</p>
<section id="advantages">
<h2>Advantages</h2>
<ul>
<li><p>Simplicity of implementation for the implementer. They don't have to design
and implement a new interface for every new thing in the system. Instead they
implement a single interface that they can re-use everywhere and refine as
necessary.</p></li>
<li><p>Simplicity for the user. They only have to learn one thing, and knowledge
gained in one area is immediately applicable to many others.</p></li>
<li><p>Meta-powers. Not only can the user apply knowledge or skills from one instance
in another area, very often the “X” interface can be applied at a higher
level, and this gives the user super-powers – they can go from manipulating X
to manipulating meta-X with very little extra knowledge needed.</p>
<p>When there are multiple meta-levels that don't seem to stop, another name for
this pattern is “X all the way down“.</p>
</li>
<li><p>Elegance and beauty. Beyond utility, systems designed like this have an
aesthetic value that I think is motivating for both the implementer and the
user.</p></li>
</ul>
</section>
<section id="disadvantages">
<h2>Disadvantages</h2>
<ul class="simple">
<li><p>For some of the things in the system, making them behave as “X” can be a very
bad and awkward fit. This could have the effect of:</p>
<ul>
<li><p>producing an under-powered interface for some things, which has to be
patched up in some way.</p></li>
<li><p>giving an over-powered, bloated interface for other things.</p></li>
<li><p>making “X” too broad to be useful.</p></li>
</ul>
</li>
<li><p>Giving the user a higher-level, meta interface could bring challenges, and
limit how you restructure or optimise internals because you've given the user
too much.</p></li>
<li><p>There are also issues when you have to break the “everything is an X” pattern
for some pragmatic reason – you end up with an ugly corner of the design.</p></li>
</ul>
</section>
<section id="examples">
<h2>Examples</h2>
<section id="programming">
<h3>Programming</h3>
<ul>
<li><p>Java: everything is a class (sort of).</p>
<p>This gives a certain uniformity in how APIs work that tends to reduce the
amount of work you have to do when learning a new library.</p>
<p>In terms of meta-levels, Java doesn't have so much. It has some reflection
abilities that allow classes to be represented as runtime objects, although it
is not very deep in terms of what it offers (contrast Python below).</p>
<p>The limited power for users is undoubtedly useful for compiler writers,
however – you can get excellent performance with Java due to the more limited
nature of the language (again, contrast Python below).</p>
<p>Java breaks its own pattern of basing everything on classes with “primitive
types”, which means you have also have to deal with boxing, auto-boxing etc.
Javascript has similar issues with <a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number">Number</a>.</p>
</li>
<li><p>Python: everything is an object.</p>
<p>Note that this is very different from Java's version of “everything is a
class/object” – far more things are actually runtime objects in Python,
including functions, methods, types/classes and modules, as well as instances
of classes. This makes <a class="reference external" href="https://www.toptal.com/python/python-parameterized-design-patterns">parameterisation</a> an
extremely powerful and general pattern in Python.</p>
<p>You also get super powers that come from how meta-classes work:</p>
<ul class="simple">
<li><p>instances are objects</p></li>
<li><p>classes (the things that produce instances) are objects</p></li>
<li><p>meta-classes (the things that produce classes) are objects</p></li>
</ul>
<p>So meta-class programming is just normal programming. It can be done at a
REPL, or with any other technique that you have already learned. Using
meta-classes of course requires learning new things, but things still work in
very familiar ways. Python feels like it's made out of Python all the way
down.</p>
<p>We also see the disadvantages. For example, Python integers are full Python
objects, which is extremely wasteful of memory compared to compact
representations (like C would use, or numpy uses for example). Plus, they are
immutable (for good reasons), so doing simple addition can cause memory
allocations. This is pretty horrifying from an efficiency point of view!</p>
<p>[Correction - in depends on implementation; CPython caches integers between -5
and 255 and has some <a class="reference external" href="https://www.laurentluce.com/posts/python-integer-objects-implementation/">other optimisations</a>,
but still has a lot of overhead in terms of the size of the object]</p>
</li>
<li><p>ML family languages, like Haskell/OCaml and relatives: everything is an
expression.</p>
<p>So, for example, there is no <code class="docutils literal">if</code> statement, there is instead an <code class="docutils literal">if</code>
expression. Assignments, too, are actually expressions – for example a <code class="docutils literal">let</code>
statement is instead an expression that defines some local values and returns
the value of the ‘body’.</p>
<p>Exceptions: Top level assignments, and type signatures. At least in Haskell,
the type system feels quite separate. Due to this, AFAICS the meta level is
kind of missing in Haskell. Haskell “super-powers” come from a bunch of
different features, such as <code class="docutils literal">deriving</code> and TemplateHaskell, and programming
at type level, especially with various language extensions. These are all cool
features, but they have to be learned separately, and I think this is what
gives Haskell the feeling of being a big and intimidating language.</p>
</li>
<li><p>Lisp: everything is an s-expression.</p>
<p>In contrast to Haskell, this goes much further and includes top level
statements. This uniformity of syntax is hugely helpful in writing macros,
probably the biggest meta-power that Lisp boasts.</p>
</li>
</ul>
</section>
<section id="user-interfaces">
<h3>User interfaces</h3>
<ul>
<li><p>Emacs: everything is a buffer.</p>
<p>This is one of Emacs' big ideas – it has very few different UI elements,
because almost everything appears in a buffer. As well as the text of a file
to edit, you also have:</p>
<ul class="simple">
<li><p>help manuals</p></li>
<li><p>auto-generated help for the buffer you are using (such as listing all
keyboard shortcuts).</p></li>
<li><p>search results (this is an essential part of my workflow for recursive
searching that remembers context)</p></li>
<li><p>many more things.</p></li>
</ul>
<p>Once you have learned how to navigate and use buffers, you know how to
navigate these things, <a class="reference external" href="https://www.masteringemacs.org/article/why-emacs-has-buffers">and much more</a>.</p>
<p>Of course, there are other UI elements, such as the toolbar and menubar, to
make Emacs slightly less terrifying to new users. But these tend to be turned
off by experienced users – they're not very useful because…they're not
buffers!</p>
<p>You also gain meta-powers: the list of all buffers is of course a buffer, and
you can manipulate the buffer-buffer using normal mechanisms (e.g. delete line
to kill a buffer).</p>
</li>
<li><p>RDMSs: everything is a table/relation, and can be queried as such.</p>
<p>With some implementations such as PostgreSQL, as well as the normal relations
you have created, internal structures are also presented as relations,
including the lists of tables/relations/indexes, and runtime information.</p>
<p>So you can get a table of all tables:</p>
<div class="code"><pre class="code sql"><a id="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-1" name="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-1" href="https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/#rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="k">table_name</span><span class="w"></span>
<a id="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-2" name="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-2" href="https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/#rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-2"></a><span class="k">FROM</span><span class="w"> </span><span class="n">information_schema</span><span class="p">.</span><span class="n">tables</span><span class="w"></span>
<a id="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-3" name="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-3" href="https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/#rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-3"></a><span class="k">WHERE</span><span class="w"> </span><span class="n">table_schema</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'public'</span><span class="w"></span>
<a id="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-4" name="rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-4" href="https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/#rest_code_a3ad6230fc49408fbdb4a22b5151b9eb-4"></a><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">table_name</span><span class="p">;</span><span class="w"></span>
</pre></div>
<p>And query the currently running queries:</p>
<div class="code"><pre class="code sql"><a id="rest_code_3f03ff2ad2c84f65a50cd79d5aced85d-1" name="rest_code_3f03ff2ad2c84f65a50cd79d5aced85d-1" href="https://lukeplant.me.uk/blog/posts/everything-is-an-x-pattern/#rest_code_3f03ff2ad2c84f65a50cd79d5aced85d-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_activity</span><span class="p">;</span><span class="w"></span>
</pre></div>
<p>And <a class="reference external" href="https://pgstats.dev/">many other things</a>. These things benefit from all
the usual powers of <code class="docutils literal">SELECT</code>, such as joins, filtering, ordering etc.</p>
<p>If you extended the meta-powers, you could create/delete tables by inserting
into/deleting from the “tables table”, and similarly add/remove/alter columns
by manipulating the “columns table”. Your schema then becomes data, and your
power has become a super-power. I don't know if any DB has implemented this,
or if the advantages would be really compelling over normal DDL.</p>
</li>
<li><p><a class="reference external" href="https://www.visidata.org/">Visidata</a>: everything is a sheet.</p>
<p>At points this seems to go further than PostgreSQL in terms of giving you
meta-powers and sticking to the pattern. In addition to the normal data
sheets:</p>
<ul class="simple">
<li><p>rows are sheets (or can be viewed as such)</p></li>
<li><p>column meta-data is a sheet (and not just read-only - you can <a class="reference external" href="https://jsvine.github.io/intro-to-visidata/basics/understanding-columns/#manipulating-columns-from-the-columns-sheet">manipulate it</a>)</p></li>
<li><p>the <a class="reference external" href="https://jsvine.github.io/intro-to-visidata/basics/understanding-sheets/#how-to-use-the-sheets-sheet">list of sheets</a> is a sheet</p></li>
<li><p><a class="reference external" href="https://jsvine.github.io/intro-to-visidata/advanced/configuring-visidata/#the-global-options-sheet">configuration</a>
is a sheet</p></li>
</ul>
<p>…and more.</p>
<p>I'm loving the power of Visidata, which is basically “Excel as a text user
interface”, and I've realised that a lot of its power comes from its <a class="reference external" href="https://lukeplant.me.uk/blog/posts/less-powerful-languages/">lack of
power</a>.</p>
<p>In Excel, you can have multiple tables in a sheet, and put them wherever you
like. You can have headers, or not, you can have a different format or data
type for every cell etc. In Visidata, you have exactly one table per sheet,
you must have headers, each column has exactly one data type etc. The enforced
uniformity of "Everything is a sheet (and a sheet has a tightly defined
structure)” is actually crucial to making Visidata so much nicer than Excel
for many tasks – without it, most of the interface shortcuts and meta-powers
would fail.</p>
</li>
<li><p><a class="reference external" href="https://en.wikipedia.org/wiki/Representational_state_transfer">REST</a>:
everything is a resource.</p>
<p>(More technically, I’m talking about <a class="reference external" href="https://en.wikipedia.org/wiki/Richardson_Maturity_Model#Level_2_:_HTTP_verbs">Pseudo-REST level 1 and 2</a>,
as opposed to <a class="reference external" href="https://htmx.org/essays/how-did-rest-come-to-mean-the-opposite-of-rest/">actual REST which is almost, but not quite, entirely different</a>).</p>
<p>The idea of everything in your system being a “resource” which you can
manipulate with a few well-understood verbs (GET, PUT, DELETE etc.) is a
really attractive part of REST.</p>
<p>It comes with some meta-powers - with things like <a class="reference external" href="https://www.openapis.org/">OpenAPI</a>, the list of resources is typically also served
as a resource.</p>
<p>Relatively frequently with JSON APIs designed like this, I feel I hit the
“awkward fit” disadvantage. Modelling some actions as “manipulating a
resource” is very bizarre, and what you really want is unembarrassed RPC.
Examples of this are “clone/copy/move” actions where there are hidden
properties that you also want to be handled.</p>
</li>
<li><p>Unix: <a class="reference external" href="https://en.wikipedia.org/wiki/Everything_is_a_file">everything is a file</a>.</p>
<p>This is a pretty successful abstraction, but it does leak quite a bit – for
example, stdout and stdin are “files”, except when they are terminals, which
you very often want to special case because they are not really very much like
files when they have humans sitting in front of them. Also, files turn out to
be <a class="reference external" href="https://danluu.com/deconstruct-files/">much harder</a> than you think, even
before you start treating things that aren't files as files. And in some cases
you might want to deal with files as if they are not files but memory anyway
(mmap).</p>
<p>A lot of people would argue that it’s been taken too far when applied to other
things, like USB interfaces etc., producing some <a class="reference external" href="https://www.youtube.com/watch?v=9-IWMbJXoLM">pretty nasty interfaces</a>.</p>
</li>
</ul>
</section>
</section>
<section id="questions">
<h2>Questions</h2>
<p>I'm sure there are many, many more examples of this pattern, and I'm only
scratching the surface. <a class="reference external" href="https://wiki.c2.com/?EverythingIsa">EverythingIsa on c2.com</a> has a longer list, for example, but
without much analysis. What are your favourite examples? Are there examples
where it works very well – or very badly? Are there other
advantages/disadvantages that I've missed?</p>
</section>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://www.reddit.com/r/programming/comments/jsnffk/everything_is_an_x_lukeplantmeuk/">Discussion of this post on Reddit</a></p></li>
<li><p><a class="reference external" href="https://twitter.com/search?q=https%3A%2F%2Flukeplant.me.uk%2Fblog%2Fposts%2Feverything-is-an-x-pattern%2F&src=typed_query">Discussion of this post on Twitter</a></p></li>
<li><p><a class="reference external" href="https://lobste.rs/s/g7c661">Discussion of this post on Lobsters</a></p></li>
</ul>
</section>
<section id="notes">
<h2>Notes</h2>
<ul class="simple">
<li><p>2023-07: For “everything is a file”, added paragraph and link to “What UNIX Cost Us” video.</p></li>
</ul>
</section>Test smarter, not harderhttps://lukeplant.me.uk/blog/posts/test-smarter-not-harder/2020-09-04T19:46:50+01:002020-09-04T19:46:50+01:00Luke Plant<p>Tips for winning the automated testing battle.</p><p>“Smarter, not harder” is a saying used in many contexts, but rowing is the
context I think I first heard it in, and I still associate it with rowing many
years later.</p>
<p>When you look at novice and more experienced rowing crews, it seems particularly
appropriate, because the primary difference is not the amount of effort that
goes in, nor even the strength of the rowers, but technique. Poor rowers still
finish a race absolutely exhausted, but they've moved at a fraction of the speed
of better crews. Sometimes the effort they put in actually slows the boat down.
They tend to make a lot of noise, splash a huge amount of water in every
direction, and pull a lot of faces. (I did a lot of all those things when I tried
rowing!).</p>
<p><a class="reference external" href="https://youtu.be/6V6va2RIdeE?t=4327">Expert crews, however, do none of these things</a>, because they don't make you go faster.
These rowers do a huge amount of training, and exercise massive amounts of
concentration, to ensure that every bit of the (very large) effort they put in
is actually contributing to speed.</p>
<p>The “smarter not harder” mindset is also essential for writing good automated
software tests.</p>
<p>It's in this context that religious devotion to things like <a class="reference external" href="https://en.wikipedia.org/wiki/Test-driven_development">TDD</a> can be really
unhelpful. For many religions, the more painful an activity, and the more you do
it, the more meritorious it is – and it may even atone for past misdeeds. If you
take that mindset with you into writing tests, you will do a rather bad job.</p>
<p>If writing tests is extremely painful, it may be a sign that something is wrong.
Huge and unnecessary quantities of tests are not meritorious, they are a massive
maintenance burden. Many of the things that make tests hard to write are also
going to make them hard (and therefore expensive) to maintain. I've seen far too
many examples where it looks like people have just sat back and accepted their
painful fate.</p>
<p>For example, good ol' Uncle Bob seems to have this attitude. He <a class="reference external" href="https://blog.cleancoder.com/uncle-bob/2017/01/11/TheDarkPath.html">wrote</a>:</p>
<blockquote>
<p>you’d better get used to writing lots and lots of tests, no matter what
language you are using!</p>
</blockquote>
<p><a class="reference external" href="https://www.hillelwayne.com/post/uncle-bob/">Don't listen to Uncle Bob!</a> (at
least, not on this subject).</p>
<p>“Test smarter, not harder” means:</p>
<ul>
<li><p>Only write necessary tests – specifically, tests whose estimated value is
greater than their estimated cost. This is a hard judgement call, of course,
but it does mean that at least some of the time you should be saying “it's not
worth it”. Some of the costs associated with tests are:</p>
<ul class="simple">
<li><p>the time taken to write them.</p></li>
<li><p>the time they add to the test suite on every run.</p></li>
<li><p>the time to maintain them - understand them, debug them, change them when
other things change.</p></li>
<li><p>every time they fail incorrectly - when the functionality works, but the
test fails.</p></li>
</ul>
<p>The value on the other hand, is found in:</p>
<ul class="simple">
<li><p>catching regressions, and doing so at low cost with a quick feedback loop.</p></li>
<li><p>enabling fearless refactoring (which is a consequence of the above, but
distinct from it).</p></li>
<li><p>providing a starting point for making changes, including a form of
documentation for the existing desirable behaviour.</p></li>
</ul>
</li>
<li><p>Write your test code with the functions/methods/classes you wish existed, not
the ones you've been given. For example, don't write this:</p>
<div class="code"><pre class="code python"><a id="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-1" name="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-1" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-1"></a><span class="bp">self</span><span class="o">.</span><span class="n">driver</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">live_server_url</span> <span class="o">+</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"contact_form"</span><span class="p">))</span>
<a id="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-2" name="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-2" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-2"></a><span class="bp">self</span><span class="o">.</span><span class="n">driver</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s2">"#id_email"</span><span class="p">)</span><span class="o">.</span><span class="n">send_keys</span><span class="p">(</span><span class="s2">"my@email.com"</span><span class="p">)</span>
<a id="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-3" name="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-3" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-3"></a><span class="bp">self</span><span class="o">.</span><span class="n">driver</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s2">"#id_message"</span><span class="p">)</span><span class="o">.</span><span class="n">send_keys</span><span class="p">(</span><span class="s2">"Hello"</span><span class="p">)</span>
<a id="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-4" name="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-4" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-4"></a><span class="bp">self</span><span class="o">.</span><span class="n">driver</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s2">"input[type=submit]"</span><span class="p">)</span><span class="o">.</span><span class="n">click</span><span class="p">()</span>
<a id="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-5" name="rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-5" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_6b39ba138bfe4e8aa6cc1891a24d3566-5"></a><span class="n">WebDriverWait</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">driver</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">until</span><span class="p">(</span><span class="k">lambda</span> <span class="n">driver</span><span class="p">:</span> <span class="n">driver</span><span class="o">.</span><span class="n">find_element_by_css_selector</span><span class="p">(</span><span class="s2">"body"</span><span class="p">))</span>
</pre></div>
<p>That looks very tedious! Write this instead:</p>
<div class="code"><pre class="code python"><a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-1" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-1" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-1"></a><span class="bp">self</span><span class="o">.</span><span class="n">get_url</span><span class="p">(</span><span class="s2">"contact_form"</span><span class="p">)</span>
<a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-2" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-2" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-2"></a><span class="bp">self</span><span class="o">.</span><span class="n">fill</span><span class="p">({</span>
<a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-3" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-3" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-3"></a> <span class="s2">"#id_email"</span><span class="p">:</span> <span class="s2">"my@email.com"</span><span class="p">,</span>
<a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-4" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-4" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-4"></a> <span class="s2">"#id_message"</span><span class="p">:</span> <span class="s2">"Hello"</span><span class="p">,</span>
<a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-5" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-5" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-5"></a><span class="p">})</span>
<a id="rest_code_5f3b6ce2322d4588a3a0480918eb6072-6" name="rest_code_5f3b6ce2322d4588a3a0480918eb6072-6" href="https://lukeplant.me.uk/blog/posts/test-smarter-not-harder/#rest_code_5f3b6ce2322d4588a3a0480918eb6072-6"></a><span class="bp">self</span><span class="o">.</span><span class="n">submit</span><span class="p">(</span><span class="s2">"input[type=submit]"</span><span class="p">)</span>
</pre></div>
<p>(Like you can with <a class="reference external" href="https://django-functest.readthedocs.io/en/latest/">django-functest</a>, but it's the principle,
not the library, that's important. If the API you want to use doesn't exist
yet, you still use it, and then make it exist.)</p>
</li>
<li><p>Don't write tests for things that can be more effectively tested in other
ways, and lean on other correctness methodologies as much as possible. These
include:</p>
<ul class="simple">
<li><p>code review</p></li>
<li><p>static type checking (especially in languages with sound and powerful type
systems, with type inference everywhere, giving you a very good cost-benefit
ratio)</p></li>
<li><p>linters like <a class="reference external" href="https://github.com/pycqa/flake8">flake8</a> and <a class="reference external" href="https://semgrep.dev/">Semgrep</a>.</p></li>
<li><p><a class="reference external" href="https://www.hillelwayne.com/post/business-case-formal-methods/">formal methods</a></p></li>
<li><p>introspection (like <a class="reference external" href="https://docs.djangoproject.com/en/stable/topics/checks/#module-django.core.checks">Django's checks framework</a>)</p></li>
<li><p>property based testing like <a class="reference external" href="https://hypothesis.readthedocs.io/en/latest/">hypothesis</a>.</p></li>
</ul>
</li>
<li><p>Move the burden onto the computer. “Push the loop in”.</p>
<p>Take, for example, a requirement that every entry point to your web app (i.e.
a page or HTTP API), apart from a few exceptions like login and reset
password, should require authentication.</p>
<p>The “test harder” religion interprets this as:</p>
<ul class="simple">
<li><p><em>For every entry point</em></p>
<ul>
<li><p>Write a test that</p>
<ul>
<li><p>Ensures non-authenticated requests return 403</p></li>
</ul>
</li>
</ul>
</li>
</ul>
<p>That's a lot of tests, and even worse is that you have to remember to write
them.</p>
<p>“Test smarter” says:</p>
<ul class="simple">
<li><p>Write a test that</p>
<ul>
<li><p><em>For every entry point</em></p>
<ul>
<li><p>Ensures non-authenticated requests return 403</p></li>
</ul>
</li>
</ul>
</li>
</ul>
<p>That's one test. “Write a test” is executed in developer time, so in the first
example the loop ("For every entry point") is also executed in developer time.
Push the loop inside the test, and it gets executed in computer time instead.</p>
<p>Already mentioned, but <a class="reference external" href="https://hypothesis.readthedocs.io/en/latest/">hypothesis</a> is a great way to push the
loop in. Also, the implementation of the requirements can benefit from the
same techniques that the tests do.</p>
</li>
<li><p>Cheat on your homework. It's smart to get help, and hard work is for suckers.
If you have a good idea, but don't know the techniques or tools you need to
implement it, or whether it is even possible (for example, in the example
above you don't know how to introspect your system to get a list of all entry
points), there are a lot of smart people on <a class="reference external" href="https://stackoverflow.com/">StackOverflow</a> who will revel in the challenge.</p>
<p>(Level up: loudly claim on Twitter that "it appears to be impossible to X with
tool Y" and know-it-alls like me will magically appear with solutions).</p>
</li>
</ul>
<p>Of course, there are still times when hard work is required for writing tests —
times when it will be tedious, and times when our instincts to skimp are
actually misplaced laziness that will cost more in the long run. But you should
hustle and cheat your way out of unnecessary effort as much as you possibly can.
Your overall testing strategy should feel like “I get that computer to do so
much work for me!”, not ”My RSI and bleeding fingers have hopefully appeased the
testing gods and atoned for my previous omissions”.</p>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://www.reddit.com/r/programming/comments/imzawj/test_smarter_not_harder/">Discussion on this post on Reddit</a></p></li>
<li><p><a class="reference external" href="https://lobste.rs/s/hit4t9/test_smarter_not_harder">Discussion of this post on Lobsters</a></p></li>
</ul>
</section>Announcement: Django Views - The Right Wayhttps://lukeplant.me.uk/blog/posts/announcement-django-views-the-right-way/2020-08-19T21:51:36+01:002020-08-19T21:51:36+01:00Luke Plant<p>Announcement of my guide to writing Django Views.</p><p>I announced this a few days back on Twitter, this is just a quick additional
blog post to announce <a class="reference external" href="https://spookylukey.github.io/django-views-the-right-way/">Django Views - The Right Way</a>. It's an
opinionated guide to writing views in Django that I've been working on for a few
months.</p>
<p>This project turned out to be much bigger than I expected. And in the end, more
about general programming and Python principles than just Django – so you may
enjoy it even if you're not into Django.</p>You can’t compare language features, only languageshttps://lukeplant.me.uk/blog/posts/you-cant-compare-language-features-only-languages/2014-11-11T09:51:43Z2014-11-11T09:51:43ZLuke Plant<p>Why I think we need the context of a language to have meaningful debate about language features.</p><p>A lot of programming language debate is of the form “feature X is really good,
every language needs it”, or “feature X is much better than its opposite feature
Y”. The classic example is static vs dynamic typing, but there are many others,
such as different types of meta-programming etc.</p>
<p>I often find myself pulled in both directions by these debates, as I’m rather
partial to both Haskell and Python. But I’d like to suggest that doing this kind
of comparison in the abstract, without talking about specific languages, is
misguided, for the following reasons:</p>
<section id="language-features-can-take-extremely-different-forms-in-different-languages">
<h2>Language features can take extremely different forms in different languages</h2>
<p>In my experience, static typing in Haskell is almost entirely unlike static
typing in C, and different again from C# 1.0, and, from what I can tell, very
different from static typing in C# 5.0. Does it really make sense to lump all
these together?</p>
<p>Similarly, dynamic typing in Shell script, PHP, Python and Lisp are perhaps more
different than they are alike. You can’t even put them on a spectrum – for
example, Python is not simply a ‘tighter’ type system than PHP (in not treating
strings as numbers etc.), because it also has features that allow far greater
flexibility and power (such as dynamic subclassing due to first class classes).</p>
</section>
<section id="combination-of-features-is-what-matters">
<h2>Combination of features is what matters</h2>
<p>One of my favourite features of Python, for example, is keyword arguments. They
often increase the clarity of calling code, and give functions the ability to
grow new features in a backwards compatible way. However, this feature only
makes sense in combination with other features. If you had keyword arguments but
without the <code class="docutils literal">**kwargs</code> syntax for passing and receiving an unknown set of
keyword arguments, it would make decorators extremely difficult.</p>
<p>If you are thinking of how great Python is, I don’t think it helps to talk about
keyword arguments in general as a killer feature. It is keyword arguments <strong>in
Python</strong> that work particularly well.</p>
</section>
<section id="comparing-language-features-opens-up-lots-of-opportunities-for-bad-arguments">
<h2>Comparing language features opens up lots of opportunities for bad arguments</h2>
<p>For example:</p>
<section id="attacking-the-worst-implementation">
<h3>Attacking the worst implementation</h3>
<p>So, a dynamic typing advocate might say that static typing means lots of
repetitive and verbose boilerplate to indicate types. That criticism might apply
to Java, but it doesn’t apply to Haskell and many other modern languages, where
type inference handles 95% of the times where you might need to specify types.</p>
</section>
<section id="defending-the-best-implementation">
<h3>Defending the best implementation</h3>
<p>The corollary to the above fallacy is that if you are only debating language
features in the abstract, you can pick whichever implementation you want in
order to refute a claim. Someone claims that dynamic typing makes IDE support
for refactoring very difficult, and a dynamic typing advocate retorts that this
isn’t the case with Smalltalk – ignoring the fact that they don’t use Smalltalk,
they have <strong>never</strong> used Smalltalk, and their dynamically-typed language of
choice does indeed present much greater or even insurmountable problems to
automated refactoring.</p>
</section>
<section id="defending-a-hypothetical-implementation">
<h3>Defending a hypothetical implementation</h3>
<p>Defending the best implementation goes further when you actually defend one that
doesn’t exist yet.</p>
<p>The mythical “smart enough compiler” is an example of this, and another would be
dynamic typing advocates might talk about “improving” dynamic analysis.</p>
<p>Hypothetical implementations are always great for winning arguments, especially
as they can combine all the best features of all the languages, without worrying
about whether those features will actually fit together, and produce something
that people would actually want to use. Sometimes a hybrid turns out like <a class="reference external" href="https://www.youtube.com/watch?v=191cPadaxCk">Hercules</a>, and sometimes like the
<a class="reference external" href="https://en.wikipedia.org/wiki/Africanized_bee">Africanized bee</a>.</p>
</section>
<section id="ignoring-everything-else">
<h3>Ignoring everything else</h3>
<p>In choosing a programming language, it’s not only the features of the language
that you have to consider – there is long list of other factors, such as the
maturity of the language, the community, the libraries, the documentation, the
tooling, the availability (and quality) of programmers etc.</p>
<p>Sometimes the quality of these things are dominated by accidents of history
(which language became popular and when), and sometimes they can be traced back
to features of the language design.</p>
<p>Many language-war debates ignore all these things. But it’s even easier if you
are not actually comparing real languages – just language features, abstracted
from everything else.</p>
<p>I understand that comparing everything at once is difficult, and we will always
attempt to break things down into smaller pieces for analysis. But I doubt that
this goes very far with programming languages, because of the way the different
features interact with each other, and also exert huge influence on the way that
everything else develops e.g. libraries.</p>
</section>
</section>
<section id="conclusion">
<h2>Conclusion</h2>
<p>Language features exist within the context of a language and everything
surrounding that language. It seems to me that attempts to analyse them outside
that context simply lead to false generalisations.</p>
<p>Of course, being really concrete and talking about specific languages often ends
up even more personal, which has its own pitfalls! Is there a good way forward?</p>
</section>Things that can be self-hostinghttps://lukeplant.me.uk/blog/posts/things-that-can-be-self-hosting/2014-05-03T16:38:57+01:002014-05-03T16:38:57+01:00Luke Plant<p>An attempt to put something down about the idea of self-hosting software</p><p>With slightly different concepts of “self-hosting”:</p>
<ul class="simple">
<li><p>Compilers (if the compiler can compile itself e.g. <a class="reference external" href="http://gcc.gnu.org/install/prerequisites.html">gcc</a>, <a class="reference external" href="https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/Tools">GHC</a>, many
others)</p></li>
<li><p>Operating systems (either in terms of development of the OS, or virtual machines run on the OS)</p></li>
<li><p>Build tools (e.g. <a class="reference external" href="http://git.savannah.gnu.org/cgit/make.git/tree/">make</a>, autoconf etc.)</p></li>
<li><p>Version Control Systems (<a class="reference external" href="https://github.com/git/git/">git</a>, <a class="reference external" href="http://selenic.com/hg">mercurial</a> etc.)</p></li>
<li><p>Testing libraries and tools</p></li>
<li><p>Debuggers</p></li>
<li><p>Bug tracking tools (e.g. <a class="reference external" href="http://trac.edgewall.org/query?status=assigned&status=new&status=reopened&type=defect&component=ticket+system&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=milestone&order=priority">Trac</a>)</p></li>
<li><p>Editors and IDEs (if you use the editor/IDE to develop itself)</p></li>
<li><p>File distribution servers (e.g. web server, bittorrent client/server, which can be used to distribute themselves.)</p></li>
<li><p>Grammar definition languages (e.g. BNF, which can be described in BNF)</p></li>
<li><p>Package management tools (apt, Cabal etc)</p></li>
<li><p>Documentation tools (e.g. Sphinx)</p></li>
<li><p>Manufacture of robot parts</p></li>
<li><p>An item in a list that depends on self-reference to be properly understood, like this one :-)</p></li>
</ul>
<p>I'm struggling to define this concept exactly. A common thread with all of them
is that there will be some kind of boot-strapping problem before maturity is
reached. Also, most of these things can be classified more simply as “software
development tools” – due to the fact that most software development tools are
software, there is the possibility of this kind of “recursion”.</p>
<p>Are there other things in this category that I haven't thought of? Are there
lots of areas of life outside the software world where this is common?</p>