Luke Plant's home page (Posts about Python)https://lukeplant.me.uk/blog/categories/python.xml2024-01-22T08:57:30ZLuke PlantNikolaPython packaging must be getting better - a datapointhttps://lukeplant.me.uk/blog/posts/python-packaging-must-be-getting-better-a-datapoint/2024-01-21T20:47:10Z2024-01-21T20:47:10ZLuke Plant<p>I “pip install”ed my app on Windows and everything just worked. Something is going right.</p><p>I’m developing some Python software for a client, which in its current early state is desktop software that will need to run on Windows.</p>
<p>So far, however, I have done all development on my normal comfortable Linux machine. I haven’t really used Windows in earnest for more than 15 years – to the point where my wife happily installs Linux on her own machine, knowing that I’ll be hopeless at helping her fix issues if the OS is Windows – and certainly not for development work in that time. So I was expecting a fair amount of pain.</p>
<p>There was certainly a lot of friction getting a development environment set up. <a class="reference external" href="https://realpython.com/python-coding-setup-windows/">RealPython.com have a great guide</a> which got me a long way, but even that had some holes and a lot of inconvenience, mostly due to the fact that, on the machine I needed to use, my main login and my admin login are separate. (I’m very lucky to be granted an admin login at all, so I’m not complaining). And there are lots of ways that Windows just seems to be broken, but that’s another blog post.</p>
<p>When it came to getting my app running, however, I was very pleasantly surprised.</p>
<p>At this stage in development, I just have a rough <code class="docutils literal">requirements.txt</code> that I add Python deps to manually. This might be a good thing, as I avoid the pain/confusion of <a class="reference external" href="https://chriswarrick.com/blog/2024/01/15/python-packaging-one-year-later/">some of the additional layers people have added</a>, which I’m mostly avoiding until things settle down a bit.</p>
<p>So after installing Python and creating a virtual environment on Windows, I ran <code class="docutils literal">pip install <span class="pre">-r</span> requirements.txt</code>, expecting a world of pain, especially as I already had complex non-Python dependencies, including <a class="reference external" href="https://doc.qt.io/qt-5/index.html">Qt5</a> and <a class="reference external" href="https://vtk.org/">VTK</a>. I had specified both of these as simple Python deps via the wrappers <a class="reference external" href="https://pypi.org/project/PyQt5/">pyqt5</a> and <a class="reference external" href="https://pypi.org/project/vtk/">vtk</a> in my <code class="docutils literal">requirements.txt</code>, and nothing else, with the attitude of “well I may as well dream this is going to work”.</p>
<p>And in fact, it did! Everything just downloaded as binary wheels – rather large ones, but that’s fine. I didn’t need compilers or QMake or header files or anything.</p>
<p>And when I ran my app, apart from a dependency that I’d forgotten to add to <code class="docutils literal">requirements.txt</code>, <strong>everything worked perfectly first time</strong>. This was even more surprising as I had put zero conscious effort into Windows compatibility. In retrospect I realise that use of <a class="reference external" href="https://docs.python.org/3/library/pathlib.html">pathlib</a>, which is automatic for me these days, had helped me because it smooths over some Windows/Unix differences with path handling.</p>
<p>Of course, this is a single datapoint. From other people’s reports there are many, many ways that this experience may not be typical. But that it is possible at all suggests that a lot of progress has been made and we are very much going in the right direction. A lot of people have put a lot of work in to achieve that, for which I’m very grateful!</p>Python Type Hints: pyastgrep case studyhttps://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/2023-10-05T09:45:02Z2023-10-05T09:45:02ZLuke Plant<p>A second, and more successful attempt to use static type checking in a real Python project</p><p>In a previous post, I did <a class="reference external" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study">a case study on my attempts to add type hints to parsy</a>. In this post, I’m continuing the series, but in a very different project.</p>
<p>A while back I forked an existing tool called <a class="reference external" href="https://github.com/hchasestevens/astpath">astpath</a> to create my own tool <a class="reference external" href="https://github.com/spookylukey/pyastgrep/">pyastgrep</a>, fixing various bugs and usability issues. In the process I rewrote significant parts of the existing code, and added quite a lot of my own. This was a pretty good opportunity for me to attempt to use static typing throughout, since I was not limited by backwards compatibility in the design. Plus it’s a relatively very small amount of code, making it much easier than many of the larger projects I maintain, while still being big enough to be much more than a toy.</p>
<p>There are at least <a class="reference external" href="https://lukeplant.me.uk/blog/posts/the-different-uses-of-python-type-hints">5 different ways that type hints can be used in Python</a>, but this post focuses on static type checking and interactive programming help. In particular, I wanted to get mypy to catch errors for me, and I was incorporating it in my CI/testing workflows.</p>
<nav class="contents" id="contents" role="doc-toc">
<p class="topic-title">Contents</p>
<ul class="simple">
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#about-pyastgrep" id="toc-entry-1">About pyastgrep</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#things-i-liked" id="toc-entry-2">Things I liked</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#issues" id="toc-entry-3">Issues</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#mypy-just-isnt-checking-that-code" id="toc-entry-4">mypy just isn’t checking that code.</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#iobase-vs-binaryio" id="toc-entry-5">IOBase vs BinaryIO</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#missing-types-for-imports" id="toc-entry-6">Missing types for imports</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#equality-checks" id="toc-entry-7">Equality checks</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#mypy-caching-bugs" id="toc-entry-8">mypy caching bugs</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#third-party-types" id="toc-entry-9">Third party types</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#duck-typing" id="toc-entry-10">Duck typing</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#false-security" id="toc-entry-11">False security</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#exhaustiveness-checking" id="toc-entry-12">Exhaustiveness checking</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#decorators" id="toc-entry-13">Decorators</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#pyright" id="toc-entry-14">pyright</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#summary" id="toc-entry-15">Summary</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#links" id="toc-entry-16">Links</a></p></li>
</ul>
</nav>
<section id="about-pyastgrep">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-1" role="doc-backlink">About pyastgrep</a></h2>
<p>This tool is a utility to allow you to grep Python code for specific syntax elements using XPath as a powerful query language. The main functions are:</p>
<ul class="simple">
<li><p>intelligently work out which files to search (respecting .gitignore files etc.)</p></li>
<li><p>parse the Python files as AST and convert to XML</p></li>
<li><p>apply a user-supplied XPath expression to search for specific AST elements</p></li>
<li><p>print the results (with the complexity of handling different context strategies and colouring)</p></li>
</ul>
<p>Here is an example showing pyastgrep searching its own code base for all usages of names (variables etc) that contain the substring <code class="docutils literal">idx</code>:</p>
<pre style="background-color: #000; color: #fff;">
<span>$ pyastgrep './/Name[contains(@id, "idx")]'
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span style=" ">:</span><span style="color: #67b11d; font-weight: bold; ">60</span><span>:</span><span style="color: #da8b55; ">5</span><span>: </span><span> </span><span style="color: #f2241f; ">current_idx</span><span> = 0
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">64</span><span>:</span><span style="color: #da8b55; ">9</span><span>: </span><span> </span><span style="color: #f2241f; ">linebreak_idx</span><span> = python_file_bytes.find(b"\n", current_idx)
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">64</span><span>:</span><span style="color: #da8b55; ">55</span><span>: </span><span> linebreak_idx = python_file_bytes.find(b"\n", </span><span style="color: #f2241f; ">current_idx</span><span>)
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">65</span><span>:</span><span style="color: #da8b55; ">12</span><span>: </span><span> if </span><span style="color: #f2241f; ">linebreak_idx</span><span> < 0:
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">66</span><span>:</span><span style="color: #da8b55; ">38</span><span>: </span><span> line = python_file_bytes[</span><span style="color: #f2241f; ">current_idx</span><span>:]
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">68</span><span>:</span><span style="color: #da8b55; ">38</span><span>: </span><span> line = python_file_bytes[</span><span style="color: #f2241f; ">current_idx</span><span>:linebreak_idx]
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">68</span><span>:</span><span style="color: #da8b55; ">50</span><span>: </span><span> line = python_file_bytes[current_idx:</span><span style="color: #f2241f; ">linebreak_idx</span><span>]
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">72</span><span>:</span><span style="color: #da8b55; ">12</span><span>: </span><span> if </span><span style="color: #f2241f; ">linebreak_idx</span><span> < 0:
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">75</span><span>:</span><span style="color: #da8b55; ">13</span><span>: </span><span> </span><span style="color: #f2241f; ">current_idx</span><span> = linebreak_idx + 1
</span><span style="color: #eb30ff; ">src/pyastgrep/files.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">75</span><span>:</span><span style="color: #da8b55; ">27</span><span>: </span><span> current_idx = </span><span style="color: #f2241f; ">linebreak_idx</span><span> + 1
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">244</span><span>:</span><span style="color: #da8b55; ">9</span><span>: </span><span> </span><span style="color: #f2241f; ">start_line_idx</span><span> = line_index - before_context
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">245</span><span>:</span><span style="color: #da8b55; ">9</span><span>: </span><span> </span><span style="color: #f2241f; ">end_line_idx</span><span> = line_index + after_context
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">246</span><span>:</span><span style="color: #da8b55; ">9</span><span>: </span><span> </span><span style="color: #f2241f; ">stop_line_idx</span><span> = end_line_idx + 1
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">246</span><span>:</span><span style="color: #da8b55; ">25</span><span>: </span><span> stop_line_idx = </span><span style="color: #f2241f; ">end_line_idx</span><span> + 1
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">248</span><span>:</span><span style="color: #da8b55; ">19</span><span>: </span><span> if (path, </span><span style="color: #f2241f; ">end_line_idx</span><span>) in self.printed_context_lines:
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">252</span><span>:</span><span style="color: #da8b55; ">19</span><span>: </span><span> if (path, </span><span style="color: #f2241f; ">start_line_idx</span><span> - 1) not in self.printed_context_lines:
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">253</span><span>:</span><span style="color: #da8b55; ">57</span><span>: </span><span> header = self.formatter.format_header(path, </span><span style="color: #f2241f; ">start_line_idx</span><span>)
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">257</span><span>:</span><span style="color: #da8b55; ">44</span><span>: </span><span> code = "\n".join(result.file_lines[</span><span style="color: #f2241f; ">start_line_idx</span><span>:stop_line_idx])
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">257</span><span>:</span><span style="color: #da8b55; ">59</span><span>: </span><span> code = "\n".join(result.file_lines[start_line_idx:</span><span style="color: #f2241f; ">stop_line_idx</span><span>])
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">260</span><span>:</span><span style="color: #da8b55; ">13</span><span>: </span><span> for </span><span style="color: #f2241f; ">idx</span><span> in range(start_line_idx, stop_line_idx):
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">260</span><span>:</span><span style="color: #da8b55; ">26</span><span>: </span><span> for idx in range(</span><span style="color: #f2241f; ">start_line_idx</span><span>, stop_line_idx):
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">260</span><span>:</span><span style="color: #da8b55; ">42</span><span>: </span><span> for idx in range(start_line_idx, </span><span style="color: #f2241f; ">stop_line_idx</span><span>):
</span><span style="color: #eb30ff; ">src/pyastgrep/printer.py</span><span>:</span><span style="color: #67b11d; font-weight: bold; ">261</span><span>:</span><span style="color: #da8b55; ">51</span><span>: </span><span> self.printed_context_lines.add((path, </span><span style="color: #f2241f; ">idx</span><span>))
</span></pre></section>
<section id="things-i-liked">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-2" role="doc-backlink">Things I liked</a></h2>
<p>There were some things I really liked about the flow of using a type checker, and I was able to lean on the checker pretty heavily for some things.</p>
<p>One example of using type-driven programming was going between the layer of my code that found matches and the layer that printed them. I found that the search function needed to go from returning a simple <code class="docutils literal">Iterable[Match]</code> eventually all the way to <code class="docutils literal">Iterable[Match | MissingPath | ReadError | NonElementReturned | FileFinished]</code>. In each case, I could do something like:</p>
<ul class="simple">
<li><p>add the new return value, something like <code class="docutils literal">yield <span class="pre">MissingPath(...)</span></code>, in the body of the function.</p></li>
<li><p>fix up the function type signature, in response to the type error that mypy would now report.</p></li>
<li><p>then respond to the type error that mypy would report in the places that consumed this iterable, usually by implementing handling of the new result type. <code class="docutils literal">isinstance</code> checks can be used to drive <a class="reference external" href="https://mypy.readthedocs.io/en/stable/type_narrowing.html">type narrowing</a> and satisfy mypy that everything is fine.</p></li>
</ul>
<p>It was nice when this went through multiple layers, knowing that something was checking it for me and driving me to the next bit of code that needed fixing. Having the types there explicitly also helps to make you conscious of decisions you are making about how the layers are working, which was helpful in keeping the layers straight, so that I can use this code both as a library and a command line tool.</p>
<p>Being able to convert various exceptions, errors and corner cases to sum types in this way was also great, and I leaned on this heavily — probably more than if I hadn’t been using a type checker. I quite like these kind of ad-hoc sum types – in some ways they often work nicer than sum types in Haskell etc.</p>
<p>Use of mypy has encouraged me to use lots of small custom classes, because I can use “Find references” to search the code base for everything related to a particular type, or a particular attribute or method. For custom classes I create and use within the code base, this works perfectly, which is really nice. The same goes for renaming things using IDE tools (I’m using Emacs and the pyright LSP server).</p>
<p>I have a high degree of confidence that I’ll be able to come back to this code base after years without touching and be able to navigate it and make changes very easily.</p>
</section>
<section id="issues">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-3" role="doc-backlink">Issues</a></h2>
<p>However, I have a long list of complaints about issues I found too!</p>
<section id="mypy-just-isnt-checking-that-code">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-4" role="doc-backlink">mypy just isn’t checking that code.</a></h3>
<p>You have to turn on at least:</p>
<div class="code"><pre class="code ini"><a id="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-1" name="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_4a2e4786fcd5416cb62b43b64fdb7777-1"></a><span class="na">check_untyped_defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<a id="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-2" name="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_4a2e4786fcd5416cb62b43b64fdb7777-2"></a><span class="na">disallow_untyped_calls</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<a id="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-3" name="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_4a2e4786fcd5416cb62b43b64fdb7777-3"></a><span class="na">disallow_untyped_defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<a id="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-4" name="rest_code_4a2e4786fcd5416cb62b43b64fdb7777-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_4a2e4786fcd5416cb62b43b64fdb7777-4"></a><span class="na">disallow_incomplete_defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Otherwise you should not be expecting mypy to actually find typing errors. In general, it can be frustrating not knowing whether the lack of red is because you haven’t got errors, or because mypy just isn’t checking code, or can’t meaningfully check anything.</p>
</section>
<section id="iobase-vs-binaryio">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-5" role="doc-backlink">IOBase vs BinaryIO</a></h3>
<p>pyastgrep, in common with many similar tools, allows you to process standard input as well as grepping files named on the command line or found by directory walking. So I had to have branches for that, and I had tests for it too.</p>
<p>I hit a bug regarding encodings, where files not encoded in UTF-8 were causing the tool to crash. I ended up doing an internal change that switched a bunch of types from <code class="docutils literal">str</code> to <code class="docutils literal">bytes</code>. The code worked, the tests passed, and mypy reported no errors. But later – thankfully before a release – I noticed that I had broken stdin handling.</p>
<p>It turned out that according to mypy, <code class="docutils literal">IOBase.read()</code> returns <code class="docutils literal">Any</code>, and not the actual type <code class="docutils literal">bytes</code> or <code class="docutils literal">str</code>. I had been using <code class="docutils literal">IOBase</code> as a type for some of my arguments, which meant that mypy didn’t pick up the problem – if <code class="docutils literal">Any</code> appears anywhere, it’s like throwing a “silently disable everything this touches” bomb into the type checker.</p>
<p>Now, I had been alerted to the problem earlier – mypy thinks that <code class="docutils literal">sys.stdin</code> is of type <code class="docutils literal">typing.TextIO</code>, not <code class="docutils literal">IOBase</code>. However, <code class="docutils literal">typing.TextIO</code> is not something you test for at run-time, so it interacts badly with the very useful <code class="docutils literal">isinstance</code> type guards and type narrowing. So I had ended up using <code class="docutils literal">IOBase</code> as that seemed less problematic.</p>
<p>In other words, I <strong>had</strong> added type hints, and had I added <strong>correct</strong> type hints. But they weren’t correct <strong>enough</strong>, and therefore turned out to be “wrong”. It was very disappointing that despite the effort I had put in, this kind of type error still got through.</p>
<p>Fixing the bug involved writing a better, more accurate test that more closely emulated actual stdin handling, and then a very simple change. Fixing my type hints, however, was a much bigger task.</p>
<p>It involved a long journey of understanding regarding type guards and <a class="reference external" href="https://peps.python.org/pep-0724/">stricter type guards</a>, because non-strict type guards (which is what we have at the moment) <a class="reference external" href="https://github.com/python/mypy/issues/13957">turned out not to work how I thought</a>. The eventual refactoring now uses the <code class="docutils literal">typing.BinaryIO</code> type hint, and some code that seems somewhat fragile in terms of type checking – because there is no way of doing a “negative type guard” for <code class="docutils literal">BinaryIO</code>, I have to order my if/elif/else clauses in exactly the right way.</p>
<p>It’s also closing the barn door after the horse has bolted – I had already fixed the bug, and added much more thorough tests to prove it was fixed. I was hoping that static type checking would have caught this before I had to do that.</p>
</section>
<section id="missing-types-for-imports">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-6" role="doc-backlink">Missing types for imports</a></h3>
<p>I saw a bug I hadn’t fixed, and one that again I thought mypy might have caught.</p>
<p>It turns out I had added <code class="docutils literal">ignore_missing_imports = true</code> early in development to reduce the errors to a manageable set. This silenced errors relating to lxml and effectively gave me a bunch of <code class="docutils literal">Any</code> types floating around instead of something useful.</p>
<p>Again, this was “my fault”, but I feel it’s fairly typical of what will happen in the real world. Switching to <code class="docutils literal">ignore_missing_imports = false</code> can cause so many problems that it will be hard to justify the cost in many cases. In this case, the type stubs it wants me to install require me to “fix” a whole load of static type checking issues relating to lxml that don’t correspond to real run-time issues.</p>
</section>
<section id="equality-checks">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-7" role="doc-backlink">Equality checks</a></h3>
<p>I switched a bunch of code paths from <code class="docutils literal">str</code> to <code class="docutils literal">Path</code> at one point. mypy gave me some help, but less than I wanted. For example, this reports no error:</p>
<div class="code"><pre class="code python"><a id="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-1" name="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-1"></a><span class="n">path</span><span class="p">:</span> <span class="n">Path</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
<a id="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-2" name="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-2"></a>
<a id="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-3" name="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-3"></a><span class="k">if</span> <span class="n">path</span> <span class="o">==</span> <span class="s2">"-"</span><span class="p">:</span>
<a id="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-4" name="rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_8c0b6e91b03b4828b57313eff2cf8ec7-4"></a> <span class="o">...</span>
</pre></div>
<p>A comparison between a <code class="docutils literal">str</code> and a <code class="docutils literal">Path</code> always returns <code class="docutils literal">False</code>, so it’s not a useful thing to do, and therefore a developer error. I meant to do the comparison to <code class="docutils literal"><span class="pre">"-"</span></code> before I converted the input <code class="docutils literal">str</code> objects to internal <code class="docutils literal">Path</code> objects. It’s conceptually a <code class="docutils literal">TypeError</code>, but not actually one. Thankfully I had tests that failed.</p>
</section>
<section id="mypy-caching-bugs">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-8" role="doc-backlink">mypy caching bugs</a></h3>
<p>Several times I had to blow away <code class="docutils literal">.mypy_cache</code> to get errors to appear. This is not a fundamental problem with the idea of static typing, but it makes very big difference to the whole flow of leaning on mypy. I often noticed only when I <strong>knew</strong> that mypy should be reporting errors due to a change I just made, but it wasn’t – I have no idea how many other times it was happening. When interpreting “mypy reports no errors”, there are now about half a dozen reasons why that might be the case, only one of which is “you fixed everything and your code is correct”.</p>
</section>
<section id="third-party-types">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-9" role="doc-backlink">Third party types</a></h3>
<p>Are types provided by third party libs or typeshed reliable?</p>
<p>No, they are not. For example, I discovered this one in <a class="reference external" href="https://github.com/python/typeshed/blob/a094aa09c2aa47721664d3fdef91eda4fac24ebb/stdlib/_ast.pyi#L19">typeshed/stdlib/_ast.pyi</a> among many others:</p>
<div class="code"><pre class="code python"><a id="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-1" name="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_5c380eaab43d4b878a1a1aba4137e6dc-1"></a><span class="k">class</span> <span class="nc">AST</span><span class="p">:</span>
<a id="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-2" name="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_5c380eaab43d4b878a1a1aba4137e6dc-2"></a> <span class="o">...</span>
<a id="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-3" name="rest_code_5c380eaab43d4b878a1a1aba4137e6dc-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_5c380eaab43d4b878a1a1aba4137e6dc-3"></a> <span class="c1"># TODO: Not all nodes have all of the following attributes</span>
</pre></div>
<p>This is probably not typeshed’s “fault” — it’s a problem trying to retro-fit static types to a language and stdlib that wasn’t designed for them.</p>
</section>
<section id="duck-typing">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-10" role="doc-backlink">Duck typing</a></h3>
<p>As soon as you want to use duck typing, which I did want to, you’ve got more work ahead of you, work that isn’t really to do with solving your actual problem. There are solutions such as <a class="reference external" href="https://docs.python.org/3/library/typing.html#typing.Protocol">Protocol</a>, but I’m simply noting you do have significant amounts of extra work for the type checker to understand idiomatic Python.</p>
</section>
<section id="false-security">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-11" role="doc-backlink">False security</a></h3>
<p>I fairly often got that sense of “it type checks, and everything works first time I run it, cool!”</p>
<p>Sometimes, it was an illusion though – take this code:</p>
<div class="code"><pre class="code python"><a id="rest_code_6bf898ff493c4551b13624ce9e746f5c-1" name="rest_code_6bf898ff493c4551b13624ce9e746f5c-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_6bf898ff493c4551b13624ce9e746f5c-1"></a><span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">color</span> <span class="o">==</span> <span class="n">UseColor</span><span class="o">.</span><span class="n">AUTO</span><span class="p">:</span>
<a id="rest_code_6bf898ff493c4551b13624ce9e746f5c-2" name="rest_code_6bf898ff493c4551b13624ce9e746f5c-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_6bf898ff493c4551b13624ce9e746f5c-2"></a> <span class="n">colorer</span> <span class="o">=</span> <span class="n">make_default_colorer</span><span class="p">()</span>
<a id="rest_code_6bf898ff493c4551b13624ce9e746f5c-3" name="rest_code_6bf898ff493c4551b13624ce9e746f5c-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_6bf898ff493c4551b13624ce9e746f5c-3"></a><span class="k">elif</span> <span class="n">args</span><span class="o">.</span><span class="n">colors</span> <span class="o">==</span> <span class="n">UseColor</span><span class="o">.</span><span class="n">NEVER</span><span class="p">:</span>
<a id="rest_code_6bf898ff493c4551b13624ce9e746f5c-4" name="rest_code_6bf898ff493c4551b13624ce9e746f5c-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_6bf898ff493c4551b13624ce9e746f5c-4"></a> <span class="n">colorer</span> <span class="o">=</span> <span class="n">NullColorer</span><span class="p">()</span>
</pre></div>
<p>I had typed <code class="docutils literal">colors</code> instead of the correct <code class="docutils literal">color</code> in the second branch, but I got no squiggly red lines. This was because of a lurking <code class="docutils literal">Any</code> – the argparse <code class="docutils literal">args</code> object is actually an <code class="docutils literal">Any</code>. This tripped me up, because I didn’t have any tests for that line of code.</p>
<p>Having <code class="docutils literal">strict = true</code> in your mypy config doesn’t fix this. I think I’d need a way to say “warn me for about anywhere that <code class="docutils literal">Any</code> leaks into my code base”, but even if it existed I imagine I would not like it.</p>
</section>
<section id="exhaustiveness-checking">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-12" role="doc-backlink">Exhaustiveness checking</a></h3>
<p>mypy fails to find the obvious issue in this bit of code:</p>
<div class="code"><pre class="code python"><a id="rest_code_b1aad54a40c349f0b044cbdd51a6b585-1" name="rest_code_b1aad54a40c349f0b044cbdd51a6b585-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_b1aad54a40c349f0b044cbdd51a6b585-1"></a><span class="k">def</span> <span class="nf">foo</span><span class="p">()</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<a id="rest_code_b1aad54a40c349f0b044cbdd51a6b585-2" name="rest_code_b1aad54a40c349f0b044cbdd51a6b585-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_b1aad54a40c349f0b044cbdd51a6b585-2"></a> <span class="k">if</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<a id="rest_code_b1aad54a40c349f0b044cbdd51a6b585-3" name="rest_code_b1aad54a40c349f0b044cbdd51a6b585-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_b1aad54a40c349f0b044cbdd51a6b585-3"></a> <span class="n">x</span> <span class="o">=</span> <span class="s2">"hello"</span>
<a id="rest_code_b1aad54a40c349f0b044cbdd51a6b585-4" name="rest_code_b1aad54a40c349f0b044cbdd51a6b585-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_b1aad54a40c349f0b044cbdd51a6b585-4"></a> <span class="nb">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</pre></div>
<p>I hoped I’d at least get a warning for a potential unbound variable. This comes up with cases where you want exhaustiveness checking, like:</p>
<div class="code"><pre class="code python"><a id="rest_code_345f50377f1a435c847407d55b46d8b4-1" name="rest_code_345f50377f1a435c847407d55b46d8b4-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-1"></a><span class="k">def</span> <span class="nf">print_greeting</span><span class="p">(</span><span class="n">username</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="nb">type</span><span class="p">:</span> <span class="n">Greeting</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-2" name="rest_code_345f50377f1a435c847407d55b46d8b4-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-2"></a> <span class="k">if</span> <span class="nb">type</span> <span class="o">==</span> <span class="n">Greeting</span><span class="o">.</span><span class="n">HELLO</span><span class="p">:</span>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-3" name="rest_code_345f50377f1a435c847407d55b46d8b4-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-3"></a> <span class="n">greeting</span> <span class="o">=</span> <span class="s2">"hello"</span>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-4" name="rest_code_345f50377f1a435c847407d55b46d8b4-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-4"></a> <span class="k">elif</span> <span class="nb">type</span> <span class="o">==</span> <span class="n">Greeting</span><span class="o">.</span><span class="n">GOODBYE</span><span class="p">:</span>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-5" name="rest_code_345f50377f1a435c847407d55b46d8b4-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-5"></a> <span class="n">greeting</span> <span class="o">=</span> <span class="s2">"goodbye"</span>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-6" name="rest_code_345f50377f1a435c847407d55b46d8b4-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-6"></a>
<a id="rest_code_345f50377f1a435c847407d55b46d8b4-7" name="rest_code_345f50377f1a435c847407d55b46d8b4-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_345f50377f1a435c847407d55b46d8b4-7"></a> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">greeting</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">username</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p>You can get this right using an <code class="docutils literal">else</code> branch with <a class="reference external" href="https://typing.readthedocs.io/en/latest/source/unreachable.html">assert_never</a>, but it’s annoying that you have to remember to do this.</p>
</section>
<section id="decorators">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-13" role="doc-backlink">Decorators</a></h3>
<p><a class="reference external" href="https://mypy.readthedocs.io/en/stable/generics.html#declaring-decorators">Type hints for decorators</a> are … bad. If you want parameterised decorators, or other people’s decorators that don’t have types</p>
<p>[Apologies for the unfinished sentence above. I don’t want to risk a repeated head-against-table moment that the first attempt triggered, once was painful enough]</p>
</section>
</section>
<section id="pyright">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-14" role="doc-backlink">pyright</a></h2>
<p>More recently I’ve tried pyright as an alternative to mypy. Generally I’ve found it to be significantly less buggy. However, mypy has a lot going for it in terms of features and extensions, and I don’t really want to have two different type checkers. At the moment I’m experimenting with mainly using pyright for interactive checks in my editor, and using mypy for pre-commit/CI checks.</p>
<p>The overlapping feature sets can be kind of annoying though. For example, for the potential unbound variable error above, the latest version of pyright does warn you. It also has built-in exhaustiveness checking <strong>without</strong> needing the <code class="docutils literal">assert_never</code> technique. However, in one case it wasn’t working for me, until I finally tracked down the issue — mypy was able to handle this code and correctly deduce the base class of my enum, but pyright wasn’t:</p>
<div class="code"><pre class="code python"><a id="rest_code_652ff8e9800f45b38044cdcd76725632-1" name="rest_code_652ff8e9800f45b38044cdcd76725632-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_652ff8e9800f45b38044cdcd76725632-1"></a><span class="k">try</span><span class="p">:</span>
<a id="rest_code_652ff8e9800f45b38044cdcd76725632-2" name="rest_code_652ff8e9800f45b38044cdcd76725632-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_652ff8e9800f45b38044cdcd76725632-2"></a> <span class="kn">from</span> <span class="nn">enum</span> <span class="kn">import</span> <span class="n">StrEnum</span>
<a id="rest_code_652ff8e9800f45b38044cdcd76725632-3" name="rest_code_652ff8e9800f45b38044cdcd76725632-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_652ff8e9800f45b38044cdcd76725632-3"></a><span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
<a id="rest_code_652ff8e9800f45b38044cdcd76725632-4" name="rest_code_652ff8e9800f45b38044cdcd76725632-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#rest_code_652ff8e9800f45b38044cdcd76725632-4"></a> <span class="kn">from</span> <span class="nn">backports.strenum</span> <span class="kn">import</span> <span class="n">StrEnum</span> <span class="c1"># type: ignore [no-redef]</span>
</pre></div>
<p>I eventually <a class="reference external" href="https://github.com/microsoft/pyright/issues/4076">found an adequate solution</a> that keeps them both happy — but only because I’m writing this blog post and don’t want to look stupid. Normally it would be “stuff is broken, maybe it’s me, maybe it’s them, gotta move on”.</p>
<p>In addition, in some places pyright does <strong>not</strong> support the <code class="docutils literal">assert_never</code> technique that mypy needs, and reports an error. There are <a class="reference external" href="https://github.com/microsoft/pyright/issues/4706">other pain points</a> if you try to use both.</p>
<p>There are quite a few places where you find pyright doesn’t do the same thing as mypy because pyright is more correct. Microsoft people tend to know what they are talking about when it comes to type systems. But it means you may find yourself digging through <a class="reference external" href="https://github.com/microsoft/pyright/issues?q=is%3Aissue+is%3Aclosed+label%3A%22as+designed%22">large numbers of issues closed with the “as designed” tag</a> to find answers.</p>
</section>
<section id="summary">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-15" role="doc-backlink">Summary</a></h2>
<p>Overall, despite listing more bad things than good, I’m actually happy with the addition of mypy as a required static type checker in this project.</p>
<p>The disappointments I’ve listed may come from my experience and enjoyment of languages like Haskell where you really can lean on the type checker. In those languages, you find both that the rewards of static type checking are massively higher, and that the effort required to use them is massively lower. Haskell type signatures, for instance, are often not needed, and much easier to write and understand than Python’s.</p>
<p>Perhaps the most positive outlook is “static type checking in Python is just an advanced linter, of course it’s not actually reliable”. This can be hard to accept, though, due to the amount of work you have to do to get any real benefit above and beyond linters like flake8 and ruff that, with virtually no changes to your code or workflow, catch a lot of issues with a very low false positive rate.</p>
<p>In terms of tips and advice:</p>
<ul class="simple">
<li><p>You need to turn up error reporting and <a class="reference external" href="https://rtpg.co/2023/03/07/how-to-adopt-mypy-on-bigger-projects.html">spend considerable effort configuring mypy</a>, especially in larger projects.</p></li>
<li><p>If you want something approaching reliability, your entire stack of libraries needs to have been designed with static types from the beginning, so you don’t have to use stubs. This means:</p>
<ul>
<li><p>probably not much in stdlib. You’re going to need to wrap everything.</p></li>
<li><p>probably not much that was created more than 5 years ago.</p></li>
</ul>
</li>
</ul>
<p>Maybe this works well for MegaCorps with an army of developers and a very large code base that they have to get under control somehow. I think for many projects, you are going to be happy with static type checking in Python only if you can resign yourself to a very low level of reliability, and are mostly leaning on other techniques for correctness, like an extensive test suite.</p>
</section>
<hr class="docutils">
<section id="links">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-pyastgrep-case-study/#toc-entry-16" role="doc-backlink">Links</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/ecmsdo/python_type_hints_pyastgrep_case_study">Discussion of this post on Lobsters</a></p></li>
</ul>
</section>Django and Sass/SCSS without Node.js or a build stephttps://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/2023-06-01T19:54:15Z2023-06-01T19:54:15ZLuke Plant<p>How to use Sass/SCSS in a Django project, without needing Node.js/npm or running a build process</p><p>Although they are less necessary than in the past, I like to use a <a class="reference external" href="https://developer.mozilla.org/en-US/docs/Glossary/CSS_preprocessor">CSS pre-processor</a> when doing web development. I used to use <a class="reference external" href="https://lesscss.org/">LessCSS</a>, but recently I’ve found that I can use <a class="reference external" href="https://sass-lang.com/">Sass</a> without needing either a separate build step, or a package that requires Node.js and npm to install it. The heart of the functionality is provided by <a class="reference external" href="https://sass-lang.com/libsass">libsass</a>, an implementation of Sass as a C++ library.</p>
<p>On Linux systems, this can be installed as a package <code class="docutils literal">libsass</code> or similar, but even better is that you can pip install it as a Python package, <a class="reference external" href="https://pypi.org/project/libsass/">libsass</a>.</p>
<p>When it comes to using it from a Django project, the first step is to <a class="reference external" href="https://django-compressor.readthedocs.io/en/stable/quickstart.html">install
django-compressor</a>.</p>
<p>Then, you need to add <a class="reference external" href="https://pypi.org/project/django-libsass/">django-libsass</a> as per its instructions.</p>
<p>That’s about it. As per the django-libsass instructions, somewhere in your base HTML templates you’ll have something like this:</p>
<div class="code"><pre class="code html+django"><a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-1" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-1" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-1"></a><span class="c">{# at the top #}</span>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-2" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-2" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-2"></a><span class="cp">{%</span> <span class="k">load</span> <span class="nv">compress</span> <span class="cp">%}</span>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-3" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-3" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-3"></a><span class="cp">{%</span> <span class="k">load</span> <span class="nv">static</span> <span class="cp">%}</span>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-4" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-4" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-4"></a>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-5" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-5" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-5"></a>{# in the <span class="p"><</span><span class="nt">head</span><span class="p">></span> element #]
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-6" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-6" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-6"></a><span class="cp">{%</span> <span class="k">compress</span> <span class="nv">css</span> <span class="cp">%}</span>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-7" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-7" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-7"></a> <span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">type</span><span class="o">=</span><span class="s">"text/x-scss"</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{%</span> <span class="k">static</span> <span class="s2">"myapp/css/main.scss"</span> <span class="cp">%}</span><span class="s">"</span> <span class="p">/></span>
<a id="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-8" name="rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-8" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_dcd45d0f50a64dcbac57eaccef03cc8b-8"></a><span class="cp">{%</span> <span class="k">endcompress</span> <span class="cp">%}</span>
</pre></div>
<p>You write your SCSS in that <code class="docutils literal">main.scss</code> file (it doesn’t have to be called that), and it can <code class="docutils literal">@import</code> other SCSS files of course.</p>
<p>Then, when you load a page, django-compressor will take care of running the SCSS files through libsass, saving the output CSS to a file and inserting the appropriate HTML that references that CSS file into your template output. It caches things very well so that you don’t incur any penalty if files haven’t changed — and libsass is a very fast implementation for when the processing does need to happen.</p>
<p>What this means is that you have eliminated both the need for Node.js/npm, and the need for a build step/process, if you only needed these things for CSS pre-processing.</p>
<p>Of course, the SCSS → CSS compilation still has to happen, but it happens on demand in the same process that runs the web app, and it’s both fast enough and reliable enough that you simply never have to think about it again. So this is “build-less” in the same way that “server-less” means you don’t have to think about servers, and the same way that Python “doesn’t have a compilation step”.</p>
<section id="future-proofing">
<h2>Future proofing</h2>
<p>On the Sass-lang page about libsass, they say it is “deprecated”, and on the <a class="reference external" href="https://github.com/sass/libsass">project page</a> page it says:</p>
<blockquote>
<p>While it will continue to receive maintenance releases indefinitely, there are no plans to add additional features or compatibility with any new CSS or Sass features.</p>
</blockquote>
<p>In other words, this is what I prefer to call “mature software” 😉. libsass already has everything I need. If it does eventually fail to be maintained or I need new features, it’s not a problem:</p>
<ul>
<li><p>Switch to Dart Sass, which can be installed as a <a class="reference external" href="https://github.com/sass/dart-sass/releases/">standalone binary</a>.</p></li>
<li><p>Set your django-compressor settings like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_803ac20f483f4fa083bfe21dcd4829c7-1" name="rest_code_803ac20f483f4fa083bfe21dcd4829c7-1" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_803ac20f483f4fa083bfe21dcd4829c7-1"></a><span class="n">COMPRESS_PRECOMPILERS</span> <span class="o">=</span> <span class="p">[</span>
<a id="rest_code_803ac20f483f4fa083bfe21dcd4829c7-2" name="rest_code_803ac20f483f4fa083bfe21dcd4829c7-2" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_803ac20f483f4fa083bfe21dcd4829c7-2"></a> <span class="p">(</span><span class="s2">"text/x-scss"</span><span class="p">,</span> <span class="s2">"sass </span><span class="si">{infile}</span><span class="s2"> </span><span class="si">{outfile}</span><span class="s2">"</span><span class="p">),</span>
<a id="rest_code_803ac20f483f4fa083bfe21dcd4829c7-3" name="rest_code_803ac20f483f4fa083bfe21dcd4829c7-3" href="https://lukeplant.me.uk/blog/posts/django-sass-scss-without-nodejs-or-build-step/#rest_code_803ac20f483f4fa083bfe21dcd4829c7-3"></a><span class="p">]</span>
</pre></div>
</li>
</ul>
<p>This covers the basic case. If you want all the features of django-libsass, which includes looking in your other static file folders for SCSS, you’ll probably need to fork <a class="reference external" href="https://github.com/torchbox/django-libsass/blob/main/django_libsass.py">the code</a> and make it work by calling Dart Sass using <a class="reference external" href="https://docs.python.org/3/library/subprocess.html">subprocess</a> — a small amount of work, and nothing that will fundamentally break this approach.</p>
</section>The different uses of Python type hintshttps://lukeplant.me.uk/blog/posts/the-different-uses-of-python-type-hints/2023-04-05T20:49:38+01:002023-04-05T20:49:38+01:00Luke Plant<p>5 different things you might be using type annotations for, or might want to.</p><p>When you use <a class="reference external" href="https://peps.python.org/pep-0484/">type hints/annotations</a> in <a class="reference external" href="https://www.python.org/">Python</a>, you could be using them for one or more of at least 5 different things:</p>
<section id="interactive-programming-help">
<h2>Interactive programming help</h2>
<p>Many editors will be able to use type hints to give you help with:</p>
<ul class="simple">
<li><p>autocomplete (e.g. suggesting methods that actually exist on the type of objects you are dealing with)</p></li>
<li><p>immediate error checking (e.g. squiggly red lines under mistakes)</p></li>
<li><p>code navigation (e.g. jump to definition)</p></li>
<li><p>refactoring (e.g. renaming a method and all uses of it)</p></li>
</ul>
<p>To be clear, these features can often work without type hints – for example, I’ve used <a class="reference external" href="https://github.com/davidhalter/jedi/">jedi</a> very effectively to provide jump to definition etc. on code bases without any type hints, and many linters like <a class="reference external" href="https://flake8.pycqa.org/en/latest/">flake8</a> and <a class="reference external" href="https://beta.ruff.rs/docs/">ruff</a> also provide a lot of functionality without types. But type hints can help a lot in cases where static analysis wouldn’t otherwise give clear answers.</p>
</section>
<section id="static-type-checking">
<h2>Static type checking</h2>
<p>This is where a tool uses the type annotations to check the correctness of your code. I’m distinguishing this from the “immediate error checking” use case above, even though the same tool such as <a class="reference external" href="https://mypy.readthedocs.io/en/stable/">mypy</a> or <a class="reference external" href="https://microsoft.github.io/pyright/#/">pyright</a> might be behind it, because I’m specifically thinking of cases where your code will be rejected by something in your process (like checks in your CI build system) if type checking reports errors. This case is different because help in your editor can be ignored if it is wrong, but static type checks built into your deployment processes etc. either cannot be skipped, or require extra work to ignore. “Friendly assistant” and “opinionated gatekeeper” are quite different personas, and you might not appreciate them equally.</p>
</section>
<section id="runtime-behaviour-determination">
<h2>Runtime behaviour determination</h2>
<p>Running Python code can use reflection/introspection techniques to inspect type hints and change behaviour on that basis. The most obvious example in my mind is <a class="reference external" href="https://docs.pydantic.dev/">pydantic</a>, which uses type annotations to determine what correct inputs look like, and also serialisation/deserialisation behaviour. Another example would be runtime type checking like <a class="reference external" href="https://beartype.readthedocs.io/en/latest/">beartype</a> or <a class="reference external" href="https://typeguard.readthedocs.io/en/latest/">typeguard</a>.</p>
<p><strong>UPDATE:</strong> Another use is <a class="reference external" href="https://lagom-di.readthedocs.io/en/latest/">dependency injection (Lagom)</a> (thanks <a class="reference external" href="https://lobste.rs/u/antoinewdg">antoinewdg</a>), and other notable projects leaning on runtime use of type hints include <a class="reference external" href="https://fastapi.tiangolo.com/">FastAPI</a> and <a class="reference external" href="https://typer.tiangolo.com/">Typer</a> (thanks <a class="reference external" href="https://www.b-list.org/">ubernostrum</a>). There are probably lots more.</p>
</section>
<section id="code-documentation">
<h2>Code documentation</h2>
<p>A type hint can be used to tell user what kind of objects a function accepts or produces. This can be extracted in automatically created docs, or shown in your editor (in which case it also falls under “Interactive programming help”).</p>
<p>An interesting application of this is <a class="reference external" href="https://drf-spectacular.readthedocs.io/en/latest/">drf-spectacular</a>, which uses type hints as well as other information to extract an OpenAPI spec from a project using <a class="reference external" href="https://www.django-rest-framework.org/">DRF</a>. This spec, as well as serving as documentation or input to tools like <a class="reference external" href="https://github.com/Redocly/redoc">redoc</a> or <a class="reference external" href="https://swagger.io/tools/swagger-ui/">Swagger UI</a>, can also be used for type checking or code generation in another language, typically for web frontend code, via tools like <a class="reference external" href="https://github.com/OpenAPITools/openapi-generator">OpenAPI generator</a>.</p>
</section>
<section id="compiler-instructions">
<h2>Compiler instructions</h2>
<p>I don’t know how many people are doing this, but tools like <a class="reference external" href="https://github.com/mypyc/mypyc">mypyc</a> will use type hints to compile Python code to something faster, like C extensions. Using type annotations to speed up Python is <a class="reference external" href="https://bernsteinbear.com//blog/typed-python/">quite hard in practice</a>.</p>
</section>
<section id="conclusion">
<h2>Conclusion</h2>
<p>There isn’t really a point of this post other than to say “be aware of these different use cases”. This awareness can be very important when you are in any discussion about the usefulness or necessity of type hints – which scenarios are you thinking about?</p>
<p>Also, when you are weighing up whether to add type hints, you might decide to do so in order to support some of these but not others – <a class="reference external" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/">as I did for the parsy library</a>.</p>
<p>Finally, and this is a bit of a gotcha to leave you with, you may need to be very aware of the different use cases when thinking about correctness. If you see <code class="docutils literal">count: int</code>, what kind of guarantee do you have that the <code class="docutils literal">count</code> name is actually bound to an <code class="docutils literal">int</code> object at runtime? Is the type hint invoking some runtime checking, or is it merely docs, or hoping for a static type check that might not actually happen? You probably need to know which it is!</p>
</section>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/2beggz/different_uses_python_type_hints">Discussion of this post on Lobsters</a></p></li>
</ul>
</section>Python’s “Disappointing” Superpowershttps://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/2023-02-01T13:44:15Z2023-02-01T13:44:15ZLuke Plant<p>A response to Hillel Wayne’s “I am disappointed by dynamic typing”</p><p>In Hillel Wayne’s post <a class="reference external" href="https://buttondown.email/hillelwayne/archive/i-am-disappointed-by-dynamic-typing/">“I am disappointed by dynamic typing”</a>, he expresses his sense that the Python ecosystem doesn’t really make the most of the possibilities that Python provides as a dynamically typed language. This is an important subject, since every Python program pays a very substantial set of costs for Python’s highly dynamic nature, such as poor run-time performance, and maintainability issues. Are we we getting anything out of this tradeoff?</p>
<p>I think Hillel makes some fair points, and this post is intended as a response rather than a rebuttal. Recently there has been a significant influence of static type systems which I think might be harmful. The static type system we have in the form of mypy/pyright (which is partly codified in <a class="reference external" href="https://peps.python.org/pep-0484/">PEP 484</a> and following) seems to be much too heavily inspired by what is possible to map to other languages, rather than the features that Python provides.</p>
<p>(As a simple example to support that claim, consider the fact that Python has had support for keyword arguments since as long as I can remember, and for keyword-only arguments since Python 3.0. But <code class="docutils literal">typing.Callable</code> has zero support for them, meaning they can’t be typed in a higher-order context. . This is bad, since they are a key part of Python’s excellent reputation for readability, and <a class="reference external" href="https://lukeplant.me.uk/blog/posts/keyword-only-arguments-in-python/">we want more keyword-only arguments, not fewer</a>.
[<strong>EDIT:</strong> it looks like there is <a class="reference external" href="https://mypy.readthedocs.io/en/stable/protocols.html#callback-protocols">another way to do it</a>, it’s just about 10 times more work, so the point kind of stands.]
I can give more examples, but that will have to wait for another blog post).</p>
<p>I’m worried that a de-facto move away from dynamic stuff in the Python ecosystem, possibly motivated by those who use Python only because they have to, and just want to make it more like the C# or Java they are comfortable with, could leave us with the very worst of all worlds.</p>
<p>However, I also think there are plenty of counter-examples to Hillel’s claim, and that’s what this post will explore.</p>
<p>Hillel was specifically thinking about, in his own words:</p>
<ul class="simple">
<li><p>“runtime program manipulation”</p></li>
<li><p>“programs that take programs and output other programs”</p></li>
<li><p>“thinking of the whole runtime environment in the same way, where everything is a runtime construct”</p></li>
</ul>
<p>…and he gave some examples that included things like:</p>
<ul class="simple">
<li><p>run-time type modification</p></li>
<li><p>introspection/manipulation of the stack</p></li>
<li><p>passing very differently typed objects through normal code to collect information about it.</p></li>
</ul>
<p>I’m going to give lots of examples of this kind of thing in Python, and they will all be <strong>real world</strong> examples. This means that either I have used them myself to solve real problems, or I’m aware that other people are using them in significant numbers.</p>
<p>Before I get going, there are some things to point out.</p>
<p>First, I don’t have the exact examples Hillel is looking for – but that’s because the kind of problems I’ve needed to solve have not been exactly the same as his. My examples are all necessarily limited in scope: since Python allows unrestricted side-effects in any function, including IO and being able to modify other code, there are obviously limits into how well these techniques can work across large amounts of code.</p>
<p>I do think, however, that my examples are in the same general region, and some of them very close. On both sides we’ve got to avoid semantic hair-splitting – you can argue that every time you use the <code class="docutils literal">class</code> keyword in Python you are doing “run-time type creation”, rather than “compile-time type creation”, because that’s how Python’s classes work. But that’s not what Hillel meant.</p>
<p>Second, many of these more magical techniques involve what is called monkey patching. People are often confused about the difference between monkey patching and “dynamic meta-programming”, so I’ve prepared a handy flow chart for you:</p>
<img alt="Flow chart: Is this code I found a hacky monkey patch, or cool dynamic meta-programming? Question: who wrote it? If “Me” - it’s “Dynamic meta-programming”, if “someone else”, it’s “hacky monkey patch”" class="align-center" src="https://lukeplant.me.uk/blogmedia/monkey_patch_or_dynamic_meta_programming.png">
<p>There are, however, many instances of advanced, dynamic techniques that never get to the point of the chart above, and that’s because you never know about them. What you know is that the code does something useful, and it does so reliably enough that you don’t need to know what techniques contributed to it. And this is, I think, the biggest problem in what Hillel is asking for. The best examples of these techniques will be reliable enough that they don’t draw attention to themselves, and you immediately take them for granted.</p>
<p>Which is also to say that you cannot discount something I mention below just because it is so widely used that you, too, have taken it for granted – that would effectively be saying that the only examples that count are the ones that have proved to be so wild and wacky that everyone has decided they are a bad idea.</p>
<p>Third, you might also discount these examples as “just using features the language provides”, rather than “hyper programming” or something exotic. On the one hand, it would be true, but also unfair in the context of this debate. The most obvious example is <code class="docutils literal">eval</code>. This is clearly a very powerful technique not available to many statically typed languages, and exactly the kind that Hillel is looking for – you are literally creating more of your program as your program is running. On the other hand, it’s nothing more than a builtin function.</p>
<p>Finally, a number of these examples don’t involve “production” code i.e. the code is typically run only on developer machines or in CI. These still count, however – just like many of Hillel’s examples are in the area of testing. The reasons they still count are 1) developers are humans too, and solving their problems is still important and 2) the techniques used by developers on their own machines are useful in creating high quality code for running on other people’s machines, where we don’t want to incur the performance or robustness penalties of the techniques used.</p>
<p>So, here are my examples. The majority are not my own code, but I’ve also taken the opportunity to do some fairly obvious bragging about cool things I’ve done in Python.</p>
<nav class="contents" id="examples" role="doc-toc">
<p class="topic-title">Examples</p>
<ul class="simple">
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#gooey" id="toc-entry-1">Gooey</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#werkzeugs-interactive-debugger" id="toc-entry-2">Werkzeug’s interactive debugger</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#hybrid-attributes-in-sqlalchemy" id="toc-entry-3">Hybrid attributes in SQLAlchemy</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#pony-orm" id="toc-entry-4">Pony ORM</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#django" id="toc-entry-5">Django</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#foreignkey" id="toc-entry-6">ForeignKey</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#relatedmanager" id="toc-entry-7">RelatedManager</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#manytomany-models" id="toc-entry-8">ManyToMany models</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#consequences" id="toc-entry-9">Consequences</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#baserow" id="toc-entry-10">Baserow</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#cciw-data-retention-policy" id="toc-entry-11">CCiW data retention policy</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#query-tracing" id="toc-entry-12">Query tracing</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#time-machine-and-pyfakefs" id="toc-entry-13">time-machine and pyfakefs</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#environment-detection" id="toc-entry-14">Environment detection</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#fluent-compiler" id="toc-entry-15">fluent-compiler</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#compile-to-python" id="toc-entry-16">Compile-to-Python</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#dynamic-test-methods" id="toc-entry-17">Dynamic test methods</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#morph-into" id="toc-entry-18"><code class="docutils literal">morph_into</code></a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#pytest" id="toc-entry-19">Pytest</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#assert-rewriting" id="toc-entry-20">Assert rewriting</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#automatic-dependency-injection-of-fixtures" id="toc-entry-21">Automatic dependency injection of fixtures</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#others" id="toc-entry-22">Others</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#conclusion" id="toc-entry-23">Conclusion</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#links" id="toc-entry-24">Links</a></p></li>
</ul>
</nav>
<section id="gooey">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-1" role="doc-backlink">Gooey</a></h2>
<p><a class="reference external" href="https://github.com/chriskiehl/Gooey">Gooey</a> is a library that will re-interpret <a class="reference external" href="https://docs.python.org/3/library/argparse.html">argparse</a> entry points as if they were specifying a GUI. In other words, you do “import gooey”, add a decorator and it transforms your CLI program into a GUI program. Apparently it does this by <a class="reference external" href="https://github.com/chriskiehl/Gooey#how-does-it-work">re-parsing your entry point module</a>, for reasons I don’t know and don’t need to know. I do know that it works for programs I’ve tried it with, when I wanted to make something that I was using as a CLI, but also needed to be usable by other family members. A pretty cool tool that solves real problems.</p>
</section>
<section id="werkzeugs-interactive-debugger">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-2" role="doc-backlink">Werkzeug’s interactive debugger</a></h2>
<p>Werkzeug provide a <a class="reference external" href="https://werkzeug.palletsprojects.com/en/2.2.x/debug/">debugger middleware</a> which works with any WSGI-compliant Python web framework (which is most of them) with the following extremely useful behaviour:</p>
<ul class="simple">
<li><p>Crashing errors are automatically intercepted and an error page is shown with a stack trace instead of a generic 500 error.</p></li>
<li><p>For any and every frame of the stack trace, you can, right from your web browser, start a Python REPL at that frame – i.e. you can effectively continue execution of the crashed program at any point in the stack, or from multiple points simultaneously.</p></li>
</ul>
<p>This is extremely useful, to say the least.</p>
<img alt="Screenshot of Werkzeug debugger in action" class="align-center" src="https://lukeplant.me.uk/blogmedia/werkzeug_debugger_example.png">
<p>(For Django users – you can use this most easily using <a class="reference external" href="https://django-extensions.readthedocs.io/en/latest/">django-extensions</a>)</p>
</section>
<section id="hybrid-attributes-in-sqlalchemy">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-3" role="doc-backlink">Hybrid attributes in SQLAlchemy</a></h2>
<p>I’m sure there are <strong>many</strong> examples of advanced dynamic techniques in SQLAlchemy, and I’m not the best qualified to talk about them, but here is a cool one I came across that helps explain the kind of thing you can do in Python.</p>
<p>Suppose you have an ORM object with some attributes that come straight from the database, along with some calculated properties. In the example below we’ve got a model representing an account that might have payments against it:</p>
<div class="code"><pre class="code python"><a id="rest_code_417660358bd94b8f8291526aadc85be5-1" name="rest_code_417660358bd94b8f8291526aadc85be5-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-1"></a><span class="k">class</span> <span class="nc">Account</span><span class="p">(</span><span class="n">Base</span><span class="p">):</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-2" name="rest_code_417660358bd94b8f8291526aadc85be5-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-2"></a> <span class="c1"># DB columns:</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-3" name="rest_code_417660358bd94b8f8291526aadc85be5-3" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-3"></a> <span class="n">amount_paid</span><span class="p">:</span> <span class="n">Mapped</span><span class="p">[</span><span class="n">Decimal</span><span class="p">]</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-4" name="rest_code_417660358bd94b8f8291526aadc85be5-4" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-4"></a> <span class="n">total_purchased</span><span class="p">:</span> <span class="n">Mapped</span><span class="p">[</span><span class="n">Decimal</span><span class="p">]</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-5" name="rest_code_417660358bd94b8f8291526aadc85be5-5" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-5"></a>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-6" name="rest_code_417660358bd94b8f8291526aadc85be5-6" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-6"></a> <span class="c1"># Calculated properties:</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-7" name="rest_code_417660358bd94b8f8291526aadc85be5-7" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-7"></a> <span class="nd">@property</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-8" name="rest_code_417660358bd94b8f8291526aadc85be5-8" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-8"></a> <span class="k">def</span> <span class="nf">balance_due</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">Decimal</span><span class="p">:</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-9" name="rest_code_417660358bd94b8f8291526aadc85be5-9" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-9"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">total_purchased</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">amount_paid</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-10" name="rest_code_417660358bd94b8f8291526aadc85be5-10" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-10"></a>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-11" name="rest_code_417660358bd94b8f8291526aadc85be5-11" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-11"></a> <span class="nd">@property</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-12" name="rest_code_417660358bd94b8f8291526aadc85be5-12" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-12"></a> <span class="k">def</span> <span class="nf">has_payment_outstanding</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
<a id="rest_code_417660358bd94b8f8291526aadc85be5-13" name="rest_code_417660358bd94b8f8291526aadc85be5-13" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_417660358bd94b8f8291526aadc85be5-13"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">balance_due</span> <span class="o">></span> <span class="mi">0</span>
</pre></div>
<p>Very often you find yourself in a situation like this:</p>
<ul class="simple">
<li><p>Sometimes you have already loaded an object from the DB, and want to know a calculated value like “does this account have an outstanding payment?”. This shouldn’t execute any more database queries, since you’ve already loaded everything you need to answer that question.</p></li>
<li><p>But sometimes, you want to re-use this logic to do something like “get me all the accounts that have outstanding payments”, and it is vital for efficiency that we do the filtering in the database as a SQL <code class="docutils literal">WHERE</code> clause, rather than loading all the records into a Python process and filtering there.</p></li>
</ul>
<p>How could we do this in SQLAlchemy <strong>without duplicating the logic</strong> regarding <code class="docutils literal">balance_due</code> and <code class="docutils literal">has_outstanding_payment</code>?</p>
<p>The answer is <a class="reference external" href="https://docs.sqlalchemy.org/en/20/orm/extensions/hybrid.html">hybrid attributes</a>:</p>
<ul class="simple">
<li><p><code class="docutils literal">from sqlalchemy.ext.hybrid import hybrid_property</code></p></li>
<li><p>replace <code class="docutils literal">property</code> with <code class="docutils literal">hybrid_property</code> on the two properties.</p></li>
</ul>
<p><strong>That is all</strong>. Then you can do:</p>
<div class="code"><pre class="code python"><a id="rest_code_f7082a943d034cafbfca71e8c6489996-1" name="rest_code_f7082a943d034cafbfca71e8c6489996-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_f7082a943d034cafbfca71e8c6489996-1"></a><span class="n">select</span><span class="p">(</span><span class="n">Account</span><span class="p">)</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">Account</span><span class="o">.</span><span class="n">has_payment_outstanding</span> <span class="o">==</span> <span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>This will generate a SQL query that looks like this:</p>
<div class="code"><pre class="code SQL"><a id="rest_code_6e503af4b3354dd3ad9d4394563f28a0-1" name="rest_code_6e503af4b3354dd3ad9d4394563f28a0-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_6e503af4b3354dd3ad9d4394563f28a0-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="n">account</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">account</span><span class="p">.</span><span class="n">amount_paid</span><span class="p">,</span><span class="w"> </span><span class="n">account</span><span class="p">.</span><span class="n">total_purchased</span><span class="w"></span>
<a id="rest_code_6e503af4b3354dd3ad9d4394563f28a0-2" name="rest_code_6e503af4b3354dd3ad9d4394563f28a0-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_6e503af4b3354dd3ad9d4394563f28a0-2"></a><span class="k">FROM</span><span class="w"> </span><span class="n">account</span><span class="w"></span>
<a id="rest_code_6e503af4b3354dd3ad9d4394563f28a0-3" name="rest_code_6e503af4b3354dd3ad9d4394563f28a0-3" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_6e503af4b3354dd3ad9d4394563f28a0-3"></a><span class="k">WHERE</span><span class="w"> </span><span class="p">(</span><span class="n">account</span><span class="p">.</span><span class="n">total_purchased</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">account</span><span class="p">.</span><span class="n">amount_paid</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
</pre></div>
<p>What’s going on here? If you have a normal model instance <code class="docutils literal">an_account</code>, retrieved from a database query, and you do <code class="docutils literal">an_account.has_payment_outstanding</code>, then in the <code class="docutils literal">has_payment_outstanding</code> function body above, everything is normal: <code class="docutils literal">self</code> is bound to <code class="docutils literal">an_account</code>, attributes like <code class="docutils literal">total_purchased</code> will be <code class="docutils literal">Decimal</code> objects that have been loaded from the database.</p>
<p>However, if you use <code class="docutils literal">Account.has_payment_outstanding</code>, the <code class="docutils literal">self</code> variable gets bound to a different type of object (the <code class="docutils literal">Account</code> class or some proxy), and so things like <code class="docutils literal">self.total_purchased</code> instead resolve to objects representing columns/fields. These classes have appropriate “dunder” methods defined, (<code class="docutils literal">__add__</code>, <code class="docutils literal">__gt__</code> etc) so that operations done on them, such as maths and comparisons, instead of returning values immediately, return new expression objects that track what operations were done. These can then be compiled to SQL later on. So we can execute the filtering as a WHERE clause in the DB.</p>
<p>The point here is: we are passing both “normal” and “instrumented” types through the same code in order to completely change our execution strategy. This allows us to effectively compile our Python code into SQL on the fly. This is essentially identical to Hillel’s example of passing instrumented objects (“Replacer” class) through normal code to extract certain information about what operations were done.</p>
<p>This is a very neat feature in SQLAlchemy that I’m rather jealous of as a Django user. If you want the same efficiency in Django, you have to define the instance properties and the database filtering separately, and usually physically not next to each other in the code. The closest we have is <a class="reference external" href="https://docs.djangoproject.com/en/4.1/ref/models/expressions/#query-expressions">Query expressions</a> but they don’t work quite the same.</p>
</section>
<section id="pony-orm">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-4" role="doc-backlink">Pony ORM</a></h2>
<p>This ORM has a way of writing SQL select queries that appears even more magical. Using an example from <a class="reference external" href="https://ponyorm.org/">their home page</a>, you write code like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_56c1ef670f0d4a29b02ebf928483a2ea-1" name="rest_code_56c1ef670f0d4a29b02ebf928483a2ea-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_56c1ef670f0d4a29b02ebf928483a2ea-1"></a><span class="n">select</span><span class="p">(</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">Customer</span> <span class="k">if</span> <span class="nb">sum</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">orders</span><span class="o">.</span><span class="n">price</span><span class="p">)</span> <span class="o">></span> <span class="mi">1000</span><span class="p">)</span>
</pre></div>
<p>The result of this is a SQL query that looks like this:</p>
<div class="code"><pre class="code SQL"><a id="rest_code_45902b2eed014a7da13893377fdb27a5-1" name="rest_code_45902b2eed014a7da13893377fdb27a5-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-1"></a><span class="k">SELECT</span><span class="w"> </span><span class="ss">"c"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"></span>
<a id="rest_code_45902b2eed014a7da13893377fdb27a5-2" name="rest_code_45902b2eed014a7da13893377fdb27a5-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-2"></a><span class="k">FROM</span><span class="w"> </span><span class="ss">"customer"</span><span class="w"> </span><span class="ss">"c"</span><span class="w"></span>
<a id="rest_code_45902b2eed014a7da13893377fdb27a5-3" name="rest_code_45902b2eed014a7da13893377fdb27a5-3" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-3"></a><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="ss">"order"</span><span class="w"> </span><span class="ss">"order-1"</span><span class="w"></span>
<a id="rest_code_45902b2eed014a7da13893377fdb27a5-4" name="rest_code_45902b2eed014a7da13893377fdb27a5-4" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-4"></a><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="ss">"c"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">"order-1"</span><span class="p">.</span><span class="ss">"customer"</span><span class="w"></span>
<a id="rest_code_45902b2eed014a7da13893377fdb27a5-5" name="rest_code_45902b2eed014a7da13893377fdb27a5-5" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-5"></a><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="ss">"c"</span><span class="p">.</span><span class="ss">"id"</span><span class="w"></span>
<a id="rest_code_45902b2eed014a7da13893377fdb27a5-6" name="rest_code_45902b2eed014a7da13893377fdb27a5-6" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_45902b2eed014a7da13893377fdb27a5-6"></a><span class="k">HAVING</span><span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="k">SUM</span><span class="p">(</span><span class="ss">"order-1"</span><span class="p">.</span><span class="ss">"total_price"</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1000</span><span class="w"></span>
</pre></div>
<p>A normal understanding of generator expressions suggests that the <code class="docutils literal">select</code> function is consuming a generator. But that couldn’t explain the behaviour here. Instead, it actually <a class="reference external" href="https://github.com/ponyorm/pony/blob/27593ffc74184bc334dd301a86fc5f40fdd3ad87/pony/orm/core.py#L5542">introspects the frame object of the calling code</a>, then <a class="reference external" href="https://github.com/ponyorm/pony/blob/27593ffc74184bc334dd301a86fc5f40fdd3ad87/pony/orm/decompiling.py#L22">decompiles the byte code of the generator expression object it finds</a>, and builds a <a class="reference external" href="https://github.com/ponyorm/pony/blob/27593ffc74184bc334dd301a86fc5f40fdd3ad87/pony/orm/core.py#L5669">Query</a> based on the <a class="reference external" href="https://docs.python.org/3/library/ast.html">AST</a> objects.</p>
<p>PonyORM doesn’t advertise all that, of course. It advertises a “beautiful” syntax for writing ORM code, because that’s what matters.</p>
</section>
<section id="django">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-5" role="doc-backlink">Django</a></h2>
<p>This is the web framework I know well, as I used to contribute significantly, and I’ll pick just a few important examples from the ORM, and then from the broader ecosystem.</p>
<section id="foreignkey">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-6" role="doc-backlink">ForeignKey</a></h3>
<p>Suppose, to pick one example of many, you are writing <a class="reference external" href="https://github.com/django-otp/django-otp/">django-otp</a>, a third party library that provides a <a class="reference external" href="https://en.wikipedia.org/wiki/One-time_password">One Time Password</a> implementation for <a class="reference external" href="https://en.wikipedia.org/wiki/Multi-factor_authentication">2FA requirements</a>. You want to create a table of TOTP devices that are linked to user accounts, and so you have something like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_329c9295a4804c208d653734ffd2ca36-1" name="rest_code_329c9295a4804c208d653734ffd2ca36-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_329c9295a4804c208d653734ffd2ca36-1"></a><span class="k">class</span> <span class="nc">TOTPDevice</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<a id="rest_code_329c9295a4804c208d653734ffd2ca36-2" name="rest_code_329c9295a4804c208d653734ffd2ca36-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_329c9295a4804c208d653734ffd2ca36-2"></a> <span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="s1">'auth.User'</span><span class="p">,</span> <span class="n">related_name</span><span class="o">=</span><span class="s1">'totp_devices'</span><span class="p">)</span>
</pre></div>
<p>Later on, you have code that starts with a <code class="docutils literal">User</code> object and retrieves their TOTP devices, and it looks something like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_2cead7e41fb54c83bfbe5bd6cf57152b-1" name="rest_code_2cead7e41fb54c83bfbe5bd6cf57152b-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_2cead7e41fb54c83bfbe5bd6cf57152b-1"></a><span class="n">devices</span> <span class="o">=</span> <span class="n">user</span><span class="o">.</span><span class="n">totp_devices</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
</pre></div>
<p>This is interesting, because my <code class="docutils literal">user</code> variable is an instance of a <code class="docutils literal">User</code> model that was provided by core Django, which has no knowledge of the third party project that provides the <code class="docutils literal">TOTPDevice</code> model.</p>
<p>In fact it goes further: I may not be using Django’s <code class="docutils literal">User</code> at all, but my own custom <code class="docutils literal">User</code> class, and the <code class="docutils literal">TOTPDevice</code> model can easily support that too just by doing this:</p>
<div class="code"><pre class="code python"><a id="rest_code_305672d98d074859b838f6b37782fd15-1" name="rest_code_305672d98d074859b838f6b37782fd15-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_305672d98d074859b838f6b37782fd15-1"></a><span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="s2">"AUTH_USER_MODEL"</span><span class="p">,</span> <span class="s2">"auth.User"</span><span class="p">))</span>
</pre></div>
<p>This means that my <code class="docutils literal">User</code> model has no knowledge of the <code class="docutils literal">TOTPDevice</code> class, nor the other way around, yet instances of these classes both get wired up to refer to each other.</p>
<p>What is actually going on to enable this?</p>
<p>When you import Django and call <code class="docutils literal">setup()</code>, it imports all the apps in your
project. When it comes to the <code class="docutils literal">TOTPDevice</code> class, it sees the <code class="docutils literal">"auth.User"</code>
reference and finds the class it refers to. It then <strong>modifies that
class</strong>, adding a <code class="docutils literal">totp_devices</code> <a class="reference external" href="https://docs.python.org/3/glossary.html#term-descriptor">descriptor</a> object to the class attributes.</p>
<p>This is <strong>run-time type modification</strong>.</p>
<p>The result is that when you do <code class="docutils literal">user.totp_devices</code>, you get a <code class="docutils literal">Manager</code> instance that does queries against the <code class="docutils literal">TOTPDevice</code> table. It is a specific kind of manager, known as a <code class="docutils literal">RelatedManager</code>, with the special property that it automatically does the correct <code class="docutils literal">filter()</code> calls to limit returned values to those related to the model instance, among other things.</p>
</section>
<section id="relatedmanager">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-7" role="doc-backlink">RelatedManager</a></h3>
<p>The <code class="docutils literal">RelatedManager</code> class is interesting in a number of ways. First, it is <a class="reference external" href="https://github.com/django/django/blob/d54717118360e8679aa2bd0c5a1625f3e84712ba/django/db/models/fields/related_descriptors.py#L632">created as a closure</a> – meaning the class itself is created inside a function that is called for each relationship. This is <strong>run-time type creation</strong>.</p>
<p>Second, there are some additional things it needs to support. In Django, projects often override <code class="docutils literal">Manager</code> classes, and the related <code class="docutils literal">QuerySet</code> classes, to provide a lot of model layer functionality. This is Django’s answer to the “Repository” pattern – popular in some languages, but laborious to create and painful to use <a class="reference external" href="https://lukeplant.me.uk/blog/posts/evolution-of-a-django-repository-pattern/">compared to what we have</a>.</p>
<p>Because of this, it’s important that the <code class="docutils literal">RelatedManager</code> preserves any custom <code class="docutils literal">Manager</code> and <code class="docutils literal">QuerySet</code> behaviour defined on the target model. So, the solution is simply that Django makes the created <code class="docutils literal">RelatedManager</code> class inherit from your custom <code class="docutils literal">Manager</code>. This is <strong>dynamic sub-classing</strong> – sub-classing of a class that is discovered at run-time. In other OOP languages, you inherit from framework classes. In Python, framework inherits you!</p>
</section>
<section id="manytomany-models">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-8" role="doc-backlink">ManyToMany models</a></h3>
<p>One common need in database applications is to have <a class="reference external" href="https://en.wikipedia.org/wiki/Many-to-many_(data_model)">many-to-many relationships</a> between two models. Typically this can be modelled with a separate table that has foreign keys to the two related tables.</p>
<p>To make this easy, Django provides a <code class="docutils literal">ManyToManyField</code>. For simple cases, it’s
tedious to have to create a model for the intermediate table yourself, so of
course Django just <a class="reference external" href="https://github.com/django/django/blob/d54717118360e8679aa2bd0c5a1625f3e84712ba/django/db/models/fields/related.py#L1279">creates it for you</a>
if you don’t provide your own, using <a class="reference external" href="https://docs.python.org/3/library/functions.html#type">type() with 3 arguments</a>. This is again <strong>run-time type creation</strong>.</p>
</section>
<section id="consequences">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-9" role="doc-backlink">Consequences</a></h3>
<p>These examples of run-time type modification or creation are perhaps not the most extreme or mind-bending. But they are something even better: useful. It’s these features, and things like them, that enable an ecosystem of third party Django libraries that can integrate with your own code without any problems.</p>
<p>Python also always gives us enough flexibility to have a good backwards compatibility story – so that, for example, the swappable User model was introduced with an absolute minimum of fuss for both projects and pluggable Django apps.</p>
<p>I’m interested in functional programming, Haskell in particular – this blog even ran on Haskell for a time – so I always take interest in developments in the Haskell web framework world. I see lots of cool things, but it always seems that the ecosystems around Haskell web frameworks are at least 10 years behind Django. One key issue is that in contrast to Django or other Python frameworks, Haskell web frameworks almost always have some kind of code generation layer. This can be made to work well for the purposes envisaged by the framework authors, but it never seems to enable the ecosystem of external packages that highly dynamic typing supports.</p>
<p>Please note that I’m not claiming here that Python is better than Haskell or anything so grand. I’m simply claiming that Python does enable very useful things to be built, and those things are made possible and easy <strong>because of Python’s design, rather than despite it</strong>.</p>
<p>I think this is important to say. Python has become massively more popular than it was when I first started to use it, and there are increasing numbers of people who use it only because of network effects, and don’t understand why it got so popular in the first place. These people can sometimes assume that it’s fundamentally a poorly designed language that we are just lumped with – today’s PHP – whose best trajectory is to make it more like Java or C# or Rust etc. I think that would be a big mistake.</p>
</section>
<section id="baserow">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-10" role="doc-backlink">Baserow</a></h3>
<p>One project that takes Python’s run-time type creation further is <a class="reference external" href="https://baserow.io/">Baserow</a>. In their case, their customers create database applications, and the metadata for those tables is stored in … tables. They like Django and want to use it as much as possible. But they also want their customers’ actual data tables to be normal RDBMS tables, and therefore benefit from all the typical RDBMS features to make their tables fast and compact etc. (I’ve seen and worked on systems that took the opposite approach, where customer schema was relegated to second class storage – essentially a key-value table – and the performance was predictably awful).</p>
<p>And they want plug-in authors to be able to use Django too! Some people are just greedy! They <a class="reference external" href="https://baserow.io/blog/how-baserow-lets-users-generate-django-models">have a nice article describing how they achieved all this</a>: in short, they use <code class="docutils literal">type()</code> for run-time type creation and then leverage everything Django gives them.</p>
<p>This has the interesting effect that the metadata tables, along with their own business tables, live <strong>at the same level</strong> as their customers’ tables which are described by those metadata tables. This “meta and non-meta living at the same level” is a neat illustration of what Python’s type system gives you:</p>
<p>When you first discover the mind-bending relationships around <code class="docutils literal">type(type) == type</code>, you might think of an infinitely-recursive relationship. But actually, an infinite relationship has been flattened to being just 3 layers deep – instance, class, metaclass. The last layer just recurses onto itself. The infinity has been tamed, and brought into the same structures that you can already deal with, and without changing language or switching to code generation techniques. This is one reason why many examples of the “hyper-programming” that Hillel talks about can just be dismissed as normal programming – but they are simply hyper-programming that you are now taking for granted.</p>
</section>
<section id="cciw-data-retention-policy">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-11" role="doc-backlink">CCiW data retention policy</a></h3>
<p><a class="reference external" href="https://www.cciw.co.uk/">CCiW</a> is a small charity I’ve been involved with for a long time. When I came to implement its <a class="reference external" href="https://gdpr-info.eu/">GDPR</a> and data retention policies, I found another example of how useful it is having access to Django’s meta-layer (generic framework code) on the same level as my normal data layer (business specific classes and tables), in ways that often aren’t the case for statically typed languages that resort to code-generation techniques for some of these things.</p>
<p>I wanted to have a data retention policy that was both human readable and machine readable, so that:</p>
<ul class="simple">
<li><p>We don’t have keep two separate documents in sync.</p></li>
<li><p>the CCiW committee and other interested parties would be able to read the policy that actually gets applied, rather than merely what the policy was supposed to be.</p></li>
<li><p>I could have machine level checking of the exhaustiveness of the policy.</p></li>
</ul>
<p>My solution was to split the data retention policy into two parts:</p>
<ul class="simple">
<li><p>a heavily commented, human-and-machine readable <a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/master/config/data_retention.yaml">Literate YAML file</a> with a <a class="reference external" href="https://www.cciw.co.uk/data-retention-policy/">nicely formatted version</a>, that I can genuinely claim <strong>is</strong> our data retention policy, and that it is automatically applied,</p></li>
<li><p>and a <a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/master/cciw/data_retention/applying.py">Python implementation</a> that reads this file and applies it, along with some additional logic.</p></li>
</ul>
<p>A key part of the neatness of this solution is that the generic, higher level code (which reads in a YAML file, and therefore has to treat field names and table names as strings), and the business/domain specific logic can sit right next to each other. The end result is something that’s both efficient and elegant, with great separation of concerns, and virtually self-maintaining – it complains at me automatically if I fail to update it when adding new fields or tables.</p>
<p>In terms of performance, the daily application for the data retention policy for the entire database requires, at the moment, just 5 UPDATE and 3 DELETE queries, run once a day. This is made possible by:</p>
<ul class="simple">
<li><p>using the power of an ORM,</p></li>
<li><p>using generic code to build up <code class="docutils literal">**kwargs</code> to <a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/37e6d69064c9a5d1372809fa2d723a0e203d21c3/cciw/data_retention/applying.py#L86">pass</a> to <a class="reference external" href="https://docs.djangoproject.com/en/4.1/ref/models/querysets/#django.db.models.query.QuerySet.update">QuerySet.update()</a>,</p></li>
<li><p>seamlessly integrating these two with <a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/37e6d69064c9a5d1372809fa2d723a0e203d21c3/cciw/data_retention/applying.py#L229">business specific logic</a>.</p></li>
</ul>
<p><a class="reference external" href="https://gist.github.com/spookylukey/eeafa220b61e479694e2acf44902b6e1">Here is one of the queries the ORM generates</a>, which is complex enough that I wouldn’t attempt to write this by hand, but it correctly applies business logic like not erasing any data of people who still owe us money, and combines all the erasure that needs to be done into a single query.</p>
</section>
</section>
<section id="query-tracing">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-12" role="doc-backlink">Query tracing</a></h2>
<p>A common need in database web applications is development tools that monitor what database queries your code is generating and where they are coming from in the code. In Python this is made very easy thanks to <a class="reference external" href="https://docs.python.org/3/library/sys.html?highlight=_getframe#sys._getframe">sys._getframe</a> which gives you frame objects of the currently running program.</p>
<p>For Django, the go-to tool that uses this is <a class="reference external" href="https://github.com/jazzband/django-debug-toolbar">django-debug-toolbar</a>, which does an excellent job of pinpointing where queries are coming from.</p>
<p>There have been times when it has failed me, however. In particular, when you are working with generic code, such as the Django admin or <a class="reference external" href="https://www.django-rest-framework.org/">Django REST framework</a>, in which the fields and properties that will be fetched may be defined as strings in declarative code, a stack trace alone is not enough to work out what is triggering the queries. For example, you might have an admin class defined like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_ce953786759e4a2cac950c3dc319cf48-1" name="rest_code_ce953786759e4a2cac950c3dc319cf48-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_ce953786759e4a2cac950c3dc319cf48-1"></a><span class="k">class</span> <span class="nc">MyModelAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<a id="rest_code_ce953786759e4a2cac950c3dc319cf48-2" name="rest_code_ce953786759e4a2cac950c3dc319cf48-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_ce953786759e4a2cac950c3dc319cf48-2"></a> <span class="n">list_display</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"field1"</span><span class="p">,</span> <span class="s2">"field2"</span><span class="p">,</span> <span class="s2">"field3"</span><span class="p">,</span> <span class="o">...</span><span class="p">]</span>
</pre></div>
<p>And the stack trace points you to <a class="reference external" href="https://github.com/django/django/blob/4470c2405c8dbb529501f9d78753e2aa4e9653a2/django/contrib/admin/templatetags/admin_list.py#L212">this code</a>:</p>
<div class="code"><pre class="code python"><a id="rest_code_e9d42b47cd754134816dd91c35dad923-1" name="rest_code_e9d42b47cd754134816dd91c35dad923-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_e9d42b47cd754134816dd91c35dad923-1"></a><span class="k">for</span> <span class="n">field_index</span><span class="p">,</span> <span class="n">field_name</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">cl</span><span class="o">.</span><span class="n">list_display</span><span class="p">):</span>
<a id="rest_code_e9d42b47cd754134816dd91c35dad923-2" name="rest_code_e9d42b47cd754134816dd91c35dad923-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_e9d42b47cd754134816dd91c35dad923-2"></a> <span class="n">f</span><span class="p">,</span> <span class="n">attr</span><span class="p">,</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lookup_field</span><span class="p">(</span><span class="n">field_name</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">cl</span><span class="o">.</span><span class="n">model_admin</span><span class="p">)</span>
</pre></div>
<p>It’s correct, but not helpful. I need to know what the value of the local variable <code class="docutils literal">field_name</code> is in that loop to work out what is actually causing these queries.</p>
<p>In addition, in one case I was actually working with DRF endpoints, not the HTML endpoints the debug toolbar is designed for.</p>
<p>So, I wrote <a class="reference external" href="https://gist.github.com/spookylukey/cafeadfbe776ace223e5520bb0a93652#file-db_debug-py-L313">my own utilities</a> that, in addition to extracting the stack, would also include certain local variables for specified functions/methods. I then needed to add some aggregation functionality and pretty-printing for the SQL queries too. Also, I wrote a version of <a class="reference external" href="https://docs.djangoproject.com/en/4.1/topics/testing/tools/#django.test.TransactionTestCase.assertNumQueries">assertNumQueries</a> that used this better reporting.</p>
<p>This was highly effective, and enabled me and members of my team to tackle these DRF endpoints that had got entirely out of hand, often taking them from 10,000+ database queries (!) to 10 or 20.</p>
<p>This is relatively advanced stuff, but not actually all that hard, and it’s within reach of many developers. It doesn’t require learning a whole new language or deep black magic. You can call <code class="docutils literal">sys._getframe</code> interactively from a REPL and find out what it does. The biggest hurdle is actually making the mental leap that says “I need to build this, and with Python, I probably can”.</p>
</section>
<section id="time-machine-and-pyfakefs">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-13" role="doc-backlink">time-machine and pyfakefs</a></h2>
<p>As an example of “entire program transformation”, <a class="reference external" href="https://github.com/adamchainz/time-machine">time-machine</a> is an extremely useful library that mocks out date/time functions across your entire program, and <a class="reference external" href="https://github.com/pytest-dev/pyfakefs">pyfakefs</a> is one that does the same thing for file-system calls.</p>
<p>These contrast with libraries like <a class="reference external" href="https://docs.python.org/3/library/unittest.mock.html">unittest.mock</a> do that do monkey patching on a more limited, module-by-module basis.</p>
<p>This technique is primarily useful in automated test suites, but it has a profound impact on the rest of your code base. In other languages, if you want to mock out “all date/time access” or “all filesystem access”, you may end up with a lot of tedious and noisy code to pass these dependencies through layers of code, or complex automatic dependency injection frameworks to avoid that. In Python, those things are rarely necessary, precisely because of things like time-machine and pyfakefs – that is, because your entire program can be manipulated at run-time. Your code base then has the massive benefit of a direct and simple style.</p>
</section>
<section id="environment-detection">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-14" role="doc-backlink">Environment detection</a></h2>
<p>My current employer is <a class="reference external" href="https://datapane.com/">Datapane</a> who make tools for data apps. Many of our customers use <a class="reference external" href="https://jupyter.org/">Jupyter</a> or similar environments. To make things work really smoothly, our library codes detects the environment it is running in and responds, and in some cases interacts with this environment (courtesy of <a class="reference external" href="https://datacrayon.com/">Shahin</a>, our Jupyter guy). This is an application of Python’s great support for introspection of the running program. There are a bunch of ways you can do this kind of thing:</p>
<ul class="simple">
<li><p>checking the system environment in <code class="docutils literal">os.environ</code></p></li>
<li><p>checking the contents of <code class="docutils literal">sys.modules</code></p></li>
<li><p>using <code class="docutils literal">sys._getframe</code> to examine how you are being called.</p></li>
<li><p>attempting to use <a class="reference external" href="https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.getipython.html#IPython.core.getipython.get_ipython">get_ipython</a> and seeing if it works etc.</p></li>
</ul>
<p>This is an example of the “whole runtime environment” being dynamic and introspectable, and Jupyter Notebook and its huge ecosystem make great use of this.</p>
<p>With some of the bigger features we’re working on at the moment at Datapane, we’re needing more advanced ways of adjusting to the running environment. Of course, as long it works, none of the implementation matters to our customers, so we don’t advertise any of that. Our marketing tagline for this is “Jupyter notebook to a shareable data app in 10 seconds”, not “we’re in your Python process, looking at your sys.modules”.</p>
<p>After doing a grep through my <code class="docutils literal"><span class="pre">site-packages</span></code>, I found that doing <code class="docutils literal">sys._getframe</code> for different kinds of environment detection is relatively common – often used for things like “raise a deprecation warning, but not if we are being called from these specific callees, like our own code”. Here’s just one more example:</p>
<p><a class="reference external" href="https://boltons.readthedocs.io/">boltons</a> provides a <a class="reference external" href="https://boltons.readthedocs.io/en/latest/typeutils.html?highlight=sentinel#boltons.typeutils.make_sentinel">make_sentinel</a> function. The docs state that if you want “pickleability”, the sentinel must be stored in a module-level constant. But the implementation goes further and <a class="reference external" href="https://boltons.readthedocs.io/en/latest/_modules/boltons/typeutils.html#make_sentinel">checks</a> you are doing that using a <code class="docutils literal">sys._getframe</code> trick. This is just a simple usability enhancement in which code checks that it is being used correctly, made possible by Python’s deep introspection support, but this kind of thing adds up. You will find many similar things in small amounts scattered across different libraries.</p>
</section>
<section id="fluent-compiler">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-15" role="doc-backlink">fluent-compiler</a></h2>
<p><a class="reference external" href="https://projectfluent.org/">Fluent</a> is a localisation system by Mozilla. I wrote and contributed the initial version of the official <a class="reference external" href="https://github.com/projectfluent/python-fluent">fluent.runtime</a> Python implementation, which is an interpreter for the Fluent language, and I also wrote a second implementation, <a class="reference external" href="https://github.com/django-ftl/fluent-compiler">fluent-compiler</a>.</p>
<p>Of all the libraries I’ve written, this was the one I enjoyed most, and it’s also the least popular it seems – not surprising, since GNU gettext provides a great 90% solution, which is enough for just about everyone, apart from Mozilla and, for some reason I can’t quite remember, me. However, I do know that Mozilla are actually using my second implementation in some of their web projects, via <a class="reference external" href="https://github.com/django-ftl/django-ftl">django-ftl</a>, and I’m using it, and it has a few GitHub stars, so that counts as real world!</p>
<p>Here are some of the Hillel-worthy Python techniques I used:</p>
<section id="compile-to-python">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-16" role="doc-backlink">Compile-to-Python</a></h3>
<p>In fluent-compiler, the implementation strategy I took was to compile the parsed Fluent AST to Python code, and <code class="docutils literal">exec</code> it. I actually use Python AST nodes rather than strings, for various security reasons, but this is basically the same as doing <code class="docutils literal">eval</code>, and that same technique is used by various other projects like <a class="reference external" href="https://jinja.palletsprojects.com/">Jinja</a> and <a class="reference external" href="https://www.makotemplates.org/">Mako</a>.</p>
<p>If anything qualifies as “programs that create programs on the fly”, then using <a class="reference external" href="https://docs.python.org/3/library/functions.html#exec">exec</a>, <a class="reference external" href="https://docs.python.org/3/library/functions.html#eval">eval</a> or <a class="reference external" href="https://docs.python.org/3/library/functions.html#compile">compile</a> must do so! The main advantage of this technique here is speed. It works particularly well with Fluent, because with a bit of static analysis, we can often completely eliminate the overhead that would otherwise be caused by its more advanced features, like <a class="reference external" href="https://projectfluent.org/fluent/guide/terms.html">terms and parameterized terms</a>, so that at run-time they cost us nothing.</p>
<p>This works even better when combined with PyPy. For the simple and common cases, under CPython 3.11 my benchmarks show a solution using fluent-compiler is about 15% faster than GNU gettext, while under PyPy it’s more than twice as fast. You should take these numbers with a pinch of salt, but I am confident that the result is not slow, despite having far more advanced capabilities than GNU gettext, which is not true for the first implementation – the compiler is about 5-10x faster than the interpreter for common cases on CPython.</p>
<p>Additionally, there are some neat tricks you can do when implementing a compiler using the same language that you are compiling to, like <a class="reference external" href="https://github.com/django-ftl/fluent-compiler/blob/6b262af7ce7c5608516aa24aff868ff66f95e0af/src/fluent_compiler/compiler.py#L1366">evaluating some things ahead of time that you know are constants</a>.</p>
</section>
<section id="dynamic-test-methods">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-17" role="doc-backlink">Dynamic test methods</a></h3>
<p>While developing the second implementation, I used the first implementation as a reference. I didn’t want to duplicate every test, or really do anything manually to every test, I just wanted a large sub-set of the test suite to automatically test both implementations. I also wanted failures to clearly indicate which implementation had failed, i.e. I wanted them to run as separate test cases, because the reference implementation could potentially be at fault in some corner cases.</p>
<p>As I was using unittest, my solution was this: I <a class="reference external" href="https://github.com/django-ftl/fluent-compiler/blob/d1481d61e0bc1a28a228a4b6d5258350d436e765/fluent.runtime/tests/__init__.py#L12">added a class decorator</a> that modified the test classes by removing every method that started with <code class="docutils literal">test_</code>, replacing it with two methods, one for each implementation.</p>
<p>This provided almost exactly the same functionality as one of Hillel’s wished-for examples:</p>
<blockquote>
<p>Add an output assertion to an optimized function in dev/testing, checking that on all invocations it matches the result of an unoptimized function</p>
</blockquote>
<p>I just used a slightly different technique that better suited my needs, but also made great use of run-time program manipulation.</p>
</section>
<section id="morph-into">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-18" role="doc-backlink"><code class="docutils literal">morph_into</code></a></h3>
<p>As part of the Fluent-to-Python compilation process, I have a tree of AST objects that I want to simplify. Simplifications include things like replacing a “string join” operation that has just one string, with that single string – so we need completely different types of objects. Even in a language that has mutation this can be a bit of a pain, because we’ve got to update the parent object and tell it to replace this child with a different child, and there are many different types of parent object with very different shapes. So my solution was <code class="docutils literal">morph_into</code>:</p>
<div class="code"><pre class="code python"><a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-1" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-1" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-1"></a><span class="k">def</span> <span class="nf">morph_into</span><span class="p">(</span><span class="n">item</span><span class="p">,</span> <span class="n">new_item</span><span class="p">):</span>
<a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-2" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-2" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-2"></a> <span class="sd">"""</span>
<a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-3" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-3" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-3"></a><span class="sd"> Change `item` into `new_item` without changing its identity</span>
<a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-4" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-4" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-4"></a><span class="sd"> """</span>
<a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-5" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-5" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-5"></a> <span class="n">item</span><span class="o">.</span><span class="vm">__class__</span> <span class="o">=</span> <span class="n">new_item</span><span class="o">.</span><span class="vm">__class__</span>
<a id="rest_code_31ecff3114ad4381bf1274dff2c67ba8-6" name="rest_code_31ecff3114ad4381bf1274dff2c67ba8-6" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#rest_code_31ecff3114ad4381bf1274dff2c67ba8-6"></a> <span class="n">item</span><span class="o">.</span><span class="vm">__dict__</span> <span class="o">=</span> <span class="n">new_item</span><span class="o">.</span><span class="vm">__dict__</span>
</pre></div>
<p>With this solution, we leave the identity of the object the same, so none of the pointers to it need to be updated. But its type and all associated data is changed into something else, so that, other than <a class="reference external" href="https://docs.python.org/3/library/functions.html#id">id()</a>, the behaviour of <code class="docutils literal">item</code> will now be indistinguishable from <code class="docutils literal">new_item</code>. Not many languages allow you to do this!</p>
<p>I spent quite a lot of time wondering if I should be ashamed or proud of this code. But it turned out there was nothing to be ashamed of – it saved me writing a bunch of code and has had really no downsides.</p>
<p>Now, this technique won’t work for some things, like builtin primitives, so it can’t be completely generalised. But it doesn’t need that in order to be useful – all the objects I want to do this on are my own custom AST classes that share an interface, so it works and is “type safe” in its own way.</p>
<p>I’m far from the first person to discover this kind of trick when implementing compilers. In <a class="reference external" href="https://thume.ca/2019/04/29/comparing-compilers-in-rust-haskell-c-and-python/">this comparison of several groups of people working on a compiler project</a>, one of the most impressive results came from a single-person team who chose Python. She used way less code, and implemented way more features than the other groups, who all had multiple people on their teams and were using C++/Rust/Haskell etc. Fancy metaprogramming and dynamic typing were a big part of the difference, and by the sounds of it she used exactly the same kinds of things I used:</p>
<blockquote>
<p>Another example of the power of metaprogramming and dynamic typing is that we have a 400 line file called <code class="docutils literal">visit.rs</code> that is mostly repetitive boilerplate code implementing a visitor on a bunch of AST structures. In Python this could be a short ~10 line function that recursively introspects on the fields of the AST node and visits them (using the <code class="docutils literal">__dict__</code> attribute).</p>
</blockquote>
<p>Again, I’m not claiming “dynamic typing is better than static typing” – <a class="reference external" href="https://lukeplant.me.uk/blog/posts/you-cant-compare-language-features-only-languages/">I don’t think it’s even meaningful to do that comparison</a>. I’m claiming that highly dynamic meta-programming tricks are indeed a significant part of real Python code, and really do make a big difference.</p>
</section>
</section>
<section id="pytest">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-19" role="doc-backlink">Pytest</a></h2>
<p>Pytest does quite a few dynamic tricks. Hillel wishes that pytests functionality was more easily usable elsewhere, such as from a REPL. I’ve no doubt this is a legitimate complaint – as it happens, my own use cases involve <a class="reference external" href="https://lukeplant.me.uk/blog/posts/repl-python-programming-and-debugging-with-ipython/">sticking a REPL in my test</a>, rather than sticking a pytest in my REPL. However, you can’t claim that pytest isn’t a valid example, or isn’t making use of Python’s dynamism – it does, and it provides a lot of useful functionality as a result, including:</p>
<section id="assert-rewriting">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-20" role="doc-backlink">Assert rewriting</a></h3>
<p>The most obvious is perhaps their <a class="reference external" href="https://docs.pytest.org/en/6.2.x/assert.html">assert rewriting</a>, which relies on modifying the AST of test modules to inject sub-expression information for when asserts fail. It makes test assertions often much more immediately useful.</p>
</section>
<section id="automatic-dependency-injection-of-fixtures">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-21" role="doc-backlink">Automatic dependency injection of fixtures</a></h3>
<p>Pytest provides one of the few cases of automatic dependency injection in Python where I’ve thought it was a good idea. It also makes use of Python’s dynamism to make this dependency injection extremely low ceremony. All you need to do is add a parameter to your test function, giving the parameter the name of the <a class="reference external" href="https://docs.pytest.org/en/6.2.x/fixture.html">fixture</a> you want, and pytest will find that fixture in its registry and pass it to your function.</p>
</section>
</section>
<section id="others">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-22" role="doc-backlink">Others</a></h2>
<p>This post is way too long already, and I’ve done very little actual searching for this stuff – almost all my examples are things that I’ve heard about in the past or done myself, so there must be far more than these in the real world out there. Here are a bunch more I thought of but didn’t have time to expand on:</p>
<ul>
<li><p>PyTorch <a class="reference external" href="https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html">automatic differentiation</a> which uses instrumented objects (similar to the SQLAlchemy example I presume), plus <a class="reference external" href="https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml">some kind of pattern matching on function calls</a> that I haven’t had time to investigate.</p>
<p><a class="reference external" href="https://vmartin.fr/understanding-automatic-differentiation-in-30-lines-of-python.html">Understanding Automatic Differentiation in 30 lines of Python</a> is a great article on how you can build this kind of thing. Crucially, Python’s dynamism makes this kind of thing very accessible to mere mortals.</p>
</li>
<li><p><a class="reference external" href="https://vcrpy.readthedocs.io/en/latest/index.html">VCR.py</a>: monkey patch all HTTP functions and record interactions, so that the second time we run a test we can use canned responses and avoid the network.</p></li>
<li><p>CCiW email tests: <a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/37e6d69064c9a5d1372809fa2d723a0e203d21c3/cciw/utils/tests/base.py#L55">monkey patch Django’s Atomic decorator and mail sending functions</a> to ensure we are using “queued email” appropriately inside transactions.</p></li>
<li><p>Lots of tricks in <a class="reference external" href="https://github.com/radiac/django-tagulous">django-tagulous</a> to improve usability for developers.</p></li>
<li><p><a class="reference external" href="https://numba.pydata.org/">numba</a>: JIT compile and run your Python code on a GPU with a single decorator.</p></li>
<li><p><a class="reference external" href="https://drf-spectacular.readthedocs.io/en/latest/readme.html">drf-spectacular</a>: iterate over all endpoints in a DRF project, introspecting serializers and calling methods with dummy request objects where necessary, to produce an OpenAPI schema.</p></li>
<li><p>In the stdlib, <a class="reference external" href="https://docs.python.org/3/library/functools.html#functools.total_ordering">@total_ordering</a> will look at your class and add missing rich comparison methods.</p></li>
<li><p>depending on an environment flag, <a class="reference external" href="https://github.com/learnscripture/learnscripture.net/blob/3063de7bd364ccf6105b39485830e75b19f902d9/learnscripture/tests/base.py#L232">automatically wrap all UI test cases in a decorator that takes a screenshot if the test fails</a>.</p></li>
</ul>
<p>EDIT: And some more I discovered after publishing this post, which look interesting:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/amakelov/mandala">mandala</a> - “Computations that save, query and version themselves”</p></li>
<li><p><a class="reference external" href="https://github.com/google/latexify_py">latexify</a> - pretty print Python functions using LaTeX</p></li>
<li><p><a class="reference external" href="https://jax.readthedocs.io/en/latest/index.html">JAX</a> which has a JIT compiler of Python code and <a class="reference external" href="https://github.com/hips/autograd">Autograd</a> which implements automatic differentiation of Python code, possibly similar to PyTorch.</p></li>
</ul>
</section>
<section id="conclusion">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-23" role="doc-backlink">Conclusion</a></h2>
<p>Why don’t we talk about these much? I think a large part of the answer is that the Python community cares about solving problems, and not about how clever your code is. Clever code, in fact, is looked down on, which is the right attitude – cleverness for the sake of it is always bad. Problem solving is good though. So libraries and projects that do these things don’t tend to brag about their clever techniques, but the problem that they solve.</p>
<p>Also, many libraries that use these things wrap them up so that you don’t have to know what’s going on – It Just Works. As a newbie, everything about computers is magical and you have to just accept that that’s how they work. Then you take it for granted, and just get on with using it.</p>
<p>On the other hand, for the implementer, once you understand the magic, it stops being magic, it’s just a feature that the language has.</p>
<p>Either way, pretty soon none of these things count as “hyper programming” any more – in one sense, they are just normal Python programming, and that’s the whole point: <strong>Python gives you super powers which are not super powers, they are normal powers</strong>. Everyone gets to use them, and you don’t need to learn a different language to do so.</p>
<p>Perhaps we do need to talk about them more, though. At the very least, I hope my examples have sparked some ideas about the kinds of things that are possible in Python.</p>
<p>Happy hacking!</p>
</section>
<section id="links">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/#toc-entry-24" role="doc-backlink">Links</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/9w7ylg/python_s_disappointing_superpowers">Discussion of this post on Lobsters</a></p></li>
<li><p><a class="reference external" href="https://twitter.com/spookylukey/status/1620851142849863680">Discussion of this post on Twitter</a></p></li>
<li><p><a class="reference external" href="https://news.ycombinator.com/item?id=34611969">Discussion of this post on Hacker News</a></p></li>
</ul>
</section>Test factory functions in Djangohttps://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/2022-11-25T16:07:02Z2022-11-25T16:07:02ZLuke Plant<p>Patterns for creating model instances in Django project test suites, and some anti-patterns</p><p>When writing tests for <a class="reference external" href="https://www.djangoproject.com">Django</a> projects, you
typically need to create quite a lot of instances of database model objects.
This page documents the patterns I recommend, and the ones I don’t.</p>
<p>Before I get going, I should mention that a lot of this can be avoided
altogether if you can separate out database independent logic from your models.
But you can only go so far without serious contortions, and you’ll probably
still need to write a fair number of tests that hit the database.</p>
<nav class="contents" id="contents" role="doc-toc">
<p class="topic-title">Contents</p>
<ul class="simple">
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#the-aim" id="toc-entry-1">The aim</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#custom-factory-functions" id="toc-entry-2">Custom factory functions</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#the-auto-sentinel" id="toc-entry-3">The Auto sentinel</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#constraints-and-sequences" id="toc-entry-4">Constraints and sequences</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#delegation-and-sub-objects" id="toc-entry-5">Delegation and sub-objects</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#special-purpose-factories" id="toc-entry-6">Special purpose factories</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#sensible-and-minimal-defaults" id="toc-entry-7">Sensible and minimal defaults</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#simplified-interface" id="toc-entry-8">Simplified interface</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#type-hints" id="toc-entry-9">Type hints</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#dont-depend-on-defaults" id="toc-entry-10">Don’t depend on defaults</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#enhancements" id="toc-entry-11">Enhancements</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#what-not-to-do" id="toc-entry-12">What not to do</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#json-yaml-fixtures" id="toc-entry-13">JSON/YAML fixtures</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#kwargs" id="toc-entry-14"><code class="docutils literal">**kwargs</code></a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#django-dynamic-fixture" id="toc-entry-15">django-dynamic-fixture</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#factory-boy" id="toc-entry-16">factory_boy</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#but-factory-boy-can-also-create-instances-without-saving-them" id="toc-entry-17">But factory_boy can also create instances without saving them!</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#but-factory-boy-can-specify-related-data" id="toc-entry-18">But factory_boy can specify related data!</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#but-factory-boy-has-faker-integration" id="toc-entry-19">But factory_boy has faker integration!</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#but-factory-boy-has-a-create-batch-method" id="toc-entry-20">But factory_boy has a create_batch method!</a></p></li>
</ul>
</li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#conclusion" id="toc-entry-21">Conclusion</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#footnotes" id="toc-entry-22">Footnotes</a></p></li>
</ul>
</nav>
<section id="the-aim">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-1" role="doc-backlink">The aim</a></h2>
<p>We want the following:</p>
<ul class="simple">
<li><p>Every test should specify each detail about database state it depends on</p></li>
<li><p>The test should not specify any detail it doesn’t depend on</p></li>
<li><p>We should be able to conveniently and succinctly write “what we mean”, without
having to worry about lower level details, especially database schema details
that are not intrinsic to the test.</p></li>
</ul>
<p>These things are important so that you can understand tests in isolation, and so
that changes not relevant to a test should not break that test. Otherwise you
will spend a lot of your time fixing broken tests rather than actually doing the
changes you need to do.</p>
<p>Using Django ORM <a class="reference external" href="https://docs.djangoproject.com/en/stable/ref/models/querysets/#create">create</a> calls
directly in your tests is not a great solution, because database constraints often
force you to specify fields that you are not interested in.</p>
</section>
<section id="custom-factory-functions">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-2" role="doc-backlink">Custom factory functions</a></h2>
<p>The answer to this is simply to create your own “factory” functions, with
optional keyword arguments (preferably <a class="reference external" href="https://lukeplant.me.uk/blog/posts/keyword-only-arguments-in-python/">keyword only</a>) for
almost everything. You can add parameters by hand as and when you need them.</p>
<p>Here are some simple but real examples from the <a class="reference external" href="https://www.cciw.co.uk/">Christian Camps in Wales</a> booking system, which has a <code class="docutils literal">BookingAccount</code> model
and includes the ability to pay by cheque which is a <code class="docutils literal">ManualPayment</code> object:</p>
<div class="code"><pre class="code python"><a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-1" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-1"></a><span class="k">def</span> <span class="nf">create_booking_account</span><span class="p">(</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-2" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-2"></a> <span class="o">*</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-3" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-3"></a> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">"A Booker"</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-4" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-4"></a> <span class="n">address_line1</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">""</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-5" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-5"></a> <span class="n">address_post_code</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">"XYZ"</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-6" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-6"></a> <span class="n">email</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">Auto</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-7" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-7"></a><span class="p">)</span> <span class="o">-></span> <span class="n">BookingAccount</span><span class="p">:</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-8" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-8"></a> <span class="k">return</span> <span class="n">BookingAccount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-9" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-9" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-9"></a> <span class="n">name</span><span class="o">=</span><span class="n">name</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-10" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-10" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-10"></a> <span class="n">email</span><span class="o">=</span><span class="n">email</span> <span class="ow">or</span> <span class="nb">next</span><span class="p">(</span><span class="n">BOOKING_ACCOUNT_EMAIL_SEQUENCE</span><span class="p">),</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-11" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-11" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-11"></a> <span class="n">address_line1</span><span class="o">=</span><span class="n">address_line1</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-12" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-12" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-12"></a> <span class="n">address_post_code</span><span class="o">=</span><span class="n">address_post_code</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-13" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-13" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-13"></a> <span class="p">)</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-14" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-14" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-14"></a>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-15" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-15" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-15"></a><span class="k">def</span> <span class="nf">create_manual_payment</span><span class="p">(</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-16" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-16" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-16"></a> <span class="o">*</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-17" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-17" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-17"></a> <span class="n">account</span><span class="p">:</span> <span class="n">BookingAccount</span> <span class="o">=</span> <span class="n">Auto</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-18" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-18" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-18"></a> <span class="n">amount</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-19" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-19" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-19"></a><span class="p">)</span> <span class="o">-></span> <span class="n">ManualPayment</span><span class="p">:</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-20" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-20" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-20"></a> <span class="k">return</span> <span class="n">ManualPayment</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-21" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-21" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-21"></a> <span class="n">account</span><span class="o">=</span><span class="n">account</span> <span class="ow">or</span> <span class="n">create_booking_account</span><span class="p">(),</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-22" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-22" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-22"></a> <span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-23" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-23" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-23"></a> <span class="n">payment_type</span><span class="o">=</span><span class="n">ManualPaymentType</span><span class="o">.</span><span class="n">CHEQUE</span><span class="p">,</span>
<a id="rest_code_f972bcc034f344c9bfeb6b5272b2340e-24" name="rest_code_f972bcc034f344c9bfeb6b5272b2340e-24" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f972bcc034f344c9bfeb6b5272b2340e-24"></a> <span class="p">)</span>
</pre></div>
<p>You can find the rest of this project’s test factory functions <a class="reference external" href="https://github.com/search?q=%22def+create%22+repo%3Acciw-uk%2Fcciw.co.uk+path%3Afactories.py&type=code&ref=advsearch">with this search on GitHub</a></p>
<p>A few patterns to note:</p>
<section id="the-auto-sentinel">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-3" role="doc-backlink">The Auto sentinel</a></h3>
<p>A number of places here we used a default value of <code class="docutils literal">Auto</code>, which is a custom
object defined as follows:</p>
<div class="code"><pre class="code python"><a id="rest_code_3a51b858c15a4e429ced6608b83b99df-1" name="rest_code_3a51b858c15a4e429ced6608b83b99df-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-1"></a><span class="k">class</span> <span class="nc">_Auto</span><span class="p">:</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-2" name="rest_code_3a51b858c15a4e429ced6608b83b99df-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-2"></a> <span class="sd">"""</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-3" name="rest_code_3a51b858c15a4e429ced6608b83b99df-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-3"></a><span class="sd"> Sentinel value indicating an automatic default will be used.</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-4" name="rest_code_3a51b858c15a4e429ced6608b83b99df-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-4"></a><span class="sd"> """</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-5" name="rest_code_3a51b858c15a4e429ced6608b83b99df-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-5"></a>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-6" name="rest_code_3a51b858c15a4e429ced6608b83b99df-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-6"></a> <span class="k">def</span> <span class="fm">__bool__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-7" name="rest_code_3a51b858c15a4e429ced6608b83b99df-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-7"></a> <span class="c1"># Allow `Auto` to be used like `None` or `False` in boolean expressions</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-8" name="rest_code_3a51b858c15a4e429ced6608b83b99df-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-8"></a> <span class="k">return</span> <span class="kc">False</span>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-9" name="rest_code_3a51b858c15a4e429ced6608b83b99df-9" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-9"></a>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-10" name="rest_code_3a51b858c15a4e429ced6608b83b99df-10" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-10"></a>
<a id="rest_code_3a51b858c15a4e429ced6608b83b99df-11" name="rest_code_3a51b858c15a4e429ced6608b83b99df-11" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_3a51b858c15a4e429ced6608b83b99df-11"></a><span class="n">Auto</span><span class="p">:</span> <span class="n">Any</span> <span class="o">=</span> <span class="n">_Auto</span><span class="p">()</span>
</pre></div>
<p>We use <code class="docutils literal">Auto</code> instead of <code class="docutils literal">None</code> or something else, because:</p>
<ul class="simple">
<li><p>Sometimes you need to specify <code class="docutils literal">None</code> as an actual value (for nullable DB fields), but not want it as a default.</p></li>
<li><p>Often the correct default needs to be defined dynamically:</p>
<ul>
<li><p>you need to create another object at runtime, as in the <code class="docutils literal">account:
BookingAccount = Auto</code> line above</p></li>
<li><p>a sensible and correct default depends on some other argument, so requires
some logic in the body of the function.</p></li>
</ul>
</li>
</ul>
<p>We create a singleton value <code class="docutils literal">Auto</code> so we can do <code class="docutils literal">if foo is Auto</code> checks.</p>
<p>We also give it a type <code class="docutils literal">Any</code> so that type checkers don’t complain about using
it as a default value. It doesn’t break type checking for the functions calling
our factory functions.</p>
</section>
<section id="constraints-and-sequences">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-4" role="doc-backlink">Constraints and sequences</a></h3>
<p>Often you have the problem that a unique constraint on a field makes it
difficult to provide a static default. As in the example above, I’m using a
really simple technique to deal with this – generate a sequence of values that
are unlikely to be specified manually in a test. In the above code, you can see
<code class="docutils literal">BOOKING_ACCOUNT_EMAIL_SEQUENCE</code> which is defined like this at the module level:</p>
<div class="code"><pre class="code python"><a id="rest_code_5a494397c57d401fa751a0f7b6fbb36f-1" name="rest_code_5a494397c57d401fa751a0f7b6fbb36f-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_5a494397c57d401fa751a0f7b6fbb36f-1"></a><span class="n">BOOKING_ACCOUNT_EMAIL_SEQUENCE</span> <span class="o">=</span> <span class="n">sequence</span><span class="p">(</span><span class="k">lambda</span> <span class="n">n</span><span class="p">:</span> <span class="sa">f</span><span class="s2">"booker_</span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s2">@example.com"</span><span class="p">)</span>
</pre></div>
<p>Every time we call <code class="docutils literal">next()</code> on this object, we get a distinct value, so we avoid
issues with constraints.</p>
<p>The <code class="docutils literal">sequence</code> utility is actually super simple, but presented here in all
it’s type-hinted glory:</p>
<div class="code"><pre class="code python"><a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-1" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-1"></a><span class="kn">import</span> <span class="nn">itertools</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-2" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-2"></a><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Any</span><span class="p">,</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">Generator</span><span class="p">,</span> <span class="n">TypeVar</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-3" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-3"></a>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-4" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-4"></a><span class="n">T</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"T"</span><span class="p">)</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-5" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-5"></a>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-6" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-6"></a>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-7" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-7"></a><span class="k">def</span> <span class="nf">sequence</span><span class="p">(</span><span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">int</span><span class="p">],</span> <span class="n">T</span><span class="p">])</span> <span class="o">-></span> <span class="n">Generator</span><span class="p">[</span><span class="n">T</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">]:</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-8" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-8"></a> <span class="sd">"""</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-9" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-9" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-9"></a><span class="sd"> Generates a sequence of values from a sequence of integers starting at zero,</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-10" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-10" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-10"></a><span class="sd"> passed through the callable, which must take an integer argument.</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-11" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-11" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-11"></a><span class="sd"> """</span>
<a id="rest_code_f85d102af18b4da78fb6589b20c6e87e-12" name="rest_code_f85d102af18b4da78fb6589b20c6e87e-12" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f85d102af18b4da78fb6589b20c6e87e-12"></a> <span class="k">return</span> <span class="p">(</span><span class="n">func</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
</pre></div>
<p>You could do something even simpler though – just use a generator expression at
the top level:</p>
<div class="code"><pre class="code python"><a id="rest_code_f472c866c0f044b19da51555eaab3444-1" name="rest_code_f472c866c0f044b19da51555eaab3444-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_f472c866c0f044b19da51555eaab3444-1"></a><span class="n">BOOKING_ACCOUNT_EMAIL_SEQUENCE</span> <span class="o">=</span> <span class="p">(</span><span class="sa">f</span><span class="s2">"booker_</span><span class="si">{</span><span class="n">n</span><span class="si">}</span><span class="s2">@example.com"</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
</pre></div>
<p>There can be some cases where you need something more complicated than this (for
example to be able to reset sequences) but they are rare in my experience and
fairly easy to write <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#advanced-sequences" id="footnote-reference-1" role="doc-noteref"><span class="fn-bracket">[</span>1<span class="fn-bracket">]</span></a>.</p>
</section>
<section id="delegation-and-sub-objects">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-5" role="doc-backlink">Delegation and sub-objects</a></h3>
<p>Factory functions often delegate to other factory functions, as in the examples
above.</p>
<p>It’s also quite common to want to specify something about a sub-object. Rather
than build up a tree of objects as the caller, I often add a parameter to the
top-level factory itself. This gives you some independence from the actual
schema.</p>
</section>
<section id="special-purpose-factories">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-6" role="doc-backlink">Special purpose factories</a></h3>
<p>You aren’t limited to one factory function per model, you can have as many as
you like. For example you might have <code class="docutils literal">create_staff_user</code> and
<code class="docutils literal">create_customer</code> which take different parameters, but both happen to return
the same <code class="docutils literal">User</code> model.</p>
</section>
<section id="sensible-and-minimal-defaults">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-7" role="doc-backlink">Sensible and minimal defaults</a></h3>
<p>As far as possible, the factory function should pick sensible defaults, based on
what parameters were passed in if any. If it can’t because the caller contradicted themselves, it should raise an exception.</p>
<p>I normally take the approach that the defaults should produce <strong>minimal</strong> and
<strong>pristine</strong> objects, while being <strong>complete</strong> and <strong>usable</strong>.</p>
<p>For example, if your model supports soft-delete via deactivation,
<code class="docutils literal">active=False</code> would be a bad default. On the other hand, creating lots of
related objects in order to be “realistic” would not be a good idea.</p>
<p>You should be pragmatic. For example, for a <code class="docutils literal">User</code> object, if a brand new,
“pristine” user is always forced to go through an on-boarding flow on your
website, meaning that every single page but the on-boarding page is blocked
until they complete it, then <code class="docutils literal">has_onboarded=True</code> is probably a more sensible
default – only a few of your tests will want <code class="docutils literal">has_onboarded=False</code>.</p>
<p>In many cases, your main business logic may already have functions that initialise database objects into sensible states when creating them, or when changing their states. Test factory functions will often delegate to them, so that things are set up as close as possible to how they would be normally.</p>
</section>
<section id="simplified-interface">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-8" role="doc-backlink">Simplified interface</a></h3>
<p>A good factory function will often simplify things for the caller.</p>
<p>For example, in the CCiW project mentioned, the <code class="docutils literal">Camp</code> model has a <code class="docutils literal">leaders</code>
relationship, which is a many-to-many. For several good reasons, the leaders are
not <code class="docutils literal">User</code> objects, but <code class="docutils literal">Person</code> objects, where <code class="docutils literal">Person</code> has some metadata
and another many-to-many (!) with <code class="docutils literal">User</code> objects. However, when I’m writing a
test, I might want to be able to say something like:</p>
<div class="code"><pre class="code python"><a id="rest_code_26ef0d3a2b624c759a90e0bae12e3994-1" name="rest_code_26ef0d3a2b624c759a90e0bae12e3994-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_26ef0d3a2b624c759a90e0bae12e3994-1"></a><span class="n">user</span> <span class="o">=</span> <span class="n">create_user</span><span class="p">()</span>
<a id="rest_code_26ef0d3a2b624c759a90e0bae12e3994-2" name="rest_code_26ef0d3a2b624c759a90e0bae12e3994-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_26ef0d3a2b624c759a90e0bae12e3994-2"></a><span class="n">camp</span> <span class="o">=</span> <span class="n">create_camp</span><span class="p">(</span><span class="n">leader</span><span class="o">=</span><span class="n">user</span><span class="p">)</span>
<a id="rest_code_26ef0d3a2b624c759a90e0bae12e3994-3" name="rest_code_26ef0d3a2b624c759a90e0bae12e3994-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_26ef0d3a2b624c759a90e0bae12e3994-3"></a><span class="n">login</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
</pre></div>
<p>Here, I just care that the user is conceptually the leader of the camp. I don’t
care:</p>
<ul class="simple">
<li><p>that a camp can have more than one leader</p></li>
<li><p>that the <code class="docutils literal">Camp</code> is actually related to the <code class="docutils literal">User</code> object via a <code class="docutils literal">Person</code> object.</p></li>
</ul>
<p>Sometimes I don’t care about specifying who the leader actually is, just that
there is one, so I might want to pass <code class="docutils literal">leader=True</code>.</p>
<p>My factory function ends up looking like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_efed5e4085b440418681a4ed400d0539-1" name="rest_code_efed5e4085b440418681a4ed400d0539-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-1"></a><span class="k">def</span> <span class="nf">create_camp</span><span class="p">(</span>
<a id="rest_code_efed5e4085b440418681a4ed400d0539-2" name="rest_code_efed5e4085b440418681a4ed400d0539-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-2"></a> <span class="o">*</span><span class="p">,</span>
<a id="rest_code_efed5e4085b440418681a4ed400d0539-3" name="rest_code_efed5e4085b440418681a4ed400d0539-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-3"></a> <span class="n">leader</span><span class="p">:</span> <span class="n">Person</span> <span class="o">|</span> <span class="n">User</span> <span class="o">|</span> <span class="nb">bool</span> <span class="o">=</span> <span class="n">Auto</span><span class="p">,</span>
<a id="rest_code_efed5e4085b440418681a4ed400d0539-4" name="rest_code_efed5e4085b440418681a4ed400d0539-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-4"></a> <span class="n">leaders</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">Person</span> <span class="o">|</span> <span class="n">User</span><span class="p">]</span> <span class="o">=</span> <span class="n">Auto</span><span class="p">,</span>
<a id="rest_code_efed5e4085b440418681a4ed400d0539-5" name="rest_code_efed5e4085b440418681a4ed400d0539-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-5"></a><span class="p">)</span> <span class="o">-></span> <span class="n">Camp</span><span class="p">:</span>
<a id="rest_code_efed5e4085b440418681a4ed400d0539-6" name="rest_code_efed5e4085b440418681a4ed400d0539-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_efed5e4085b440418681a4ed400d0539-6"></a> <span class="o">...</span>
</pre></div>
<p>It’s redundant, but it’s easy to use, and this approach means you isolate many
of your tests from needing changing. Sometimes my factory functions end up
having a <strong>lot</strong> of parameters, and they’re unlikely to win any beauty contests
— but who really cares? They are easy to understand and modify.</p>
</section>
<section id="type-hints">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-9" role="doc-backlink">Type hints</a></h3>
<p>Type hints are great for getting good help in your editor when writing tests.
Use them!</p>
</section>
<section id="dont-depend-on-defaults">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-10" role="doc-backlink">Don’t depend on defaults</a></h3>
<p>If a test requires a certain value, and it happens to be the default that the
factory will use, the test should still specify it. This makes the test more
robust, and allows the factory to change the defaults. If a test doesn’t specify
it, it means it doesn’t care, and it should work with any value the factory
happens to choose.</p>
</section>
<section id="enhancements">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-11" role="doc-backlink">Enhancements</a></h3>
<p>If you are using <a class="reference external" href="https://docs.pytest.org/">pytest</a> (which I recommend, along
with <a class="reference external" href="https://pytest-django.readthedocs.io/en/latest/index.html">pytest-django</a>), Haki Benita has
nice post that explains how to <a class="reference external" href="https://realpython.com/django-pytest-fixtures/#using-factories-as-fixtures">use factory functions as pytest fixtures</a>.</p>
</section>
</section>
<section id="what-not-to-do">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-12" role="doc-backlink">What not to do</a></h2>
<p>Now for the anti-patterns. If you’re happy with the answer above, you don’t need
to read this bit.</p>
<section id="json-yaml-fixtures">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-13" role="doc-backlink">JSON/YAML fixtures</a></h3>
<p>Django docs used to encourage you to define models in <a class="reference external" href="https://docs.djangoproject.com/en/4.1/howto/initial-data/">JSON/YAML fixtures</a> for use in tests.
Don’t do that! <a class="reference external" href="https://youtu.be/ickNQcNXiS4?t=985">I’ll let Carl Meyer tell you why</a>.</p>
<p>There are some legitimate cases for using these kinds of fixtures in tests – in
particular, where you might use the same/similar fixture files for loading data
in a production environment. This is typically when you have essentially static
data that is defined by some external reality, which happens to be stored in a
database table in your app – such as a list of countries and their ISO codes.</p>
</section>
<section id="kwargs">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-14" role="doc-backlink"><code class="docutils literal">**kwargs</code></a></h3>
<p>When writing factory functions, rather than adding loads of parameters, it may
be tempting to just let them accept <code class="docutils literal">**kwargs</code> and pass those on to the
underlying model. I usually prefer not to do that, because:</p>
<ul class="simple">
<li><p>you get much less help when writing tests</p></li>
<li><p>you tend to end up overly tied to the actual schema</p></li>
</ul>
</section>
<section id="django-dynamic-fixture">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-15" role="doc-backlink">django-dynamic-fixture</a></h3>
<p>I used to use <a class="reference external" href="https://github.com/paulocheque/django-dynamic-fixture">django-dynamic-fixture</a> to avoid the tedium of
manual factory functions, but have since moved away from that. You are just
introducing a layer between yourself and the code that you actually need to
write, and have to stop it from doing things you don’t want etc. It also doesn’t
understand the “business logic” needed to come up with sensible defaults.</p>
</section>
<section id="factory-boy">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-16" role="doc-backlink">factory_boy</a></h3>
<p>OK, <a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/index.html">factory_boy</a>,
this is like my comments for django-dynamic-fixture, only more so.</p>
<p>Let me put it this way:</p>
<p>You’ve been tasked with providing a <strong>procedure</strong> for creating model instances,
where that procedure will have sensible defaults, but will allow the caller to
override them. You have to decide what are the appropriate language features of
Python to use. Do you:</p>
<ol class="upperalpha simple">
<li><p>Create a function or a method, with parameters for overriding defaults, or,</p></li>
<li><p>Define a new class that inherits from <code class="docutils literal">Factory</code>, and use the <strong>body</strong> of
the class statement to define a procedure?</p></li>
</ol>
<p>If you chose A), congratulations, you got the right answer! You will be rewarded
for using the language as it was meant to be used, by things like:</p>
<ul class="simple">
<li><p>Automatic help inside your editor, both for the parameters and the returned
value.</p></li>
<li><p>Static type checking if you want it.</p></li>
<li><p>Everyone being able to modify your code without looking up some documentation.</p></li>
</ul>
<p>If you chose B), you get points for novelty. But you will be punished as follows:</p>
<ul class="simple">
<li><p>You will have to invent things like:</p>
<ul>
<li><p>nested <code class="docutils literal">class Meta</code> for <a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/introduction.html#basic-usage">essential configuration</a> of <code class="docutils literal">FactoryOptions</code></p></li>
<li><p>nested <code class="docutils literal">class Params</code></p></li>
<li><p><code class="docutils literal">Trait</code></p></li>
<li><p><code class="docutils literal">PostGeneration</code></p></li>
<li><p><code class="docutils literal">@post_generation</code></p></li>
<li><p><code class="docutils literal">LazyAttribute</code></p></li>
<li><p><code class="docutils literal">@lazy_attribute</code></p></li>
<li><p><code class="docutils literal">@lazy_attribute_sequence</code></p></li>
<li><p><code class="docutils literal">LazyFunction</code></p></li>
<li><p><code class="docutils literal">SubFactory</code></p></li>
<li><p><code class="docutils literal">RelatedFactory</code></p></li>
<li><p><code class="docutils literal">SelfAttribute</code></p></li>
<li><p><a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/reference.html#factory.debug">a debug mode</a> (of course)</p></li>
<li><p>and <a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/orms.html">much</a>, <a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/recipes.html">much</a> <a class="reference external" href="https://factoryboy.readthedocs.io/en/stable/reference.html">more</a>!</p></li>
</ul>
</li>
<li><p>You will have to write thousands of lines of code (1700+), thousands more of
tests (5000+), and page after page of documentation (16,000+ words) to support
all this.</p></li>
<li><p>You will have to get people to read that documentation. Instead of which, they
will spend their evenings writing snarky blog posts complaining about all your
hard work!</p></li>
<li><p>You will have an Open Source side project with <a class="reference external" href="https://github.com/FactoryBoy/factory_boy/issues">hundreds of open issues</a>, fun!</p></li>
<li><p>You will get <strong>less than zero help</strong> from your editor when using these
factories – not only will it just display <code class="docutils literal">**kwargs</code> for inputs, it will
think the output is a <code class="docutils literal">Factory</code> instance, which it is not.</p></li>
<li><p>For people to find what parameters they can pass to a <code class="docutils literal">Factory</code>, they will
have to look up the model, <strong>and</strong> inspect the <code class="docutils literal">Factory</code> definition
and decipher its “traits” etc.</p></li>
</ul>
<p>I don’t want to add any further to the burden of the authors – they have
suffered enough already! But I do want to deal with a few objections:</p>
<section id="but-factory-boy-can-also-create-instances-without-saving-them">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-17" role="doc-backlink">But factory_boy can also create instances without saving them!</a></h4>
<p>This is useful if you want to avoid hitting the DB while being able to test a
model method that doesn’t need the DB. In Django, it’s extremely easy to do that
without help, because if you aren’t going to save a model instance, you don’t
need to worry about any attributes other than the ones you specify – models
don’t run validation in the constructor – and so you don’t need factories at
all:</p>
<div class="code"><pre class="code python"><a id="rest_code_b152449d36824b8eb1af4a93df8295f7-1" name="rest_code_b152449d36824b8eb1af4a93df8295f7-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_b152449d36824b8eb1af4a93df8295f7-1"></a><span class="k">def</span> <span class="nf">test_address_formatted</span><span class="p">():</span>
<a id="rest_code_b152449d36824b8eb1af4a93df8295f7-2" name="rest_code_b152449d36824b8eb1af4a93df8295f7-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_b152449d36824b8eb1af4a93df8295f7-2"></a> <span class="n">address</span> <span class="o">=</span> <span class="n">Address</span><span class="p">(</span><span class="n">line1</span><span class="o">=</span><span class="s2">"123 Main St"</span><span class="p">,</span> <span class="n">line2</span><span class="o">=</span><span class="s2">"London"</span><span class="p">)</span>
<a id="rest_code_b152449d36824b8eb1af4a93df8295f7-3" name="rest_code_b152449d36824b8eb1af4a93df8295f7-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_b152449d36824b8eb1af4a93df8295f7-3"></a> <span class="k">assert</span> <span class="n">address</span><span class="o">.</span><span class="n">formatted</span><span class="p">()</span> <span class="o">==</span> <span class="s2">"123 Main St</span><span class="se">\n</span><span class="s2">London</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p>If you really need it, you could always add a <code class="docutils literal">commit: bool = True</code> parameter to your factory functions.</p>
</section>
<section id="but-factory-boy-can-specify-related-data">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-18" role="doc-backlink">But factory_boy can specify related data!</a></h4>
<p>As is a common pattern in Django, you can use a double underscore in a parameter
to indicate a relationship traversal – from the example in the <a class="reference external" href="https://github.com/FactoryBoy/factory_boy">README</a>:</p>
<div class="code"><pre class="code python"><a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-1" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-1"></a><span class="n">order</span> <span class="o">=</span> <span class="n">OrderFactory</span><span class="p">(</span>
<a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-2" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-2"></a> <span class="n">amount</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span>
<a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-3" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-3"></a> <span class="n">status</span><span class="o">=</span><span class="s1">'PAID'</span><span class="p">,</span>
<a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-4" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-4"></a> <span class="n">customer__is_vip</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-5" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-5"></a> <span class="n">address__country</span><span class="o">=</span><span class="s1">'AU'</span><span class="p">,</span>
<a id="rest_code_a6be7c33c5e447779b7e6597db8845f3-6" name="rest_code_a6be7c33c5e447779b7e6597db8845f3-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_a6be7c33c5e447779b7e6597db8845f3-6"></a> <span class="p">)</span>
</pre></div>
<p>This is neat, but an anti-pattern in my opinion. As well as specifying that
the order country is Australia, you are also implicitly specifying:</p>
<ul class="simple">
<li><p>the Order model stores its address via a foreign key to a separate address model,</p></li>
<li><p>that model has a <code class="docutils literal">country</code> field</p></li>
<li><p>and you store country information using ISO-3166 country codes.</p></li>
</ul>
<p>In other words, you are tying the test more tightly to the schema than you need
to. None of these things are relevant to the test, you just want to specify that
the order is for Australia.</p>
<p>If instead you do <code class="docutils literal"><span class="pre">create_order(address_country="AU")</span></code> then you can leave the
factory function to handle the details. That can include normalising a country
code to whatever is the right thing, if it wants to, which is very easy to do
with simple functions that you are in complete control of.</p>
</section>
<section id="but-factory-boy-has-faker-integration">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-19" role="doc-backlink">But factory_boy has faker integration!</a></h4>
<p>If you want randomized and realistic looking data, you can use <code class="docutils literal">faker</code>
directly with almost exactly the same amount of code:</p>
<div class="code"><pre class="code python"><a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-1" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-1"></a><span class="kn">from</span> <span class="nn">faker</span> <span class="kn">import</span> <span class="n">Faker</span>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-2" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-2"></a>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-3" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-3"></a><span class="n">faker</span> <span class="o">=</span> <span class="n">Faker</span><span class="p">()</span>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-4" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-4"></a>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-5" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-5"></a><span class="k">def</span> <span class="nf">create_user</span><span class="p">():</span>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-6" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-6"></a> <span class="k">return</span> <span class="n">User</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-7" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-7"></a> <span class="n">name</span><span class="o">=</span><span class="n">faker</span><span class="o">.</span><span class="n">name</span><span class="p">(),</span>
<a id="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-8" name="rest_code_360bcb9c511e4219b3fe8ba17a6369e8-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_360bcb9c511e4219b3fe8ba17a6369e8-8"></a> <span class="p">)</span>
</pre></div>
</section>
<section id="but-factory-boy-has-a-create-batch-method">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-20" role="doc-backlink">But factory_boy has a create_batch method!</a></h4>
<p>If you need to create a bunch of things, you can just do this:</p>
<div class="code"><pre class="code python"><a id="rest_code_d591c52374e34f40925d5a3da4c1709a-1" name="rest_code_d591c52374e34f40925d5a3da4c1709a-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_d591c52374e34f40925d5a3da4c1709a-1"></a><span class="n">payments</span> <span class="o">=</span> <span class="p">[</span><span class="n">create_manual_payment</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">)]</span>
</pre></div>
<p>which really isn’t very hard, and also means you can have arguments that vary
depending on the loop variable.</p>
<p>But, because I’m <strong>very</strong> generous, I will write you a <code class="docutils literal">create_batch</code> function
<strong>for free</strong>. Not only that, I’ll add type hints <strong>for free</strong>, and I’ll leave it
right here where you can find it, in the public domain:</p>
<div class="code"><pre class="code python"><a id="rest_code_8c3328ef380143d685ebafa5f4b28217-1" name="rest_code_8c3328ef380143d685ebafa5f4b28217-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-1"></a><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">TypeVar</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-2" name="rest_code_8c3328ef380143d685ebafa5f4b28217-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-2"></a>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-3" name="rest_code_8c3328ef380143d685ebafa5f4b28217-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-3"></a><span class="n">T</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"T"</span><span class="p">)</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-4" name="rest_code_8c3328ef380143d685ebafa5f4b28217-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-4"></a>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-5" name="rest_code_8c3328ef380143d685ebafa5f4b28217-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-5"></a>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-6" name="rest_code_8c3328ef380143d685ebafa5f4b28217-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-6"></a><span class="k">def</span> <span class="nf">create_batch</span><span class="p">(</span><span class="n">factory</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="n">T</span><span class="p">],</span> <span class="n">count</span><span class="p">,</span> <span class="o">/</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">[</span><span class="n">T</span><span class="p">]:</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-7" name="rest_code_8c3328ef380143d685ebafa5f4b28217-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-7"></a> <span class="sd">"""</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-8" name="rest_code_8c3328ef380143d685ebafa5f4b28217-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-8"></a><span class="sd"> Use `factory` callable to create `count` objects, passing along kwargs</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-9" name="rest_code_8c3328ef380143d685ebafa5f4b28217-9" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-9"></a><span class="sd"> """</span>
<a id="rest_code_8c3328ef380143d685ebafa5f4b28217-10" name="rest_code_8c3328ef380143d685ebafa5f4b28217-10" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8c3328ef380143d685ebafa5f4b28217-10"></a> <span class="k">return</span> <span class="p">[</span><span class="n">factory</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">count</span><span class="p">)]</span>
</pre></div>
<p>Now you can do the following, and your editor and static type checker will know
exactly what type of objects <code class="docutils literal">payment_1</code> and <code class="docutils literal">payment_2</code> are:</p>
<div class="code"><pre class="code python"><a id="rest_code_d7a2ab45868d4fa0b88b5c0cee315403-1" name="rest_code_d7a2ab45868d4fa0b88b5c0cee315403-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_d7a2ab45868d4fa0b88b5c0cee315403-1"></a><span class="n">payment_1</span><span class="p">,</span> <span class="n">payment_2</span> <span class="o">=</span> <span class="n">create_batch</span><span class="p">(</span><span class="n">create_manual_payment</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</pre></div>
</section>
</section>
</section>
<section id="conclusion">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-21" role="doc-backlink">Conclusion</a></h2>
<p>You don’t need to install anything to create factory functions. Just use
built-in language features, and maybe a few tiny helpers like I’ve shown, and
you’re good!</p>
<p>The only real issue with my approach is that sometimes it can feel a bit tedious
adding another parameter. But slightly tedious code that is extremely easy to
understand and modify, and helps you in all the ways I’ve described, is still a
big win in my book. There will be many days when you long for slightly tedious
code that just works.</p>
<p>Happy testing!</p>
</section>
<hr class="docutils">
<section id="footnotes">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#toc-entry-22" role="doc-backlink">Footnotes</a></h2>
<aside class="footnote brackets" id="advanced-sequences" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#footnote-reference-1">1</a><span class="fn-bracket">]</span></span>
<p>Advanced sequences:</p>
<p>Sometimes, you might want to reset your sequences, and perhaps automatically
between every test case. I would implement that as follows. Replace the
previous <code class="docutils literal">sequence</code> implementation with:</p>
<div class="code"><pre class="code python"><a id="rest_code_8a639f20052749fc93d87714fd5e6a94-1" name="rest_code_8a639f20052749fc93d87714fd5e6a94-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-1"></a><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">annotations</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-2" name="rest_code_8a639f20052749fc93d87714fd5e6a94-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-2"></a><span class="kn">import</span> <span class="nn">itertools</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-3" name="rest_code_8a639f20052749fc93d87714fd5e6a94-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-3"></a><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Generic</span><span class="p">,</span> <span class="n">Iterator</span><span class="p">,</span> <span class="n">TypeVar</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-4" name="rest_code_8a639f20052749fc93d87714fd5e6a94-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-4"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-5" name="rest_code_8a639f20052749fc93d87714fd5e6a94-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-5"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-6" name="rest_code_8a639f20052749fc93d87714fd5e6a94-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-6"></a><span class="n">T</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"T"</span><span class="p">)</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-7" name="rest_code_8a639f20052749fc93d87714fd5e6a94-7" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-7"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-8" name="rest_code_8a639f20052749fc93d87714fd5e6a94-8" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-8"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-9" name="rest_code_8a639f20052749fc93d87714fd5e6a94-9" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-9"></a><span class="k">class</span> <span class="nc">sequence</span><span class="p">(</span><span class="n">Generic</span><span class="p">[</span><span class="n">T</span><span class="p">]):</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-10" name="rest_code_8a639f20052749fc93d87714fd5e6a94-10" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-10"></a> <span class="n">instances</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">sequence</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-11" name="rest_code_8a639f20052749fc93d87714fd5e6a94-11" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-11"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-12" name="rest_code_8a639f20052749fc93d87714fd5e6a94-12" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-12"></a> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="nb">int</span><span class="p">],</span> <span class="n">T</span><span class="p">])</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-13" name="rest_code_8a639f20052749fc93d87714fd5e6a94-13" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-13"></a> <span class="bp">self</span><span class="o">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">func</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-14" name="rest_code_8a639f20052749fc93d87714fd5e6a94-14" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-14"></a> <span class="bp">self</span><span class="o">.</span><span class="n">reset_sequence</span><span class="p">()</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-15" name="rest_code_8a639f20052749fc93d87714fd5e6a94-15" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-15"></a> <span class="bp">self</span><span class="o">.</span><span class="n">instances</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-16" name="rest_code_8a639f20052749fc93d87714fd5e6a94-16" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-16"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-17" name="rest_code_8a639f20052749fc93d87714fd5e6a94-17" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-17"></a> <span class="k">def</span> <span class="nf">reset_sequence</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-18" name="rest_code_8a639f20052749fc93d87714fd5e6a94-18" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-18"></a> <span class="bp">self</span><span class="o">.</span><span class="n">seq</span><span class="p">:</span> <span class="n">Iterator</span><span class="p">[</span><span class="n">T</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">func</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-19" name="rest_code_8a639f20052749fc93d87714fd5e6a94-19" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-19"></a>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-20" name="rest_code_8a639f20052749fc93d87714fd5e6a94-20" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-20"></a> <span class="k">def</span> <span class="fm">__next__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">T</span><span class="p">:</span>
<a id="rest_code_8a639f20052749fc93d87714fd5e6a94-21" name="rest_code_8a639f20052749fc93d87714fd5e6a94-21" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_8a639f20052749fc93d87714fd5e6a94-21"></a> <span class="k">return</span> <span class="nb">next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">seq</span><span class="p">)</span>
</pre></div>
<p>To reset automatically between each test case, assuming use of <code class="docutils literal">pytest</code>,
add the following <code class="docutils literal">autouse</code> fixture to <code class="docutils literal">conftest.py</code>:</p>
<div class="code"><pre class="code python"><a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-1" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-1" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-1"></a><span class="nd">@pytest</span><span class="o">.</span><span class="n">fixture</span><span class="p">(</span><span class="n">autouse</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-2" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-2" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-2"></a><span class="k">def</span> <span class="nf">reset_all_sequences</span><span class="p">():</span>
<a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-3" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-3" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-3"></a> <span class="kn">from</span> <span class="nn">myproject.factory_utils</span> <span class="kn">import</span> <span class="n">sequence</span> <span class="c1"># or wherever</span>
<a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-4" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-4" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-4"></a>
<a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-5" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-5" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-5"></a> <span class="k">for</span> <span class="n">instance</span> <span class="ow">in</span> <span class="n">sequence</span><span class="o">.</span><span class="n">instances</span><span class="p">:</span>
<a id="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-6" name="rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-6" href="https://lukeplant.me.uk/blog/posts/test-factory-functions-in-django/#rest_code_cbfb4ccde0c24bf8b89c12f175ad7d5c-6"></a> <span class="n">instance</span><span class="o">.</span><span class="n">reset_sequence</span><span class="p">()</span>
</pre></div>
</aside>
</section>Python Type Hints: case study on parsyhttps://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/2022-11-21T21:07:02Z2022-11-21T21:07:02ZLuke Plant<p>How I tried and failed to add static type checking to Parsy, and settled for type hints as documentation instead.</p><p>I have been trying to like static type checking in Python. For most of my Django projects, I get annoyed and give up, so I’ve had a go with some smaller projects instead. This blog post documents how it went with <a class="reference external" href="https://github.com/python-parsy/parsy">Parsy</a>, a parser combinator library I maintain.</p>
<nav class="contents" id="contents" role="doc-toc">
<p class="topic-title">Contents</p>
<ul class="simple">
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#intro-to-parsy" id="toc-entry-1">Intro to Parsy</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#simple-types" id="toc-entry-2">Simple types</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#generics" id="toc-entry-3">Generics</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#typed-parsy-fork" id="toc-entry-4">Typed Parsy fork</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#implementation-perspective" id="toc-entry-5">Implementation perspective</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#overall" id="toc-entry-6">Overall</a></p></li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#using-it" id="toc-entry-7">Using it</a></p>
<ul>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#sequences" id="toc-entry-8">Sequences</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#error-messages" id="toc-entry-9">Error messages</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#generate-decorator" id="toc-entry-10">@generate decorator</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#overall-1" id="toc-entry-11">Overall</a></p></li>
</ul>
</li>
</ul>
</li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#types-for-documentation" id="toc-entry-12">Types for documentation</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#conclusion" id="toc-entry-13">Conclusion</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#links" id="toc-entry-14">Links</a></p></li>
<li><p><a class="reference internal" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnotes" id="toc-entry-15">Footnotes</a></p></li>
</ul>
</nav>
<section id="intro-to-parsy">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-1" role="doc-backlink">Intro to Parsy</a></h2>
<p>I need to explain a few things about Parsy.</p>
<p>In Parsy, you build up <code class="docutils literal">Parser</code> objects via a set of primitives and combinators. Each <code class="docutils literal">Parser</code> object has a <code class="docutils literal">parse</code> method that accepts strings <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#strings" id="footnote-reference-1" role="doc-noteref"><span class="fn-bracket">[</span>1<span class="fn-bracket">]</span></a> and returns some object – which might be a string for your lowest building blocks, but quickly you build more complex parsers that return different types of objects. A lot of code is written in “fluent” style where you chain methods together.</p>
<p>Here are some basics:</p>
<p>The primitive <code class="docutils literal">string</code> just matches and returns the input:</p>
<div class="code"><pre class="code python"><a id="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-1" name="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_3da1c75f00534e15a18d4b4afd78e3c4-1"></a><span class="kn">import</span> <span class="nn">parsy</span> <span class="k">as</span> <span class="nn">P</span>
<a id="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-2" name="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_3da1c75f00534e15a18d4b4afd78e3c4-2"></a>
<a id="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-3" name="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_3da1c75f00534e15a18d4b4afd78e3c4-3"></a><span class="n">hello</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"hello"</span><span class="p">)</span>
<a id="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-4" name="rest_code_3da1c75f00534e15a18d4b4afd78e3c4-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_3da1c75f00534e15a18d4b4afd78e3c4-4"></a><span class="k">assert</span> <span class="n">hello</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"hello"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"hello"</span>
</pre></div>
<p>But we can change the result to some other type of object:</p>
<div class="code"><pre class="code python"><a id="rest_code_7042f13452a241faad95c38bdae18e46-1" name="rest_code_7042f13452a241faad95c38bdae18e46-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7042f13452a241faad95c38bdae18e46-1"></a><span class="n">true_parser</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"true"</span><span class="p">)</span><span class="o">.</span><span class="n">result</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span>
<a id="rest_code_7042f13452a241faad95c38bdae18e46-2" name="rest_code_7042f13452a241faad95c38bdae18e46-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7042f13452a241faad95c38bdae18e46-2"></a><span class="k">assert</span> <span class="n">true_parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"true"</span><span class="p">)</span> <span class="o">==</span> <span class="kc">True</span>
</pre></div>
<p>We can map the parse result using a callable. This time I’m starting with a regex primitive, which returns strings, but converting to ints:</p>
<div class="code"><pre class="code python"><a id="rest_code_703b5800a9664e75b73450af9d0a43a4-1" name="rest_code_703b5800a9664e75b73450af9d0a43a4-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_703b5800a9664e75b73450af9d0a43a4-1"></a><span class="n">int_parser</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"-?\d+"</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<a id="rest_code_703b5800a9664e75b73450af9d0a43a4-2" name="rest_code_703b5800a9664e75b73450af9d0a43a4-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_703b5800a9664e75b73450af9d0a43a4-2"></a><span class="k">assert</span> <span class="n">int_parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"123"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">123</span>
</pre></div>
<p>We can discard things we don’t care about in a number of ways, such as with these “pointy” operators that point to the important bit:</p>
<div class="code"><pre class="code python"><a id="rest_code_2b2d41a5e1f0444ca027bf7b891431c0-1" name="rest_code_2b2d41a5e1f0444ca027bf7b891431c0-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_2b2d41a5e1f0444ca027bf7b891431c0-1"></a><span class="n">whitespace</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\s*"</span><span class="p">)</span>
<a id="rest_code_2b2d41a5e1f0444ca027bf7b891431c0-2" name="rest_code_2b2d41a5e1f0444ca027bf7b891431c0-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_2b2d41a5e1f0444ca027bf7b891431c0-2"></a><span class="k">assert</span> <span class="p">(</span><span class="n">whitespace</span> <span class="o">>></span> <span class="n">int_parser</span> <span class="o"><<</span> <span class="n">whitespace</span><span class="p">)</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">" 123 "</span><span class="p">)</span> <span class="o">==</span> <span class="mi">123</span>
</pre></div>
<p>We can have a sequence of items, here with some separator we don’t care about collecting:</p>
<div class="code"><pre class="code python"><a id="rest_code_7dfd31364f444b348ab94e15d5954a82-1" name="rest_code_7dfd31364f444b348ab94e15d5954a82-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-1"></a><span class="n">three_ints</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">seq</span><span class="p">(</span>
<a id="rest_code_7dfd31364f444b348ab94e15d5954a82-2" name="rest_code_7dfd31364f444b348ab94e15d5954a82-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-2"></a> <span class="n">int_parser</span> <span class="o"><<</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"-"</span><span class="p">),</span>
<a id="rest_code_7dfd31364f444b348ab94e15d5954a82-3" name="rest_code_7dfd31364f444b348ab94e15d5954a82-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-3"></a> <span class="n">int_parser</span> <span class="o"><<</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"-"</span><span class="p">),</span>
<a id="rest_code_7dfd31364f444b348ab94e15d5954a82-4" name="rest_code_7dfd31364f444b348ab94e15d5954a82-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-4"></a> <span class="n">int_parser</span>
<a id="rest_code_7dfd31364f444b348ab94e15d5954a82-5" name="rest_code_7dfd31364f444b348ab94e15d5954a82-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-5"></a><span class="p">)</span>
<a id="rest_code_7dfd31364f444b348ab94e15d5954a82-6" name="rest_code_7dfd31364f444b348ab94e15d5954a82-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_7dfd31364f444b348ab94e15d5954a82-6"></a><span class="k">assert</span> <span class="n">three_ints</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"123-45-67"</span><span class="p">)</span> <span class="o">==</span> <span class="p">[</span><span class="mi">123</span><span class="p">,</span> <span class="mi">45</span><span class="p">,</span> <span class="mi">67</span><span class="p">]</span>
</pre></div>
<p>If we want something better than a list to store different components in (usually we do), we can use a keyword argument form of <code class="docutils literal">seq</code> to give names for the components and collect them in a dict instead of a list, and instead of <code class="docutils literal">.map</code> we can do <code class="docutils literal">.combine_dict</code> to convert the result into some other object:</p>
<div class="code"><pre class="code python"><a id="rest_code_bc10665e09104dfb816e233393e143ab-1" name="rest_code_bc10665e09104dfb816e233393e143ab-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-1"></a><span class="kn">from</span> <span class="nn">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-2" name="rest_code_bc10665e09104dfb816e233393e143ab-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-2"></a>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-3" name="rest_code_bc10665e09104dfb816e233393e143ab-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-3"></a><span class="nd">@dataclass</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-4" name="rest_code_bc10665e09104dfb816e233393e143ab-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-4"></a><span class="k">class</span> <span class="nc">Date</span><span class="p">:</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-5" name="rest_code_bc10665e09104dfb816e233393e143ab-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-5"></a> <span class="n">year</span><span class="p">:</span> <span class="nb">int</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-6" name="rest_code_bc10665e09104dfb816e233393e143ab-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-6"></a> <span class="n">month</span><span class="p">:</span> <span class="nb">int</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-7" name="rest_code_bc10665e09104dfb816e233393e143ab-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-7"></a> <span class="n">day</span><span class="p">:</span> <span class="nb">int</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-8" name="rest_code_bc10665e09104dfb816e233393e143ab-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-8"></a>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-9" name="rest_code_bc10665e09104dfb816e233393e143ab-9" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-9"></a><span class="n">date_parser</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">seq</span><span class="p">(</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-10" name="rest_code_bc10665e09104dfb816e233393e143ab-10" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-10"></a> <span class="n">year</span><span class="o">=</span><span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\d</span><span class="si">{4}</span><span class="s2">"</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span> <span class="o"><<</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"-"</span><span class="p">),</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-11" name="rest_code_bc10665e09104dfb816e233393e143ab-11" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-11"></a> <span class="n">month</span><span class="o">=</span><span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\d</span><span class="si">{2}</span><span class="s2">"</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span> <span class="o"><<</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"-"</span><span class="p">),</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-12" name="rest_code_bc10665e09104dfb816e233393e143ab-12" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-12"></a> <span class="n">day</span><span class="o">=</span><span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\d</span><span class="si">{2}</span><span class="s2">"</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-13" name="rest_code_bc10665e09104dfb816e233393e143ab-13" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-13"></a><span class="p">)</span><span class="o">.</span><span class="n">combine_dict</span><span class="p">(</span><span class="n">Date</span><span class="p">)</span>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-14" name="rest_code_bc10665e09104dfb816e233393e143ab-14" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-14"></a>
<a id="rest_code_bc10665e09104dfb816e233393e143ab-15" name="rest_code_bc10665e09104dfb816e233393e143ab-15" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_bc10665e09104dfb816e233393e143ab-15"></a><span class="k">assert</span> <span class="n">date_parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"2022-11-19"</span><span class="p">)</span> <span class="o">==</span> <span class="n">Date</span><span class="p">(</span><span class="n">year</span><span class="o">=</span><span class="mi">2022</span><span class="p">,</span> <span class="n">month</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">day</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span>
</pre></div>
<p>We can have alternatives using the <code class="docutils literal">|</code> operator:</p>
<div class="code"><pre class="code python"><a id="rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-1" name="rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-1"></a><span class="n">bool_parser</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"true"</span><span class="p">)</span><span class="o">.</span><span class="n">result</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span> <span class="o">|</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"false"</span><span class="p">)</span><span class="o">.</span><span class="n">result</span><span class="p">(</span><span class="kc">False</span><span class="p">)</span>
<a id="rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-2" name="rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_4763aea0c71d4c20bdc9b73b8a91f28d-2"></a><span class="k">assert</span> <span class="n">bool_parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"false"</span><span class="p">)</span> <span class="o">==</span> <span class="kc">False</span>
</pre></div>
<p>That’s enough to understand the rest of this post, let’s have a look at my 4 different approaches to improving the static type checking story for Parsy.</p>
</section>
<section id="simple-types">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-2" role="doc-backlink">Simple types</a></h2>
<p>The most obvious thing to do is to add <code class="docutils literal">Parser</code> as the return value for a bunch of methods and operators, and other type hints wherever we can for the input arguments. For example, <code class="docutils literal">.map</code> is:</p>
<div class="code"><pre class="code python"><a id="rest_code_662795e2422843c0aee2336c26ce0ee6-1" name="rest_code_662795e2422843c0aee2336c26ce0ee6-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_662795e2422843c0aee2336c26ce0ee6-1"></a><span class="k">def</span> <span class="nf">map</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">map_function</span><span class="p">:</span> <span class="n">Callable</span><span class="p">)</span> <span class="o">-></span> <span class="n">Parser</span><span class="p">:</span>
<a id="rest_code_662795e2422843c0aee2336c26ce0ee6-2" name="rest_code_662795e2422843c0aee2336c26ce0ee6-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_662795e2422843c0aee2336c26ce0ee6-2"></a> <span class="o">...</span>
</pre></div>
<p>What type of object does the <code class="docutils literal">parse</code> method return? We don’t know, so we have to do:</p>
<div class="code"><pre class="code python"><a id="rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-1" name="rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-1"></a><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">Any</span><span class="p">:</span>
<a id="rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-2" name="rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_ed112e1c22d04833a2851e7cbc0d2f6a-2"></a> <span class="o">...</span>
</pre></div>
<p>And this is our first hint that the static type checking isn’t very useful. For example, this faulty code will now type check:</p>
<div class="code"><pre class="code python"><a id="rest_code_245d18350c53402a9260f36fc73536aa-1" name="rest_code_245d18350c53402a9260f36fc73536aa-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_245d18350c53402a9260f36fc73536aa-1"></a><span class="n">x</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="n">int_parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s2">"123"</span><span class="p">)</span>
</pre></div>
<p>We can see the return value is not going to be the right type, but our static type checker sees the <code class="docutils literal">Any</code> and allows it.</p>
<p>The type checker is also not catching <code class="docutils literal">TypeError</code> exceptions for <code class="docutils literal">.map()</code> that it definitely ought to be able to. For example, suppose we have this faulty parser which is going to attempt to construct a <code class="docutils literal">timedelta</code> from strings like <code class="docutils literal">"7 days ago"</code>:</p>
<div class="code"><pre class="code python"><a id="rest_code_e265678d0dfb4d3f815c4e822a171310-1" name="rest_code_e265678d0dfb4d3f815c4e822a171310-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e265678d0dfb4d3f815c4e822a171310-1"></a><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">timedelta</span>
<a id="rest_code_e265678d0dfb4d3f815c4e822a171310-2" name="rest_code_e265678d0dfb4d3f815c4e822a171310-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e265678d0dfb4d3f815c4e822a171310-2"></a><span class="n">days_ago_parser</span> <span class="o">=</span> <span class="p">(</span><span class="n">P</span><span class="o">.</span><span class="n">regex</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\d+"</span><span class="p">)</span> <span class="o"><<</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">" days ago"</span><span class="p">))</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">timedelta</span><span class="p">)</span>
</pre></div>
<p>This is going to fail every time with:</p>
<pre class="literal-block">TypeError: unsupported type for timedelta days component: str</pre>
<p>That’s because we forgot a <code class="docutils literal">.map(int)</code> after the <code class="docutils literal">P.regex</code> parser.</p>
<p>This kind of type error is caught for you by mypy and pyright if you try to pass a string to the <code class="docutils literal">timedelta</code> constructor, but here, we’ve got no constraints on the callable that would enable the type checker to pick it up. When we put a bare <code class="docutils literal">Callable</code> in a signature, as we did above for the <code class="docutils literal">map</code> method, we are really writing <code class="docutils literal"><span class="pre">Callable[...,</span> Any]</code>, so all proper type checking effectively gets disabled.</p>
<p>If you want static type checking, this is not what you expect or want! It’s especially important for Parsy, because almost all the mistakes you are likely to make will be of this nature. Most Parsy code consists of parsers defined at a module level, which means that as soon as you import the module, you’ll know whether you have attempted to use combinator methods that don’t exist, for example, so there is little usefulness in a type checker being able to tell you this. What you want to know is whether you are going to get <code class="docutils literal">TypeError</code> or similar when you call the <code class="docutils literal">parse</code> method at runtime.</p>
<p>Can we achieve that?</p>
</section>
<section id="generics">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-3" role="doc-backlink">Generics</a></h2>
<p>The answer is to make the <code class="docutils literal">Parser</code> type aware of what kind of object it is going to output. This can be achieved by parameterising the type of <code class="docutils literal">Parser</code> with a generic type, and is the second main approach.</p>
<p>Very often <a class="reference external" href="https://mypy.readthedocs.io/en/stable/generics.html">generics</a> are used for homogeneous containers, to capture the type of the object they contain. Here, we don’t have a container as such. We are instead capturing the type of the object that the parser instance is going to produce when you call <code class="docutils literal">.parse()</code> (assuming it succeeds, I’m ignoring all failure cases).</p>
<p>Some of the key type signatures in our <code class="docutils literal">Parser</code> code now look like this (lots of details elided):</p>
<div class="code"><pre class="code python"><a id="rest_code_a36b4434c61649be8ca1e16419f0f601-1" name="rest_code_a36b4434c61649be8ca1e16419f0f601-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-1"></a><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">TypeVar</span><span class="p">,</span> <span class="n">Generic</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-2" name="rest_code_a36b4434c61649be8ca1e16419f0f601-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-2"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-3" name="rest_code_a36b4434c61649be8ca1e16419f0f601-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-3"></a><span class="n">OUT</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"OUT"</span><span class="p">)</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-4" name="rest_code_a36b4434c61649be8ca1e16419f0f601-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-4"></a><span class="n">OUT1</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"OUT1"</span><span class="p">)</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-5" name="rest_code_a36b4434c61649be8ca1e16419f0f601-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-5"></a><span class="n">OUT2</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"OUT2"</span><span class="p">)</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-6" name="rest_code_a36b4434c61649be8ca1e16419f0f601-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-6"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-7" name="rest_code_a36b4434c61649be8ca1e16419f0f601-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-7"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-8" name="rest_code_a36b4434c61649be8ca1e16419f0f601-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-8"></a><span class="nd">@dataclass</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-9" name="rest_code_a36b4434c61649be8ca1e16419f0f601-9" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-9"></a><span class="k">class</span> <span class="nc">Result</span><span class="p">(</span><span class="n">Generic</span><span class="p">[</span><span class="n">OUT</span><span class="p">]):</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-10" name="rest_code_a36b4434c61649be8ca1e16419f0f601-10" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-10"></a> <span class="n">value</span><span class="p">:</span> <span class="n">OUT</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-11" name="rest_code_a36b4434c61649be8ca1e16419f0f601-11" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-11"></a> <span class="o">...</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-12" name="rest_code_a36b4434c61649be8ca1e16419f0f601-12" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-12"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-13" name="rest_code_a36b4434c61649be8ca1e16419f0f601-13" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-13"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-14" name="rest_code_a36b4434c61649be8ca1e16419f0f601-14" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-14"></a><span class="k">class</span> <span class="nc">Parser</span><span class="p">(</span><span class="n">Generic</span><span class="p">[</span><span class="n">OUT</span><span class="p">]):</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-15" name="rest_code_a36b4434c61649be8ca1e16419f0f601-15" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-15"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-16" name="rest_code_a36b4434c61649be8ca1e16419f0f601-16" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-16"></a> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stream</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="n">OUT</span><span class="p">:</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-17" name="rest_code_a36b4434c61649be8ca1e16419f0f601-17" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-17"></a> <span class="o">...</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-18" name="rest_code_a36b4434c61649be8ca1e16419f0f601-18" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-18"></a>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-19" name="rest_code_a36b4434c61649be8ca1e16419f0f601-19" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-19"></a> <span class="k">def</span> <span class="nf">map</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT1</span><span class="p">],</span> <span class="n">map_function</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[</span><span class="n">OUT1</span><span class="p">],</span> <span class="n">OUT2</span><span class="p">])</span> <span class="o">-></span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT2</span><span class="p">]:</span>
<a id="rest_code_a36b4434c61649be8ca1e16419f0f601-20" name="rest_code_a36b4434c61649be8ca1e16419f0f601-20" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_a36b4434c61649be8ca1e16419f0f601-20"></a> <span class="o">...</span>
</pre></div>
<p>The main point is that we are capturing the type of output using a type parameter that our static type checker can track, which means that we can indeed catch the type errors that were ignored by the previous approach. I got this to work, and you can see my results in <a class="reference external" href="https://github.com/python-parsy/parsy/pull/58">this PR</a> (not merged).</p>
<p>The reason it isn’t merged is that this approach breaks down as soon as you have things like <code class="docutils literal">*args</code> or <code class="docutils literal">**kwargs</code> where the arguments need to be of different types. We have exactly that, multiple times, once you care about generics. For example, the <a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/methods_and_combinators.html#parsy.seq">seq</a> combinator takes sequence of parsers as input, and runs them in order, collecting their results. All of them are <code class="docutils literal">Parser</code> instances, so that would work fine with the previous approach, but they could all have different output types. There is no way to specify a type signature for this, as well as for <a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/methods_and_combinators.html#parsy.alt">alt</a>, <a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/methods_and_combinators.html#parsy.Parser.combine">combine</a> and <a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/methods_and_combinators.html#parsy.Parser.combine_dict">combine_dict</a>.</p>
<p>The best you can do is specify that they return <code class="docutils literal">Parser[Any]</code>. This means you are downgrading to no type checking. This problem is going to apply to all but the most trivial cases – it’s difficult to come up with many real world examples where you don’t need sequencing.</p>
<p>Some people would say “well, it works sometimes, so it is better than nothing”. The problem is that you when you start writing your parser, you may well really benefit from the type checking and start to lean on it. Then, as soon as you get beyond the level of your simple (single part) objects and are creating parsers for more complex (multiple part) objects in your language, the type checking silently disappears. Or, if you have strictness turned up high, your type checker will complain about the introduction of <code class="docutils literal">Any</code>, but you won’t be able to do anything about it.</p>
<p>Both of these are really bad developer UX in my opinion. If the type checker is going to give up and go home at 2pm on the second day of work, it would be better for it not to show up, which would push developers to lean on other, more reliable methods, like writing tests.</p>
</section>
<section id="typed-parsy-fork">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-4" role="doc-backlink">Typed Parsy fork</a></h2>
<p>So we come to my third option, which builds on the second. Can we redesign Parsy so that it doesn’t have any <code class="docutils literal">Any</code>? This would be a backwards incompatible fork that removes any API that is impossible to fully type, and attempts to provide some good enough replacements.</p>
<p>This was my most ambitious foray into static type checking in Python, and below are my notes from two perspectives – first implementation, which is important for any potential future contributors and maintainers, and secondly, usage, for the people who actually might want to use this fork.</p>
<section id="implementation-perspective">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-5" role="doc-backlink">Implementation perspective</a></h3>
<p>I didn’t complete this for reasons that will become clear. Overall, I’d say working “alongside” mypy and pyright was quite nice at points, and other times really difficult. To keep this article short, I’ve moved most of this section to footnotes. Here are the bullet points:</p>
<ul class="simple">
<li><p>You can see my results in the <a class="reference external" href="https://github.com/python-parsy/typed-parsy">typed-parsy</a> repo, especially the <a class="reference external" href="https://github.com/python-parsy/typed-parsy/blob/master/src/parsy/__init__.py">single source file</a>.</p></li>
<li><p>I dropped support for anything but <code class="docutils literal">str</code> as input type, as a simplification.</p></li>
<li><p>I discovered that pyright can really shine in various places that mypy is lacking, particularly error messages.</p></li>
<li><p>But sometimes they fight each other <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#mypy-pyright-fight" id="footnote-reference-2" role="doc-noteref"><span class="fn-bracket">[</span>2<span class="fn-bracket">]</span></a></p></li>
<li><p>I couldn’t work out how Protocols work with respect to operators and dunder methods. <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#protocols" id="footnote-reference-3" role="doc-noteref"><span class="fn-bracket">[</span>3<span class="fn-bracket">]</span></a></p></li>
<li><p>Covariance is tricky, and you have to understand it. <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#covariance" id="footnote-reference-4" role="doc-noteref"><span class="fn-bracket">[</span>4<span class="fn-bracket">]</span></a></p></li>
<li><p><a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/primitives.html#parsy.forward_declaration">forward_declaration</a> made my head explode. <a class="footnote-reference brackets" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#forward-declaration-1" id="footnote-reference-5" role="doc-noteref"><span class="fn-bracket">[</span>5<span class="fn-bracket">]</span></a></p></li>
<li><p>There are lots of places marked <code class="docutils literal">TODO</code> where I just couldn’t solve the new problems I had, even after getting rid of the most problematic code, and I had to give up and do <code class="docutils literal">type: ignore</code> quite a few times.</p></li>
</ul>
<section id="overall">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-6" role="doc-backlink">Overall</a></h4>
<p>Too much of this was just too hard, especially given we are only talking about a few hundred lines of code. It seemed much worse than doing the same thing in Haskell for some reason. This might be just because the language wasn’t designed for it. Even just the syntax for types is significantly worse.</p>
<p>I think another issue is that there is no REPL for type level work. Normally when I’m trying to debug something, <a class="reference external" href="https://lukeplant.me.uk/blog/posts/repl-python-programming-and-debugging-with-ipython/">I jump into a REPL</a>. Working with actual, concrete values is so much easier, and so much closer to the <a class="reference external" href="https://www.youtube.com/watch?v=PUv66718DII">immediate connection</a> that makes programming enjoyable.</p>
<p>An additional problem is that static type checkers have to worry about issues that may not be relevant to my code.</p>
<p>Finally, the type system we have right now for Python is so far behind what Python can actually express. But it isn’t necessarily obvious when this is the case. The answer to “why doesn’t this work” is anywhere between, “I made a dumb mistake”, “I need to learn more”, “there’s a bug in mypy” and “that’s impossible (at the moment)”.</p>
<p>As someone who needs to worry about future contributors and maintainers, these are serious issues. In addition to getting code to work, contributors would also have to get the type checks to pass, and ensure they weren’t breaking type checking for users, which is an extra burden.</p>
</section>
</section>
<section id="using-it">
<h3><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-7" role="doc-backlink">Using it</a></h3>
<p>So much for the pains of implementing typed-parsy, what would it look like for a user?</p>
<p>First, it happens that for a lot of typical usage, the user wouldn’t need to worry about types or adding type hints at all, but would still get type checking, which is great.</p>
<p>Second, for the resulting code, mypy and pyright do a very good job of checking almost every type error you would normally make in your parsers. The few places where we lose type safety are limited and don’t result in <code class="docutils literal">Any</code> escaping and trashing everything from then on.</p>
<p>However, if you do need to write type signatures, which you probably will if you have mypy settings turned up high and you want to make your own combinator functions (i.e. something that takes a Parser and returns a new Parser), which is fairly common, you’re going to need to understand a lot to create type hints that are both correct and useful.</p>
<p>In addition, to achieve all this, we had to make some big sacrifices:</p>
<section id="sequences">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-8" role="doc-backlink">Sequences</a></h4>
<p>You can’t implement <code class="docutils literal">seq</code>, <code class="docutils literal">alt</code>, <code class="docutils literal">.combine</code> or <code class="docutils literal">.combine_dict</code> in a type safe way (without degrading everything to <code class="docutils literal">Parser[Any]</code> from then on), and I had to remove them.</p>
<p>The biggest issue is <a class="reference external" href="https://parsy.readthedocs.io/en/latest/ref/methods_and_combinators.html?highlight=seq#parsy.seq">seq</a>, and especially the convenience of the keyword argument version to name things. The alternative I came up with – using <code class="docutils literal">&</code> operator for creating a tuple of two results – does work, but turns out to be pretty ugly.</p>
<p>Below are some incomplete extracts from the <a class="reference external" href="https://parsy.readthedocs.io/en/latest/howto/other_examples.html#sql-select-statement-parser">SQL SELECT example</a>, which illustrate the readability loss fairly well. We have some enum types and dataclasses to hold Abstract Syntax Tree nodes for a SQL parser:</p>
<div class="code"><pre class="code python"><a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-1" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-1"></a><span class="k">class</span> <span class="nc">Operator</span><span class="p">(</span><span class="n">enum</span><span class="o">.</span><span class="n">Enum</span><span class="p">):</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-2" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-2"></a> <span class="n">EQ</span> <span class="o">=</span> <span class="s2">"="</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-3" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-3"></a> <span class="n">LT</span> <span class="o">=</span> <span class="s2">"<"</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-4" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-4"></a> <span class="n">GT</span> <span class="o">=</span> <span class="s2">">"</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-5" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-5"></a> <span class="n">LTE</span> <span class="o">=</span> <span class="s2">"<="</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-6" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-6"></a> <span class="n">GTE</span> <span class="o">=</span> <span class="s2">">="</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-7" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-7"></a>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-8" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-8"></a><span class="nd">@dataclass</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-9" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-9" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-9"></a><span class="k">class</span> <span class="nc">Comparison</span><span class="p">:</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-10" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-10" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-10"></a> <span class="n">left</span><span class="p">:</span> <span class="n">ColumnExpression</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-11" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-11" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-11"></a> <span class="n">operator</span><span class="p">:</span> <span class="n">Operator</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-12" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-12" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-12"></a> <span class="n">right</span><span class="p">:</span> <span class="n">ColumnExpression</span>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-13" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-13" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-13"></a>
<a id="rest_code_b7f371fd4ef946c2a29bfc78476822c1-14" name="rest_code_b7f371fd4ef946c2a29bfc78476822c1-14" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_b7f371fd4ef946c2a29bfc78476822c1-14"></a><span class="c1"># dataclass for Select etc</span>
</pre></div>
<p>We then have a bunch of parsers for different components, which we assemble into larger parsers for bigger things, like <code class="docutils literal">Comparison</code> or <code class="docutils literal">Select</code>. With normal parsy it looks like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_8d1842939dbb407299aa71282a5ffe41-1" name="rest_code_8d1842939dbb407299aa71282a5ffe41-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-1"></a><span class="n">comparison</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">seq</span><span class="p">(</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-2" name="rest_code_8d1842939dbb407299aa71282a5ffe41-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-2"></a> <span class="n">left</span><span class="o">=</span><span class="n">column_expr</span> <span class="o"><<</span> <span class="n">padding</span><span class="p">,</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-3" name="rest_code_8d1842939dbb407299aa71282a5ffe41-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-3"></a> <span class="n">operator</span><span class="o">=</span><span class="n">P</span><span class="o">.</span><span class="n">from_enum</span><span class="p">(</span><span class="n">Operator</span><span class="p">),</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-4" name="rest_code_8d1842939dbb407299aa71282a5ffe41-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-4"></a> <span class="n">right</span><span class="o">=</span><span class="n">padding</span> <span class="o">>></span> <span class="n">column_expr</span><span class="p">,</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-5" name="rest_code_8d1842939dbb407299aa71282a5ffe41-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-5"></a><span class="p">)</span><span class="o">.</span><span class="n">combine_dict</span><span class="p">(</span><span class="n">Comparison</span><span class="p">)</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-6" name="rest_code_8d1842939dbb407299aa71282a5ffe41-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-6"></a>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-7" name="rest_code_8d1842939dbb407299aa71282a5ffe41-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-7"></a><span class="n">SELECT</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"SELECT"</span><span class="p">)</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-8" name="rest_code_8d1842939dbb407299aa71282a5ffe41-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-8"></a><span class="n">FROM</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"FROM"</span><span class="p">)</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-9" name="rest_code_8d1842939dbb407299aa71282a5ffe41-9" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-9"></a><span class="n">WHERE</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">"WHERE"</span><span class="p">)</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-10" name="rest_code_8d1842939dbb407299aa71282a5ffe41-10" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-10"></a>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-11" name="rest_code_8d1842939dbb407299aa71282a5ffe41-11" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-11"></a><span class="n">select</span> <span class="o">=</span> <span class="n">P</span><span class="o">.</span><span class="n">seq</span><span class="p">(</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-12" name="rest_code_8d1842939dbb407299aa71282a5ffe41-12" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-12"></a> <span class="n">_select</span><span class="o">=</span><span class="n">SELECT</span> <span class="o">+</span> <span class="n">space</span><span class="p">,</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-13" name="rest_code_8d1842939dbb407299aa71282a5ffe41-13" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-13"></a> <span class="n">columns</span><span class="o">=</span><span class="n">column_expr</span><span class="o">.</span><span class="n">sep_by</span><span class="p">(</span><span class="n">padding</span> <span class="o">+</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span> <span class="o">+</span> <span class="n">padding</span><span class="p">,</span> <span class="nb">min</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-14" name="rest_code_8d1842939dbb407299aa71282a5ffe41-14" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-14"></a> <span class="n">_from</span><span class="o">=</span><span class="n">space</span> <span class="o">+</span> <span class="n">FROM</span> <span class="o">+</span> <span class="n">space</span><span class="p">,</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-15" name="rest_code_8d1842939dbb407299aa71282a5ffe41-15" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-15"></a> <span class="n">table</span><span class="o">=</span><span class="n">table</span><span class="p">,</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-16" name="rest_code_8d1842939dbb407299aa71282a5ffe41-16" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-16"></a> <span class="n">where</span><span class="o">=</span><span class="p">(</span><span class="n">space</span> <span class="o">>></span> <span class="n">WHERE</span> <span class="o">>></span> <span class="n">space</span> <span class="o">>></span> <span class="n">comparison</span><span class="p">)</span><span class="o">.</span><span class="n">optional</span><span class="p">(),</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-17" name="rest_code_8d1842939dbb407299aa71282a5ffe41-17" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-17"></a> <span class="n">_end</span><span class="o">=</span><span class="n">padding</span> <span class="o">+</span> <span class="n">P</span><span class="o">.</span><span class="n">string</span><span class="p">(</span><span class="s2">";"</span><span class="p">),</span>
<a id="rest_code_8d1842939dbb407299aa71282a5ffe41-18" name="rest_code_8d1842939dbb407299aa71282a5ffe41-18" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_8d1842939dbb407299aa71282a5ffe41-18"></a><span class="p">)</span><span class="o">.</span><span class="n">combine_dict</span><span class="p">(</span><span class="n">Select</span><span class="p">)</span>
</pre></div>
<p>There are some things you need to understand: <code class="docutils literal">seq</code> runs a sequence of parsers in order, and with its keyword arguments version allows you to give names to each one, to produce a dictionary of results. <code class="docutils literal">.combine_dict</code> then passes these to a callable using <code class="docutils literal">**kwargs</code> syntax.</p>
<p><code class="docutils literal">.combine_dict</code> also has a neat trick of skipping items whose names start with
underscores, to allow you to deal with things that you need to parse but want to
discard, like <code class="docutils literal">_select</code>, <code class="docutils literal">_from</code> and <code class="docutils literal">_end</code> above. Notice how easy it is
to read the <code class="docutils literal">select</code> parser and see what things we are picking out.</p>
<p>With typed-parsy, this was the best I could do, formatted using Black:</p>
<div class="code"><pre class="code python"><a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-1" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-1"></a><span class="n">comparison</span> <span class="o">=</span> <span class="p">(</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-2" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-2"></a> <span class="p">(</span><span class="n">column_expr</span> <span class="o"><<</span> <span class="n">padding</span><span class="p">)</span> <span class="o">&</span> <span class="n">P</span><span class="o">.</span><span class="n">from_enum</span><span class="p">(</span><span class="n">Operator</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">padding</span> <span class="o">>></span> <span class="n">column_expr</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-3" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-3"></a><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">t</span><span class="p">:</span> <span class="n">Comparison</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">operator</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">],</span> <span class="n">right</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-4" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-4"></a>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-5" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-5"></a><span class="n">SELECT</span> <span class="o">=</span> <span class="n">string</span><span class="p">(</span><span class="s2">"SELECT"</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-6" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-6"></a><span class="n">FROM</span> <span class="o">=</span> <span class="n">string</span><span class="p">(</span><span class="s2">"FROM"</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-7" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-7"></a><span class="n">WHERE</span> <span class="o">=</span> <span class="n">string</span><span class="p">(</span><span class="s2">"WHERE"</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-8" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-8"></a>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-9" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-9" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-9"></a><span class="n">select</span> <span class="o">=</span> <span class="p">(</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-10" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-10" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-10"></a> <span class="p">(</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-11" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-11" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-11"></a> <span class="p">(</span><span class="n">SELECT</span> <span class="o">+</span> <span class="n">space</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-12" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-12" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-12"></a> <span class="o">>></span> <span class="p">(</span><span class="n">column_expr</span><span class="o">.</span><span class="n">sep_by</span><span class="p">(</span><span class="n">padding</span> <span class="o">+</span> <span class="n">string</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span> <span class="o">+</span> <span class="n">padding</span><span class="p">,</span> <span class="nb">min</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-13" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-13" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-13"></a> <span class="o"><<</span> <span class="p">(</span><span class="n">space</span> <span class="o">+</span> <span class="n">FROM</span> <span class="o">+</span> <span class="n">space</span><span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-14" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-14" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-14"></a> <span class="p">)</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-15" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-15" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-15"></a> <span class="o">&</span> <span class="n">table</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-16" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-16" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-16"></a> <span class="o">&</span> <span class="p">((</span><span class="n">space</span> <span class="o">>></span> <span class="n">WHERE</span> <span class="o">>></span> <span class="n">space</span> <span class="o">>></span> <span class="n">comparison</span><span class="p">)</span><span class="o">.</span><span class="n">optional</span><span class="p">()</span> <span class="o"><<</span> <span class="p">(</span><span class="n">padding</span> <span class="o">+</span> <span class="n">string</span><span class="p">(</span><span class="s2">";"</span><span class="p">)))</span>
<a id="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-17" name="rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-17" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_e2c8a92c062c4a04af4c35ab6421bcf4-17"></a><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">t</span><span class="p">:</span> <span class="n">Select</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">table</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">],</span> <span class="n">where</span><span class="o">=</span><span class="n">t</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
</pre></div>
<p>This is a pretty massive regression. We can’t use commas to separate parts as before, and we can’t use keyword arguments to name components any more. Instead we have to use tuples, and when we end up with nested tuples it’s awful – I literally couldn’t work out how to write the tuple indexing correctly and had to just keep guessing until I got it right. And that’s just with 3 items, some parsers might have many more items in a sequence, each of which adds another level of nesting.</p>
<p>This might have been significantly better if we still had tuple unpacking within function/lambdas signatures (which was removed in Python 3), but still not very nice.</p>
<p>It is kind of impressive that mypy and pyright will handle all this and tell you about type violations very reliably. This is possible because they have support for indexing tuples i.e. in the above statements it can tell you what the types of <code class="docutils literal">t[0]</code>, <code class="docutils literal"><span class="pre">t[0][1]</span></code> etc are. In an IDE, tools like pyright will tell you what you need to supply for <code class="docutils literal">.map()</code> – for example for the <code class="docutils literal">select</code> statement, inside the final <code class="docutils literal">.map</code> call:</p>
<pre class="literal-block">(map_fn: (tuple[tuple[list[Field | String | Number], Table], Comparison | None]) -> OUT2@map) -> Parser[OUT2@map]",</pre>
<p>But this isn’t my idea of developer friendly. The loss of readability is huge, even for simple cases.</p>
<p>For comparison, I looked at <a class="reference external" href="https://funcparserlib.pirx.ru/">funcparserlib</a>, the Python parsing library closest to Parsy. They claim “fully typed”, but it turns out that their sequencing operator, which returns only tuples and so isn’t as usable as <code class="docutils literal">seq</code>, flattens nested tuples. This is much better for usability, but is also impossible to type, so they introduce <code class="docutils literal">Any</code> <a class="reference external" href="https://github.com/vlasovskikh/funcparserlib/blob/5af4f8cc445d3f919590b8729430c576d4426917/funcparserlib/parser.pyi#L66">at this point</a>, and so lose type checking, the thing I’ve been trying to avoid.</p>
<p><strong>UPDATE 2022-11-24:</strong> I had another idea about how to approach this. Parsy could provide <code class="docutils literal">seq2</code>, <code class="docutils literal">seq3</code>, <code class="docutils literal">seq4</code> etc. combinators which return 2-tuples, 3-tuples, 4-tuples etc. These functions, which would have to be implemented the long way, would handle the nested tuple unpacking for you, without losing type safety. As an overload, they could also optionally include the functionality of <code class="docutils literal">.combine</code> without loss of safety, and in this way you would get close to the usability of the <code class="docutils literal"><span class="pre">seq(*args)</span></code> version, with just the annoyance that you have to change the function if you want to add another argument. This still wouldn’t give you the usability of the keyword argument version of <code class="docutils literal">seq</code>, but it would probably be a significant improvement.</p>
</section>
<section id="error-messages">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-9" role="doc-backlink">Error messages</a></h4>
<p>If you are relying on types to fix you up, instead of readable code, then you depend on the error messages your type checker will emit. Below is an example of an error I got when attempting to port the example code in <code class="docutils literal">simple_logo_lexer.py</code>, which has a function <code class="docutils literal">flatten_list</code>:</p>
<pre class="literal-block">Argument 1 to "map" of "Parser" has incompatible type
"Callable[[List[List[T]]], List[T]]"; expected
"Callable[[List[Tuple[Tuple[str, int], str]]], List[T]]"</pre>
<p>Here was another one I hit:</p>
<pre class="literal-block">Argument 1 to "map" of "Parser" has incompatible type
"Callable[[Tuple[List[List[Tuple[List[List[OUT]], List[OUT]]]],
List[Tuple[List[List[OUT]], List[OUT]]]]], List[List[Tuple[List[List[OUT]],
List[OUT]]]]]";
expected "Callable[[Tuple[List[List[Tuple[List[List[OUT]],
List[OUT]]]], List[Tuple[List[List[OUT]], List[OUT]]]]], List[OUT]]"</pre>
<p>I can’t remember what mistake produced that, but I did notice that the code worked perfectly at runtime. Also, pyright didn’t complain, only mypy. This is the kind of thing that makes people hate static typing.</p>
<p>One of the main principles of parsy is that it should be very easy to pull data into appropriate containers where every field is <strong>named</strong>, as well as <strong>typed</strong> – rather than a parse that returns lists of nested lists and tuples and dicts. This is why <code class="docutils literal">namedtuple</code> is a big improvement over <code class="docutils literal">tuple</code>, and <a class="reference external" href="https://docs.python.org/3/library/dataclasses.html">dataclasses</a> are a big step up again. But at the level of types and error messages, it seems we are back in the dark ages, with nested tuples and lists everywhere.</p>
</section>
<section id="generate-decorator">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-10" role="doc-backlink">@generate decorator</a></h4>
<p>Being able to add conditional logic and control flow into parsers is really important for some cases, and Parsy has an elegant solution in the form of the <a class="reference external" href="https://parsy.readthedocs.io/en/latest/tutorial.html#using-previously-parsed-values">@generate decorator</a>. Getting this to work in typed-parsy turned out to be only partially possible.</p>
<p>The first issue is that, unlike other ways of building up parsers, the user will need to write a type signature to get type checking, and it’s complex enough that there is no reasonable way for someone to understand what type signature they need without looking up the docs, which is poor usability.</p>
<p>Having done so, they can get type checking on the return type of the parser, and code that uses that parser. However, they get no type checking related to parsers used within the function. The <code class="docutils literal">Generator</code> type assumes a homogeneous stream of yield and send types, whereas we have pairs of yield/send types which need to match within the pair, but each pair can be completely different from the next in the stream.</p>
<p>Since you can’t sacrifice <code class="docutils literal">@generate</code> without major loss of functionality/usability, you have to live with the fact that you do not have type safety in the body of a <code class="docutils literal">@generate</code> function.</p>
</section>
<section id="overall-1">
<h4><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-11" role="doc-backlink">Overall</a></h4>
<p>This did not feel like an upgrade for a user, but rather like a pretty big downgrade. typed-parsy was definitely going to be worse than parsy, so I stopped working on it.</p>
<p>Which brings me to my last approach:</p>
</section>
</section>
</section>
<section id="types-for-documentation">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-12" role="doc-backlink">Types for documentation</a></h2>
<p>At some point along the way, I noticed that for the original version of parsy, with no type hints at all, my language server (pyright) was able to correctly infer return types of all the methods that returned <code class="docutils literal">Parser</code> instances. The types it was inferring were the same as I would have added in the very first section, simple <code class="docutils literal">Parser</code> objects, and that meant it could reliably give help with chained method calls, which is pretty nice.</p>
<p>The biggest problems were that the docstrings weren’t helpful (in most cases missing), and that for many parameters it wasn’t entirely obvious what type of object you should be passing in.</p>
<p>So, using a small amount of effort, we could improve usability a lot. We can add those dosctrings, and add type hints that are about the same level of types that pyright was inferring anyway, just a bit more complete.</p>
<p>The one thing I don’t want to do is imply that these types bring Parsy code up to the level of being “type checked”, but there is a simple way I can do that – by not including a <code class="docutils literal">py.typed</code> <a class="reference external" href="https://peps.python.org/pep-0561/">marker file</a> to my package.</p>
<p>So, I’m back at the beginning, but with a different aim. Now, it’s not about helping automated static type checkers – without a <code class="docutils literal">py.typed</code> marker in the module they basically ignore the types – it’s about improving usability for developers as they write. I’ve done this work in the master branch now, and will hopefully release it soon.</p>
<p>There is another interesting advantage to this: because I’ve given up on static type checks and I’m not using static type checking for internal use in Parsy at all, I can be a bit looser with the type hints, allowing for greater readability. For example, I can annotate an optional string argument as <code class="docutils literal">arg: str = None</code>, even though that’s not strictly compliant with PEP 484. In other places I can use slightly simplified type hints for the sake of having something that doesn’t make your eyes glaze over, even if it doesn’t show all the possibilities – such as saying that input types can be <code class="docutils literal">str | bytes | list</code>, when technically I should be writing something much more abstract and complicated using <code class="docutils literal">Sequence</code>.</p>
</section>
<section id="conclusion">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-13" role="doc-backlink">Conclusion</a></h2>
<p>In the end, type hints didn’t work out for use by a static type checker. But as clear and concise documentation for humans that pop up in a code editor, they worked well. Unless or until there are some big improvements in what it is possible to express with static types in Python, this seems to be the best solution for Parsy.</p>
<p>I learnt a lot about the limitations of typing in Python along the way, and hope you found this helpful too!</p>
</section>
<hr class="docutils">
<section id="links">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-14" role="doc-backlink">Links</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="https://lobste.rs/s/1elwat/python_type_hints_case_study_on_parsy">Discussion of this posts on Lobsters</a></p></li>
</ul>
</section>
<hr class="docutils">
<section id="footnotes">
<h2><a class="toc-backref" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#toc-entry-15" role="doc-backlink">Footnotes</a></h2>
<aside class="footnote brackets" id="strings" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnote-reference-1">1</a><span class="fn-bracket">]</span></span>
<p>Parser input:</p>
<p>Actually Parsy supports bytes and in fact any sequence of objects as input. But I’m ignoring that for the rest of the post for the sake of simplicity.</p>
</aside>
<aside class="footnote brackets" id="mypy-pyright-fight" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnote-reference-2">2</a><span class="fn-bracket">]</span></span>
<p>mypy and pyright can fight: For example, when implementing the operators that discard one of the parsers, pyright complained (I think rightly) about unused generic parameters. When I removed them, mypy wanted them put back in!</p>
</aside>
<aside class="footnote brackets" id="protocols" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnote-reference-3">3</a><span class="fn-bracket">]</span></span>
<p>Protocols and operators:</p>
<p>I wanted some way of expressing “an object that supports addition”, or actually “an object that supports addition and returns an object of the same type”. I needed this for the <code class="docutils literal">+</code> operator, which you can use for both things like <code class="docutils literal">Parser[str]</code> and <code class="docutils literal">Parser[list[str]]</code>.</p>
<p>But I couldn’t get this minimal test case to work:</p>
<div class="code"><pre class="code python"><a id="rest_code_18c15bf497184d5f8a05d61baec10e58-1" name="rest_code_18c15bf497184d5f8a05d61baec10e58-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-1"></a><span class="k">class</span> <span class="nc">Addable</span><span class="p">(</span><span class="n">Protocol</span><span class="p">[</span><span class="n">T</span><span class="p">]):</span>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-2" name="rest_code_18c15bf497184d5f8a05d61baec10e58-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-2"></a> <span class="nd">@abstractmethod</span>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-3" name="rest_code_18c15bf497184d5f8a05d61baec10e58-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-3"></a> <span class="k">def</span> <span class="fm">__add__</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span> <span class="n">other</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="o">-></span> <span class="n">T</span><span class="p">:</span>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-4" name="rest_code_18c15bf497184d5f8a05d61baec10e58-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-4"></a> <span class="k">pass</span>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-5" name="rest_code_18c15bf497184d5f8a05d61baec10e58-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-5"></a>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-6" name="rest_code_18c15bf497184d5f8a05d61baec10e58-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-6"></a>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-7" name="rest_code_18c15bf497184d5f8a05d61baec10e58-7" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-7"></a><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="n">Addable</span><span class="p">[</span><span class="n">T</span><span class="p">],</span> <span class="n">y</span><span class="p">:</span> <span class="n">Addable</span><span class="p">[</span><span class="n">T</span><span class="p">])</span> <span class="o">-></span> <span class="n">T</span><span class="p">:</span>
<a id="rest_code_18c15bf497184d5f8a05d61baec10e58-8" name="rest_code_18c15bf497184d5f8a05d61baec10e58-8" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_18c15bf497184d5f8a05d61baec10e58-8"></a> <span class="k">return</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
</pre></div>
<p>Mypy reports:</p>
<pre class="literal-block">Unsupported operand types for + ("Addable[T]" and "Addable[T]") [operator]</pre>
<p>This is probably my fault, but I couldn’t work it out, I obviously need to understand more about protocols.</p>
</aside>
<aside class="footnote brackets" id="covariance" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnote-reference-4">4</a><span class="fn-bracket">]</span></span>
<p>Covariance:</p>
<p>As a replacement for the variadic <code class="docutils literal">seq</code> which produces a hetereogeneous list (in its simplest form), I added the <code class="docutils literal">&</code> operator which returns a tuple. This can be typed correctly, because unlike <code class="docutils literal">list</code>/<code class="docutils literal">typing.List</code> which is considered to be a homogeneous container, <code class="docutils literal">tuple</code>/<code class="docutils literal">typing.Tuple</code> is treated as a product type, like it is in other languages.</p>
<p>This pairs well with the existing <code class="docutils literal">|</code> operator for alternatives, which produces a union (or sum type).</p>
<div class="code"><pre class="code python"><a id="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-1" name="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-1"></a><span class="k">def</span> <span class="fm">__or__</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT1</span><span class="p">],</span> <span class="n">other</span><span class="p">:</span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT2</span><span class="p">])</span> <span class="o">-></span> <span class="n">Parser</span><span class="p">[</span><span class="n">Union</span><span class="p">[</span><span class="n">OUT1</span><span class="p">,</span> <span class="n">OUT2</span><span class="p">]]:</span>
<a id="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-2" name="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-2"></a> <span class="o">...</span>
<a id="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-3" name="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-3"></a>
<a id="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-4" name="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-4"></a><span class="k">def</span> <span class="fm">__and__</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT1</span><span class="p">],</span> <span class="n">other</span><span class="p">:</span> <span class="n">Parser</span><span class="p">[</span><span class="n">OUT2</span><span class="p">])</span> <span class="o">-></span> <span class="n">Parser</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="n">OUT1</span><span class="p">,</span> <span class="n">OUT2</span><span class="p">]]:</span>
<a id="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-5" name="rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_71ee4be3c81d4ae1a0cca083b495d5ff-5"></a> <span class="o">...</span>
</pre></div>
<p>In implementing the first, however, both mypy and pyright complain:</p>
<pre class="literal-block">Expression of type "Result[OUT1@__or__]" cannot be assigned to return type
"Result[OUT1@__or__ | OUT2@__or__]"</pre>
<p>The issue here is that the type checker needs to know that a <code class="docutils literal">Result[A]</code> is a sub-type of a <code class="docutils literal">Result[A | B]</code>, which is only true if <code class="docutils literal">Result</code> is <strong>covariant</strong> with respect to that type parameter. Since it’s an immutable container of a single item, whose only operation is that you can extract the item, it is covariant.</p>
<p>My first attempt to fix this, however, was to change all the type parameters to <code class="docutils literal">covariant=True</code>, which gave me a ton more problems – both mypy and pyright complaining "covariant type variable cannot be used in parameter type" in many places.</p>
<p>It turns out, after a lot of attempts, head scratching, and finally the fresh take involved in writing a blog post, I just needed to use a covariant type parameter only for the definition of <code class="docutils literal">Result</code>:</p>
<div class="code"><pre class="code python"><a id="rest_code_5146c044a4e44be4bcb9261148306ddd-1" name="rest_code_5146c044a4e44be4bcb9261148306ddd-1" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-1"></a><span class="n">OUT_co</span> <span class="o">=</span> <span class="n">TypeVar</span><span class="p">(</span><span class="s2">"OUT_co"</span><span class="p">,</span> <span class="n">covariant</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<a id="rest_code_5146c044a4e44be4bcb9261148306ddd-2" name="rest_code_5146c044a4e44be4bcb9261148306ddd-2" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-2"></a>
<a id="rest_code_5146c044a4e44be4bcb9261148306ddd-3" name="rest_code_5146c044a4e44be4bcb9261148306ddd-3" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-3"></a><span class="nd">@dataclass</span>
<a id="rest_code_5146c044a4e44be4bcb9261148306ddd-4" name="rest_code_5146c044a4e44be4bcb9261148306ddd-4" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-4"></a><span class="k">class</span> <span class="nc">Result</span><span class="p">(</span><span class="n">Generic</span><span class="p">[</span><span class="n">OUT_co</span><span class="p">]):</span>
<a id="rest_code_5146c044a4e44be4bcb9261148306ddd-5" name="rest_code_5146c044a4e44be4bcb9261148306ddd-5" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-5"></a> <span class="n">value</span><span class="p">:</span> <span class="n">OUT_co</span>
<a id="rest_code_5146c044a4e44be4bcb9261148306ddd-6" name="rest_code_5146c044a4e44be4bcb9261148306ddd-6" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#rest_code_5146c044a4e44be4bcb9261148306ddd-6"></a> <span class="o">...</span>
</pre></div>
<p>I think this is now correct, and the fact that I’m not using the <code class="docutils literal">OUT_co</code> type parameter in all the other places is not a problem, but to be honest I’m not entirely sure. Both mypy and pyright now seem happy, so I think I’m done?</p>
</aside>
<aside class="footnote brackets" id="forward-declaration-1" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="https://lukeplant.me.uk/blog/posts/python-type-hints-parsy-case-study/#footnote-reference-5">5</a><span class="fn-bracket">]</span></span>
<p><code class="docutils literal">forward_declaration</code>:</p>
<p>This is a neat way of untying recursive definitions. In typical uses, I think the type hint for the object produced also requires a recursive definition. Every time I looked at to work out how you would declare a parser using <code class="docutils literal">forward_declaration</code> such that a type checker could still work, my brain just shut down. I think it would probably require users having to use forward references at the type level, but might also hit some unsolvable problems in how that interacts with <code class="docutils literal">forward_declaration</code>.</p>
</aside>
</section>Tools for rewriting Python codehttps://lukeplant.me.uk/blog/posts/tools-for-rewriting-python-code/2022-11-16T16:07:02Z2022-11-16T16:07:02ZLuke Plant<p>A collection of tools that can be run to automatically rewrite Python code in a number of ways</p><p>When writing (or reviewing) code, you have better things to do than concern
yourself with low-level details about coding style or other changes that are
essentially mechanical in nature. Thankfully, the tooling ecosystem for doing
these kind of boring changes to Python code has become much stronger in the past
few years.</p>
<p>Below is my collection, with some alternatives and recommendations. These all go
beyond being <a class="reference external" href="https://en.wikipedia.org/wiki/Lint_(software)">linters</a>, which
only report problems, to being able to fix your code automatically. Most of
these work really well with tools like <a class="reference external" href="https://pre-commit.com/">pre-commit</a>
so that by the time you come to code review, all the boring stuff is already
fixed.</p>
<section id="formatting-and-coding-style">
<h2>Formatting and coding style</h2>
<ul>
<li><p><a class="reference external" href="https://github.com/psf/black">Black</a> is probably the most populate Python
code formatter today.</p>
<p><a class="reference external" href="https://github.com/google/yapf">YAPF</a> is another with a similar ethos to
Black, but less popular AFAIK, and I don’t use it.</p>
</li>
<li><p><a class="reference external" href="https://github.com/hhatto/autopep8">autopep8</a> doesn’t go as far as Black or
YAPF - it fixes <a class="reference external" href="https://pep8.org/">PEP8</a> violations but otherwise leaves
your code alone. This is useful for cases where people aren’t quite ready for
Black yet.</p></li>
<li><p><a class="reference external" href="https://github.com/PyCQA/isort">isort</a> and <a class="reference external" href="https://github.com/asottile/reorder_python_imports">reorder_python_imports</a> will sort your Python
imports for you.</p>
<p>I personally prefer the former, isort. <code class="docutils literal">reorder_python_imports</code> has a much
more verbose style, resulting in many lines for imports. This is useful for
reducing merge conflicts, but with the other tools listed here, I don’t find
those much of a problem – if you aren’t sure which imports are still needed,
include them all and let isort remove the duplicates, and autoflake remove the
unneeded ones.</p>
</li>
<li><p><a class="reference external" href="https://github.com/spookylukey/table-format">table-format</a> makes it easy
to have aligned columns in your Python source code.</p></li>
<li><p><a class="reference external" href="https://github.com/astral-sh/ruff">ruff</a> is a more recent tool that
combines quite a few tools into one — isort, flake8, Black, flynt and others —
and is very fast as well.</p></li>
</ul>
</section>
<section id="upgrades">
<h2>Upgrades</h2>
<p>The following tools will do upgrades on your code:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/asottile/pyupgrade">pyupgrade</a> – moves code to the most modern Python idioms.</p></li>
<li><p><a class="reference external" href="https://github.com/PyCQA/isort">flynt</a> – rewrites older string formatting
code using <code class="docutils literal">%</code> to use <code class="docutils literal">.format</code> and/or f-strings where possible.</p></li>
<li><p><a class="reference external" href="https://github.com/adamchainz/django-upgrade">django-upgrade</a> and
<a class="reference external" href="https://github.com/browniebroke/django-codemod">django-codemod</a> – include
various fixes for breaking changes or new features in Django.</p></li>
<li><p><a class="reference external" href="https://github.com/asottile/setup-py-upgrade">setup-py-upgrade</a> – upgrades
your <code class="docutils literal">setup.py</code> to a <code class="docutils literal">setup.cfg</code> file.</p></li>
</ul>
</section>
<section id="type-hints">
<h2>Type hints</h2>
<ul>
<li><p><a class="reference external" href="https://github.com/instagram/MonkeyType">Monkeytype</a> and <a class="reference external" href="https://github.com/dropbox/pyannotate">pyannotate</a> – add type hints based on
instrumented test suite runs.</p></li>
<li><p><a class="reference external" href="https://github.com/google/pytype">pytype</a> – this does type checking and
produces <code class="docutils literal">.pyi</code> files based on inference, and also includes a <code class="docutils literal"><span class="pre">merge-pyi</span></code>
tool that can merge <code class="docutils literal">.pyi</code> files into <code class="docutils literal">.py</code> files.</p></li>
<li><p><a class="reference external" href="https://github.com/JelleZijlstra/autotyping">autotyping</a> – a tool to add
type hints for various cases where this can be done automatically.</p>
<p>(As a comment, I’m not wild about some of these automated changes. Annotating
<code class="docutils literal">__str__</code> with <code class="docutils literal"><span class="pre">-></span> str</code>, when <a class="reference external" href="https://docs.python.org/3/reference/datamodel.html#object.__str__">it is required to be a str</a>, seems
like a failure of our static typing tools, and it adds a lot of noise.)</p>
</li>
<li><p><a class="reference external" href="https://github.com/hauntsaninja/no_implicit_optional">no_implicit_optional</a>
– a small tool to make some type hints more compliant with <a class="reference external" href="https://peps.python.org/pep-0484/">PEP 484</a>.</p></li>
</ul>
</section>
<section id="refactoring">
<h2>Refactoring</h2>
<p>Many IDEs/editors provide a bunch of tools to rewrite Python code (for example
doing renames), often by integrating with <a class="reference external" href="https://en.wikipedia.org/wiki/Language_Server_Protocol">language servers</a>.</p>
<p>In VSCode, the default is <a class="reference external" href="https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance">Pylance</a>
which is proprietary and can only be used with VSCode. However, <a class="reference external" href="https://github.com/microsoft/pyright">pyright</a> powers most of its functionality, and
is Open Source. As well as being a command line static type checker, it also
functions as a language server, and it’s the one I use from Emacs at the moment.</p>
<p>One of the issues I find is that these is that they can be hard to use from the
command line, to be able to do more automated refactoring – in fact I haven’t
found a good way to do so, other than scripting things using <a class="reference external" href="https://en.wikipedia.org/wiki/Emacs_Lisp">elisp</a>.</p>
<p>So here are some other tools that are designed for more stand-alone use and have
some refactoring features:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/python-rope/rope">rope</a></p></li>
<li><p><a class="reference external" href="https://github.com/davidhalter/jedi/">jedi</a></p></li>
</ul>
</section>
<section id="other">
<h2>Other</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/PyCQA/autoflake">autoflake</a> – remove unused imports.</p></li>
<li><p><a class="reference external" href="https://github.com/Instagram/Fixit">Fixit</a> – custom linting rules with automatic fixes.</p></li>
<li><p><a class="reference external" href="https://github.com/Zac-HD/shed">shed</a> – bundles together a few of the above.</p></li>
</ul>
</section>
<section id="write-your-own">
<h2>Write your own</h2>
<p>Finally, there are great libraries like <a class="reference external" href="https://libcst.readthedocs.io/en/latest/index.html">libCST</a> that will help you to
manipulate Python code but without losing comments etc., so that writing your
own tool to do this is no longer a massive task.</p>
<p>Also looking for packages that depend on LibCST, <a class="reference external" href="https://github.com/Instagram/LibCST/network/dependents?dependent_type=PACKAGE">on GitHub</a>
or on <a class="reference external" href="https://libraries.io/pypi/libcst/dependents">libraries.io</a>, is a great
way to find more tools like this.</p>
<p>Have fun writing code to fix your code!</p>
</section>Better Python code grepping with pyastgrephttps://lukeplant.me.uk/blog/posts/grep-python-syntax-using-ast-pyastgrep/2022-11-07T09:37:46+01:002022-11-07T09:37:46+01:00Luke Plant<p>Release announcement for pyastgrep, a tool for grepping Python code at the syntax level.</p><p>A few weeks ago I released <a class="reference external" href="https://github.com/spookylukey/pyastgrep">pyastgrep</a>, a tool for grepping Python code at
the syntax level (using AST - Abstract Syntax Trees), and today I released some
more improvements.</p>
<p>It builds on an earlier tool, <a class="reference external" href="https://github.com/hchasestevens/astpath">astpath</a> which now appears to be abandoned,
and also had quite a few bugs. I’ve fixed lots of things and re-written quite a
bit internally in backwards incompatible ways, so a fork was the easiest way
forward. I’ve also been able to make lots of improvements to default behaviour,
inspired by tools like ripgrep. For example, it automatically exclude paths that
match discovered <code class="docutils literal">.gitignore</code> files, which can make a massive performance
improvement in some cases (looking at you, <code class="docutils literal">node_modules</code>).</p>
<p>This tool can be really useful for certain kind of linting or one off
maintenance tasks e.g. “find me all loop variables called <code class="docutils literal">i</code> or <code class="docutils literal">j</code>”:</p>
<pre class="literal-block">pyastgrep './/For/target//Name[@id="i" or @id="j"]'</pre>
<p>With a bit more work, and more understanding of XPath expressions, you can do
some fairly advanced things. See the docs for more examples.</p>
<p>Hope you find it useful!</p>Raising exceptions or returning error objects in Pythonhttps://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/2022-06-06T11:29:35+01:002022-06-06T11:29:35+01:00Luke Plant<p>How returning error objects can provide some advantages over raising exceptions in Python, such as for static type checking tools.</p><p>The other day I got a question about some old code I had written which, instead
of raising an exception for an error condition as the reader expected, returned
an error object:</p>
<blockquote>
<p>With your EmailVerifyTokenGenerator class, why do you return error classes
instead of raising custom errors? You could still pass the email to a custom
VerifyExpired exception.</p>
<p><a class="reference external" href="https://github.com/cciw-uk/cciw.co.uk/blob/eae8005feb95a5383663e69e92d80e11effe5ee6/cciw/bookings/email.py#L41">https://github.com/cciw-uk/cciw.co.uk/blob/eae8005feb95a5383663e69e92d80e11effe5ee6/cciw/bookings/email.py#L41</a></p>
<p>I think I'm too eager to raise errors but maybe there's something I'm missing with classes 😁!</p>
</blockquote>
<p>The code in question is below (slightly modified and with several uninteresting
methods removed). It is part of a system for doing email address verification
via magic links in emails.</p>
<div class="code"><pre class="code python"><a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-1" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-1"></a><span class="kn">from</span> <span class="nn">dataclasses</span> <span class="kn">import</span> <span class="n">dataclass</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-2" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-2"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-3" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-3" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-3"></a><span class="k">class</span> <span class="nc">VerifyFailed</span><span class="p">:</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-4" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-4" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-4"></a> <span class="k">pass</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-5" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-5" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-5"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-6" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-6" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-6"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-7" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-7" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-7"></a><span class="n">VerifyFailed</span> <span class="o">=</span> <span class="n">VerifyFailed</span><span class="p">()</span> <span class="c1"># singleton sentinel value</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-8" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-8" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-8"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-9" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-9" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-9"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-10" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-10" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-10"></a><span class="nd">@dataclass</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-11" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-11" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-11"></a><span class="k">class</span> <span class="nc">VerifyExpired</span><span class="p">:</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-12" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-12" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-12"></a> <span class="n">email</span><span class="p">:</span> <span class="nb">str</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-13" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-13" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-13"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-14" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-14" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-14"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-15" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-15" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-15"></a><span class="k">class</span> <span class="nc">EmailVerifyTokenGenerator</span><span class="p">:</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-16" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-16" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-16"></a> <span class="k">def</span> <span class="nf">token_for_email</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">email</span><span class="p">):</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-17" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-17" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-17"></a> <span class="o">...</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-18" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-18" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-18"></a>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-19" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-19" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-19"></a> <span class="k">def</span> <span class="nf">email_from_token</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-20" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-20" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-20"></a> <span class="sd">"""</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-21" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-21" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-21"></a><span class="sd"> Extracts the verified email address from the token, or a VerifyFailed</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-22" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-22" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-22"></a><span class="sd"> constant if verification failed, or VerifyExpired if the link expired.</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-23" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-23" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-23"></a><span class="sd"> """</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-24" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-24" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-24"></a> <span class="n">max_age</span> <span class="o">=</span> <span class="n">settings</span><span class="o">.</span><span class="n">EMAIL_VERIFY_TIMEOUT</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-25" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-25" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-25"></a> <span class="k">try</span><span class="p">:</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-26" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-26" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-26"></a> <span class="n">unencoded_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">url_safe_decode</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-27" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-27" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-27"></a> <span class="k">except</span> <span class="p">(</span><span class="ne">UnicodeDecodeError</span><span class="p">,</span> <span class="n">binascii</span><span class="o">.</span><span class="n">Error</span><span class="p">):</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-28" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-28" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-28"></a> <span class="k">return</span> <span class="n">VerifyFailed</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-29" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-29" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-29"></a> <span class="k">try</span><span class="p">:</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-30" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-30" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-30"></a> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">signer</span><span class="o">.</span><span class="n">unsign</span><span class="p">(</span><span class="n">unencoded_token</span><span class="p">,</span> <span class="n">max_age</span><span class="o">=</span><span class="n">max_age</span><span class="p">)</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-31" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-31" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-31"></a> <span class="k">except</span> <span class="p">(</span><span class="n">SignatureExpired</span><span class="p">,):</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-32" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-32" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-32"></a> <span class="k">return</span> <span class="n">VerifyExpired</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">signer</span><span class="o">.</span><span class="n">unsign</span><span class="p">(</span><span class="n">unencoded_token</span><span class="p">))</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-33" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-33" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-33"></a> <span class="k">except</span> <span class="p">(</span><span class="n">BadSignature</span><span class="p">,):</span>
<a id="rest_code_968c47b7cbd149d584a89da29fe4c56e-34" name="rest_code_968c47b7cbd149d584a89da29fe4c56e-34" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_968c47b7cbd149d584a89da29fe4c56e-34"></a> <span class="k">return</span> <span class="n">VerifyFailed</span>
</pre></div>
<p>To sum up, we have a function that extracts an email address from a token,
checking the HMAC signature that it is bundled with. There are 3 possibilities
we want to deal with:</p>
<ol class="arabic simple">
<li><p>The happy case – we’ve got a valid HMAC code, we just need the email address
returned.</p></li>
<li><p>We’ve got an invalid signature.</p></li>
<li><p>We’ve got a valid but expired signature. We want to handle this separately,
because we’d like to streamline the user experience for getting a new token
generated and sent to them, which means we need to return the email address.</p></li>
</ol>
<p>It’s using <a class="reference external" href="https://docs.djangoproject.com/en/stable/topics/signing/">Django’s signer functions</a> to do the heavy
lifting, but that doesn’t matter for our purposes, because we are wrapping it
up.</p>
<p>To get going on designing our API for this bit of code, here are some bad
options:</p>
<ol class="arabic">
<li><p>We could have a pair of methods or functions: <code class="docutils literal">extract_email_from_token</code>
and <code class="docutils literal">check_signature</code>, which can be used independently. This is bad because
you could easily use <code class="docutils literal">extract_email_from_token</code> and completely forget to
use <code class="docutils literal">check_signature</code>.</p>
<p>The principle here is that we want the developer using this API to fall into
<a class="reference external" href="https://blog.codinghorror.com/falling-into-the-pit-of-success/">the pit of success</a>. Either
the developer should get their code perfectly correct, or if they don’t, it
either will be obviously broken and not work at all, or at least not subtly
flawed with some nasty bug, like a security issue.</p>
</li>
<li><p>We could have <code class="docutils literal">email_from_token()</code> method or function with a return value
of a tuple containing <code class="docutils literal">(email_address: str, valid_and_not_expired_signature:
bool)</code>.</p>
<p>This has a similar issue to above – the calling code could use
<code class="docutils literal">email_address</code> and forget to check the validity boolean.</p>
</li>
</ol>
<p>Having ruled those out, we’ve got two main contenders for how to design
<code class="docutils literal">email_from_token()</code>:</p>
<ol class="arabic simple">
<li><p>We could make it raise exceptions for the “invalid” or “expired” cases. We need
to pass extra data for the latter, but we can put it inside the exception
object – as noted by the original questioner.</p></li>
<li><p>We could make it return error objects for the error cases, as coded above.</p></li>
</ol>
<p><strong>Both</strong> of these satisfy the “pit of success” criterion. If the developer
accidentally does not handle the error cases, they won’t have a bug where we
verified an email address that should not be verified. We will instead probably
have a crasher of some kind, which in the case of a web app, like this one,
means a 500 error page being seen, and something in our logs that makes it
pretty clear what happened.</p>
<p>If we choose to raise exceptions, naive code which doesn’t check for the
exceptions will simply get no further – the exception will propagate up and
terminate the handler. With the second option where we return error objects,
those objects can’t be accidentally converted into success values – the
<code class="docutils literal">VerifyExpired</code> object <strong>contains</strong> the email address, but it is a completely
different shape of value from the happy case.</p>
<p>Both of these approaches, to some degree, respect the principle that can be
summed up as <a class="reference external" href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">Parse Don’t Validate</a>. Instead
of merely validating a token and extracting an email address as two independent
things, we are parsing a token, and encoding the result of the validation in the
type of objects that will then flow through the program.</p>
<p>But which is better?</p>
<p>One of the influences on my thinking is the way types work in Haskell and other
similar language which make it very easy to create types and constructors. In
Haskell, the following is <strong>all</strong> the code you need to define a return type for
this kind of function, and the 3 different data constructors you need, which
then do double duty for <a class="reference external" href="https://en.m.wikibooks.org/wiki/Haskell/Pattern_matching">pattern matching</a>:</p>
<div class="code"><pre class="code haskell"><a id="rest_code_6404aa9a9eee4ea59e465608e3bf3963-1" name="rest_code_6404aa9a9eee4ea59e465608e3bf3963-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_6404aa9a9eee4ea59e465608e3bf3963-1"></a><span class="kr">data</span><span class="w"> </span><span class="kt">EmailVerificationResult</span><span class="w"> </span><span class="ow">=</span><span class="w"> </span><span class="kt">EmailVerified</span><span class="w"> </span><span class="n">string</span><span class="w"></span>
<a id="rest_code_6404aa9a9eee4ea59e465608e3bf3963-2" name="rest_code_6404aa9a9eee4ea59e465608e3bf3963-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_6404aa9a9eee4ea59e465608e3bf3963-2"></a><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kt">VerifyFailed</span><span class="w"></span>
<a id="rest_code_6404aa9a9eee4ea59e465608e3bf3963-3" name="rest_code_6404aa9a9eee4ea59e465608e3bf3963-3" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_6404aa9a9eee4ea59e465608e3bf3963-3"></a><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="kt">VerifyExpired</span><span class="w"> </span><span class="n">string</span><span class="w"></span>
</pre></div>
<p>Now, Python is not nearly as succinct, but <a class="reference external" href="https://docs.python.org/3/library/dataclasses.html">dataclasses</a> were a big improvement
for defining things like <code class="docutils literal">VerifyExpired</code>.</p>
<p>In Haskell, due to static type checking, this pattern makes it pretty much
impossible for the calling code to accidentally fail to handle the return value
correctly. But even in Python, which doesn’t have that built in, I think there
are some compelling advantages:</p>
<ol class="arabic">
<li><p>We expect the calling code to handle all the different return values at some
point, and <strong>at the same point</strong>. (This is unlike some code where we can
raise an exception that we never expect the calling code to specifically
handle – it will be handled by more generic methods at a different layer). It
therefore makes sense that we treat all 3 values as the same kind of thing —
they are just different return values.</p></li>
<li><p>If you instead raise exceptions, you are immediately forcing the calling
code into a special control flow structure, namely the <code class="docutils literal">try/except</code> dance,
which can be inconvenient.</p></li>
<li><p>In particular, if you want to hand off processing of the value to some other
function or code for handling, you can’t do it easily. For example, code like
this would be fine with the “return error object” method, but significantly
complicated by the “raise exception” method:</p>
<div class="code"><pre class="code python"><a id="rest_code_01bee2e5325d49249bd3b125bab79f8f-1" name="rest_code_01bee2e5325d49249bd3b125bab79f8f-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_01bee2e5325d49249bd3b125bab79f8f-1"></a><span class="n">verify_result</span> <span class="o">=</span> <span class="n">verifier</span><span class="o">.</span><span class="n">email_from_token</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<a id="rest_code_01bee2e5325d49249bd3b125bab79f8f-2" name="rest_code_01bee2e5325d49249bd3b125bab79f8f-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_01bee2e5325d49249bd3b125bab79f8f-2"></a><span class="n">log_verify_result</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">ip_address</span><span class="p">,</span> <span class="n">verify_result</span><span class="p">)</span>
<a id="rest_code_01bee2e5325d49249bd3b125bab79f8f-3" name="rest_code_01bee2e5325d49249bd3b125bab79f8f-3" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_01bee2e5325d49249bd3b125bab79f8f-3"></a><span class="c1"># etc.</span>
</pre></div>
</li>
</ol>
<p>In the years since I wrote the code, however, some perhaps more compelling
arguments have come along for the error object method.</p>
<p>First, with some small changes (specifically, removing the sentinel singleton
value), we can now add a type signature for <code class="docutils literal">email_from_token</code>:</p>
<div class="code"><pre class="code python"><a id="rest_code_86cd4bf44b394bbf951809b34a582048-1" name="rest_code_86cd4bf44b394bbf951809b34a582048-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_86cd4bf44b394bbf951809b34a582048-1"></a><span class="k">def</span> <span class="nf">email_from_token</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">,</span> <span class="n">max_age</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span> <span class="o">|</span> <span class="n">VerifyFailed</span> <span class="o">|</span> <span class="n">VerifyExpired</span><span class="p">:</span>
<a id="rest_code_86cd4bf44b394bbf951809b34a582048-2" name="rest_code_86cd4bf44b394bbf951809b34a582048-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_86cd4bf44b394bbf951809b34a582048-2"></a> <span class="o">...</span>
</pre></div>
<p>(You may need <a class="reference external" href="https://docs.python.org/3/library/typing.html#typing.Union">typing.Union</a> for older Python
versions)</p>
<p>This is a benefit in itself from a documentation point of view, and for better
IDE/editor help.</p>
<p>We can go further with mypy. We can structure our calling code as follows to make
use of <a class="reference external" href="https://hakibenita.com/python-mypy-exhaustive-checking">mypy exhaustiveness checking</a>:</p>
<div class="code"><pre class="code python"><a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-1" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-1"></a><span class="kn">from</span> <span class="nn">typing_extensions</span> <span class="kn">import</span> <span class="n">assert_never</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-2" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-2"></a>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-3" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-3" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-3"></a><span class="n">verified_email</span> <span class="o">=</span> <span class="n">EmailVerifyTokenGenerator</span><span class="p">()</span><span class="o">.</span><span class="n">email_from_token</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-4" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-4" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-4"></a><span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">verified_email</span><span class="p">,</span> <span class="n">VerifyFailed</span><span class="p">):</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-5" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-5" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-5"></a> <span class="o">...</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-6" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-6" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-6"></a><span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">verified_email</span><span class="p">,</span> <span class="n">VerifyExpired</span><span class="p">):</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-7" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-7" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-7"></a> <span class="o">...</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-8" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-8" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-8"></a><span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">verified_email</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-9" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-9" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-9"></a> <span class="o">...</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-10" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-10" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-10"></a><span class="k">else</span><span class="p">:</span>
<a id="rest_code_18f3770dba5141e5bd6cc156537a22e4-11" name="rest_code_18f3770dba5141e5bd6cc156537a22e4-11" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_18f3770dba5141e5bd6cc156537a22e4-11"></a> <span class="n">assert_never</span><span class="p">(</span><span class="n">verified_email</span><span class="p">)</span>
</pre></div>
<p>Now, if we remove one of these blocks, let’s say the <code class="docutils literal">VerifyExpired</code> one (or
if we added another option to <code class="docutils literal">email_from_token</code>), mypy will catch it for us:</p>
<div class="code"><pre class="code shell"><a id="rest_code_254f3e6c0cef4168b0211bac38c0f5eb-1" name="rest_code_254f3e6c0cef4168b0211bac38c0f5eb-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_254f3e6c0cef4168b0211bac38c0f5eb-1"></a>error: Argument <span class="m">1</span> to <span class="s2">"assert_never"</span> has incompatible <span class="nb">type</span> <span class="s2">"VerifyExpired"</span><span class="p">;</span> expected <span class="s2">"NoReturn"</span>
</pre></div>
<p>With the error object method, we could also write our handling code using
<a class="reference external" href="https://peps.python.org/pep-0636/">structural pattern matching</a>. The
equivalent code, including our mypy exhaustiveness check, now looks like this:</p>
<div class="code"><pre class="code python"><a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-1" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-1" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-1"></a><span class="n">verified_email</span> <span class="o">=</span> <span class="n">EmailVerifyTokenGenerator</span><span class="p">()</span><span class="o">.</span><span class="n">email_from_token</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-2" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-2" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-2"></a><span class="k">match</span> <span class="n">verified_email</span><span class="p">:</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-3" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-3" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-3"></a> <span class="k">case</span> <span class="n">VerifyFailed</span><span class="p">():</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-4" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-4" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-4"></a> <span class="o">...</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-5" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-5" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-5"></a> <span class="k">case</span> <span class="n">VerifyExpired</span><span class="p">(</span><span class="n">expired_token_email</span><span class="p">):</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-6" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-6" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-6"></a> <span class="o">...</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-7" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-7" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-7"></a> <span class="k">case</span> <span class="nb">str</span><span class="p">():</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-8" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-8" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-8"></a> <span class="o">...</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-9" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-9" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-9"></a> <span class="k">case</span> <span class="k">_</span><span class="p">:</span>
<a id="rest_code_8e037f35c6fa45efb26ad7553c38086e-10" name="rest_code_8e037f35c6fa45efb26ad7553c38086e-10" href="https://lukeplant.me.uk/blog/posts/raising-exceptions-or-returning-error-objects-in-python/#rest_code_8e037f35c6fa45efb26ad7553c38086e-10"></a> <span class="n">assert_never</span><span class="p">(</span><span class="n">verified_email</span><span class="p">)</span>
</pre></div>
<p>This has destructuring of the email address in <code class="docutils literal">VerifyExpired</code> built in – it
is bound to the name <code class="docutils literal">expired_token_email</code> in that branch.</p>
<p>Hopefully this gives a good justification for the approach I took with this
code. There are times when exceptions are better – generally when the things
mentioned above don’t apply, or the opposite applies – but I think error objects
also have their place, and sometimes are a much better solution.</p>
<section id="links">
<h2>Links</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://twitter.com/spookylukey/status/1533831216536997892">Discussion on Twitter</a></p></li>
</ul>
</section>