In a previous post, I did a case study on my attempts to add type hints to parsy. In this post, I’m continuing the series, but in a very different project.
A while back I forked an existing tool called astpath to create my own tool pyastgrep, fixing various bugs and usability issues. In the process I rewrote significant parts of the existing code, and added quite a lot of my own. This was a pretty good opportunity for me to attempt to use static typing throughout, since I was not limited by backwards compatibility in the design. Plus it’s a relatively very small amount of code, making it much easier than many of the larger projects I maintain, while still being big enough to be much more than a toy.
There are at least 5 different ways that type hints can be used in Python, but this post focuses on static type checking and interactive programming help. In particular, I wanted to get mypy to catch errors for me, and I was incorporating it in my CI/testing workflows.
This tool is a utility to allow you to grep Python code for specific syntax elements using XPath as a powerful query language. The main functions are:
intelligently work out which files to search (respecting .gitignore files etc.)
parse the Python files as AST and convert to XML
apply a user-supplied XPath expression to search for specific AST elements
print the results (with the complexity of handling different context strategies and colouring)
Here is an example showing everywhere in the pyastgrep code base that uses a variable whose name contains idx:
$ pyastgrep './/Name[contains(@id, "idx")]' src/pyastgrep/files.py:60:5: current_idx = 0 src/pyastgrep/files.py:64:9: linebreak_idx = python_file_bytes.find(b"\n", current_idx) src/pyastgrep/files.py:64:55: linebreak_idx = python_file_bytes.find(b"\n", current_idx) src/pyastgrep/files.py:65:12: if linebreak_idx < 0: src/pyastgrep/files.py:66:38: line = python_file_bytes[current_idx:] src/pyastgrep/files.py:68:38: line = python_file_bytes[current_idx:linebreak_idx] src/pyastgrep/files.py:68:50: line = python_file_bytes[current_idx:linebreak_idx] src/pyastgrep/files.py:72:12: if linebreak_idx < 0: src/pyastgrep/files.py:75:13: current_idx = linebreak_idx + 1 src/pyastgrep/files.py:75:27: current_idx = linebreak_idx + 1 src/pyastgrep/printer.py:244:9: start_line_idx = line_index - before_context src/pyastgrep/printer.py:245:9: end_line_idx = line_index + after_context src/pyastgrep/printer.py:246:9: stop_line_idx = end_line_idx + 1 src/pyastgrep/printer.py:246:25: stop_line_idx = end_line_idx + 1 src/pyastgrep/printer.py:248:19: if (path, end_line_idx) in self.printed_context_lines: src/pyastgrep/printer.py:252:19: if (path, start_line_idx - 1) not in self.printed_context_lines: src/pyastgrep/printer.py:253:57: header = self.formatter.format_header(path, start_line_idx) src/pyastgrep/printer.py:257:44: code = "\n".join(result.file_lines[start_line_idx:stop_line_idx]) src/pyastgrep/printer.py:257:59: code = "\n".join(result.file_lines[start_line_idx:stop_line_idx]) src/pyastgrep/printer.py:260:13: for idx in range(start_line_idx, stop_line_idx): src/pyastgrep/printer.py:260:26: for idx in range(start_line_idx, stop_line_idx): src/pyastgrep/printer.py:260:42: for idx in range(start_line_idx, stop_line_idx): src/pyastgrep/printer.py:261:51: self.printed_context_lines.add((path, idx))
There were some things I really liked about the flow of using a type checker, and I was able to lean on the checker pretty heavily for some things.
One example of using type-driven programming was going between the layer of my code that found matches and the layer that printed them. I found that the search function needed to go from returning a simple
Iterable[Match] eventually all the way to
Iterable[Match | MissingPath | ReadError | NonElementReturned | FileFinished]. In each case, I could do something like:
add the new return value, something like
yield MissingPath(...), in the body of the function.
fix up the function type signature, in response to the type error that mypy would now report.
then respond to the type error that mypy would report in the places that consumed this iterable, usually by implementing handling of the new result type.
isinstancechecks can be used to drive type narrowing and satisfy mypy that everything is fine.
It was nice when this went through multiple layers, knowing that something was checking it for me and driving me to the next bit of code that needed fixing. Having the types there explicitly also helps to make you conscious of decisions you are making about how the layers are working, which was helpful in keeping the layers straight, so that I can use this code both as a library and a command line tool.
Being able to convert various exceptions, errors and corner cases to sum types in this way was also great, and I leaned on this heavily — probably more than if I hadn’t been using a type checker. I quite like these kind of ad-hoc sum types – in some ways they often work nicer than sum types in Haskell etc.
Use of mypy has encouraged me to use lots of small custom classes, because I can use “Find references” to search the code base for everything related to a particular type, or a particular attribute or method. For custom classes I create and use within the code base, this works perfectly, which is really nice. The same goes for renaming things using IDE tools (I’m using Emacs and the pyright LSP server).
I have a high degree of confidence that I’ll be able to come back to this code base after years without touching and be able to navigate it and make changes very easily.
However, I have a long list of complaints about issues I found too!
You have to turn on at least:
Otherwise you should not be expecting mypy to actually find typing errors. In general, it can be frustrating not knowing whether the lack of red is because you haven’t got errors, or because mypy just isn’t checking code, or can’t meaningfully check anything.
pyastgrep, in common with many similar tools, allows you to process standard input as well as grepping files named on the command line or found by directory walking. So I had to have branches for that, and I had tests for it too.
I hit a bug regarding encodings, where files not encoded in UTF-8 were causing the tool to crash. I ended up doing an internal change that switched a bunch of types from
bytes. The code worked, the tests passed, and mypy reported no errors. But later – thankfully before a release – I noticed that I had broken stdin handling.
It turned out that according to mypy,
Any, and not the actual type
str. I had been using
IOBase as a type for some of my arguments, which meant that mypy didn’t pick up the problem – if
Any appears anywhere, it’s like throwing a “silently disable everything this touches” bomb into the type checker.
Now, I had been alerted to the problem earlier – mypy thinks that
sys.stdin is of type
typing.TextIO is not something you test for at run-time, so it interacts badly with the very useful
isinstance type guards and type narrowing. So I had ended up using
IOBase as that seemed less problematic.
In other words, I had added type hints, and had I added correct type hints. But they weren’t correct enough, and therefore turned out to be “wrong”. It was very disappointing that despite the effort I had put in, this kind of type error still got through.
Fixing the bug involved writing a better, more accurate test that more closely emulated actual stdin handling, and then a very simple change. Fixing my type hints, however, was a much bigger task.
It involved a long journey of understanding regarding type guards and stricter type guards, because non-strict type guards (which is what we have at the moment) turned out not to work how I thought. The eventual refactoring now uses the
typing.BinaryIO type hint, and some code that seems somewhat fragile in terms of type checking – because there is no way of doing a “negative type guard” for
BinaryIO, I have to order my if/elif/else clauses in exactly the right way.
It’s also closing the barn door after the horse has bolted – I had already fixed the bug, and added much more thorough tests to prove it was fixed. I was hoping that static type checking would have caught this before I had to do that.
I saw a bug I hadn’t fixed, and one that again I thought mypy might have caught.
It turns out I had added
ignore_missing_imports = true early in development to reduce the errors to a manageable set. This silenced errors relating to lxml and effectively gave me a bunch of
Any types floating around instead of something useful.
Again, this was “my fault”, but I feel it’s fairly typical of what will happen in the real world. Switching to
ignore_missing_imports = false can cause so many problems that it will be hard to justify the cost in many cases. In this case, the type stubs it wants me to install require me to “fix” a whole load of static type checking issues relating to lxml that don’t correspond to real run-time issues.
I switched a bunch of code paths from
Path at one point. mypy gave me some help, but less than I wanted. For example, this reports no error:
A comparison between a
str and a
Path always returns
False, so it’s not a useful thing to do, and therefore a developer error. I meant to do the comparison to
"-" before I converted the input
str objects to internal
Path objects. It’s conceptually a
TypeError, but not actually one. Thankfully I had tests that failed.
Several times I had to blow away
.mypy_cache to get errors to appear. This is not a fundamental problem with the idea of static typing, but it makes very big difference to the whole flow of leaning on mypy. I often noticed only when I knew that mypy should be reporting errors due to a change I just made, but it wasn’t – I have no idea how many other times it was happening. When interpreting “mypy reports no errors”, there are now about half a dozen reasons why that might be the case, only one of which is “you fixed everything and your code is correct”.
Are types provided by 3rd party libs or typeshed reliable?
No, they are not. For example, I discovered this one in typeshed/stdlib/_ast.pyi among many others:
This is probably not typeshed’s “fault” — it’s a problem trying to retro-fit static types to a language and stdlib that wasn’t designed for them.
As soon as you want to use duck typing, which I did want to, you’ve got more work ahead of you, work that isn’t really to do with solving your actual problem. There are solutions such as Protocol, but I’m simply noting you do have significant amounts of extra work for the type checker to understand idiomatic Python.
I fairly often got that sense of “it type checks, and everything works first time I run it, cool!”
Sometimes, it was an illusion though – take this code:
I had typed
colors instead of the correct
color in the second branch, but I got no squiggly red lines. This was because of a lurking
Any – the argparse
args object is actually an
Any. This tripped me up, because I didn’t have any tests for that line of code.
strict = true in your mypy config doesn’t fix this. I think I’d need a way to say “warn me for about anywhere that
Any leaks into my code base”, but even if it existed I imagine I would not like it.
mypy fails to find the obvious issue in this bit of code:
I hoped I’d at least get a warning for a potential unbound variable. This comes up with cases where you want exhaustiveness checking, like:
You can get this right using an
else branch with assert_never, but it’s annoying that you have to remember to do this.
Type hints for decorators are … bad. If you want parameterised decorators, or other people’s decorators that don’t have types
[Apologies for the unfinished sentence above. I don’t want to risk a repeated head-against-table moment that the first attempt triggered, once was painful enough]
More recently I’ve tried pyright as an alternative to mypy. Generally I’ve found it to be significantly less buggy. However, mypy has a lot going for it in terms of features and extensions, and I don’t really want to have two different type checkers. At the moment I’m experimenting with mainly using pyright for interactive checks in my editor, and using mypy for pre-commit/CI checks.
The overlapping feature sets can be kind of annoying though. For example, for the potential unbound variable error above, the latest version of pyright does warn you. It also has built-in exhaustiveness checking without needing the
assert_never technique. However, in one case it wasn’t working for me, until I finally tracked down the issue — mypy was able to handle this code and correctly deduce the base class of my enum, but pyright wasn’t:
I eventually found an adequate solution that keeps them both happy — but only because I’m writing this blog post and don’t want to look stupid. Normally it would be “stuff is broken, maybe it’s me, maybe it’s them, gotta move on”.
In addition, in some places pyright does not support the
assert_never technique that mypy needs, and reports an error. There are other pain points if you try to use both.
There are quite a few places where you find pyright doesn’t do the same thing as mypy because pyright is more correct. Microsoft people tend to know what they are talking about when it comes to type systems. But it means you may find yourself digging through large numbers of issues closed with the “as designed” tag to find answers.
Overall, despite listing more bad things than good, I’m actually happy with the addition of mypy as a required static type checker in this project.
The disappointments I’ve listed may come from my experience and enjoyment of languages like Haskell where you really can lean on the type checker. In those languages, you find both that the rewards of static type checking are massively higher, and that the effort required to use them is massively lower. Haskell type signatures, for instance, are often not needed, and much easier to write and understand than Python’s.
Perhaps the most positive outlook is “static type checking in Python is just an advanced linter, of course it’s not actually reliable”. This can be hard to accept, though, due to the amount of work you have to do to get any real benefit above and beyond linters like flake8 and ruff that, with virtually no changes to your code or workflow, catch a lot of issues with a very low false positive rate.
In terms of tips and advice:
You need to turn up error reporting and spend considerable effort configuring mypy, especially in larger projects.
If you want something approaching reliability, your entire stack of libraries needs to have been designed with static types from the beginning, so you don’t have to use stubs. This means:
probably not much in stdlib. You’re going to need to wrap everything.
probably not much that was created more than 5 years ago.
Maybe this works well for MegaCorps with an army of developers and a very large code base that they have to get under control somehow. I think for many projects, you are going to be happy with static type checking in Python only if you can resign yourself to a very low level of reliability, and are mostly leaning on other techniques for correctness, like an extensive test suite.