Python Type Hints: pyastgrep case study

by Luke Plant

Posted in:

— October 5, 2023 09:45

In a previous post, I did a case study on my attempts to add type hints to parsy. In this post, I’m continuing the series, but in a very different project.

A while back I forked an existing tool called astpath to create my own tool pyastgrep, fixing various bugs and usability issues. In the process I rewrote significant parts of the existing code, and added quite a lot of my own. This was a pretty good opportunity for me to attempt to use static typing throughout, since I was not limited by backwards compatibility in the design. Plus it’s a relatively very small amount of code, making it much easier than many of the larger projects I maintain, while still being big enough to be much more than a toy.

There are at least 5 different ways that type hints can be used in Python, but this post focuses on static type checking and interactive programming help. In particular, I wanted to get mypy to catch errors for me, and I was incorporating it in my CI/testing workflows.

About pyastgrep

This tool is a utility to allow you to grep Python code for specific syntax elements using XPath as a powerful query language. The main functions are:

intelligently work out which files to search (respecting .gitignore files etc.)
parse the Python files as AST and convert to XML
apply a user-supplied XPath expression to search for specific AST elements
print the results (with the complexity of handling different context strategies and colouring)

Here is an example showing pyastgrep searching its own code base for all usages of names (variables etc) that contain the substring idx:

$ pyastgrep './/Name[contains(@id, "idx")]'
src/pyastgrep/files.py:60:5:    current_idx = 0
src/pyastgrep/files.py:64:9:        linebreak_idx = python_file_bytes.find(b"\n", current_idx)
src/pyastgrep/files.py:64:55:        linebreak_idx = python_file_bytes.find(b"\n", current_idx)
src/pyastgrep/files.py:65:12:        if linebreak_idx < 0:
src/pyastgrep/files.py:66:38:            line = python_file_bytes[current_idx:]
src/pyastgrep/files.py:68:38:            line = python_file_bytes[current_idx:linebreak_idx]
src/pyastgrep/files.py:68:50:            line = python_file_bytes[current_idx:linebreak_idx]
src/pyastgrep/files.py:72:12:        if linebreak_idx < 0:
src/pyastgrep/files.py:75:13:            current_idx = linebreak_idx + 1
src/pyastgrep/files.py:75:27:            current_idx = linebreak_idx + 1
src/pyastgrep/printer.py:244:9:        start_line_idx = line_index - before_context
src/pyastgrep/printer.py:245:9:        end_line_idx = line_index + after_context
src/pyastgrep/printer.py:246:9:        stop_line_idx = end_line_idx + 1
src/pyastgrep/printer.py:246:25:        stop_line_idx = end_line_idx + 1
src/pyastgrep/printer.py:248:19:        if (path, end_line_idx) in self.printed_context_lines:
src/pyastgrep/printer.py:252:19:        if (path, start_line_idx - 1) not in self.printed_context_lines:
src/pyastgrep/printer.py:253:57:            header = self.formatter.format_header(path, start_line_idx)
src/pyastgrep/printer.py:257:44:        code = "\n".join(result.file_lines[start_line_idx:stop_line_idx])
src/pyastgrep/printer.py:257:59:        code = "\n".join(result.file_lines[start_line_idx:stop_line_idx])
src/pyastgrep/printer.py:260:13:        for idx in range(start_line_idx, stop_line_idx):
src/pyastgrep/printer.py:260:26:        for idx in range(start_line_idx, stop_line_idx):
src/pyastgrep/printer.py:260:42:        for idx in range(start_line_idx, stop_line_idx):
src/pyastgrep/printer.py:261:51:            self.printed_context_lines.add((path, idx))

Things I liked

There were some things I really liked about the flow of using a type checker, and I was able to lean on the checker pretty heavily for some things.

One example of using type-driven programming was going between the layer of my code that found matches and the layer that printed them. I found that the search function needed to go from returning a simple Iterable[Match] eventually all the way to Iterable[Match | MissingPath | ReadError | NonElementReturned | FileFinished]. In each case, I could do something like:

add the new return value, something like yield MissingPath(...), in the body of the function.
fix up the function type signature, in response to the type error that mypy would now report.
then respond to the type error that mypy would report in the places that consumed this iterable, usually by implementing handling of the new result type. isinstance checks can be used to drive type narrowing and satisfy mypy that everything is fine.

It was nice when this went through multiple layers, knowing that something was checking it for me and driving me to the next bit of code that needed fixing. Having the types there explicitly also helps to make you conscious of decisions you are making about how the layers are working, which was helpful in keeping the layers straight, so that I can use this code both as a library and a command line tool.

Being able to convert various exceptions, errors and corner cases to sum types in this way was also great, and I leaned on this heavily — probably more than if I hadn’t been using a type checker. I quite like these kind of ad-hoc sum types – in some ways they often work nicer than sum types in Haskell etc.

Use of mypy has encouraged me to use lots of small custom classes, because I can use “Find references” to search the code base for everything related to a particular type, or a particular attribute or method. For custom classes I create and use within the code base, this works perfectly, which is really nice. The same goes for renaming things using IDE tools (I’m using Emacs and the pyright LSP server).

I have a high degree of confidence that I’ll be able to come back to this code base after years without touching and be able to navigate it and make changes very easily.

Issues

However, I have a long list of complaints about issues I found too!

mypy just isn’t checking that code.

You have to turn on at least:

check_untyped_defs = true
disallow_untyped_calls = true
disallow_untyped_defs = true
disallow_incomplete_defs = true

Otherwise you should not be expecting mypy to actually find typing errors. In general, it can be frustrating not knowing whether the lack of red is because you haven’t got errors, or because mypy just isn’t checking code, or can’t meaningfully check anything.

IOBase vs BinaryIO

pyastgrep, in common with many similar tools, allows you to process standard input as well as grepping files named on the command line or found by directory walking. So I had to have branches for that, and I had tests for it too.

I hit a bug regarding encodings, where files not encoded in UTF-8 were causing the tool to crash. I ended up doing an internal change that switched a bunch of types from str to bytes. The code worked, the tests passed, and mypy reported no errors. But later – thankfully before a release – I noticed that I had broken stdin handling.

It turned out that according to mypy, IOBase.read() returns Any, and not the actual type bytes or str. I had been using IOBase as a type for some of my arguments, which meant that mypy didn’t pick up the problem – if Any appears anywhere, it’s like throwing a “silently disable everything this touches” bomb into the type checker.

Now, I had been alerted to the problem earlier – mypy thinks that sys.stdin is of type typing.TextIO, not IOBase. However, typing.TextIO is not something you test for at run-time, so it interacts badly with the very useful isinstance type guards and type narrowing. So I had ended up using IOBase as that seemed less problematic.

In other words, I had added type hints, and I had added correct type hints. But they weren’t correct enough, and therefore turned out to be “wrong”. It was very disappointing that despite the effort I had put in, this kind of type error still got through.

Fixing the bug involved writing a better, more accurate test that more closely emulated actual stdin handling, and then a very simple change. Fixing my type hints, however, was a much bigger task.

It involved a long journey of understanding regarding type guards and stricter type guards, because non-strict type guards (which is what we have at the moment) turned out not to work how I thought. The eventual refactoring now uses the typing.BinaryIO type hint, and some code that seems somewhat fragile in terms of type checking – because there is no way of doing a “negative type guard” for BinaryIO, I have to order my if/elif/else clauses in exactly the right way.

It’s also closing the barn door after the horse has bolted – I had already fixed the bug, and added much more thorough tests to prove it was fixed. I was hoping that static type checking would have caught this before I had to do that.

Missing types for imports

I saw a bug I hadn’t fixed, and one that again I thought mypy might have caught.

It turns out I had added ignore_missing_imports = true early in development to reduce the errors to a manageable set. This silenced errors relating to lxml and effectively gave me a bunch of Any types floating around instead of something useful.

Again, this was “my fault”, but I feel it’s fairly typical of what will happen in the real world. Switching to ignore_missing_imports = false can cause so many problems that it will be hard to justify the cost in many cases. In this case, the type stubs it wants me to install require me to “fix” a whole load of static type checking issues relating to lxml that don’t correspond to real run-time issues.

Equality checks

I switched a bunch of code paths from str to Path at one point. mypy gave me some help, but less than I wanted. For example, this reports no error:

path: Path = Path(...)

if path == "-":
    ...

A comparison between a str and a Path always returns False, so it’s not a useful thing to do, and therefore a developer error. I meant to do the comparison to "-" before I converted the input str objects to internal Path objects. It’s conceptually a TypeError, but not actually one. Thankfully I had tests that failed.

mypy caching bugs

Several times I had to blow away .mypy_cache to get errors to appear. This is not a fundamental problem with the idea of static typing, but it makes very big difference to the whole flow of leaning on mypy. I often noticed only when I knew that mypy should be reporting errors due to a change I just made, but it wasn’t – I have no idea how many other times it was happening. When interpreting “mypy reports no errors”, there are now about half a dozen reasons why that might be the case, only one of which is “you fixed everything and your code is correct”.

Third party types

Are types provided by third party libs or typeshed reliable?

No, they are not. For example, I discovered this one in typeshed/stdlib/_ast.pyi among many others:

class AST:
    ...
    # TODO: Not all nodes have all of the following attributes

This is probably not typeshed’s “fault” — it’s a problem trying to retro-fit static types to a language and stdlib that wasn’t designed for them.

Duck typing

As soon as you want to use duck typing, which I did want to, you’ve got more work ahead of you, work that isn’t really to do with solving your actual problem. There are solutions such as Protocol, but I’m simply noting you do have significant amounts of extra work for the type checker to understand idiomatic Python.

False security

I fairly often got that sense of “it type checks, and everything works first time I run it, cool!”

Sometimes, it was an illusion though – take this code:

if args.color == UseColor.AUTO:
    colorer = make_default_colorer()
elif args.colors == UseColor.NEVER:
    colorer = NullColorer()

I had typed colors instead of the correct color in the second branch, but I got no squiggly red lines. This was because of a lurking Any – the argparse args object is actually an Any. This tripped me up, because I didn’t have any tests for that line of code.

Having strict = true in your mypy config doesn’t fix this. I think I’d need a way to say “warn me for about anywhere that Any leaks into my code base”, but even if it existed I imagine I would not like it.

Exhaustiveness checking

mypy fails to find the obvious issue in this bit of code:

def foo() -> None:
    if 1 + 1 == 3:
        x = "hello"
    print(x)

I hoped I’d at least get a warning for a potential unbound variable. This comes up with cases where you want exhaustiveness checking, like:

def print_greeting(username: str, type: Greeting) -> None:
    if type == Greeting.HELLO:
        greeting = "hello"
    elif type == Greeting.GOODBYE:
        greeting = "goodbye"

    print(f"{greeting} {username}")

You can get this right using an else branch with assert_never, but it’s annoying that you have to remember to do this.

[Update 2024: while mypy doesn’t see the potential unbound variable error, even with --strict, I’ve found pyright does spot it, and these days I’m using pyright more and more]

Decorators

Type hints for decorators are … bad. If you want parameterised decorators, or other people’s decorators that don’t have types

[Apologies for the unfinished sentence above. I don’t want to risk a repeated head-against-table moment that the first attempt triggered, once was painful enough]

pyright

More recently I’ve tried pyright as an alternative to mypy. Generally I’ve found it to be significantly less buggy. However, mypy has a lot going for it in terms of features and extensions, and I don’t really want to have two different type checkers. At the moment I’m experimenting with mainly using pyright for interactive checks in my editor, and using mypy for pre-commit/CI checks.

The overlapping feature sets can be kind of annoying though. For example, for the potential unbound variable error above, the latest version of pyright does warn you. It also has built-in exhaustiveness checking without needing the assert_never technique. However, in one case it wasn’t working for me, until I finally tracked down the issue — mypy was able to handle this code and correctly deduce the base class of my enum, but pyright wasn’t:

try:
    from enum import StrEnum
except ImportError:
    from backports.strenum import StrEnum  # type: ignore [no-redef]

I eventually found an adequate solution that keeps them both happy — but only because I’m writing this blog post and don’t want to look stupid. Normally it would be “stuff is broken, maybe it’s me, maybe it’s them, gotta move on”.

In addition, in some places pyright does not support the assert_never technique that mypy needs, and reports an error. There are other pain points if you try to use both.

There are quite a few places where you find pyright doesn’t do the same thing as mypy because pyright is more correct. Microsoft people tend to know what they are talking about when it comes to type systems. But it means you may find yourself digging through large numbers of issues closed with the “as designed” tag to find answers.

[Update 2024: where I have the choice, I usually use exclusively pyright these days. I use it both as a standalone tool from CLI and in CI etc, and with the lsp-pyright LSP server in Emacs. It now has enough support for assert_never – it sometimes reports a warning for unreachable code, but that’s fine as it’s not an error. pyright seems to understand my Python code a lot better, especially when it comes to type narrowing. When I run mypy over the same bit of code, it reports a large number of spurious errors where pyright is correctly silent, and I don’t think there are many cases where mypy spots errors that pyright fails to see]

Summary

Overall, despite listing more bad things than good, I’m actually happy with the addition of mypy as a required static type checker in this project.

The disappointments I’ve listed may come from my experience and enjoyment of languages like Haskell where you really can lean on the type checker. In those languages, you find both that the rewards of static type checking are massively higher, and that the effort required to use them is massively lower. Haskell type signatures, for instance, are often not needed, and much easier to write and understand than Python’s.

Perhaps the most positive outlook is “static type checking in Python is just an advanced linter, of course it’s not actually reliable”. This can be hard to accept, though, due to the amount of work you have to do to get any real benefit above and beyond linters like flake8 and ruff that, with virtually no changes to your code or workflow, catch a lot of issues with a very low false positive rate.

In terms of tips and advice:

You need to turn up error reporting and spend considerable effort configuring mypy, especially in larger projects.
If you want something approaching reliability, your entire stack of libraries needs to have been designed with static types from the beginning, so you don’t have to use stubs. This means:
- probably not much in stdlib. You’re going to need to wrap everything.
- probably not much that was created more than 5 years ago.

Maybe this works well for MegaCorps with an army of developers and a very large code base that they have to get under control somehow. I think for many projects, you are going to be happy with static type checking in Python only if you can resign yourself to a very low level of reliability, and are mostly leaning on other techniques for correctness, like an extensive test suite.