Python Type Hints: case study on parsy

Posted in:

I have been trying to like static type checking in Python. For most of my Django projects, I get annoyed and give up, so I’ve had a go with some smaller projects instead. This blog post documents how it went with Parsy, a parser combinator library I maintain.

Intro to Parsy

I need to explain a few things about Parsy.

In Parsy, you build up Parser objects via a set of primitives and combinators. Each Parser object has a parse method that accepts strings [1] and returns some object — which might be a string for your lowest building blocks, but quickly you build more complex parsers that return different types of objects. A lot of code is written in “fluent” style where you chain methods together.

Here are some basics:

The primitive string just matches and returns the input:

import parsy as P

hello = P.string("hello")
assert hello.parse("hello") == "hello"

But we can change the result to something other type of object:

true_parser = P.string("true").result(True)
assert true_parser.parse("true") == True

We can map the parse result using a callable. This time I’m starting with a regex primitive, which returns strings, but converting to ints:

int_parser = P.regex(r"-?\d+").map(int)
assert int_parser.parse("123") == 123

We can discard things we don’t care about in a number of ways, such as with these “pointy” operators that point to the important bit:

whitespace = P.regex(r"\s*")
assert (whitespace >> int_parser << whitespace).parse(" 123    ") == 123

We can have a sequence of items, here with some separator we don’t care about collecting:

 three_ints = P.seq(
   int_parser << P.string("-"),
   int_parser << P.string("-"),
   int_parser
)
assert three_ints.parse("123-45-67") == [123, 45, 67]

If we want something better than a list to store different components in (usually we do), we can use a keyword argument form of seq to give names for the components and collect them in a dict instead of a list, and instead of .map we can do .combine_dict to convert the result into some other object:

from dataclasses import dataclass

@dataclass
class Date:
    year: int
    month: int
    day: int

date_parser = P.seq(
  year=P.regex(r"\d{4}").map(int) << P.string("-"),
  month=P.regex(r"\d{2}").map(int) << P.string("-"),
  day=P.regex(r"\d{2}").map(int)
).combine_dict(Date)

assert date_parser.parse("2022-11-19") == Date(year=2022, month=11, day=19)

And we can have alternatives:

bool_parser = P.string("true").result(True) | P.string("false").result(False)
assert bool_parser.parse("false") == False

With that done, let’s have a look at my 4 different approaches to improving the static type checking story for Parsy.

Simple types

The most obvious thing to do is to add Parser as the return value for a bunch of methods and operators, and other type hints wherever we can for the input arguments. For example, .map is:

def map(self, map_function: Callable) -> Parser:
   ...

What type of object does the parse method return? We don’t know, so we have to do:

def parse(self, input: str) -> Any:
    ...

And this is our first hint that the static type checking isn’t very useful. For example, this faulty code will now type check:

x: str = int_parser.parse("123")

We can see the return value is not going to be the right type, but our static type checker sees the Any and allows it.

Our .map method also is not catching TypeError exceptions that it definitely ought to be able to. For example, suppose we have this faulty parser which is going to attempt to construct a timedelta from strings like "7 days ago":

from datetime import timedelta
days_ago_parser = (P.regex(r"\d+") << P.string(" days ago")).map(timedelta)

This is going to fail every time with:

TypeError: unsupported type for timedelta days component: str

That’s because we forgot a .map(int) after the P.regex parser.

This kind of type error is caught for you by mypy and pyright if you try to pass a string to the timedelta constructor, but here, we’ve got no constraints on the callable that would enable the type checker to pick it up. When you put a bare Callable in a signature, as we did above for the map method, you are really writing Callable[..., Any], so all proper type checking effectively gets disabled.

If you want static type checking, this is not what you expect or want! It’s especially important for Parsy, because almost all the mistakes you are likely to make will be of this nature. Most Parsy code consists of parsers defined at a module level, which means that as soon as you import the module, you’ll know whether you have attempted to use combinator methods that don’t exist, for example, so there is little usefulness in a type checker being able to tell you this. What you want to know is whether you are going to get TypeError or a similar type mistake when you call the parse method at runtime.

Can we achieve that?

Generics

The answer is to make the Parser type aware of what kind of object it is going to output. This can be achieved by parameterising the type of Parser with a generic type, and is the second main approach.

Very often generics are used for homogeneous containers, to capture the type of the object they contain. Here, we don’t have a container as such. We are instead capturing the type of the object that the parser instance is going to produce when you call .parse() (assuming it succeeds).

Some of the key type signatures in our Parser code now look like this (lots of details elided):

from typing import TypeVar, Generic

OUT = TypeVar("OUT")
OUT1 = TypeVar("OUT1")
OUT2 = TypeVar("OUT2")


@dataclass
class Result(Generic[OUT]):
    value: OUT
    ...


class Parser(Generic[OUT]):

    def parse(self, stream: str) -> OUT:
        ...

    def map(self: Parser[OUT1], map_function: Callable[[OUT1], OUT2]) -> Parser[OUT2]:
        ...

The main point is that we are capturing the type of output using a type parameter that our static type checker can track, which means that we can indeed catch the type errors that were ignored by the previous approach. I got this to work, and you can see my results in this PR (not merged).

The reason it isn’t merged is that this approach breaks down as soon as you have things like *args or **kwargs where the arguments need to be of different types. We have exactly that, multiple times, once you care about generics. For example, the seq combinator takes sequence of parsers as input, and runs them in order, collecting their results. All of them are Parser instances, so that would work fine with the previous approach, but they could all have different output types. There is no way to specify a type signature for this, as well as for alt, combine and combine_dict.

The best you can do is specify that they return Parser[Any]. This means you are downgrading to no type checking. This problem is going to apply to all but the most trivial cases — it’s difficult to come up with many real world examples where you don’t need sequencing.

Some people would say “well, it works sometimes, so it is better than nothing”. The problem is that you when you start writing your parser, you may well really benefit from the type checking and start to lean on it. Then, as soon as you get beyond the level of your simple (single part) objects and are creating parsers for more complex (multiple part) objects in your language, the type checking silently disappears. Or, if you have strictness turned up high, your type checker will complain about the introduction of Any, but you won’t be able to do anything about it.

Both of these are really bad developer UX in my opinion. If the type checker is going to give up and go home at 2pm on the second day of work, it would be better for it not to show up, which would push developers to lean on other, more reliable methods, like writing tests.

Typed Parsy fork

So we come to my third option, which builds on the second. Can we redesign Parsy so that it doesn’t have any Any? This would be a backwards incompatible fork that removes any API that is impossible to fully type, and attempts to provide some good enough replacements.

This was my most ambitious foray into static type checking in Python, and below are my notes from two perspectives — first implementation, which is important for any potential future contributors and maintainers, and secondly, usage, for the people who actually might want to use this fork.

Implementation perspective

I didn’t complete this for reasons that will become clear. Overall, I’d say working “alongside” mypy and pyright was quite nice at points, and other times really difficult. To keep this article short, I’ve moved most of this section to footnotes. Here are the bullet points:

  • You can see my results in the typed-parsy repo, especially the single source file.

  • I dropped support for anything but str as input type, as a simplification.

  • I discovered that pyright can really shine in various places that mypy is lacking, particularly error messages.

  • But sometimes they fight each other [2]

  • I couldn’t work out how Protocols work with respect to operators and dunder methods. [3]

  • Covariance is tricky, and you have to understand it. [4]

  • forward_declaration made my head explode. [5]

  • There are lots of places marked TODO where I just couldn’t solve the new problems I had, even after getting rid of the most problematic code, and I had to give up and do type: ignore quite a few times.

Overall

Too much of this was just too hard, especially given we are only talking about a few hundred lines of code. It seemed much worse than doing the same thing in Haskell for some reason. This might be just because the language wasn’t designed for it. Even just the syntax for types is significantly worse.

I think another issue is that there is no REPL for type level work. Normally when I’m trying to debug something, I jump into a REPL. Working with actual, concrete values is so much easier, and so much closer to the immediate connection that makes programming enjoyable.

An additional problem is that static type checkers have to worry about issues that may not be relevant to my code.

Finally, the type system we have right now for Python is so far behind what Python can actually express. But it isn’t necessarily obvious when this is the case. The answer to “why doesn’t this work” is anywhere between, “I made a dumb mistake”, “I need to learn more”, “there’s a bug in mypy” and “that’s impossible (at the moment)”.

As someone who needs to worry about future contributors and maintainers, these are serious issues. In addition to getting code to work, contributors would also have to get the type checks to pass, and ensure they weren’t breaking type checking for users, which is an extra burden.

Using it

So much for the pains of implementing typed-parsy, what would it look like for a user?

First, it happens that for a lot of typical usage, the user doesn’t need to worry about types or adding type hints at all, which is great.

Second, for the resulting code, mypy and pyright do a very good job of checking almost every type error you would normally make in your parsers. The few places where we lose type safety are limited and don’t result in Any escaping and trashing everything from then on.

However, if you do need to write type signatures, which you probably will if you have mypy settings turned up high and you want to make your own combinator functions (i.e. something that takes a Parser and returns a new Parser), which is fairly common, you’re going to need to understand a lot to create type hints that are both correct and useful.

In addition, to achieve all this, we had to make some big sacrifices:

Sequences

You can’t implement seq, alt, .combine or .combine_dict in a type safe way (without degrading everything to Parser[Any] from then on), and I had to remove them.

The biggest issue is seq, and especially the convenience of the keyword argument version to name things. The alternative I came up with — using & operator for creating a tuple of two results — does work, but turns out to be pretty ugly.

Below are some incomplete extracts from the SQL SELECT example, which illustrate the readability loss fairly well. We have some enum types and dataclasses to hold Abstract Syntax Tree nodes for a SQL parser:

class Operator(enum.Enum):
    EQ = "="
    LT = "<"
    GT = ">"
    LTE = "<="
    GTE = ">="

@dataclass
class Comparison:
    left: ColumnExpression
    operator: Operator
    right: ColumnExpression

# dataclass for Select etc

We then have a bunch of parsers for different components, which we assemble into larger parsers for bigger things, like Comparison or Select. With normal parsy it looks like this:

comparison = P.seq(
    left=column_expr << padding,
    operator=P.from_enum(Operator),
    right=padding >> column_expr,
).combine_dict(Comparison)

SELECT = P.string("SELECT")
FROM = P.string("FROM")
WHERE = P.string("WHERE")

select = P.seq(
    _select=SELECT + space,
    columns=column_expr.sep_by(padding + P.string(",") + padding, min=1),
    _from=space + FROM + space,
    table=table,
    where=(space >> WHERE >> space >> comparison).optional(),
    _end=padding + P.string(";"),
).combine_dict(Select)

There are some things you need to understand: seq runs a sequence of parsers in order, and with its keyword arguments version allows you to give names to each one, to produce a dictionary of results. .combine_dict then passes these to a callable using **kwargs syntax.

.combine_dict also has a neat trick of skipping items whose names start with underscores, to allow you to deal with things that you need to parse but want to discard, like _select, _from and _end above.

With typed-parsy, this was the best I could do, formatted using Black:

comparison = (
    (column_expr << padding) & P.from_enum(Operator) & (padding >> column_expr)
).map(lambda t: Comparison(left=t[0][0], operator=t[0][1], right=t[1]))

SELECT = string("SELECT")
FROM = string("FROM")
WHERE = string("WHERE")

select = (
    (
        (SELECT + space)
        >> (column_expr.sep_by(padding + string(",") + padding, min=1))
        << (space + FROM + space)
    )
    & table
    & ((space >> WHERE >> space >> comparison).optional() << (padding + string(";")))
).map(lambda t: Select(columns=t[0][0], table=t[0][1], where=t[1]))

This is a pretty massive regression. We can’t use commas to separate parts as before, and we can’t use keyword arguments to name components any more. Instead we have to use tuples, and when we end up with nested tuples it’s awful — I literally couldn’t work out how to write the tuple indexing correctly and had to just keep guessing until I got it right. And that’s just with 3 items, some parsers might have many more items in a sequence, each of which adds another level of nesting.

This might have been significantly better if we still had tuple unpacking within function/lambdas signatures (which was removed in Python 3), but still not very nice.

It is kind of impressive that mypy and pyright will handle all this and tell you about type violations very reliably. This is possible because they have support for indexing tuples i.e. in the above statements it can tell you what the types of t[0], t[0][1] etc are. In an IDE, tools like pyright will tell you what you need to supply for .map() — for example for the select statement, inside the final .map call:

(map_fn: (tuple[tuple[list[Field | String | Number], Table], Comparison | None]) -> OUT2@map) -> Parser[OUT2@map]",

But this isn’t my idea of developer friendly. The loss of readability is huge, even for simple cases.

For comparison, I looked at funcparserlib, the Python parsing library closest to Parsy. They claim “fully typed”, but it turns out that their sequencing operator, which returns only tuples and so isn’t as usable as seq, flattens nested tuples. This is much better for usability, but is also impossible to type, so they introduce Any at this point, and so lose type checking, the thing I’ve been trying to avoid.

UPDATE 2022-11-24: I had another idea about how to approach this. Parsy could provide seq2, seq3, seq4 etc. combinators which return 2-tuples, 3-tuples, 4-tuples etc. These functions, which would have to be implemented the long way, would handle the nested tuple unpacking for you, without losing type safety. As an overload, they could also optionally include the functionality of .combine without loss of safety, and in this was you would get close to the usability of the seq(*args) version, with just the annoyance that you have to change the function if you want to add another argument. This still wouldn’t give you the usability of the keyword argument version of seq, but it would probably be a significant improvement.

Error messages

If you are relying on types to fix you up, instead of readable code, then you depend on the error messages your type checker will emit. Below is an example of an error I got when attempting to port the example code in simple_logo_lexer.py, which has a function flatten_list:

Argument 1 to "map" of "Parser" has incompatible type
"Callable[[List[List[T]]], List[T]]"; expected
"Callable[[List[Tuple[Tuple[str, int], str]]], List[T]]"

Here was another one I hit:

Argument 1 to "map" of "Parser" has incompatible type
"Callable[[Tuple[List[List[Tuple[List[List[OUT]], List[OUT]]]],
List[Tuple[List[List[OUT]], List[OUT]]]]], List[List[Tuple[List[List[OUT]],
List[OUT]]]]]";
expected "Callable[[Tuple[List[List[Tuple[List[List[OUT]],
List[OUT]]]], List[Tuple[List[List[OUT]], List[OUT]]]]], List[OUT]]"

I can’t remember what mistake produced that, but I did notice that the code worked perfectly at runtime. Also, pyright didn’t complain, only mypy. This is the kind of thing that makes people hate static typing.

One of the main principles of parsy is that it should be very easy to pull data into appropriate containers where every field is named, as well as typed — rather than a parse that returns lists of nested lists and tuples and dicts. This is why namedtuple is a big improvement over tuple, and dataclasses are a big step up again. But at the level of types and error messages, it seems we are back in the dark ages, with nested tuples and lists everywhere.

@generate decorator

Being able to add conditional logic and control flow into parsers is really important for some cases, and Parsy has an elegant solution in the form of the @generate decorator. Getting this to work in typed-parsy turned out to be only partially possible.

The first issue is that, unlike other ways of building up parsers, the user will need to write a type signature to get type checking, and it’s complex enough that there is no reasonable way for someone to understand what type signature they need without looking up the docs, which is poor usability.

Having done so, they can get type checking on the return type of the parser, and code that uses that parser. However, they get no type checking related to parsers used within the function. The Generator type assumes a homogeneous stream of yield and send types, whereas we have pairs of yield/send types which need to match within the pair, but each pair can be completely different from the next in the stream.

Since you can’t sacrifice @generate without major loss of functionality/usability, you have to live with the fact that you do not have type safety in the body of a @generate function.

Overall

This did not feel like an upgrade for a user, but rather like a pretty big downgrade. typed-parsy was definitely going to be worse than parsy, so I stopped working on it.

Which brings me to my last approach:

Types for documentation

At some point along the way, I noticed that for the original version of parsy, with no type hints at all, my language server (pyright) was able to correctly infer return types of all the methods that returned Parser instances. The types it was inferring were the same as I would have added in the very first section, simple Parser objects, and that meant it could reliably give help with chained method calls, which is pretty nice.

The biggest problems were that the docstrings weren’t helpful (in most cases missing), and that for many parameters it wasn’t entirely obvious what type of object you should be passing in.

So, using a small amount of effort, we could improve usability a lot. We can add those dosctrings, and add type hints that are about the same level of types that pyright was inferring anyway, just a bit more complete.

The one thing I don’t want to do is imply that these types bring Parsy code up to the level of being “type checked”, but there is a simple way I can do that — by not including a py.typed marker file to my package.

So, I’m back at the beginning, but with a different aim. Now, it’s not about helping automated static type checkers — without a py.typed marker in the module they basically ignore the types — it’s about improving usability for developers as they write. I’ve done this work in the master branch now, and will hopefully release it soon.

There is another interesting advantage to this: because I’ve given up on static type checks and I’m not using static type checking for internal use in Parsy at all, I can be a bit looser with the type hints, allowing for greater readability. For example, I can annotate an optional string argument as arg: str = None, even though that’s not strictly compliant with PEP 484. In other places I can use slightly simplified type hints for the sake of having something that doesn’t make your eyes glaze over, even if it doesn’t show all the possibilities — such as saying that input types can be str | bytes | list, when technically I should be writing something much more abstract and complicated using Sequence.

Conclusion

In the end, type hints didn’t work out for use by a static type checker. But as clear and concise documentation for humans that pop up in a code editor, they worked well. Unless or until there are some big improvements in what it is possible to express with static types in Python, this seems to be the best solution for Parsy.

I learnt a lot about the limitations of typing in Python along the way, and hope you found this helpful too!



Footnotes

Comments §

Comments should load when you scroll to here...