lukeplant.me.uk

Pylint false positives

Posted in: Python

In some recent discussion on Reddit, I claimed that, for cases where I’m already using flake8, it seemed as though 95% of Pylint’s reported problems were false positives. Others had very different experiences, so I was intrigued enough to actually do some measurements.

TL;DR

If you’re in a rush, see the Running total section which summarises the Good/Neutral/Bad in what Pylint complained about, in my judgement.

The setup

I took part of the code from a side project where I don’t use Pylint, ran Pylint (with some very basic tuning) and tried to analyse the result in terms of helpful warnings compared to false positives etc.

To give some background about the project:

  • It’s a fairly typical Django app, which means it is a fairly typical Python app — Django has some of its own idiosyncrasies, and as a framework rather than just a library it imposes some additional constraints, but it tries to make your code normal Python, and has been getting better at that in recent years. Some of the “framework-y” problems it brings will apply to many other libraries that use any framework-like patterns (such as callbacks or the template method design pattern).
  • It has about 22,000 lines of Python, but I only ran the Pylint on just under half of this, 11,000, or about 9000 without spaces. That half did include more than its fair share of view code and test code.
  • I do use flake8 already on this project, and keep errors to zero. The point of this experiment was for me to assess the benefit of Pylint given that I’m already using flake8.
  • The project has decent test coverage, but as I’m the only developer it has never had the benefit of peer code review.

Hopefully this analysis will be useful to other people in assessing whether to add Pylint to their stack of code quality checks. I’m assuming that if you use something like Pylint, you will generally use it systematically as part of required code quality checks, keeping issues to zero. (Otherwise you will have an ever growing list of issues, won’t take it seriously and won’t spot new problems, defeating any usefulness it might have had.)

Overall, Pylint had 1650 complaints about the code I ran it against. Below is my breakdown of that number. More details for all of the error messages can be found in Pylint’s feature list.

Bugs

Pylint found 1 bug in my code. By ‘bug’, I’m including only things that have observably bad behaviour at runtime, at least potentially. In this case, it was an overly broad except clause — broad-except as Pylint calls it. To be specific it was except Exception, not a ‘bare except’ because flake8 already catches those. This would have caused wrong behaviour in the presence of certain kinds of runtime exceptions. If this bug was ever triggered at runtime (and I don’t have evidence that is has been) in this case the resulting bad behaviour would likely have been trivial to non-existent in terms of consequences, but it could have been worse.

TOTAL: 1

Helpful

In addition to this one bug, there were other issues that I’ve classified as ‘helpful’ — Pylint found genuine issues with my code, which, while they were not causing immediate runtime problems, could cause problems in terms of future bugs or maintenance issues.

7 of these were too-many-locals / too-many-branches / too-many-local-variables that together related to 3 parts of my code that were badly structured. A better structure is not immediately obvious, but I’m sure I could do better than what is there.

The remainder were:

  • unused-argument × 3 — one of which was an actual mistake compared to what I thought I’d done, where the code only did the right thing by accident, and 2 of which were (unused and unnecessary) keyword args that would have caused problems in the future if I’d tried to use them.
  • redefined-builtin × 2
  • dangerous-default-value × 2 — these weren’t bugs in my usage because I never used the default, but good to get them fixed.
  • stop-iteration-return × 1 — I learnt something I probably would not have found out otherwise here.
  • no-self-argument × 1

TOTAL: 16

Cosmetic

These are things I considered to be very minor issues that were not causing problems and were unlikely to do so, but on the other hand fixing them certainly wouldn’t hurt. Some of them are debatable stylistic things. Some of them I put in this category while similar errors were in other categories, because of differences in context which made a difference in my judgement. If I was using Pylint, I would fix these issues to keep it happy, but in most cases probably wouldn’t bother otherwise.

  • invalid-name × 192

    These were mostly one letter variables, in contexts where they were mostly harmless e.g.

    for f in glob.glob("some_directory/*.tmp"):
        os.unlink(f)
    

    Or

    for k in keys:
        ...
    

    Many were in test code.

  • len-as-condition × 20

  • useless-object-inheritance × 16 (Python 2 leftovers)

  • no-else-return × 11

  • no-else-raise × 1

  • bad-continuation × 6

  • redefined-builtin × 4

  • inconsistent-return-statements × 1

  • consider-using-set-comprehension × 1

  • chained-comparison × 1

TOTAL: 252

Unhelpful

These are ones where I regarded the suggestion Pylint was making as unhelpful — there were good reasons for the way it was written, even if a bit unusual, and, in my judgement, the code is better off the way it is — although I could see that other people might make a different judgement, or different choices in coding style and design could potentially have avoided the issues.

  • too-many-ancestors × 76

    These were all in test code, where I’m using a bunch of mixins to provide utilities or mocking of certain things.

  • unused-variable × 43

    This were almost all in test code, where I was destructuring a tuple on assignment:

    foo, bar = get_some_stuff()
    

    and then not using one of them. There are easy ways to silence Pylint in these cases (e.g. names like unused), but the names as they were enhanced readability, and a future maintainer (including myself) who might need the value would probably do better if I left the code as it is.

  • invalid-name × 26

    These were cases where I had chosen decent names in context, but ones that happened to fall foul of Pylint’s naming standards e.g. db (which is a well accepted abbreviation for database), and some other names that ended up being non-standard but more understandable the way they were, in my opinion. If you are a stickler for consistency then you might have disagreed here.

  • redefined-outer-name × 16

    Sometimes the right name is the right name, in both inner and outer contexts, and you would never need to use the outer name from the inner context.

  • too-few-public-methods × 14

    Examples included data classes of the kind you create with attrs, which might have zero public methods, or a class that was implementing a dictionary interface but actually only needed to provide a single __getitem__ to work correctly.

  • no-self-use × 12

    These were all in test code, where I had deliberately added methods to a base class or mixin which didn’t use self, because it was more convenient to import them and make them available to a test case that way. Some of them even wrapped standalone functions that did the same thing.

  • attribute-defined-outside-init × 10

    There were good reasons in these cases! Mostly test code.

  • too-many-locals × 6, too-many-return-statements × 6, too-many-branches × 2, too-many-statements × 2

    Yes, these functions were long, but having looked at them, I didn’t think there were nice ways of cleaning them up that would be an overall improvement. One of them, while long, was straightforward and had a very clear structure and wasn’t ‘messy’, while any ways of shortening it I could think of would have involved unhelpful layers of indirection or awkward utility functions.

  • arguments-differ × 6

    Mostly due to using *args and **kwargs in an overridden method, which is usually actually a good way to protect yourself from changes in method signatures from 3rd party packages (but has downsides too, and in some cases this error could highlight genuine bugs).

  • ungrouped-imports × 4

    I already use isort to manage my imports

  • fixme × 4

    Yes, I have some TODOs, but I don’t want to fix them right now.

  • duplicate-code × 3

    Sometimes you have a small amount of boilerplate that is kind of unavoidable, and if the ‘body’ of the code is small, this warning gets triggered.

  • broad-except × 2

  • abstract-method × 2

  • redefined-builtin × 2

  • too-many-lines × 1

    I have tried to think of natural ways to break this module down, and can’t. It’s also one of those places where I think a linter is just the wrong tool. If I have a module with 980 lines of code, and add another 30 so I cross the 1,000 limit, a linter complaining at me is just not helpful. If 980 is OK, why is 1010 so bad? I don’t want to have to refactor that module in order for the build to succeed, and I also want to be keeping my linter silent, so the only thing I can sensibly do at this point is silence the linter somehow, which defeats the purpose.

  • pointless-statement × 1

  • expression-not-assigned × 1

  • cyclic-import × 1

    The import cycle had already been broken by placing one inside a function. I couldn’t see a better way to structure the code given the constraints.

  • unused-import × 1

    I already had a # NOQA to silence this for flake8.

  • too-many-public-methods × 1

    If my test class has 35 tests, instead of the maximum 20, is that actually a problem?

  • too-many-arguments × 1

TOTAL: 243

Can’t fix

This covers a category of issues that I couldn’t fix, even if I wanted to, due to external constraints, like the fact you might need to provide a callback/class to a third party library or framework that has to satisfy certain requirements.

  • unused-argument × 21

  • invalid-name × 13

  • protected-access × 3

    Included some access to “documented internals” like sys._getframe in stdlib and Django’s Model._meta (which is documented).

  • too-few-public-methods × 3

  • too-many-arguments × 2

  • wrong-import-position × 2

  • attribute-defined-outside-init × 1

  • too-many-ancestors × 1

TOTAL: 46

Incorrect

These are ones where Pylint was simply making objectively wrong assertions about my code. If the assertion had been correct, it would probably have been something to act upon, but it wasn’t.

These are not things you could reasonably file as Pylint bugs — the dynamism of Python makes some of the things Pylint is trying to detect impossible to do reliably.

  • no-member × 395

    These were due to a handful of base classes, some from Django, others I created myself, where Pylint was unable to detect the existence of the members due to dynamism/meta-programming.

    Quite a few of them were due to how I had structured my test code (using the pattern given by django-functest, which in some cases could have been fixed by adding additional base classes with ‘abstract’ methods (i.e. ones that just raise NotImplementedError), or perhaps by renaming many of my test classes (to something that in this case would have been misleading, so I wouldn’t be very inclined to do that).

  • invalid-name × 52

    These were mainly due to Pylint applying PEP8’s rule about constants, and thinking that every top-level name defined using an = sign is a ‘constant’. Defining exactly what we mean by a constant is trickier than it sounds, but it certainly doesn’t apply to some things that are constant in nature, such as functions, and shouldn’t apply to less usual ways to create functions e.g.

    def foo():
        pass   # etc
    
    cached_foo = cache(seconds=100)(foo)
    

    Some of the ones in this category are debatable due to the lack of definition of what a constant is e.g. should a module level defined instance of a class, which may or may not have mutable state, be considered a constant? Such as in this common idiom:

    logger = logging.getLogger(__name__)
    
  • no-self-use × 23

    Pylint incorrectly claimed “Method could be a function” for a bunch of cases where I’m using inheritance to provide different implementations, so couldn’t convert these to functions.

  • protected-access × 5

    Pylint incorrectly assessed who was the ‘owner’ (i.e. the current bit of code is creating a ‘protected’ attribute on an object, and using it locally, but Pylint can’t see that).

  • no-name-in-module × 1

  • import-error × 1

  • pointless-statement × 1

    This statement does indeed have an effect:

    1 / 0
    

    I was using it to deliberately throw an unusual error that would be unlikely to be caught, as part of test code. I don’t blame Pylint for not guessing that of course…

TOTAL: 477

Running total

We haven’t finished yet, but let’s put these groups together at a higher level, namely:

  1. “Good” — the “Bugs” and “Helpful” categories where Pylint would definitely have been a positive help: 17
  2. “Neutral” — the “Cosmetic” category for things that have very marginal benefit, but can’t hurt (as long as they don’t take too much time): 252
  3. “Bad” — the “Unhelpful”, “Can’t fix” and “Incorrect” categories where Pylint is wanting us to change code that is better off the way it is, or definitely can’t be changed because of external dependencies, or because Pylint has just plain got the analysis wrong: 766

The Good:Bad ratio here is very low in my book. If Pylint was a co-worker doing code review, I’d be praying for him to leave (mostly likely ‘he’ would be the correct gender here…)

To fix the false positives you could silence the entire class of errors (which increasingly makes using Pylint pointless), or add special comments to your code individually. I’m reluctant to do the latter:

  1. It takes time!
  2. I dislike the visual clutter of comments that are just there to silence a linter.

I’m happy to add these pragmas when there are compelling benefits from using the linter, but not otherwise — I think comments are important, or should be important, so my code syntax highlighting theme shows them in strong contrast, as recommended by this article. So I do have some NOQA comments to silence flake8, for example, but for this same section of the code they add up to only 5 instances.

Docstrings

The remainder of issues that Pylint found were missing docstrings. I put these in a separate category because:

  1. They are very debatable, and you as a reader might have a very different policy on this kind of thing.
  2. I didn’t have time to do an analysis on all of them.

Overall Pylint found 620 missing docstrings (modules, functions, classes methods).

In many cases I feel very justified in not adding docstrings. For example:

  • When the name is already clear. For example:
    • if I have feature Foo, I don’t have to guess what FooTests might be about. I also don’t tend to add docstrings to test methods, and prefer a long name instead.
    • A module foo.utils.html is most likely to contain HTML utilities used by the foo project.
    • And many other cases where a good name is enough.
  • When the docstring is effectively defined elsewhere — for example, if I’m implementing an interface like Django’s database router. Adding your own docstring could easily be dangerous here.

In other cases, my code certainly could have benefited from more docstrings. Probably in only about 15 to 30% of the cases that Pylint caught would I think “Yes, I should add a docstring here, thank you Pylint for reminding me”.

In general I don’t like tools that force you to write docstrings, because I think you almost always end up with bad docstrings when that happens, and they are much worse than nothing, for basically the same reasons as for bad comments:

  • they waste your time reading them, because they provide no extra information, or incorrect information,
  • they then make you subconsciously filter out docstrings as a useful source of info, so they make all the docstrings useless, when docstrings can contain useful info.

Warnings about docstrings are annoying because to silence them individually would require adding a comment, which is about the same amount of work, and adds the same amount of visual noise, as adding a docstring itself. So it’s pretty much inevitable that you will end up with docstrings that are not necessary or unhelpful (and are therefore actively harmful).

I’ve experienced this in practice, and I don’t think it can be solved by “choose better developers” or “try harder” — the problem is the process.

Conclusion

With these figures, I feel my previous estimates about the usefulness of Pylint (in the context of a code base where I’m already using flake8) were about right. The false positive rate would have to be massively reduced for me to consider using it.

In addition to the time and visual noise needed to deal with false positives, I would be reluctant to add something like this to a project because of the problem that junior devs may take a more slavish approach to keeping Pylint happy. This could result in them breaking working code because they didn’t understand that Pylint had got it wrong, or doing big code refactors just to enable Pylint to understand the code.

If you use Pylint from the beginning of a project, or in a project with very few third party dependencies, I imagine you might feel differently, as the rate at which you hit false positives might be low. On the other hand, this might simply be concealing the costs of these things.

Another approach would be to use Pylint with a very restricted set of errors. However, there were only a handful of errors which were consistently correct, or low enough in false positives (either in relative or absolute terms) that I would want them on my project. (These included: dangerous-default-value, stop-iteration-return, broad-exception, useless-object-inheritance)

Anyway, I hope this has been helpful to others in considering the use of Pylint, or in arguing about it with colleagues!

Comments §

...loading...
blog comments powered by Disqus