Dissecting Python - part 1

Posted in:

The first part of a series of posts that attempt to explain Python by the dissection method.

Background

On reddit a few months ago I was surprised by the popularity of a comment that explained how bound and unbound methods work in Python. One reply pleaded with me to do a series explaining Python at this kind of level, and this is the result.

I find I use things best when I understand how they work – when I understand what something is made of, what the bits are etc. I am very fortunate in having started my interest in programming at the age of about 10 with a 48k, 8-bit computer called the Oric Atmos. The Atmos came with an Advanced User Guide that told you almost everything there was to know about the computer, including a complete ROM disassembly. So, after learning BASIC, I was able to get into 6502 machine code/assembly pretty easily, and started exploring every little hack to get the machine to do my bidding. This gives you an understanding of computers which I think is indispensable. (It helped a lot that the Atmos could be re-booted within a few seconds after I locked it up :-)

I think the approach of understanding what is going on at least one level below your current level of abstraction is often neglected in other areas too.

For example, with web programming, after you have learnt HTML and CSS, I think the next step should be to learn the essentials of HTTP by literally typing commands at a telnet prompt. I have never seen anyone else suggest this method, and would like to have the opportunity to try it out with someone. It's really not that hard (as long as you can type reasonably carefully), and I think it gives you an understanding of what is going on in terms of requests and responses that would eliminate a huge number of security problems and later misunderstandings.

So, my approach to doing advanced Python is to take the bits apart and understand them. Python has an essentially simple execution and object model, which enables you to go a long way just by being an insatiable tinkerer. These articles are intended simply to point out the tinkering tools and guide you past the dead ends. Once you understand these nut and bolts, some more advanced concepts like meta-classes actually become pretty simple.

Disclaimers

  • Almost everything here has been learnt simply by playing around, working things out, and looking up docs when I got stuck. I have not invested any time in reading the CPython sources to get a better insight.

  • I think I'm probably quite indebted to Advanced Python, but these articles are not based around that presentation, and will cover in some cases less ground, but in a way that allows you to take your own time, and encourages experimentation.

  • Sometimes I will tell deliberate fibs out of a desire to keep things simple and focused.

  • Sometimes I will tell accidental fibs out of ignorance and laziness. If you challenge me, however, I will invariably claim that these were actually the deliberate fibs described above. Detecting these meta-fibs is left as an exercise for the reader.

Requirements

I'm assuming either moderate knowledge of Python, or basic knowledge of Python with moderate knowledge of programming in general. You need to have used dictionaries, lists, functions, classes, import statements, and should know how to write and run a script and use the Python prompt interactively.

I'm going to use Python 3 for all my examples, because I want these articles to remain relevant longer, and because Python 3 has a slightly simpler and cleaner model in some cases.

However, the vast majority of example code will work just fine with Python 2.X. The biggest problem is the print statement in Python 2.X, which has been turned into a function in Python 3. But if you have Python 2.5 or 2.6, you can just start your script or Python session with from __future__ import print_function, and then almost everything will be identical.

I'm also assuming CPython, but in practice it won't make much difference.

Code samples

With most of the code samples, they are written for executing as stand-alone scripts. However, you can type them at a Python prompt, which means you don't need to use print() to see the objects. The results will be identical, or almost identical depending on what is in your .pythonrc file.

Contents

In this post, I'll cover:

  • Executing a script or module

    • Compilation

    • Module creation

    • Top-down execution

  • Function definitions

  • Digression: locals()

  • An exercise

  • A lesson

OK, let's get going.

Executing a script or module

What happens when you do python myscript.py? How does your code end up running?

Compilation

Well, the first thing that happens with CPython is that it compiles your code. The result is a myscript.pyc file – you will see these .pyc files littering your folders. Of course, if that file already exists and is up-to-date, Python will just use it rather than re-compile, which is the whole point of creating this file.

This file contains byte-code – a lower level language that can be run on a virtual machine. A virtual machine, in this context, is simply a program that can run these instructions, using the hardware CPU as a model for how it should work, but with much higher level commands than the machine code instructions that a real CPU understands.

In CPython, this virtual machine and the corresponding byte-code work in such a way that we can actually ignore this step. From now on, we will assume that the Python interpreter is simply reading your code one statement at a time and executing it. There are very few occasions where this model of understanding will fail us.

This way of working also means that the Python REPL (i.e. the Python prompt) works in essentially the same way as a script or module. In the examples that follow, you can type the code into a Python prompt or run from a script, and you'll get the same result.

Module creation

After compilation, the first thing that needs to happen is a module needs to be created. A module, like everything in Python, is an object. An object is something that has an identity (think the id function), and some attributes. Now one way to store a bunch of attributes is a dictionary, so like other things in Python, a module is pretty thin wrapper around a dictionary.

The dictionary is initialised almost empty (but not quite). We'll come back to thinking about this dictionary, but for now just file this knowledge away for later reference.

Top down execution

The next thing you need to know is that Python then executes the contents of the module essentially statement by statement.

Now, there are of course different kinds of statements, including function definition, assignment, flow-control etc. and each works in its own way. So the if/elif/else statement only runs a branch if the corresponding condition matches. But we can think about the whole statement being executed when it is reached.

So, let's have an example. Look at the following code:

a = "hello"
print(a)

This, very obviously, simply prints hello. But if we switch the order:

print(a)
a = "hello"

…we get NameError: name 'a' is not defined. The statement a = "hello" creates a string object, and puts an entry in the module dictionary, and the statement print(a) attempts to look up entry a in the module dictionary and then print it. In the second, this lookup fails, because the assignment hasn't happened yet, hence the NameError.

This simply demonstrates that Python is executing your code a statement at a time. It might be obvious to some, but this simple fact makes Python very different from C, Java, C#, Haskell and others. (The difference could perhaps be better illustrated using two classes in a module, one which inherits from the other - in Python the base class must come first, unlike, say in C#).

Function definitions

The next thing we need to think about is function definitions.

It is crucial to understand that function definitions are also simply statements — with their own peculiar nature, but simple statements nonetheless. When the interpreter reaches a function definition it executes it. That doesn't mean it executes the body of the function – that will wait until the function is called. But the whole statement is executed. The particular nature of a function definition is to construct a function object, and assign it a name in the local namespace.

To summarise: A function definition is an object construction and assignment statement.

So, let's see some code that demonstrates that:

def foo():
    pass
print(foo)

The result will be something like:

<function foo at 0xb73aa32c>

Again, if we switch the lines:

print(foo)
def foo():
    pass

…we get a NameError, as before.

I want you to notice that the def foo line is doing the same job as the a = "hello" line – it creates an object, and creates a name in the module dictionary for that object.

Digression: locals()

I think it's about time we actually look at this module dictionary. It isn't something we just have to imagine.

There are two ways to do this. The first is the builtin function 'locals()'. Its docstring says: "Update and return a dictionary containing the current scope's local variables." So, try the following:

a = "hello"
def foo():
    pass

print(locals())

You'll find something like:

{'a': 'hello', '__builtins__': <module 'builtins' (built-in)>, '__file__':
 'test1.py', '__package__': None, '__name__': '__main__',
 'foo': <function foo at 0xb7449bec>, '__doc__': None}

You can see the 'a' and 'foo' entries, along with a few other things that got populated automatically.

Now, the fact that locals() can produce this output doesn't mean that the module dictionary necessarily exists – it could create the dictionary on the fly. But there is another way we can test this – via sys.modules.

Try the following:

import sys
a = 1
print(__name__)
print(sys.modules[__name__])
print(sys.modules[__name__].__dict__)
print(id(sys.modules[__name__].__dict__))
print(id(locals()))

sys.modules is where Python stores all the module objects. __name__ is a local variable, created by the Python interpreter and put into each new module dictionary. It is the key of that module in sys.modules. For a script, it is always equal to "__main__", but modules imported by import will get the names you would expect.

If you try the above code in CPython, you will find that the locals() dictionary not only has the same content as the module dictionary, it is the same dictionary (same id).

And yes, it is writable. Go on, I'll wait while you play with very verbose ways of setting local variables (or setting local variables that you can't use, because they have illegal names) – you know you want to. (BTW, CPython don't guarantee that the output of locals() will be a writable dictionary that can be used to update the local namespace, but it is currently).

locals() will prove invaluable in understanding some other features.

Function definitions - continued

So, I was proving you that the module dictionary was real. But we need to think about function objects, and my claim that a a function definition is an object construction and assignment statement.

What kind of object is contructed? Well, a function object of course. What is it constructed from? It's built from byte code that corresponds to the body of the function statement. This byte code is loaded into memory from the time that the Python is compiled, but it does not become a function until the function statement is executed and the function defined.

Now, the special syntactical form for defining function allows something unusual to happen. Look at part of the output of locals():

def foo():
    pass

print(locals())

You'll find this:

{ ... 'foo': <function foo at 0xb74a0cec>, ... }

The string 'foo' appears as both the key and as part of the value. This means that function objects have the unusual property of knowing what name they were given – unlike strings and class instances etc. But note that this name is simply the name they were given at the time they were constructed – it doesn't magically update.

So, I'm claiming that the following two statements are equivalent:

def foo(a, b=10):
    pass

foo = make_function('foo', ['a', ('b', 10)], None)

…where make_function is a mythical builtin function that takes the name of the function, a list of arguments, (with tuples being used keyword arguments here) and a 'code object' that is the body of the function, and returns a newly constructed function. (I deliberately chose a trivial function with no body so I didn't need to think of a way of representing the code object).

Given this equivalence, you'll realise that 'foo' is just another variable in the local namespace. You can give it another name in the local namespace, and it will continue to work. You can even change its internal name, and it won't be bothered (although it will make debugging harder).

Some messing around to drive home the point:

>>> def foo():
...     print(1)
...
>>> foo()
1
>>> bar = foo
>>> bar
<function foo at 0xb74a0d2c>
>>> bar()
1
>>> bar.__name__ = 'baz'
>>> bar
<function baz at 0xb74a0d2c>
>>> foo
<function baz at 0xb74a0d2c>
>>> baz
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'baz' is not defined

Exercise

OK, got all that? If so, you should be able to do the following exercise. What does this code do?

print(1)

class DeepThought(object):

    def __init__(self):
        print("Cogito ergo sum")

    def __str__(self):
        return "Deep Thought"

print(2)

def go(val=DeepThought()):
    print(val)

print(3)

go(val=4)
go()

Write down your answer, and make sure you do check, because you might learn something vital that will save you hours in the future!

Lessons

So, what can we do that we couldn't do already?

Well, you know how to destroy readability of your code by manipulating the locals() dictionary. Yay! Just when you thought your job security had been lost due to the clarity of the code you are writing in Python.

Seriously, the first thing to notice is that if function statements are assignments, they can appear anywhere that assignments do. So I can write code like this:

import sys

if sys.platform == 'win32':
    def some_function():
         # Some special Windows fix here
         pass
else:
    def some_function():
         pass

Assuming some_function is fairly small so there isn't much code duplication, this is much nicer than putting the if inside some_function, since the platform check happens just once – when the module is imported, rather than every time some_function is called. While you may not do this that often, it can be useful when you need, like this rather more complicated version in Django.

This understanding also frees you from thinking about functions being 'top level' things (or perhaps 'second level', when they appear in class statements). They can appear anywhere, and this opens up a world of possibilities once you understand it.

In the future, I'm hoping to look at modules, importing and circular imports, class statements and meta-classes. We may have a bit of exploration of stack frames at some point. Let me know if there are particular topics you'd like covered!

Comments §

Comments should load when you scroll to here...