Python type annotation

When it comes to programming, I have a belt and suspenders philosophy. Anything that can help me avoid errors early is worth looking into.

The type annotation support that's been gradually added to Python is a good example. Here's how it works and how it can be helpful.

Introduction

The first important point is that the new type annotation support has no effect at runtime. Adding type annotations in your code has no risk of causing new runtime errors: Python is not going to do any additional type-checking while running.

Instead, you'll be running separate tools to type-check your programs statically during development. I say "separate tools" because there's no official Python type checking tool, but there are several third-party tools available.

So, if you chose to use the mypy tool, you might run:

$ mypy my_code.py

and it might warn you that a function that was annotated as expecting string arguments was going to be called with an integer.

Of course, for this to work, you have to be able to add information to your code to let the tools know what types are expected. We do this by adding "annotations" to our code.

One approach is to put the annotations in specially-formatted comments. The obvious advantage is that you can do this in any version of Python, since it doesn't require any changes to the Python syntax. The disadvantages are the difficulties in writing these things correctly, and the coincident difficulties in parsing them for the tools.

To help with this, Python 3.0 added support for adding annotations to functions (PEP-3107), though without specifying any semantics for the annotations. Python 3.6 adds support for annotations on variables (PEP-526).

Two additional PEPs, PEP-483 and PEP-484, define how annotations can be used for type-checking.

Since I try to write all new code in Python 3, I won't say any more about putting annotations in comments.

Getting started

Enough background, let's see what all this looks like.

Python 3.6 was just released, so I’ll be using it. I'll start with a new virtual environment, and install the type-checking tool mypy (whose package name is mypy-lang).:

$ virtualenv -p $(which python3.6) try_types
$ . try_types/bin/activate
$ pip install mypy-lang

Let's see how we might use this when writing some basic string functions. Suppose we're looking for a substring inside a longer string. We might start with:

def search_for(needle, haystack):
    offset = haystack.find(needle)
    return offset

If we were to call this with anything that's not text, we'd consider it an error. To help us avoid that, let's annotate the arguments:

def search_for(needle: str, haystack: str):
    offset = haystack.find(needle)
    return offset

Does Python care about this?:

$ python search1.py
$

Python is happy with it. There's not much yet for mypy to check, but let's try it:

$ mypy search1.py
$

In both cases, no output means everything is okay.

(Aside: mypy uses information from the files and directories on its command line plus all packages they import, but it only does type-checking on the files and directories on its command line.)

So far, so good. Now, let's call our function with a bad argument by adding this at the end:

search_for(12, "my string")

If we tried to run this, it wouldn't work:

$ python search2.py
Traceback (most recent call last):
    File "search2.py", line 4, in <module>
        search_for(12, "my string")
    File "search2.py", line 2, in search_for
        offset = haystack.find(needle)
TypeError: must be str, not int

In a more complicated program, we might not have run that line of code until sometime when it would be a real problem, and so wouldn't have known it was going to fail. Instead, let's check the code immediately:

$ mypy search2.py
search2.py:4: error: Argument 1 to "search_for" has incompatible type "int"; expected "str"

Mypy spotted the problem for us and explained exactly what was wrong and where.

We can also indicate the return type of our function:

def search_for(needle: str, haystack: str) -> str:
    offset = haystack.find(needle)
    return offset

and ask mypy to check it:

$ mypy search3.py
search3.py: note: In function "search_for":
search3.py:3: error: Incompatible return value type (got "int", expected "str")

Oops, we're actually returning an integer but we said we were going to return a string, and mypy was smart enough to work that out. Let's fix that:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    return offset

And see if it checks out:

$ mypy search4.py
$

Now, maybe later on we forget just how our function works, and try to use the return value as a string:

x = len(search_for('the', 'in the string'))

Mypy will catch this for us:

$ mypy search5.py
search5.py:5: error: Argument 1 to "len" has incompatible type "int"; expected "Sized"

We can't call len() on an integer. Mypy wants something of type Sized -- what's that?

More complicated types

The built-in types will only take us so far, so Python 3.5 added the typing module, which both gives us a bunch of new names for types, and tools to build our own types.

In this case, typing.Sized represents anything with a __len__ method, which is the only kind of thing we can call len() on.

Let's write a new function that'll return a list of the offsets of all of the instances of some string in another string. Here it is:

from typing import List

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return []
    return [offset] + multisearch(needle, haystack[offset+1:])

Look at the return type: List[int]. You can define a new type, a list of a particular type of elements, by saying List and then adding the element type in square brackets.

There are a number of these - e.g. Dict[keytype, valuetype] - but I'll let you read the documentation to find these as you need them.

mypy passed the code above, but suppose we had accidentally had it return None when there were no matches:

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return None
    return [offset] + multisearch(needle, haystack[offset+1:])

mypy should spot that there's a case where we don't return a list of integers, like this:

$ mypy search6.py
$

Uh-oh - why didn't it spot the problem here? It turns out that by default, mypy considers None compatible with everything. To my mind, that's wrong, but luckily there's an option to change that behavior:

$ mypy --strict-optional search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

I shouldn't have to remember to add that to the command line every time, though, so let's put it in a configuration file just once. Create mypy.ini in the current directory and put in:

[mypy]
strict_optional = True

And now:

$ mypy search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

But speaking of None, it's not uncommon to have functions that can either return a value or None. We might change our search_for method to return None if it doesn't find the string, instead of -1:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

But now we don't always return an int and mypy will rightly complain:

$ mypy search7.py
search7.py: note: In function "search_for":
search7.py:4: error: Incompatible return value type (got None, expected "int")

When a method can return different types, we can annotate it with a Union type:

from typing import Union

def search_for(needle: str, haystack: str) -> Union[int, None]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

There's also a shortcut, Optional, for the common case of a value being either some type or None:

from typing import Optional

def search_for(needle: str, haystack: str) -> Optional[int]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

Wrapping up

I've barely touched the surface, but you get the idea.

One nice thing is that the Python libraries are all annotated for us already. You might have noticed above that mypy knew that calling find on a str returns an int - that's because str.find is already annotated. So you can get some benefit just by calling mypy on your code without annotating anything at all -- mypy might spot some misuses of the libraries for you.

For more reading:

New Call-to-action
blog comments powered by Disqus
Times
Check

Success!

Times

You're already subscribed

Times