Writing type annotations in python can be hard.
How do I pick the right one with so many different options that look so similar?
If a function returns
[1, 2, 3], how should I annotate it?
Should it be an
Iterable, or a
Can it be a
And why not just a
The short answer is, as always, “it depends”, on what exactly you want to do.
Let’s go through some of the python types related to container of elements, discuss how those are related, and where to use them.
Do not use
tuple as type annotations in python.
In most cases, those type annotation are either too demanding (as input types), or promising too much (as output types).
The typing system in python has more narrowly scoped types.
Also, developers can build their own types that are tailored for their specific needs.
Before we start…
… lets talk about the technical setup.
The code in this post is written in pseudo
Where necessary, links to the actual code are provided, and most of the information comes directly from the python
Explicit, minimal context matters
Types imply behaviour.
If I am given a
list[int], I know that I can do a couple of things with it:
.appendsomething to the end of the list,
- find the first index of a value via
- get an item by index like
and many more.
In most of the cases however, I don’t need all of that.
I might want only to check the size of the list and whether it contains a particular element.
Python typing system allows to be quite specific in what we actually want, by using more narrowly scoped type annotation than, for example,
Being specific is great. It is easier to satisfy a more specific requirement (e.g., what an input argument to a function is expected), than a less specific requirement. It is easier to understand and safer to use a more specific result (e.g., what an output of a function is), than a less specific result. Being specific means that we, as developers, make the code more explicit, and reduce the amount of context that is necessary to understand us.
We need to know two things to be specific:
- The minimal requirements or promises that we want to provide or make,
- How to express those in terms of the python typing system.
The first one depends on the concrete case, and it is the developer’s responsibility to figure that out.
The second one is a general question, and we will discuss it in terms of the python
Ultimately, we will understand when it makes sense to annotate something as
dict, and when something much simpler would be a better choice.
Building blocks 0: primitive types
Some types that define a behaviour must be first in the chain of inheritance.
Three of such primitive types are of particular interest to us:
Those are independent, “level 0” types, and they mean only what is in their source code.
Container object must be able to tell if it contains something, as in
y in x for an
x: Container and
y: object must work:
def __contains__(self, __x: object) -> bool: ...
(The actual code is simplified but the idea is the same.
See ref for the actual definition.)
in statement calls
Sized object must be able to tell its length, as in
len(s) for an
s: Sized must work:
def __len__(self) -> int: ...
(Again, this is slightly simplified. See ref for the actual definition.)
len() statement calls
Iterable object must be able to create a “stream” of data.
This stream of data is called an iterator, and we will it discuss in the next section.
it: Iterable, a call
iter(it) must work (obviously, not everything in python is an
Iterable, for example,
iter(1) will not work because
int does not implement
__iter__ and thus is not an
def __iter__(self) -> Iterator: ...
(And this also is slightly simplified. See ref for the actual definition.)
iter() statement calls
Ok, but why?
The reason we need all three of these types is that they represent quite distinct behaviour, and one cannot replace the other.
Iterable (something that can represent the inside data as a stream) is not necessarily
Sized, i.e., does not have a fixed length (it can be representing a stream of objects that are generated on the fly).
For the same reason it is not necessarily a
Container: we cannot know in advance if a particular object will appear in the stream.
Sized (something that has a known length) does not have to provide a way to peek into its elements, and therefore is neither an
Iterable nor a
Imagine, for example, an
Sized object that represents a seating plan in an airplane.
We might want to tell how many passengers are already in, but nothing more.
Container (something that can check what is inside) can be un
Sized, e.g., like a
Container of all twin primes.
It is also not necessarily an
Iterable, following the same argument as for the
when we don’t want to disclose which elements are actually inside.
Summing up, the level 0 container types are:
Building blocks 1: add a bit of spice, and mix
We can enrich the primitive types,
Container, by adding some additional behaviour or mixing them together.
A new type can add some new functionality to an existing one.
For example, an
Iterable that can reverse the iteration direction (instead of iterating from the first element to the last, iterate from the last element to the first) is called a
Iterable cannot change the direction by itself, so it is necessary to introduce a new behaviour for that.
def __reversed__(self) -> Iterator: ...
(See ref for the full definition.)
Another example is a “stateful”
As we discussed above, an
Iterable can create a stream of data, but does not deal with consuming that stream.
Iterable that can iterate through the elements of the stream, one by one, is called an
Iterator adds a
__next__ method to an
def __next__(self) -> T: ...
(See ref for the full definition.)
Iterator object must be able to yield its next element, as in
next(it) for an
it: Iterator must work.
This is in contrast to a plain
Iterable, which does not have a state, so calling
next on it will fail.
Side note: in python, users can write
for x in it, where
This implicitly calls
__iter__ once on the iterable
it which creates an
next calls are made multiple times against the produced
Iterator, and assigns the elements to
x one by one.
A new type does not necessarily need to introduce brand new functionality, and can be a combination of the existing ones.
Collection type is a sum of
class Collection(Iterable, Container, Sized): ...
(This is slightly simplified. See ref for the actual definition.)
If there are cases when we would need to
- iterate over content,
- check if a particular object is inside,
- known the length of a container,
Collection type annotation should be just fine!
Summing up, we discussed the following level 1 container types:
Custom level 1 types
Of course, it is perfectly fine to define custom types that include only those pieces that are of interest to the developer.
For example, a
SizedContainer = Sized + Container:
_T_co = TypeVar("_T_co", covariant=True)
class SizedContainer(Sized, Container[_T_co], Protocol[_T_co]): ...
(Here a type variable
_T_co is used.
See my previous post to learn more about that.)
Building blocks 2: All around me are familiar faces
The types that we discussed so far are quite limited. It is great on one hand, as the scope around those is very concise. On the other hand, developers are likely to expect to work with more feature-rich objects. The level 2 types cover most of those needs.
Let’s talk about three common container types: sequences, hash tables, and sets.
All of those are
Collections, with extra behaviour specific to each concrete type.
Sequence is a
Collection with a
In a sense, a
Sequence is a very minimal immutable
list - we can iterate over it, get an element by index, count the number of occurrences of a particular entry.
Mapping is a
Collection that contains key-value pairs, and allows to inspect elements by key, view the keys, values, and key-value pairs.
Similar to the
Sequence, it is an immutable type.
However, it is not possible to iterate over a
Mapping directly (but we can iterate its keys and values).
AbstractSet is a
Collection of unique elements, and allows to compare itself against other
Again, it is an immutable type.
AbstractSet are great options when we need to annotate immutable container types.
Building blocks 3: They Are A-Changin’
So far, all the types that we discussed were immutable. We do want to add elements to our containers, pop elements, update values for a particular key, etc. Let’s talk about some mutable types provided by the standard typeshed.
MutableSequence is a mutable version of the
Sequence type that allows to add and remove elements anywhere in the sequence.
MutableSet are mutable versions of the
AbstractSet types, respectively.
The syntax for elements manipulation is slightly different, but the general idea is the same.
Note that these types are purely for type annotation (as they are protocols and only define the behaviour, but do not implement it), and we cannot instantiate them like
s = MutableSet().
Building blocks 4: Back Where It All Started
We are all familiar with
Those are actual python objects, and we can instantiate them with
l = list() or
s = set((1, 2)).
So, at least, they should implement the
__init__ method (which already makes them different from the protocol types discussed above).
Let’s start with the
tuple which is the only immutable object in this group.
It is a
Sequence (but not a
MutableSequence) plus a bit of extra functionality: we can add and multiply tuples, and compare tuples (which is based on element-wise comparison).
set objects are expectedly
They can also create themselves via new methods like
intersection / etc. (for
This is possibly the main difference from the
Mutable protocols, which scopes do not extend to generation of new objects.
Similarly, the built-in types implement some form of user representation (
__repr__), unlike the protocol types.
This makes sense, because, for example, a
Collection does not really care about being represented in any user-friendly form.
Side note: interestingly, a
dict is also a
I believe it is useful to be aware of the container types build chain, from a primitive
Sized to a top-level
Building a signature of a method or a function is very exciting, and it is very important to do that wisely. Being specific in both the input and output types helps the users of the function: they know what exactly is expected (inputs) so there is less to implement; they know what can they rely on (outputs). A function that requires as little as possible suits more use cases.
Similarly, if developers of a function provide only what is actually needed to the users, they would evade unnecessary work, and de-risk backwards compatibility.
Developers and users avoid potential misunderstandings: it would be unfair to the users if a function promises to return a
dict (a mutable mapping with a bunch of utility methods), but what it actually wanted to provide was a plain
Mapping as it was never expected for the users to add new elements to it.
Imagine a taxi service: you, as a service provider, allows your passangers to go from A to B for a fee. As a service provider, the only thing you require is that a client is able to pay for the service (the input requirements are minimal). Also, as a service provider, you promise a safe journey from one place to another, but nothing else. Your employee taxi driver is be capable of changing the destination to a different city - but it would not make much sense to promise that to the passanger, because this is not what you agreed on (even though the driver can do that, just a many other unrelated things). If you don’t inform the user of your service about the limits of the service, the user might be quite upset if the services they assumed to be availble are not supported.
Being able to be explicit in what is actually needed or provided reduces risks and sets the expectations, making the code easier to follow and support.