Brief history of data classes in python
How dataclasses came in python landscape
In this tutorial, we will review the different ways we create data classes in python from the oldest way to the newer one. Hopefully, in the end, you will be convinced to use pydantic dataclasses as your default way to create data classes.
Default python class creation
The oldest way to define a class in python is via the special __init__
method. For our example, we will work with a Point
class taking x
and y
coordinates as input.
As you can see, there is some boilerplate code because we define the same variables as method arguments and class attributes (x
and y
). When we try to print our created object to see what it looks like, we see a strange default representation made by python.
Even worse, we can instantiate an object like the following Point(1, 'foo')
and python will not complain at all.
Ok, you can say that tools like mypy or pyright will help you catch the bug but not everybody wants to use them, so we have to find another way.
Also, the default class implementation does not have comparison methods implemented, therefore p == p2
returns False
even if the attributes have the same values between the two objects. 🥲
Here is how we can fix these three issues with the following code.
Now we have a working class with a pretty representation but look at the amount of code we have to write... And we haven't even implemented all the comparison methods.
Namedtuples
I will make a small digression on the namedtuples because one can argue that it is a way to declare a class quickly, and yes it is the case but not without some caveats…
Consider the following example:
Ok, with namedtuples, we have a pretty representation by default, but we suffer from the same lack of type verification as in the default way of creating classes and since it inherits the tuple
class, the comparison with a tuple in the last instruction is correct which is not always what we will want.
attrs
Taking into account the problems related to class creation quoted above, a well-known Pythonista decides to bring a solution with a library called attrs. Let’s see how we can rewrite our Point
class.
Ok, we clearly see a difference with the handwritten class we wrote above. We have:
- A pretty default representation
- Type verification using the field function and validators.
- A default comparison implementation takes into account the type of the compared objects. This is why the test with a tuple returns
False
.
It is a well-thought library, and it can be customized in different ways like defining slots, frozen classes, keyword-only arguments, and more.
dataclasses
In python3.7, the language introduces dataclasses defined in PEP 557 with the will to simplify the writing of classes. In fact, this new standard library is heavily inspired by attrs
.
Let's see what our famous Point
class looks like:
We have almost the same advantages as the attrs
definition except that the argument type is not verified at initialization time.
The only way to achieve this verification is to do the following:
Yeah, it sucks a little, but it was a will of the CPython maintainers to have a simplified version of attrs
without validation and other joys.
pydantic dataclasses
Finally, we will talk about pydantic, a data validation library made famous by
a relatively young web framework, FastAPI. If you don’t know it, I highly recommend checking its API, it is another well-thought piece of software. I also wrote a blog post presenting some of its advantages.
The feature that interests us in this article is that of the dataclasses. Again we will look at our Point
class implementation. 😁
We have all the advantages that we had with the attrs
implementation and the written code is even smaller!
pydantic
leverages type annotations to validate data, and we can still use the API provided by the standard library dataclasses
like field, asdict, etc... because pydantic.dataclasses
is just a wrapper around the standard one. For proof, look at the following example:
Here we wrap a normal dataclass into a pydantic one, and we have the same verifications and features!
Pydantic is a fantastic library (yes I am a little biased) and I can only recommend you check out its documentation.
This is all for this tutorial, hope you enjoyed it. Take care of yourself and see you next time! 😁