Brief history of data classes in python

Kevin Tewouda
4 min readJul 22, 2022

How dataclasses came in python landscape

A pile of books
Photo by Joanna Kosinska on Unsplash

In this tutorial, we will review the different ways we create data classes in python from the oldest way to the newer one. Hopefully, in the end, you will be convinced to use pydantic dataclasses as your default way to create data classes.

Default python class creation

The oldest way to define a class in python is via the special __init__ method. For our example, we will work with a Point class taking x and y coordinates as input.

old style class

As you can see, there is some boilerplate code because we define the same variables as method arguments and class attributes (x and y). When we try to print our created object to see what it looks like, we see a strange default representation made by python.
Even worse, we can instantiate an object like the following Point(1, 'foo') and python will not complain at all.

Ok, you can say that tools like mypy or pyright will help you catch the bug but not everybody wants to use them, so we have to find another way.
Also, the default class implementation does not have comparison methods implemented, therefore p == p2 returns False even if the attributes have the same values between the two objects. 🥲
Here is how we can fix these three issues with the following code.

old style class but better

Now we have a working class with a pretty representation but look at the amount of code we have to write... And we haven't even implemented all the comparison methods.

Namedtuples

I will make a small digression on the namedtuples because one can argue that it is a way to declare a class quickly, and yes it is the case but not without some caveats…
Consider the following example:

namedtuple example

Ok, with namedtuples, we have a pretty representation by default, but we suffer from the same lack of type verification as in the default way of creating classes and since it inherits the tuple class, the comparison with a tuple in the last instruction is correct which is not always what we will want.

attrs

Taking into account the problems related to class creation quoted above, a well-known Pythonista decides to bring a solution with a library called attrs. Let’s see how we can rewrite our Point class.

attrs example

Ok, we clearly see a difference with the handwritten class we wrote above. We have:

  • A pretty default representation
  • Type verification using the field function and validators.
  • A default comparison implementation takes into account the type of the compared objects. This is why the test with a tuple returns False.

It is a well-thought library, and it can be customized in different ways like defining slots, frozen classes, keyword-only arguments, and more.

dataclasses

In python3.7, the language introduces dataclasses defined in PEP 557 with the will to simplify the writing of classes. In fact, this new standard library is heavily inspired by attrs.

Let's see what our famous Point class looks like:

dataclass — example 1

We have almost the same advantages as the attrs definition except that the argument type is not verified at initialization time.

The only way to achieve this verification is to do the following:

dataclass — example 2

Yeah, it sucks a little, but it was a will of the CPython maintainers to have a simplified version of attrs without validation and other joys.

pydantic dataclasses

Finally, we will talk about pydantic, a data validation library made famous by
a relatively young web framework, FastAPI. If you don’t know it, I highly recommend checking its API, it is another well-thought piece of software. I also wrote a blog post presenting some of its advantages.

The feature that interests us in this article is that of the dataclasses. Again we will look at our Point class implementation. 😁

pydantic dataclass — example 1

We have all the advantages that we had with the attrs implementation and the written code is even smaller!
pydantic leverages type annotations to validate data, and we can still use the API provided by the standard library dataclasses like field, asdict, etc... because pydantic.dataclasses is just a wrapper around the standard one. For proof, look at the following example:

pydantic dataclass — example 2

Here we wrap a normal dataclass into a pydantic one, and we have the same verifications and features!
Pydantic is a fantastic library (yes I am a little biased) and I can only recommend you check out its documentation.

This is all for this tutorial, hope you enjoyed it. Take care of yourself and see you next time! 😁

This article was originally published on dev.to

If you like my article and want to continue learning with me, don’t hesitate to follow me here and subscribe to my newsletter on substack 😉

--

--

Kevin Tewouda

Déserteur camerounais résidant désormais en France. Passionné de programmation, sport, de cinéma et mangas. J’écris en français et en anglais dû à mes origines.