“Quacks like a duck” — why you probably should use pydantic more in Python apps

Ivan Trusov
5 min readNov 3, 2022

--

“Bird Seller” by Vicente Manansala, image credits to WikiArt

Borrowing time

Python is an extremely flexible language. And when I say extremely I sometimes mean it with a little grain of salt. It allows you to write absolutely untyped code without any worries.

For instance, when you would like to quickly parse an API response and simply extract the necessary results, you might be tempted to do it “quick and dirty”:

And in some cases, it’s a fine solution. Even more, it’s a valid Python code in the end. Tools like pylint won’t tell you anything bad about it.

However anyone who had a chance to write a bigger Python project will have immediate pain while reading this. They just feel that this bit of code will become an imminent source of bugs and issues in the future, especially when the project will start growing.

Stop for a second and consider — what kind of issues you might face with that code?

You got it right — literally every line in this code has a potential issue. Let’s describe some of them:

  • No guarantees that request will be successful
  • No guarantees that request is actually JSON compatible
  • No guarantees that response payload will be a dict and not a list
  • Even if it’s a dict, there are no guarantees that data key in it

Same stuff applies to the pretty much every bit of the code above.

Someone would say — well, we have these guarantees from the API provider, right?

Yes and no. In fact these guarantees are well provided in case when “everything goes as expected” beyond your application or service boundary. But let’s be honest, this is a too wide statement to be true.

How could I as a developer regain at least minimal control over the input data, especially when it crosses the service boundary and unexpected structure becomes my problem? If I won’t enforce structures, I might run into:

def another_transform_func(inp: Dict[Any, Any]) -> Dict[Any, Any]:

Unfortunately my experience shows that using unverified structures on input, as well as Any->Any-like functions is a way to borrow time from future self. It allows you to move fast with the project, but will cost you bug-fixing time later on.

Is there any way to make it better?

🦢 The grace of pydantic

As a person who developed a lot in Scala, I indeed miss the case classes capabilities.

And to my opinion one of the best things that happened with Python ecosystem so far is indeed the development and huge adoption of pydantic. In fact, I don’t see any reason why it’s still not in the core Python instead of the built-in @dataclass functionality.

In the essence, pydantic allows you to create logical models of the entities in your Python application. This could be client/server models, this could be simple logical blocks that serve specific purpose — pretty much anything.

There are 3 absolutely great features that make pydantic stand out of the line, namely:

👀 Validate ’em all

Validators are essential part of your code with pydantic models, since they allow you to verify the input data straight away at the moment of the object initialization. No need to see NoneType doesn't have get anymore — it will be covered just at the moment of object initialization.

Here are some of the nice tricks I use with pydantic:

This validation function allows you to verify that at least one field with specified suffix is provided — e.g. you expect that in the payload you have at least one field that should end with _name suffix. It’s easy to refer to this check in the class:

Another frequently used check is the mutual exclusivity of specific fields:

It could be easily applied in the following way:

In the end basic validation rules combined with structures could easily cover up to 100% of issues related to incorrectly parsed input data etc.

What’s more important is that it enforces usage of the logical models further down in your code. It’s way easier to understand a function when it’s clear that it receives one model and returns another one, rather than Any -> Any function.

🛂 Divide et impera

Another brilliant feature of pydantic is it’s capability to handle nested structures with various objects. It’s a pretty common case when you receive an input that contains various logical objects — and you would like to:

  • Identify the object type
  • And if the object type is not identifiable — throw an exception.

The example in the official doc demonstrates quite good the idea of this concept. However, it might be hard to add more custom logic to this identification process, e.g. identify object as A if previous object is B etc. Share your recipes below in the comments for that one!

🤔 What-if?

Another great capability is provided by the hypothesis plugin. Let’s be honest — no one wants to write sample models to test their code.

It’s just something that takes time and you’re kind of forced to specifically point your samples towards the errors you have (to improve test coverage indeed).

The hypothesis plugin makes your life slightly easier, and also adds more flexible scenario choices to your tests. Sometimes it helps to catch the “unexpected” errors, whilst manually written samples usually tend to verify the “expected” ones.

🐳 Summary

The necessity of using proper logical models is pretty clear in most of the cases. With the simplicity brought by pydantic, it makes sense to actually start writing new applications from logical models, and only then move to the application logic, organising it as a “dance of entities” (yep, it’s a reference to the Clean Architecture).

There are some small things that I would like to see better done in pydantic, for instance:

  • generated fields straight away in the class definition
  • custom functions for discriminated union instead of just property matching
  • ability to mixin multiple root validators

But overall the experience using this library is that it brings efficiency and better code structure, as well as prevents errors.

Did you use pydantic in your projects before? Have an opinion? Feel free to share it in the comments! Also, hit subscribe if you liked the post — it keeps the author motivated to write more.

--

--

Ivan Trusov

Senior Specialist Solutions Architect @ Databricks. All opinions are my own.