You never know what rabbits you’ll get out of the magician’s hat. This time we’ve got two of them.

Meet the twins: Beanie and Bunnet.

About two years ago, in my previous blog posts, I outlined the mechanisms by which you can automate the serialization of Pydantic data objects into a MongoDB. Fortunately, I didn’t have to implement the object-document mapping (ODM) library all the way to the end. Someone has made this thing ready in the meantime, for us everybody!

Beanie - alongside its buddy, Bunnet - is exactly the kind of library that I envisioned back then when I sketched mechanisms for connecting Pydantic objects with MongoDB. Or maybe the library already existed at the time, but it just didn’t catch my eye. Well, never mind. Then it only means I don’t bear any responsibility for those creatures that crawled out of the magical hat.

Bunnet provides a layer for data objects based on Pydantic, which allows them to be stored and loaded from the database in a simple and synchronous way. Bunnet has also a twin, Beanie, which is an asynchronous version of the same library. But should you use Bunnet or Beanie?

When is synchronous better?

There are many use cases for synchronous processing. Namely, in data processing, as in many other things, it is often the case that you can’t start the next thing until the previous one is finished. For example, you can’t start tying your shoelaces until you’ve put the shoes on, and so on. Or of course you can try it, but then you can go for a walk some other time.

In addition to putting on clothes, you may have to convert raw data from some measuring device along the way into reduced format that can be more easily displayed in graphs. This may also have to be done sequentially, step by step, as the data is refined into its final form.

This was also the case in a playful example from a previous blog post of mine a couple of years ago, where log data from the washing machines is processed. This can be applied to anything. For example, for processing data from devices that measure electricity consumption.

In this case, the background process can continue to convert the measurement data to different derivatives in its own isolated container. A synchronous operating model can be suitable for this, when there is one CPU per container and each container processes a separate dataset.

What about when there are many measuring devices?

When we consider that the data of each device forms its own sequenced processing entity, the system can be made really scalable when each run takes place in its own container. Each container processes one measuring device conversion iteration at a time.

Perhaps, for example, processes the latest data that has entered the database since the last iteration. When more devices are added to the cluster, the number of containers is only scaled up. This is really easy with Kubernetes, which is like it was created for this type of work. Each container processes one data source at a time and only one container processes one measuring device at a time.

Sequential but asynchronous code

It’s easy to confuse the concepts of sequential vs. concurrent and synchronous vs. asynchronous. The thing is, synchronous code can only be executed sequentially, but asynchronous code can be executed both sequentially and concurrently. It can be forced to run sequentially with await keyword before each call. Such asynchronous code works almost in the same way as synchronous code. The difference is, however, that it doesn’t block the execution of the event loop within the same program. And that’s a big difference!

Or is it?

If the runtime environment only has one CPU, it cannot be magically made to run faster by writing concurrent code with asynchronous functions. The end result will not change because parallelism is just an illusion. One CPU can still only do one thing at a time. If you observe anything else, it’s just an illusion made via ultra rapid context switching.

When your container only has one CPU, there is no point in making the code error-prone and difficult to understand by writing “concurrent” code. The end result will not be any faster regardless of whether you execute two things faster in succession but slower “in parallel”. One CPU does not become two. Therefore, it is clearer and more honest to run synchronous code than to pretend to multitask. If you need real multitasking, use a second CPU. Start another container where you also run synchronous code.

What about I/O Tasks?

Even if there is only one CPU in use, but the process also includes writing to the database, concurrent execution can still speed up execution. Because while waiting for the database write to complete, the CPU can be made to do something else. Now, as with any operation where data cannot be written until it has been processed, you have to think very carefully about whether concurrent execution really provides so much benefit that it is worth sacrificing, for example, the readability of the code.

However, if your code heavily leans on database interaction, asynchronous execution may actually provide significant benefit.

But yet again, even if you can make the code do something else while waiting for the database operation to complete, there’s no point to do something else in the meantime, if there is nothing to do. Meaning, the following operation is dependent on the new state in the database. A bit same thing as with the shoelaces.

Lazy as a toad?

Given that we have such sequential data processing tasks that alternate between computation and writing to the database, don’t we end up with really lazy toads that do nothing but wait for the database to finish before starting the next intense computation burst? Are they wasting time? Isn’t it really wasteful to have systems that just laze around while waiting for the database to finish? Doesn’t that laziness force us to start more and more containers to make up for the slowness due to extensive idling, eventually causing many more containers to be running than optimal, which could lead to huge resource consumption and money burning?

Absolutely not!

When running containers in Kubernetes pods, the reality is that CPU resources are shared. The containers being in a less CPU-intensive state (interacting with the database instead of computing) actually don’t reserve the CPU under the hood but the kernel allocates the CPU to those containers that currently need it more. So the CPU orchestration is being done on the system level.

Container-based concurrency

From an engineer’s perspective, there’s no need to complicate things by trying to manage this concurrency yourself at the code level. When parallel processing is already implemented at the system level, there is no particular need to mess things up and duplicate parallelism within individual containers.

In principle, sequential code is much easier to understand. In real life, sequential execution is also easier to understand. For example, when you go for a walk, it is a clear thing. First you put on your coat, then your hat, then your shoes, tie your laces and finally put on your gloves.

Asynchronous mess

Have you ever seen that movie where one hero, Donald Duck (not the other Donald), tries to cook and starts one thing after another with a terrible fuss and soon there is a horrific mess in the kitchen? Asynchronous code can be exactly like this at its worst. You wonder when an error will occur at any given time, when there is no guarantee in which order each operation starts and ends!

Meet the twins: Beanie and Bunnet

Is your use case a good fit for asynchronous processing? Or are you inclined toward the synchronous approach? You decide! Luckily, the synchronous library Bunnet has a twin named Beanie. If you need asynchronous processing, such as handling API calls, choose Beanie. If you do background processing, consider Bunnet.

I’ll be playing with my new pets Bunnet and Beanie and maybe even share my experiences with you.

Let’s see!