Django Model Bulk Load Slowness

At work, we have a database export process that transfers data from our OLTP PostgreSQL database to our OLAP RedShift database. This export is exceedingly slow, causing headaches for the engineers, as deployments are typically delayed until the export finishes.

In diving into the export process, I discovered that one major factor is the price of model object deserialization - in other words, the translation of the SQL query response into Django model objects.

Compared to returning plain tuples with values_list(), the construction of the model objects can be 800x slower.

Let's explore this, and also explore code snippets in Ghost.

With 100,000 objects in the creatively named Dese table (dese-rialize, hehe), let's compare. First, a look at the model:

class Dese(models.Model):
    content = models.TextField()
    created_at = models.DateTimeField()
    is_active = models.BooleanField()

Now, let's compare default object deserialization with values_list():

In [6]: %timeit list(Dese.objects.all())
493 ms ± 20.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit list(Dese.objects.all().values_list('id', 'content', 'created_at', 'is_active'))
151 ms ± 3.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As we can see, it's about 3x faster to use values_list(). However, this doesn't quite reach the slowness I witnessed at work. What else could it be?

At work, most of our models inherit from a BaseModel class, which in turn inherits from a ChangesMixin from the django-model-changes library. Adding the ChangesMixin to our model is revealing:

In [1]: %timeit list(Dese.objects.all())
3.4 s ± 61.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We're now at 3.4s for 100,000 objects, a roughly 7x slowdown versus not using ChangesMixin!

To be continued...

social