3 minute read

The Django ORM makes interacting with the database a breeze, but without due care can also lead to poor performance.

Example models

Let’s say we have 2 very simple models, an Artistand her Artwork:

class Artist(models.Model):
    name = models.CharField(max_length=40, blank=False)

class Artwork(models.Model):
    artist = models.ForeignKey(Artist, related_name='artworks', on_delete=models.CASCADE)
    title = models.CharField(max_length=254,  blank=True)

Fetching artworks for each artist

Imagine we had a user listing endpoints (for example in a Django rest framework API), where we serialized each user and a nested array of their artworks

artists = Artist.objects.all()

for artist in artists:
    artwork = artist.artworks.first()
    if artwork is not None:
        print(artwork.title)

How many database queries would you expect?

If we tried this in the python manage.py shell --print-sql, you would see N+1 queries, where N is the number of artists in our database.

SELECT 
       "artist"."name",
       "artist"."id",
FROM "artist"

SELECT
       "artwork_artwork"."id",
       "artwork_artwork"."title",
FROM "artwork_artwork"
WHERE ("artwork_artwork"."artist_id" = '5e9eceb7-5d4e-419a-a5e6-cb96b6e1fcca'::uuid)
 LIMIT 1


SELECT "artwork_artwork"."id",
       "artwork_artwork"."title",
FROM "artwork_artwork"
WHERE ("artwork_artwork"."artist_id" = '069d5b3c-3fa4-4236-a054-13b73183ac49'::uuid)
 LIMIT 1

.
.
.

Here first we have the 1 Artist lookup and then the ensuing N related Artwork lookups (another database hit per artist)

This N+1 problem is so common that Django gave us the prefetch_related queryset method

If we instead do

artists = Artist.objects.prefetch_related('artworks').all()

for artist in artists:
    artwork = artist.artworks.first()
    if artwork is not None:
        print(artwork.title)

and observe the SQL queries in the shell, we’d now see


SELECT 
       "artist"."name",
       "artist"."id",
FROM "artist"


SELECT "artwork_artwork"."id",
       "artwork_artwork"."title"
FROM "artwork_artwork"
 WHERE ("artwork_artwork"."artist_id" IN ('5e9eceb7-5d4e-419a-a5e6-cb96b6e1fcca'::uuid, '069d5b3c-3fa4-4236-a054-13b73183ac49'::uuid, 'b202cfbd-c34b-4453-8d96-d00c314655c1'::uuid, 'dc1497c4-cf71-4b29-bcd4-dc1ba1836f3f'::uuid, '5f2b5604-f204-4130-a7a4-529f435a11cc'::uuid))

So now we have just 2 SQL queries! No matter how many users in our database. This is a lot better than N+1

Django will execute the second query and store the results on Artist queryset, so any future lookup with artist.artwork will happen in Python without hitting the database.

The downside of this is that all this data will now be stored in memory, so care is needed that this doesn’t grow too large.

Conversely, if instead of a 1-N relationship, we were looking at fetching a related model of which there was only one, for example

artworks = Artwork.objects.all()
for artwork in artworks:
    print(f'{artwork.title} is by {artwork.artist.name})

Then once again, we’d have an N+1 issue: 1 query to grab the artworks from the database, and then 1 query per each of the N artworks to fetch the parent Artist:

SELECT "artwork_artwork"."id",
       "artwork_artwork"."title",
FROM "artwork_artwork"

SELECT "artist"."id",
    
       "artist"."name",
FROM "artist"
WHERE "artist"."id" = '034544ed-a262-4c86-a061-47891daf2824'::uuid

.
.
.

To avoid this issue, Django provides us with select_related

artworks = Artwork.objects.select_related('artist').all()

which will do a SQL JOIN to reduce all those queries to just a single SQL query before caching it on the Python queryset:


 SELECT 
       "artwork_artwork"."id",
       "artwork_artwork"."title",
       "artist"."id",
       "artist"."_name",
  FROM "artwork_artwork"
 INNER JOIN "artist"
    ON ("artwork_artwork"."artist_id" = "artist"."id")

Use with DRF

Whenever you use a nested serializer in Django-rest-framework, you run the risk of this N+1 problem, so it’s worth ensuring you are not hitting the database hard.

One way to implement the above methods would be to override the get_queryset method of the viewset

class ArtistViewSet(viewsets.ModelViewSet):

    serializer_class = ArtistSerializer

    def get_queryset(self):
        # The base user queryset
        # Without the prefetch_related we'd hit a N+1 with a fetch of those things per user rather than batch/cache
        qs = Artist.objects.select_related('auth_token')\
            .prefetch_related('artworks')
        return qs

Monitoring performance with Django

Runserver plus and shell plus

If you use Runserverplus then you develop locally with the --print-sql switch, which is very useful when trying to check what database queries are being made by each view. Similarly with shell_plus --print-sql

You could also commit to logging all slow SQL queries with a logging filter

# settings.py

SLOW_SQL_THRESHOLD = 0.001

class SlowQueriesFilter(logging.Filter):
    """Filter slow queries"""

    def filter(self, record):
        duration = record.duration
        if duration > SLOW_SQL_THRESHOLD:
            return True
        return False


LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'slow_queries': {
            '()': SlowQueriesFilter,
        },
    },
.
.
.
    'loggers': {
        'django.db.backends': {
            'handlers': ['console'],
            'filters': ['slow_queries'],
            'level': 'DEBUG'
        }
    }
}

Another useful tool is Django-debug-toolbar, which will show you how many SQL queries and a breakdown of them for each view.

The downside is that this only works in a browser, but for GET API endpoints if you execute them in a browser you can still use this tool for performance insights.

Comments