Query Expressions are Amazing

The Django 1.8 release added support for complex query expressions. The documentation has some nice examples but they don't do justice to how crazy awesome these are. In this post, we will go through some additional examples of how to leverage these expressions.

Django has had one form of a query expression for several years now: the F expression. F() can be used to reference an existing column in a query. This is often used for atomic update operations, such as incrementing a counter. However, F can also be used to compare two columns on a model when filtering. For instance, we may be interested in users who haven't logged in since their first two weeks on the site. That requires comparing the value of the last_login column and the date_joined on the standard User model from contrib.auth:

from datetime import timedelta

from django.contrib.auth.models import User
from django.db.models import F
from django.utils.timezone import now

# Create some fake data: 10 active users and 20 inactive ones
today = now()
active_count = 10
inactive_count = 20
for i in range(1, active_count + inactive_count + 1):
    active = i <= active_count
    prefix = 'in' if active else ''
    domain = 'example.com' if i % 3 == 0 else 'caktusgroup.com'
    attributes = {
        'username': '{}active-{}'.format(prefix, i),
        'email': '{}active-{}@{}'.format(prefix, i, domain),
        'date_joined': today - timedelta(days=30),
        'last_login': today - timedelta(days=0 if active else 21),
    }
    User.objects.create(**attributes)
# Query inactive users
inactive = User.objects.filter(last_login__lte=F('date_joined') + timedelta(days=14))

The F expression supports basic arithmetic operations including some date math, as seen in the example above. However, it is still very limiting in comparison to what is available in SQL.

Relational databases such as Postgresql support a number of built-in functions which you can leverage in the ORM using the Func expression added in 1.8. For example, you may want to examine the email domains of your user base. For that, you might use the split_part function in Postgresql to extract the domain of the email address column. To normalize the domain values you can compose this with the built-in Lower expression:

# Continued from above
from django.contrib.auth.models import User
from django.db.models import F, Func, Value
from django.db.models.functions import Lower

qs = User.objects.annotate(domain=Lower(
    Func(F('email'), Value('@'), Value(2), function='split_part')))

This translates into the SQL call split_part("auth_user"."email", @, 2) and annotates every user with a new domain attribute which is the domain of their email address. The value 2 passed to split_part says to take the second value after splitting the string. Unlike Python this is a 1-based index rather than a 0-based index. With this we can find out what the most popular domains are for the users:

# Continued from above
from django.db.models import Count

popular = qs.values('domain').annotate(count=Count('id')).order_by('-count')
print(popular)
# Result
# [{'count': 20, 'domain': 'caktusgroup.com'},
# {'count': 10, 'domain': 'example.com'}]

As noted in the example, this returns a list of dictionaries of the form {'domain': <domain name>, 'count': #} ordered by the highest counts first. We can take this even further using the conditional expressions.

Two more new expressions Case and When can be used to build conditional aggregates. For instance, we may want to only count users who have logged in recently:

# Continued from above
from django.db.models import Case, When
from django.utils.timezone import now

active = When(
    last_login__isnull=False,
    last_login__gte=now() - timedelta(days=14),
    then=Value(1))

active defines the conditional expression when the last_login is not null and is a date later than 14 days ago. If there is a match then this row will add the value of 1 to the aggregate. This conditional expression can be passed into an aggregate expression such as Count, Sum, or Avg. To get the popular domains, we’ll count the number of active users for a given email domain.

# Continued from above
popular = qs.values('domain').annotate(
    count=Count('id'), active_count=Count(Case(active))).order_by('-active_count')
print(popular)
# Result
# [{'active_count': 7, 'count': 20, 'domain': 'caktusgroup.com'},
#  {'active_count': 3, 'count': 10, 'domain': 'example.com'}]

This adds a new key/value to the resulting dictionaries which include the number of active users for the domain. Here caktusgroup.com has the most active registered users but it also has the most registered users overall. For one last usage, we can look at the percent of users for each domain who are active again using the F expression:

# Continued from above
popular = popular.annotate(
    percent_active=Value(1.0) * F('active_count') / F('count') * Value(100)
).order_by('-percent_active')
print(popular)
# Result
# [{'active_count': 7, 'count': 20, 'domain': 'caktusgroup.com', 'percent_active': 35},
#  {'active_count': 3, 'count': 10, 'domain': 'example.com', 'percent_active': 30}]

Again this adds another data point to the returned list of dictionaries which is the percent of active users. Now we know which email domains are associated with the most users, the most recently logged in users, and the percent of users with that domain who have been recently active.

Query expressions like Func allow you to make more complex queries, leveraging more of your chosen database’s power without having to drop to raw SQL. Combined with the aggregation and conditional expressions you can roll up additional statistics about your data set using the expressive power of the ORM. I hope these examples give a good overview of some of the queries that are now easy to handle in the ORM and which previously required raw SQL.

New Call-to-action
blog comments powered by Disqus
Times
Check

Success!

Times

You're already subscribed

Times