Query Expressions are Amazing
The Django 1.8 release added support for complex query expressions. The documentation has some nice examples but they don't do justice to how crazy awesome these are. In this post, we will go through some additional examples of how to leverage these expressions.
Django has had one form of a query expression for several years now: the
F
expression.
F()
can be used to reference an existing column in a query. This is
often used for atomic update operations, such as incrementing a counter.
However, F
can also be used to compare two columns on a model when
filtering. For instance, we may be interested in users who haven't
logged in since their first two weeks on the site. That requires
comparing the value of the last_login
column and the date_joined
on
the standard User
model from contrib.auth
:
from datetime import timedelta
from django.contrib.auth.models import User
from django.db.models import F
from django.utils.timezone import now
## Create some fake data: 10 active users and 20 inactive ones
today = now()
active_count = 10
inactive_count = 20
for i in range(1, active_count + inactive_count + 1):
active = i <= active_count
prefix = 'in' if active else ''
domain = 'example.com' if i % 3 == 0 else 'caktusgroup.com'
attributes = {
'username': '{}active-{}'.format(prefix, i),
'email': '{}active-{}@{}'.format(prefix, i, domain),
'date_joined': today - timedelta(days=30),
'last_login': today - timedelta(days=0 if active else 21),
}
User.objects.create(**attributes)
## Query inactive users
inactive = User.objects.filter(last_login__lte=F('date_joined') + timedelta(days=14))
The F
expression supports basic arithmetic operations including some
date math, as seen in the example above. However, it is still very
limiting in comparison to what is available in SQL.
Relational databases such as Postgresql support a number of built-in
functions which you can leverage in the ORM using the
Func
expression added in 1.8. For example, you may want to examine the email
domains of your user base. For that, you might use the split_part
function in Postgresql to extract the domain of the email address
column. To normalize the domain values you can compose this with the
built-in Lower
expression:
## Continued from above
from django.contrib.auth.models import User
from django.db.models import F, Func, Value
from django.db.models.functions import Lower
qs = User.objects.annotate(domain=Lower(
Func(F('email'), Value('@'), Value(2), function='split_part')))
This translates into the SQL call
split_part("auth_user"."email", @, 2)
and annotates every user with a
new domain
attribute which is the domain of their email address. The
value 2 passed to split_part
says to take the second value after
splitting the string. Unlike Python this is a 1-based index rather than
a 0-based index. With this we can find out what the most popular domains
are for the users:
## Continued from above
from django.db.models import Count
popular = qs.values('domain').annotate(count=Count('id')).order_by('-count')
print(popular)
## Result
## [{'count': 20, 'domain': 'caktusgroup.com'},
## {'count': 10, 'domain': 'example.com'}]
As noted in the example, this returns a list of dictionaries of the form
{'domain': <domain name>, 'count': #}
ordered by the highest counts
first. We can take this even further using the conditional expressions.
Two more new expressions Case
and When
can be used to build
conditional
aggregates.
For instance, we may want to only count users who have logged in
recently:
## Continued from above
from django.db.models import Case, When
from django.utils.timezone import now
active = When(
last_login__isnull=False,
last_login__gte=now() - timedelta(days=14),
then=Value(1))
active
defines the conditional expression when the last_login
is not
null and is a date later than 14 days ago. If there is a match then this
row will add the value of 1 to the aggregate. This conditional
expression can be passed into an aggregate expression such as Count
,
Sum
, or Avg
. To get the popular domains, we’ll count the number of
active users for a given email domain.
## Continued from above
popular = qs.values('domain').annotate(
count=Count('id'), active_count=Count(Case(active))).order_by('-active_count')
print(popular)
## Result
## [{'active_count': 7, 'count': 20, 'domain': 'caktusgroup.com'},
## {'active_count': 3, 'count': 10, 'domain': 'example.com'}]
This adds a new key/value to the resulting dictionaries which include
the number of active users for the domain. Here caktusgroup.com has the
most active registered users but it also has the most registered users
overall. For one last usage, we can look at the percent of users for
each domain who are active again using the F
expression:
## Continued from above
popular = popular.annotate(
percent_active=Value(1.0) * F('active_count') / F('count') * Value(100)
).order_by('-percent_active')
print(popular)
## Result
## [{'active_count': 7, 'count': 20, 'domain': 'caktusgroup.com', 'percent_active': 35},
## {'active_count': 3, 'count': 10, 'domain': 'example.com', 'percent_active': 30}]
Again this adds another data point to the returned list of dictionaries which is the percent of active users. Now we know which email domains are associated with the most users, the most recently logged in users, and the percent of users with that domain who have been recently active.
Query expressions like Func
allow you to make more complex queries,
leveraging more of your chosen database’s power without having to drop
to raw SQL. Combined with the aggregation and conditional expressions
you can roll up additional statistics about your data set using the
expressive power of the ORM. I hope these examples give a good overview
of some of the queries that are now easy to handle in the ORM and which
previously required raw SQL.