Custom JOINs with Django’s query.join()

September 28th, 2009 by Colin Copeland

Django’s ORM is great. It handles simple to fairly complex queries right out the box without having to write any SQL. If you need a complicated query, Django’s lets you use .extra(), and you can always fallback to raw SQL if need be, but then you lose the ORM’s bells and whistles. So it’s always nice to find solutions that allow you to tap into the ORM at different levels.

Recently, we were looking to perform a LEFT OUTER JOIN through a Many to Many relationship. For a lack of a better example, let’s use a Contact model (crm_contact), which has many Phones (crm_phones):

class Contact(models.Model):
    name = models.CharField(max_length=255)
    phones = models.ManyToManyField('Phone')
    addresses = models.ManyToManyField('Address')
 
class Phone(models.Model):
    number = models.CharField(max_length=16)

If we want to display each contact and corresponding phone numbers, looping through each contact in Contact.objects.all() and following the phones relationship will generate quite a few database queries (especially with a large contact table). select_related() doesn’t work in this scenario either, because it only supports Foreign Key relationships. We can use extra() to add a select parameter, but tables=['crm_phones'] will not generate a LEFT OUTER join type. We need to explicitly construct the JOIN.

DISCLAIMER: The following method does work, but should not be considered best practice. That is, there may be a better way to accomplish the same task (please comment if so!). But after sparse Google results for similar scenarios, I figured it’d at least be useful to post what we discovered.

After digging around in django.db.models.sql for a bit, we found BaseQuery.join in query.py. Among the possible arguments, the most important is connection, which is “a tuple (lhs, table, lhs_col, col) where ‘lhs’ is either an existing table alias or a table name. The join corresponds to the SQL equivalent of: lhs.lhs_col = table.col”. Further, the promote keyword argument will set the join type to be a LEFT OUTER JOIN.

Now we can explicitly setup the JOINs through crm_contact -> crm_contact_phones -> crm_phone:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, 'crm_contact', None, None))
 
# join to crm_contact_phones
connection = (
    'crm_contact',
    'crm_contact_phones',
    'id',
    'contact_id',
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    'crm_contact_phones',
    'crm_phone',
    'phone_id',
    'id',
)
contacts.query.join(connection, promote=True)

It’s a little verbose, but it accomplishes our goal. I used hardcoded table names/columns in the connection tuple to make it easier to follow, but we can also extract this information from the objects themselves:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, Contact._meta.db_table, None, None))
 
# join to crm_contact_phones
connection = (
    Contact._meta.db_table, # crm_contact
    Contact.phones.field.m2m_db_table(), # crm_contact_phones
    Contact._meta.pk.column, # etc...
    Contact.phones.field.m2m_column_name(),
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    Contact.phones.field.m2m_db_table(),
    Phone._meta.db_table,
    Contact.phones.field.m2m_reverse_name(),
    Phone._meta.pk.column,
)
contacts.query.join(connection, promote=True)

This results in a row for each phone number (Cartesian product), but we can print out each contact and corresponding phone numbers (with a single SQL statement) quickly in a template using {% ifchanged %}:

<h1>Contacts</h1>
 
{% for contact in contacts %}
    {% ifchanged contact.name %}
        <h2>{{ contact.name }}</h2>
    {% endifchanged %}
    <p>Phone: {{ contact.phone }}</p>
{% endfor %}

7 Responses to “Custom JOINs with Django’s query.join()”

  1. Claude Says:

    Well, very interesting! This is a common performance problem in Django apps.
    I has been confronted to such a situation in one of my apps, where I choose to join the results of two separate queries in Python code. That was not nice at all, but at least prevented many hundreds of database queries…

  2. Jay Says:

    Currently I use custom SQL to do complex joins. This post gives me a new choice. But what if I want to do RIGHT OUTER JOIN or FULL OUTER JOIN? Is there a method in Django’s ORM?

  3. tobias Says:

    You’ll have to fall back to raw SQL for this one. Check out ticket #11863 for upcoming changes that might help with this.

  4. Yishai Says:

    What worked for me and seemed nicer in terms of the code is use a “through” model instead of the regular M2M relation. You still get all the benefits of the M2M relation, but you can now query directly on the through model to get your cartesian product.

  5. copelco Says:

    Good point, a “through” model will work nicely in this scenario, though joining through multiple M2M relationships might not work. Another issue we have run across in the past involves using qs.extra() to call a SQL function on a related table that Django is unaware of. select_related() will not work and filter() (which will construct the JOINs) doesn’t support SQL functions. You really want to use the ORM to construct the join and just tap in at the end with qs.extra() to add some custom SQL.

  6. Vinilios Says:

    Very interesting article, its amazing that you posted this while i was trying to solve same issues in a i18n application i’m currently developing, and this is really useful joining translation objects with their translations within a single query.

    What would be super cool would be a way to keep ‘join compatibility’ within the queryset so something like the following will work within the custom joins (after a quick test it seems that queryset defines an additional join for which the filter conditions are set):

    contacts.query.join(connection, promote=True).filter(phones__phone__startswith=…..)

  7. theY4Kman Says:

    Thank you so much! This article has helped me a lot!

    I needed to use multiple LEFT JOINs on the same table, and I was pulling my hair out until I looked at query.join. It returns an alias that points to that particular join.

    alias1 = contacts.query.join(…)
    alias2 = contacts.query.join(…)
    contacts = contacts.extra(where=[alias1+'.id=1', alias2+'.id=3'])

Leave a Reply