Custom JOINs with Django’s query.join()

September 28th, 2009 by Colin Copeland

Django’s ORM is great. It handles simple to fairly complex queries right out the box without having to write any SQL. If you need a complicated query, Django’s lets you use .extra(), and you can always fallback to raw SQL if need be, but then you lose the ORM’s bells and whistles. So it’s always nice to find solutions that allow you to tap into the ORM at different levels.

Recently, we were looking to perform a LEFT OUTER JOIN through a Many to Many relationship. For a lack of a better example, let’s use a Contact model (crm_contact), which has many Phones (crm_phones):

class Contact(models.Model):
    name = models.CharField(max_length=255)
    phones = models.ManyToManyField('Phone')
    addresses = models.ManyToManyField('Address')
 
class Phone(models.Model):
    number = models.CharField(max_length=16)

If we want to display each contact and corresponding phone numbers, looping through each contact in Contact.objects.all() and following the phones relationship will generate quite a few database queries (especially with a large contact table). select_related() doesn’t work in this scenario either, because it only supports Foreign Key relationships. We can use extra() to add a select parameter, but tables=['crm_phones'] will not generate a LEFT OUTER join type. We need to explicitly construct the JOIN.

DISCLAIMER: The following method does work, but should not be considered best practice. That is, there may be a better way to accomplish the same task (please comment if so!). But after sparse Google results for similar scenarios, I figured it’d at least be useful to post what we discovered.

After digging around in django.db.models.sql for a bit, we found BaseQuery.join in query.py. Among the possible arguments, the most important is connection, which is “a tuple (lhs, table, lhs_col, col) where ‘lhs’ is either an existing table alias or a table name. The join corresponds to the SQL equivalent of: lhs.lhs_col = table.col”. Further, the promote keyword argument will set the join type to be a LEFT OUTER JOIN.

Now we can explicitly setup the JOINs through crm_contact -> crm_contact_phones -> crm_phone:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, 'crm_contact', None, None))
 
# join to crm_contact_phones
connection = (
    'crm_contact',
    'crm_contact_phones',
    'id',
    'contact_id',
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    'crm_contact_phones',
    'crm_phone',
    'phone_id',
    'id',
)
contacts.query.join(connection, promote=True)

It’s a little verbose, but it accomplishes our goal. I used hardcoded table names/columns in the connection tuple to make it easier to follow, but we can also extract this information from the objects themselves:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, Contact._meta.db_table, None, None))
 
# join to crm_contact_phones
connection = (
    Contact._meta.db_table, # crm_contact
    Contact.phones.field.m2m_db_table(), # crm_contact_phones
    Contact._meta.pk.column, # etc...
    Contact.phones.field.m2m_column_name(),
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    Contact.phones.field.m2m_db_table(),
    Phone._meta.db_table,
    Contact.phones.field.m2m_reverse_name(),
    Phone._meta.pk.column,
)
contacts.query.join(connection, promote=True)

This results in a row for each phone number (Cartesian product), but we can print out each contact and corresponding phone numbers (with a single SQL statement) quickly in a template using {% ifchanged %}:

<h1>Contacts</h1>
 
{% for contact in contacts %}
    {% ifchanged contact.name %}
        <h2>{{ contact.name }}</h2>
    {% endifchanged %}
    <p>Phone: {{ contact.phone }}</p>
{% endfor %}

Web Developer for Hire

September 23rd, 2009 by Colin Copeland

We’re pleased to announce that Caktus is looking for a developer to join our team on a contract basis!

What do we do? We build custom web applications for local and remote clients using a variety of open-source technologies. We are a small team founded in the Chapel Hill/Carrboro area (currently residing in Carrboro Creative Coworking) who believe in face-to-face contact and employ agile development techniques that emphasize teamwork and collaboration.

We’re looking for a strong software developer who enjoys working on a team and is excited to learn and experiment with new technologies. We do have a preference for local candidates, but will consider all submissions. Initial work will focus on maintaining small Django-powered websites. This will involve HTML/CSS (including converting Photoshop designs), Django Templates, and writing Unit Tests. Later work will involve creating and integrating Django apps into larger projects, deployment, and database work.

You will be working in Linux (Debian-flavor) production environments with Apache and WSGI. Python/Django experience is not required, but will be used on a daily basis. Relational database experience is a must. HTML/CSS and JavaScript experience are also a must, and jQuery is a plus.

If you’re interested in this position, please send us your resume, some example code, links to any open-source projects you’ve contributed to, and expected compensation. We’re excited to bring on a new team member!

Open Source Django Projects from Caktus Consulting Group

September 7th, 2009 by tobias

At Caktus we’re big fans of reusing code. We leverage many open source projects–especially Django apps–to accomplish a variety of tasks. In addition, we’ve written quite a few pluggable apps over the paste two years that we reuse over and over again for different projects. As a way of giving back to the community, we’ve polished and released a portion of that code as open source ourselves. While some of the projects have been available on Google Code for awhile now, we just put together a consolidated list of open source Django projects on our web site to serve as a jumping off point for all the projects we like, we contributed to, and we created. Enjoy!

Caktus Consulting Group, LLC sponsors DjangoCon 2009

September 5th, 2009 by tobias

Django is a tool we use on a daily basis to build fantastic web apps here at Caktus, and DjangoCon is the annual conference for Django developers and other community members. We are proud to announce that Caktus Consulting Group, LLC is sponsoring DjangoCon 2009!

This year, the conference is being held the week of September 7th in the beautiful city of Portland, Oregon. Two Caktus partners, Colin and myself, will be attending. We hope to see you there!

Creating recursive, symmetrical many-to-many relationships in Django

August 14th, 2009 by tobias

In Django, a recursive many-to-many relationship is a ManyToManyField that points to the same model in which it’s defined (’self’). A symmetrical relationship is one in where, when a.contacts = [b], a is in b.contacts.

In changeset 8136, support for through models was added to the Django core. This allows you to create a many-to-many relationship that goes through a model of your choice:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
    )
 
 
class ContactRelationship(models.Model):
    types = models.ManyToManyField(
        'RelationshipType',
        related_name='contact_relationships',
        blank=True,
    )
    from_contact = models.ForeignKey('Contact', related_name='from_contacts')
    to_contact = models.ForeignKey('Contact', related_name='to_contacts')
 
    class Meta:
        unique_together = ('from_contact', 'to_contact')

According to the Django Docs, you must set symmetrical=False for recursive many-to-many relationships. Sometimes–for a recent case in django-crm, for example–what you really want is a symmetrical, recursive many-to-many relationship.

The trick to getting this working is understanding what symmetrical=True actually does. From what we can tell after a brief look through the Django core, symmetrical=True is simply a utility that (a) creates a second, reverse relationship in the many-to-many table, and (b) hides the field in the related model (in this case the same model) from use by appending a ‘+’ to its name.

Since you normally have to create many-to-many relationships manually when a through model is specified, the solution is simply to leave symmetrical=False (otherwise it’ll raise an exception) and create the reverse relationship manually yourself via the through model:

crm.ContactRelationship.objects.create(
    from_contact=contact_a,
    to_contact=contact_b,
)
crm.ContactRelationship.objects.create(
    from_contact=contact_b,
    to_contact=contact_a,
)

Additionally, you’ll have to do a little cleanup to make sure both sides of the relationship are removed when one is removed, but otherwise this should achieve the same effect as setting symmetrical=True in other many-to-many relationships.

To hide the other side of the related manager, you can append a ‘+’ to the related_name, like so:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
        related_name='related_contacts+',
    )

Good luck and feel free to comment with any questions!

Setting PostgreSQL’s SHMMAX in Mac OS X 10.5 (Leopard)

August 13th, 2009 by Colin Copeland

If you’ve ever tried to increase the shared_buffers setting in your postgresql.conf to a value that exceeds the amount of shared memory supported by your operating system kernel, then you’ll see an error message like this:

copelco@montgomery:~$ /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
2009-07-10 10:14:04 EDTFATAL:  could not create shared memory segment: Invalid argument
2009-07-10 10:14:04 EDTDETAIL:  Failed system call was shmget(key=5432001, size=142516224, 03600).
2009-07-10 10:14:04 EDTHINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 142516224 bytes), reduce PostgreSQL's shared_buffers parameter (currently 16384) and/or its max_connections parameter (currently 23).
	If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
	The PostgreSQL documentation contains more information about shared memory configuration.

The shared_buffers default value is low (for legacy reasons). If you increase it, PostgreSQL may request a shared memory segment that exceeds your kernel’s SHMMAX paramter. You can see the current values like so:

copelco@montgomery:~$ sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4194304
copelco@montgomery:~$ sysctl kern.sysv.shmall
kern.sysv.shmall: 1024

17.4. Managing Kernel Resources outlines methods to set the values permanently, but you can play around with the values temporarily (until restart) on the command line like so:

copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmmax=1073741824
kern.sysv.shmmax: 4194304 -> 1073741824
copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmall=1073741824
kern.sysv.shmall: 1024 -> 1073741824

Once you have working values, you can fire up PostgreSQL (I’ve been happy with the kyngchaos distribution) with a LaunchDaemon file and launchd:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>org.postgresql.postgres</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/pgsql/bin/postmaster</string>
        <string>-D</string>
        <string>/usr/local/pgsql/data</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>UserName</key>
    <string>copelco</string>
</dict>
</plist>

And the launchd commands:

copelco@montgomery:~$ sudo launchctl unload /Library/LaunchDaemons/org.postgresql.postgres.plist
copelco@montgomery:~$ sudo launchctl load /Library/LaunchDaemons/org.postgresql.postgres.plist

Towards a Standard for Django Session Messages

June 19th, 2009 by tobias

Django needs a standard way in which session-specific messages can be created and retrieved for display to the user. For years we’ve been surviving using user.message_set to store messages that are really specific to the current session, not the user, or using the latest and greatest Django snippet, pluggable app, or custom crafted middleware to handle messages in a more appropriate way.

While this has been discussed at length in Ticket #4604 as well as on Django Snippets, here are a few reasons that user.message_set is the wrong implementation:

  • No message_set exists for AnonymousUsers in Django, so you can’t display any messages to them.
  • What happens when the same user is logged in from two different browsers and completing two different tasks, simultaneously? When using user.message_set to store feedback for the user, the messages will be distributed on a first come first served basis, with no regard for what session actually generated what feedback. For this reason it’s bad to get in the habit of using user.message_set for messages like “Article updated successfully,” or other messages that really have no context outside the current session.

I’ve outlined a few characteristics below that I believe would make up a solid session messaging contrib app. Please feel free to comment if I missed anything, or if you’ve got beef with any of my points. This is in many ways a work in progress, so I’ll update it as often as I can.

  • Standards. The implementation ought to make it clear how multiple messages are to be stored and retrieved for display to the user. Maybe you need to push multiple messages onto the stack from a single view, or your app performs multiple redirects through different views.
  • Persistence. In the case where your app redirects through multiple views, it’s not acceptable for session messages to disappear. The implementation needs to provide facilities for determining whether or not the messages were actually displayed, and delay purging the message list if necessary.
  • Flexibility. Support the case where a large number of independent, pluggable apps do messaging in the same project (sometimes for the same request), but don’t require it. Display all the messages created by all the apps, but don’t break (or lose messages) if one of the apps doesn’t happen to use the messaging implementation.
  • Efficiency. Avoid storing messages in the database (or another persistent store) if possible. While it’s possible to use memcache as a session backend, this isn’t always possible. One potential implementation would be to store shorter messages directly in a cookie, but provide a fallback to session-based storage for longer messages.

Here’s the implementation we use at Caktus, which is far from complete but it does address some of these points. This code is based on a number of snippets as well as attachments to the above referenced ticket. It could be improved by purging each message independently when it is actually retrieved and adding facilities for cookie-based storage. While I haven’t used it yet, django-notify looks a lot better than this and I’m excited about trying it out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from django.utils.encoding import StrAndUnicode
from django.contrib.sessions.backends.base import SessionBase
 
MESSAGES_NAME = '_messages'
 
SessionBase.get_messages = lambda self: self[MESSAGES_NAME]
 
def _session_get_and_delete_messages(self):
    messages = self.pop(MESSAGES_NAME, [])
    self[MESSAGES_NAME] = []
    return messages
SessionBase.get_and_delete_messages = \
  _session_get_and_delete_messages
 
def _session_create_message(self, message):
    self[MESSAGES_NAME].append(message)
    self.modified = True
SessionBase.create_message = _session_create_message
 
class SessionMessagesMiddleware(object):
    """
    To store messages or other user feedback in the session, add this
    class to your middleware.
 
    In your views, call request.session.create_message('the message') to
    add a message to the session.
 
    In your template(s), do this:
 
        {% if request.messages %}
            {% for message in request.messages %}<li>{{ message|escape }}</li>{% endfor %}
        {% endif %}
 
    Messages will NOT be erased from the session if you never access request.messages.
    """
 
    class LazyMessages(StrAndUnicode):
        """
        A lazy proxy for session messages.
        """
        def __init__(self, session):
            self.session = session
            super(SessionMessagesMiddleware.LazyMessages, self).__init__()
 
        def __iter__(self):
            return iter(self.messages)
 
        def __len__(self):
            return len(self.messages)
 
        def __nonzero__(self):
            return bool(self.messages)
 
        def __unicode__(self):
            return unicode(self.messages)
 
        def __getitem__(self, *args, **kwargs):
            return self.messages.__getitem__(*args, **kwargs)
 
        def _get_messages(self):
            if not hasattr(self, '_messages'):
                self._messages = self.session.get_and_delete_messages()
            return self._messages
        messages = property(_get_messages)
 
    def process_request(self, request):
        if not hasattr(request, 'session'):
            raise AttributeError('Request has no attribute "session".  Make sure session middleware is running before SessionMessages middleware.')
 
        if MESSAGES_NAME not in request.session:
            request.session[MESSAGES_NAME] = []
 
        request.messages = \
          SessionMessagesMiddleware.LazyMessages(request.session)

Remote logging with Python logging and Django

June 9th, 2009 by tobias

As part of my work on EveryWatt, our fledgling energy monitoring web site, I needed a way to consolidate log messages from all the data loggers we have running in a single place. If you’re not familiar with it, Python’s logging module is good stuff and worth checking out. We already used it for logging to files locally, and the module defines an HTTPHandler that can deliver log messages to a remote server via HTTP.

To implement the Django side, I wrote a lightweight pluggable app to receive the log messages and store them in the database. To use the app, just create an HTTPHandler that points to your Django site, and add it to a logger:

1
2
3
4
5
6
7
8
9
10
import logging
import logging.handlers
logger = logging.getLogger('mylogger')
http_handler = logging.handlers.HTTPHandler(
    'django.app.hostname:port',
    '/remotelog/your_app_slug/log/',
    method='POST',
)
logger.addHandler(http_handler)
logger.info('testing remote logging')

On the Django side, navigate to /admin/remotelog/logmessage/ and you should have a nice interface (courtesy of the Django admin) to filter, search, and sort log messages as they come in. The app is called django-remotelog, and it’s up on Google code. Check it out, and feel free to comment.

Testing Django Views for Concurrency Issues

May 26th, 2009 by tobias

At Caktus, we rely heavily on automated testing for web app development. We create tests for all the code we write, ideally before the code is written. We create tests for every bug we find and, resources permitting, ramp up the test suite with lots of random input and boundary testing.

Debugging concurrency issues or race conditions has long been a nightmare. There are only so many times you can double click the link in your web app that is generating some bizarre failure.

Using the Django test client, I created a little decorator that you can use in your unit tests to make sure a view doesn’t blow up when it’s called multiple times with the same arguments. If it does blow up, and you happen to be using PostgreSQL, chances are you can fix the issues by using Colin’s previously posted require_lock decorator.

Here’s the decorator for testing concurrency:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def test_concurrently(times):
    """ 
    Add this decorator to small pieces of code that you want to test
    concurrently to make sure they don't raise exceptions when run at the
    same time.  E.g., some Django views that do a SELECT and then a subsequent
    INSERT might fail when the INSERT assumes that the data has not changed
    since the SELECT.
    """
    def test_concurrently_decorator(test_func):
        def wrapper(*args, **kwargs):
            exceptions = []
            import threading
            def call_test_func():
                try:
                    test_func(*args, **kwargs)
                except Exception, e:
                    exceptions.append(e)
                    raise
            threads = []
            for i in range(times):
                threads.append(threading.Thread(target=call_test_func))
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            if exceptions:
                raise Exception('test_concurrently intercepted %s exceptions: %s' % (len(exceptions), exceptions))
        return wrapper
    return test_concurrently_decorator

To use this in a test, create a small function that includes the thread-safe code inside your test. Apply the decorator, passing the number of times you want to run the code simultaneously, and then call the function:

1
2
3
4
5
6
7
8
9
10
11
class MyTestCase(TestCase):
    def testRegistrationThreaded(self):
        url = reverse('toggle_registration')
        @test_concurrently(15)
        def toggle_registration():
            # perform the code you want to test here; it must be thread-safe 
            # (e.g., each thread must have its own Django test client)
            c = Client()
            c.login(username='user@example.com', password='abc123')
            response = c.get(url)
        toggle_registration()

Explicit Table Locking with PostgreSQL and Django

May 26th, 2009 by Colin Copeland

By default, Django doesn’t do explicit table locking. This is OK for most read-heavy scenarios, but sometimes you need guaranteed, exclusive access to the data. Caktus uses PostgreSQL in most of our production environments, so we can use the various lock modes it provides to control concurrent access to the data. Once we obtain a lock in PostgreSQL, it is held for the remainder of the current transaction. Django provides transaction management, so all we need to do is execute a SQL LOCK statement within a transaction, and Django and PostgreSQL will handle the rest.

Below is an example decorator we came up with to provide easy table-locking access in Django:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from django.db import transaction
 
LOCK_MODES = (
    'ACCESS SHARE',
    'ROW SHARE',
    'ROW EXCLUSIVE',
    'SHARE UPDATE EXCLUSIVE',
    'SHARE',
    'SHARE ROW EXCLUSIVE',
    'EXCLUSIVE',
    'ACCESS EXCLUSIVE',
)
 
def require_lock(model, lock):
    """
    Decorator for PostgreSQL's table-level lock functionality
 
    Example:
        @transaction.commit_on_success
        @require_lock(MyModel, 'ACCESS EXCLUSIVE')
        def myview(request)
            ...
 
    PostgreSQL's LOCK Documentation:
    http://www.postgresql.org/docs/8.3/interactive/sql-lock.html
    """
    def require_lock_decorator(view_func):
        def wrapper(*args, **kwargs):
            if lock not in LOCK_MODES:
                raise ValueError('%s is not a PostgreSQL supported lock mode.')
            from django.db import connection
            cursor = connection.cursor()
            cursor.execute(
                'LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock)
            )
            return view_func(*args, **kwargs)
        return wrapper
    return require_lock_decorator

This is, by no means, a perfect solution. Feel free to comment below.