Custom JOINs with Django’s query.join()

September 28th, 2009 by Colin Copeland

Django’s ORM is great. It handles simple to fairly complex queries right out the box without having to write any SQL. If you need a complicated query, Django’s lets you use .extra(), and you can always fallback to raw SQL if need be, but then you lose the ORM’s bells and whistles. So it’s always nice to find solutions that allow you to tap into the ORM at different levels.

Recently, we were looking to perform a LEFT OUTER JOIN through a Many to Many relationship. For a lack of a better example, let’s use a Contact model (crm_contact), which has many Phones (crm_phones):

class Contact(models.Model):
    name = models.CharField(max_length=255)
    phones = models.ManyToManyField('Phone')
    addresses = models.ManyToManyField('Address')
 
class Phone(models.Model):
    number = models.CharField(max_length=16)

If we want to display each contact and corresponding phone numbers, looping through each contact in Contact.objects.all() and following the phones relationship will generate quite a few database queries (especially with a large contact table). select_related() doesn’t work in this scenario either, because it only supports Foreign Key relationships. We can use extra() to add a select parameter, but tables=['crm_phones'] will not generate a LEFT OUTER join type. We need to explicitly construct the JOIN.

DISCLAIMER: The following method does work, but should not be considered best practice. That is, there may be a better way to accomplish the same task (please comment if so!). But after sparse Google results for similar scenarios, I figured it’d at least be useful to post what we discovered.

After digging around in django.db.models.sql for a bit, we found BaseQuery.join in query.py. Among the possible arguments, the most important is connection, which is “a tuple (lhs, table, lhs_col, col) where ‘lhs’ is either an existing table alias or a table name. The join corresponds to the SQL equivalent of: lhs.lhs_col = table.col”. Further, the promote keyword argument will set the join type to be a LEFT OUTER JOIN.

Now we can explicitly setup the JOINs through crm_contact -> crm_contact_phones -> crm_phone:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, 'crm_contact', None, None))
 
# join to crm_contact_phones
connection = (
    'crm_contact',
    'crm_contact_phones',
    'id',
    'contact_id',
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    'crm_contact_phones',
    'crm_phone',
    'phone_id',
    'id',
)
contacts.query.join(connection, promote=True)

It’s a little verbose, but it accomplishes our goal. I used hardcoded table names/columns in the connection tuple to make it easier to follow, but we can also extract this information from the objects themselves:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')
 
# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, Contact._meta.db_table, None, None))
 
# join to crm_contact_phones
connection = (
    Contact._meta.db_table, # crm_contact
    Contact.phones.field.m2m_db_table(), # crm_contact_phones
    Contact._meta.pk.column, # etc...
    Contact.phones.field.m2m_column_name(),
)
contacts.query.join(connection, promote=True)
 
# join to crm_phone
connection = (
    Contact.phones.field.m2m_db_table(),
    Phone._meta.db_table,
    Contact.phones.field.m2m_reverse_name(),
    Phone._meta.pk.column,
)
contacts.query.join(connection, promote=True)

This results in a row for each phone number (Cartesian product), but we can print out each contact and corresponding phone numbers (with a single SQL statement) quickly in a template using {% ifchanged %}:

<h1>Contacts</h1>
 
{% for contact in contacts %}
    {% ifchanged contact.name %}
        <h2>{{ contact.name }}</h2>
    {% endifchanged %}
    <p>Phone: {{ contact.phone }}</p>
{% endfor %}

Creating recursive, symmetrical many-to-many relationships in Django

August 14th, 2009 by tobias

In Django, a recursive many-to-many relationship is a ManyToManyField that points to the same model in which it’s defined (’self’). A symmetrical relationship is one in where, when a.contacts = [b], a is in b.contacts.

In changeset 8136, support for through models was added to the Django core. This allows you to create a many-to-many relationship that goes through a model of your choice:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
    )
 
 
class ContactRelationship(models.Model):
    types = models.ManyToManyField(
        'RelationshipType',
        related_name='contact_relationships',
        blank=True,
    )
    from_contact = models.ForeignKey('Contact', related_name='from_contacts')
    to_contact = models.ForeignKey('Contact', related_name='to_contacts')
 
    class Meta:
        unique_together = ('from_contact', 'to_contact')

According to the Django Docs, you must set symmetrical=False for recursive many-to-many relationships. Sometimes–for a recent case in django-crm, for example–what you really want is a symmetrical, recursive many-to-many relationship.

The trick to getting this working is understanding what symmetrical=True actually does. From what we can tell after a brief look through the Django core, symmetrical=True is simply a utility that (a) creates a second, reverse relationship in the many-to-many table, and (b) hides the field in the related model (in this case the same model) from use by appending a ‘+’ to its name.

Since you normally have to create many-to-many relationships manually when a through model is specified, the solution is simply to leave symmetrical=False (otherwise it’ll raise an exception) and create the reverse relationship manually yourself via the through model:

crm.ContactRelationship.objects.create(
    from_contact=contact_a,
    to_contact=contact_b,
)
crm.ContactRelationship.objects.create(
    from_contact=contact_b,
    to_contact=contact_a,
)

Additionally, you’ll have to do a little cleanup to make sure both sides of the relationship are removed when one is removed, but otherwise this should achieve the same effect as setting symmetrical=True in other many-to-many relationships.

To hide the other side of the related manager, you can append a ‘+’ to the related_name, like so:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
        related_name='related_contacts+',
    )

Good luck and feel free to comment with any questions!

Why Caktus Uses Django

January 13th, 2009 by tobias

Here at Caktus, we use the popular Django web framework for a lot of our custom web application development. We don’t use Django simply because it’s popular, easy to learn, or happened to be the first thing we found. We’ve written web apps in PHP, Java, and Ruby on Rails–all before we discovered Django–but were never quite satisfied. Following are just a few of the reasons that we both enjoy working with Django and believe it gives you (the client) the best end-product.

Django is Business-Friendly

Django is open source, free, and published under a “do anything you like” license, so it can be used to create all kinds of products, including proprietary business web apps. In addition to a flexible license, Django has a truly thriving user community and is being constantly improved by web developers like ourselves across the globe.

Built-in Admin Interface

Web application development often starts with the “data model.” A data model defines the ways in which all the different pieces of information–such as customer names and addresses or product descriptions–are organized and related to each other in the database. Finding the right data model takes time and it’s important to get it right, because a lot of development decisions will be based on the way your information is organized and accessible. When you’re building a web application from the ground up–something we do every day at Caktus–you want the flexibility to experiment with your data model and “see” what all the different options look like.

This is where Django’s built-in admin interface comes in. From the beginning, Django has included an automatically generated interface that lets you see and edit what’s in your database. It knows the structure of your data and puts together a set of search and listing pages and custom web forms for creating, modifying, and otherwise managing your data. It lets you evaluate your data model up front before making a big investment in other parts of your web app. For some sites, the admin interface even makes up a big part of the final product (e.g., for sites that primarily publish content, such as news organizations). And, we’ve found, the automatically generated admin interface is a powerful tool for showing potential clients what a web app can do.

I Trust Django With My Data

At Caktus we put a strong emphasis on “data integrity.” What is data integrity? Kevin already wrote a great post about what it is and why you should care about data integrity. In a nutshell, the “integrity” of data refers to its “completeness” or validity as a whole. For example, you probably want to limit the products that people can order on your web site to those that you actually stock in the warehouse.

Modern “relational database management systems” provide integrity “checks” for your data that verify its appropriateness–based on the conditions you supply–for a given table in the database. When you build a data model in Django, you specify the nature or “type” of each column in your database and can even specify “constraints” on the data that–if your database server supports it–will be enforced at the database level in addition to the application. While this is always a good thing, it’s even more important if other programs or users will be connecting to your database in addition to your web app. While Django does this out of the box, another popular web framework requires some under the hood “hacking” to achieve the same peace of mind about your data.

On a side note, in addition to preferring Django for web app development, Caktus also prefers PostgreSQL for data storage. Our friends over at Summersault have already written a good summary describing why PostgreSQL is often the best choice for web app development, so I won’t repeat the reasons here. We trust the Django + PostgreSQL combination so much that we even wrote our own CRM and bookkeeping package to keep track of our clients, projects, and all the related financial transactions.

Django is Written in Python

Python is a great language with no shortage of facilities and a huge (and growing) user base. A lot of Google’s infrastructure is written in Python, and it is the only language supported by the initial release of their App Engine service. According to python.org:

[Python] offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.

Based on Caktus’ experience writing Django web apps over the past 1.5+ years, this couldn’t be more true.

Separation of Application Components

Django uses a variation of the Model View Controller (MVC) architecture that ensures all the different pieces of your application end up in the right place and, for larger projects, let the people with different skills work on the things they do best, without getting in each other’s way. Moreover, Django implements its own very simple “template language” for generating web pages. While some may view its simplicity as a curse, it is actually a blessing in disguise: by allowing only very simple constructs in the template, Django forces you to keep your business logic in the controller (what Django calls a “view”) where it belongs.

At Caktus, we’re not just web developers. We’re web engineers with a passion for web apps that not only work, feel, and look great, but also have the capacity to grow, improve, and continue to perform long into the future without breaking the bank. That said, we’re truly thrilled about the Python/Django + PostgreSQL combination.

Asterisk CDR & Django integration with ODBC

October 13th, 2008 by lemanal

Tobias already mentioned how Caktus uses Asterisk as our PBX. He also mentioned how we tested various frontends both for managing the asterisk configuration and interacting with asterisk to, for example, check our voicemail. We were inticed by some of the client management solutions that we could plumb up with asterisk. Caktus has a loose administration structure, which allows us to be flexible and not have levels of managers between clients and coders. But, this flexiblility can leave loose ends unchecked when the person in charge of a project is distracted for a day or two. We saw this as an opportunity where Caktus could add a level of group accountability and use some neat technology. We also wanted a tool that would integrate well with our current homegrown DjangoERP/CRM as well as Trac, our prefered tool for project management. So, we decided to write some sweet code.

Asterisk provides Call Detail Records (CDR) information for billing calls. This is used for people reselling their asterisk setup on a per call basis which is not what we’re doing, but it automatically records a lot of useful information about calls including who called whom when and how long the call lasted. We decided to tie into this information for our interface. Asterisk provides a CDR ODBC interface. ODBC is a generic interface which sits between applications wanting to use a database and the database server itself. This was useful since a lot of the built in features of asterisk rely on MySQL databases which we don’t condone the use of. Instead, we used these instructions to setup our Asterisk ODBC interface using unixODBC in order to connect to our Postgres backend. This setup took a while to get all the pieces in place working together. It will take a lot of fiddling to get this working.

First, add an ODBC driver for your database. This is an example for using Postgres since that’s our preference.

1
2
3
4
5
6
7
/etc/odbc-pgsql.ini:
 
[PostgreSQL]
Description = PostgreSQL driver for Linux & Win32
Driver = /usr/lib/odbc/psqlodbca.so
Setup = /usr/lib/odbc/libodbcpsqlS.so
FileUsage = 1

Add an ODBC interface for the database of the Django project where you want the CDR data to show up. Fill in the blanks with the correct information for your configuration.

1
2
3
4
5
6
7
8
9
10
11
/etc/odbc.ini:
 
[django_odbc]
Description = PostgreSQL Asterisk
Driver = PostgreSQL
Servername = localhost
UserName = django_db_user
Password = django_db_password
Database = django_db_name
Port = 5432
Option = 3

Tell asterisk how to connect to the unixODBC server to log CDR data.

1
2
3
4
5
6
7
8
9
10
/etc/asterisk/cdr_odbc.conf:
 
[global]
dsn=django_odbc
username=django_db_name
password=django_db_password
loguniqueid=yes
dispositionstring=yes
table=cdr
usegmtime=no

Tell asterisk to use the CDR/ODBC configuration we just configured to store CDR data.

1
2
3
/etc/asterisk/modules.conf:
 
load => cdr_odbc.so

Now, the database must be setup as well. Use one of the schemas provided by the asterisk project for the Cdr table in the database. I used this data to create a model in our Django project and imported it into our models.py file using “python ./manage.py inspectdb”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Cdr(models.Model):
  acctid = models.TextField(primary_key=True)
  calldate = models.DateTimeField()
  clid = models.CharField(max_length=80)
  src = models.CharField(max_length=80)
  dst = models.CharField(max_length=80)
  dcontext = models.CharField(max_length=80)
  channel = models.CharField(max_length=80)
  dstchannel = models.CharField(max_length=80)
  lastapp = models.CharField(max_length=80)
 
  lastdata = models.CharField(max_length=80)
  duration = models.IntegerField()
  billsec = models.IntegerField()
  disposition = models.CharField(max_length=45)
  amaflags = models.IntegerField()
  accountcode = models.CharField(max_length=20)
  uniqueid = models.CharField(max_length=32)
  userfield = models.CharField(max_length=255)
  class Meta:
    db_table = u'cdr'
 
 
  def __str__(self):
    return "%s -> %s" % ( self.src, self.dst )

Storing real-time call information

Since CDR data is only needed for per call billing information, the CDR information is not stored until all the data comes in, after a call is completed. We wanted to be able to creates notes on a call as it was happening, so we created a new table, Interactions. A row in this table will be populated as soon as a call is made or received and it will provide a place for our notes. In order to do this we used Asterisk’s ODBC Functions. These allow you to make any SQL call, inserting or selecting data, from within your dialplan.

Create a simple Interaction table in your django project.

1
2
3
4
5
6
7
django-project/app/models.py
 
class Interaction(models.Model):
  project = models.ForeignKey(Project, null=True)
  contacts = models.ManyToManyField(User, related_name='interactions')
  memo = models.TextField(null=True)
  cdr = models.ForeignKey('Cdr', null=True, to_field='uniqueid', editable=False)

Now enable a connection for ODBC functions using the unixODBC settings from before.

1
2
3
4
5
6
/etc/asterisk/res_odbc.conf:
 
[django_odbc]
enabled => yes
dsn => django_odbc
pre-connect => yes

Tell asterisk to use the func_odbc driver.

1
2
3
/etc/asterisk/modules.conf:
 
preload => func_odbc.so

Here is the actual SQL statement to be called from within the dialplan. We’ve added a bit of SQL to update the contact field as well based on who was called or who called us. It’s probably too tied to our CRM model to be useful for you, but it’s not that difficult to do, if you’ve made it this far. Here, you should replace app_interaction with the name of your Interaction table. It should not be important for security to escape this value since it’s internal to asterisk.

1
2
3
4
5
/etc/asterisk/func_odbc.conf:
 
[LOG_INTERACTION]
dsn=django_odbc
write=INSERT INTO app_interaction (cdr_id,project,contacts,memo) VALUES ('${VAL1}',NULL,NULL,NULL);

A call to this function should be made from your dialplan (extensions.conf). This will create an Interaction and link it to the Cdr row for the call once that is populated using the unique id that asterisk assigns each call.

Incoming

1
2
3
4
/etc/asterisk/extensions.conf:
 
exten => _XX.,1,Set(ODBC_LOG_INTERACTION()=${UNIQUEID})
exten => _XX.,2,.....

Outgoing

1
2
3
4
/etc/asterisk/extensions.conf:
 
exten => _1NXXNXXXXXX,1,Set(ODBC_LOG_INTERACTION()=${UNIQUEID})
exten => _1NXXNXXXXXX,2,....

Future plans.

Store our voicemail in a database. Also, we’ve had the CallerID information displayed on our softphones populated by our CRM information, but this is not currently working. It would definitely worth looking at again. It would help ease the context switch of stopping what we’re doing and picking up the phone. It also would let us better determine who the call is bound for and let that person pick up reducing the number of folks a client has to talk to and not make us transfer them around.

Database Explained for the Business Professional

February 22nd, 2008 by kevin

One problem with marketing is that it introduces pseudo-false concepts, arbitrarily divorces necessarily wed ones, and leaves out all the gory details. We recently had a client ask us,

Why would we do our data model/data base outside of MySQL and not have our software make calls upon it? I thought one used MySQL to manage the data and the database, and used software to call, update and display it …

The answer, of course, is “it depends.” However, given the context of the situation with our client, whose project absolutely mandated a database, this highlighted a couple of large misconceptions. The first and foremost is that a data model and database are the same thing. They are not. A data model is a description of how a system will handle data. A database is how a system enforces that description. The difference is subtle, but significant.

For example, if I put 500 USD into my bank account, I expect that when I want it later, the funds are still there. But what if I made a mistake and put a 1 instead of a 5 for the account number? Or what if the teller can’t read my handwriting and puts down a 5 instead of 6? What’s the guard to make sure that I won’t lose my money to some lucky schmo? In this hypothetical example, the bank database would see my name on the transaction and raise a flag that it didn’t match the number the teller typed. That’s the job of the database, to make sure that the constituent (me) is correctly tied to related items (500 USD, account number, name). If I had just put money in and there was no database, there would just be one record among millions that showed that I gave some random account a gift. Good luck proving that I did not mean to do that.

But let’s return to “it depends.” It depends on what one needs to do with the data. If one merely needs a log or list of data produced, one will not need the complexity of a Relational Database Management System (RDBMS). On the other hand, the minute one wants to do something fun, like see who accessed two different pages of a site on the 1st and 5th of each month, between the hours of 3p and 8p from certain departments of 4 different companies, one wants an RDBMS. As the name “Relational” implies, a well designed data model in the hands of an RDBMS makes extremely difficult and perhaps random questions easy, or at least possible.

The second misconception lay in his confusion about keeping his project outside of the database. In his mind at the time, MySQL was it. Similar to how most people don’t know that Microsoft Windows is a choice, he thought MySQL was the only database. When he heard us drop words like “SQLite,” “Postgres,” and “datastore” around, he was confused. There exist today a plethora of databases, each targeting a segment of the market and each rife with their own strengths and weaknesses. To name drop a few: DB2, Derby, Firebird, MySQL, Oracle, Postgres, Sqlite. There are others, too (hint: google for ‘database comparison wikipedia’). Each of these has a specific market segment they target. I won’t get into it here, but knowing the right one to pick for your needs is difficult at best.

Overall, I absolutely love databases. They are supremely excellent beasts. Properly designed, databases help your company enforce all the nitty-gritty details, and pull all kinds of interesting information from your data. However, like any tool, use it for the jobs at which it excels and not for the jobs at which fails. If all you need is a storage device, use a simple text file or spreadsheet. If you need long-term data integrity and powerful information scraping capabilities, absolutely use a database.