Custom JOINs with Django's query.join()

September 28 2009 by Colin Copeland

Django's ORM is great. It handles simple to fairly complex queries right out the box without having to write any SQL. If you need a complicated query, Django's lets you use .extra(), and you can always fallback to raw SQL if need be, but then you lose the ORM's bells and whistles. So it's always nice to find solutions that allow you to tap into the ORM at different levels.

Recently, we were looking to perform a LEFT OUTER JOIN through a Many to Many relationship. For a lack of a better example, let's use a Contact model (crm_contact), which has many Phones (crm_phones):

class Contact(models.Model):
    name = models.CharField(max_length=255)
    phones = models.ManyToManyField('Phone')
    addresses = models.ManyToManyField('Address')

class Phone(models.Model):
    number = models.CharField(max_length=16)

If we want to display each contact and corresponding phone numbers, looping through each contact in Contact.objects.all() and following the phones relationship will generate quite a few database queries (especially with a large contact table). select_related() doesn't work in this scenario either, because it only supports Foreign Key relationships. We can use extra() to add a select parameter, but tables=['crm_phones'] will not generate a LEFT OUTER join type. We need to explicitly construct the JOIN.

DISCLAIMER: The following method does work, but should not be considered best practice. That is, there may be a better way to accomplish the same task (please comment if so!). But after sparse Google results for similar scenarios, I figured it'd at least be useful to post what we discovered.

After digging around in django.db.models.sql for a bit, we found BaseQuery.join in query.py. Among the possible arguments, the most important is connection, which is "a tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing table alias or a table name. The join corresponds to the SQL equivalent of: lhs.lhs_col = table.col". Further, the promote keyword argument will set the join type to be a LEFT OUTER JOIN.

Now we can explicitly setup the JOINs through crm_contact -> crm_contact_phones -> crm_phone:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')

# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, 'crm_contact', None, None))

# join to crm_contact_phones
connection = (
    'crm_contact',
    'crm_contact_phones',
    'id',
    'contact_id',
)
contacts.query.join(connection, promote=True)

# join to crm_phone
connection = (
    'crm_contact_phones',
    'crm_phone',
    'phone_id',
    'id',
)
contacts.query.join(connection, promote=True)

It's a little verbose, but it accomplishes our goal. I used hardcoded table names/columns in the connection tuple to make it easier to follow, but we can also extract this information from the objects themselves:

contacts = Contact.objects.extra(
    select={'phone': 'crm_phone.number'}
).order_by('name')

# setup intial FROM clause
# OR contacts.query.get_initial_alias()
contacts.query.join((None, Contact._meta.db_table, None, None))

# join to crm_contact_phones
connection = (
    Contact._meta.db_table, # crm_contact
    Contact.phones.field.m2m_db_table(), # crm_contact_phones
    Contact._meta.pk.column, # etc...
    Contact.phones.field.m2m_column_name(),
)
contacts.query.join(connection, promote=True)

# join to crm_phone
connection = (
    Contact.phones.field.m2m_db_table(),
    Phone._meta.db_table,
    Contact.phones.field.m2m_reverse_name(),
    Phone._meta.pk.column,
)
contacts.query.join(connection, promote=True)

This results in a row for each phone number (Cartesian product), but we can print out each contact and corresponding phone numbers (with a single SQL statement) quickly in a template using {% ifchanged %}:

<h1>Contacts</h1>

{% for contact in contacts %}
    {% ifchanged contact.name %}
        <h2>{{ contact.name }}</h2>
    {% endifchanged %}
    <p>Phone: {{ contact.phone }}</p>
{% endfor %}

Web Developer for Hire

September 23 2009 by Colin Copeland

We're pleased to announce that Caktus is looking for a developer to join our team on a contract basis!

What do we do? We build custom web applications for local and remote clients using a variety of open-source technologies. We are a small team founded in the Chapel Hill/Carrboro area (currently residing in Carrboro Creative Coworking) who believe in face-to-face contact and employ agile development techniques that emphasize teamwork and collaboration.

We're looking for a strong software developer who enjoys working on a team and is excited to learn and experiment with new technologies. We do have a preference for local candidates, but will consider all submissions. Initial work will focus on maintaining small Django-powered websites. This will involve HTML/CSS (including converting Photoshop designs), Django Templates, and writing Unit Tests. Later work will involve creating and integrating Django apps into larger projects, deployment, and database work.

You will be working in Linux (Debian-flavor) production environments with Apache and WSGI. Python/Django experience is not required, but will be used on a daily basis. Relational database experience is a must. HTML/CSS and JavaScript experience are also a must, and jQuery is a plus.

If you're interested in this position, please send us your resume, some example code, links to any open-source projects you've contributed to, and expected compensation. We're excited to bring on a new team member!

Open Source Django Projects from Caktus Consulting Group

September 07 2009 by Tobias McNulty

At Caktus we're big fans of reusing code. We leverage many open source projects--especially Django apps--to accomplish a variety of tasks. In addition, we've written quite a few pluggable apps over the paste two years that we reuse over and over again for different projects. As a way of giving back to the community, we've polished and released a portion of that code as open source ourselves. While some of the projects have been available on Google Code for awhile now, we just put together a consolidated list of open source Django projects on our web site to serve as a jumping off point for all the projects we like, we contributed to, and we created. Enjoy!

Caktus Consulting Group, LLC sponsors DjangoCon 2009

September 05 2009 by Tobias McNulty

Django is a tool we use on a daily basis to build fantastic web apps here at Caktus, and DjangoCon is the annual conference for Django developers and other community members. We are proud to announce that Caktus Consulting Group, LLC is sponsoring DjangoCon 2009!

This year, the conference is being held the week of September 7th in the beautiful city of Portland, Oregon. Two Caktus partners, Colin and myself, will be attending. We hope to see you there!

Creating recursive, symmetrical many-to-many relationships in Django

August 14 2009 by Tobias McNulty

In Django, a recursive many-to-many relationship is a ManyToManyField that points to the same model in which it's defined ('self'). A symmetrical relationship is one in where, when a.contacts = [b], a is in b.contacts.

In changeset 8136, support for through models was added to the Django core. This allows you to create a many-to-many relationship that goes through a model of your choice:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
    )


class ContactRelationship(models.Model):
    types = models.ManyToManyField(
        'RelationshipType',
        related_name='contact_relationships',
        blank=True,
    )
    from_contact = models.ForeignKey('Contact', related_name='from_contacts')
    to_contact = models.ForeignKey('Contact', related_name='to_contacts')

    class Meta:
        unique_together = ('from_contact', 'to_contact')

According to the Django Docs, you must set symmetrical=False for recursive many-to-many relationships. Sometimes--for a recent case in django-crm, for example--what you really want is a 

symmetrical, recursive many-to-many relationship.

The trick to getting this working is understanding what symmetrical=True actually does. From what we can tell after a brief look through the Django core, symmetrical=True is simply a utility that (a) creates a second, reverse relationship in the many-to-many table, and (b) hides the field in the related model (in this case the same model) from use by appending a '+' to its name.

Since you normally have to create many-to-many relationships manually when a through model is specified, the solution is simply to leave symmetrical=False (otherwise it'll raise an exception) and create the reverse relationship manually yourself via the through model:

crm.ContactRelationship.objects.create(
    from_contact=contact_a,
    to_contact=contact_b,
)
crm.ContactRelationship.objects.create(
    from_contact=contact_b,
    to_contact=contact_a,
)

Additionally, you'll have to do a little cleanup to make sure both sides of the relationship are removed when one is removed, but otherwise this should achieve the same effect as setting symmetrical=True in other many-to-many relationships.

To hide the other side of the related manager, you can append a '+' to the related_name, like so:

class Contact(models.Model):
    contacts = models.ManyToManyField(
        'self',
        through='ContactRelationship',
        symmetrical=False,
        related_name='related_contacts+',
    )

Good luck and feel free to comment with any questions!

Setting PostgreSQL's SHMMAX in Mac OS X 10.5 (Leopard)

August 13 2009 by Colin Copeland

If you've ever tried to increase the shared_buffers setting in your postgresql.conf to a value that exceeds the amount of shared memory supported by your operating system kernel, then you'll see an error message like this:

copelco@montgomery:~$ /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
2009-07-10 10:14:04 EDTFATAL:  could not create shared memory segment: Invalid argument
2009-07-10 10:14:04 EDTDETAIL:  Failed system call was shmget(key=5432001, size=142516224, 03600).
2009-07-10 10:14:04 EDTHINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 142516224 bytes), reduce PostgreSQL's shared_buffers parameter (currently 16384) and/or its max_connections parameter (currently 23).
	If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
	The PostgreSQL documentation contains more information about shared memory configuration.

The shared_buffers default value is low (for legacy reasons). If you increase it, PostgreSQL may request a shared memory segment that exceeds your kernel's SHMMAX paramter. You can see the current values like so:

copelco@montgomery:~$ sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4194304
copelco@montgomery:~$ sysctl kern.sysv.shmall
kern.sysv.shmall: 1024

17.4. Managing Kernel Resources outlines methods to set the values permanently, but you can play around with the values temporarily (until restart) on the command line like so:

copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmmax=1073741824
kern.sysv.shmmax: 4194304 -> 1073741824
copelco@montgomery:~$ sudo sysctl -w kern.sysv.shmall=1073741824
kern.sysv.shmall: 1024 -> 1073741824

Once you have working values, you can fire up PostgreSQL (I've been happy with the kyngchaos distribution) with a LaunchDaemon file and launchd:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>org.postgresql.postgres</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/pgsql/bin/postmaster</string>
        <string>-D</string>
        <string>/usr/local/pgsql/data</string>
    </array>
    <key>RunAtLoad</key>
    <true></true>
    <key>UserName</key>
    <string>copelco</string>
</dict>
</plist>

And the launchd commands:

copelco@montgomery:~$ sudo launchctl unload /Library/LaunchDaemons/org.postgresql.postgres.plist
copelco@montgomery:~$ sudo launchctl load /Library/LaunchDaemons/org.postgresql.postgres.plist

Towards a Standard for Django Session Messages

June 19 2009 by Tobias McNulty

Django needs a standard way in which session-specific messages can be created and retrieved for display to the user. For years we've been surviving using user.message_set to store messages that are really specific to the current session, not the user, or using the latest and greatest Django snippet, pluggable app, or custom crafted middleware to handle messages in a more appropriate way.

While this has been discussed at length in Ticket #4604 as well as on Django Snippets, here are a few reasons that user.message_set is the wrong implementation:

  • No message_set exists for AnonymousUsers in Django, so you can't display any messages to them.
  • What happens when the same user is logged in from two different browsers and completing two different tasks, simultaneously? When using user.message_set to store feedback for the user, the messages will be distributed on a first come first served basis, with no regard for what session actually generated what feedback. For this reason it's bad to get in the habit of using user.message_set for messages like "Article updated successfully," or other messages that really have no context outside the current session.

I've outlined a few characteristics below that I believe would make up a solid session messaging contrib app. Please feel free to comment if I missed anything, or if you've got beef with any of my points. This is in many ways a work in progress, so I'll update it as often as I can.

  • Standards. The implementation ought to make it clear how multiple messages are to be stored and retrieved for display to the user. Maybe you need to push multiple messages onto the stack from a single view, or your app performs multiple redirects through different views.
  • Persistence. In the case where your app redirects through multiple views, it's not acceptable for session messages to disappear. The implementation needs to provide facilities for determining whether or not the messages were actually displayed, and delay purging the message list if necessary.
  • Flexibility. Support the case where a large number of independent, pluggable apps do messaging in the same project (sometimes for the same request), but don't require it. Display all the messages created by all the apps, but don't break (or lose messages) if one of the apps doesn't happen to use the messaging implementation.
  • Efficiency. Avoid storing messages in the database (or another persistent store) if possible. While it's possible to use memcache as a session backend, this isn't always possible. One potential implementation would be to store shorter messages directly in a cookie, but provide a fallback to session-based storage for longer messages.

Here's the implementation we use at Caktus, which is far from complete but it does address some of these points. This code is based on a number of snippets as well as attachments to the above referenced ticket. It could be improved by purging each message independently when it is actually retrieved and adding facilities for cookie-based storage. While I haven't used it yet, django-notify looks a lot better than this and I'm excited about trying it out.

from django.utils.encoding import StrAndUnicode
from django.contrib.sessions.backends.base import SessionBase

MESSAGES_NAME = '_messages'

SessionBase.get_messages = lambda self: self[MESSAGES_NAME]

def _session_get_and_delete_messages(self):
    messages = self.pop(MESSAGES_NAME, [])
    self[MESSAGES_NAME] = []
    return messages
SessionBase.get_and_delete_messages = \
  _session_get_and_delete_messages

def _session_create_message(self, message):
    self[MESSAGES_NAME].append(message)
    self.modified = True
SessionBase.create_message = _session_create_message

class SessionMessagesMiddleware(object):
    """
    To store messages or other user feedback in the session, add this
    class to your middleware.
    
    In your views, call request.session.create_message('the message') to
    add a message to the session.
    
    In your template(s), do this:
    
        {% if request.messages %}
            {% for message in request.messages %}<li>{{ message|escape }}</li>{% endfor %}
        {% endif %}
    
    Messages will NOT be erased from the session if you never access request.messages.
    """
    
    class LazyMessages(StrAndUnicode):
        """
        A lazy proxy for session messages.
        """
        def __init__(self, session):
            self.session = session
            super(SessionMessagesMiddleware.LazyMessages, self).__init__()
            
        def __iter__(self):
            return iter(self.messages)
    
        def __len__(self):
            return len(self.messages)
    
        def __nonzero__(self):
            return bool(self.messages)
    
        def __unicode__(self):
            return unicode(self.messages)
    
        def __getitem__(self, *args, **kwargs):
            return self.messages.__getitem__(*args, **kwargs)
    
        def _get_messages(self):
            if not hasattr(self, '_messages'):
                self._messages = self.session.get_and_delete_messages()
            return self._messages
        messages = property(_get_messages)
    
    def process_request(self, request):
        if not hasattr(request, 'session'):
            raise AttributeError('Request has no attribute "session".  Make sure session middleware is running before SessionMessages middleware.')
        
        if MESSAGES_NAME not in request.session:
            request.session[MESSAGES_NAME] = []
        
        request.messages = \
          SessionMessagesMiddleware.LazyMessages(request.session)

Remote logging with Python logging and Django

June 09 2009 by Tobias McNulty

As part of my work on EveryWatt, our fledgling energy monitoring web site, I needed a way to consolidate log messages from all the data loggers we have running in a single place. If you're not familiar with it, Python's logging module is good stuff and worth checking out. We already used it for logging to files locally, and the module defines an HTTPHandler that can deliver log messages to a remote server via HTTP.

To implement the Django side, I wrote a lightweight pluggable app to receive the log messages and store them in the database. To use the app, just create an HTTPHandler that points to your Django site, and add it to a logger:

import logging
import logging.handlers
logger = logging.getLogger('mylogger')
http_handler = logging.handlers.HTTPHandler(
    'django.app.hostname:port',
    '/remotelog/your_app_slug/log/',
    method='POST',
)
logger.addHandler(http_handler)
logger.info('testing remote logging')

On the Django side, navigate to /admin/remotelog/logmessage/ and you should have a nice interface (courtesy of the Django admin) to filter, search, and sort log messages as they come in. The app is called django-remotelog, and it's up on Google code. Check it out, and feel free to comment.

Testing Django Views for Concurrency Issues

May 26 2009 by Tobias McNulty

At Caktus, we rely heavily on automated testing for web app development. We create tests for all the code we write, ideally before the code is written. We create tests for every bug we find and, resources permitting, ramp up the test suite with lots of random input and boundary testing.

Debugging concurrency issues or race conditions has long been a nightmare. There are only so many times you can double click the link in your web app that is generating some bizarre failure.

Using the Django test client, I created a little decorator that you can use in your unit tests to make sure a view doesn't blow up when it's called multiple times with the same arguments. If it does blow up, and you happen to be using PostgreSQL, chances are you can fix the issues by using Colin's previously posted require_lock decorator.

Here's the decorator for testing concurrency:

def test_concurrently(times):
    """ 
    Add this decorator to small pieces of code that you want to test
    concurrently to make sure they don't raise exceptions when run at the
    same time.  E.g., some Django views that do a SELECT and then a subsequent
    INSERT might fail when the INSERT assumes that the data has not changed
    since the SELECT.
    """
    def test_concurrently_decorator(test_func):
        def wrapper(*args, **kwargs):
            exceptions = []
            import threading
            def call_test_func():
                try:
                    test_func(*args, **kwargs)
                except Exception, e:
                    exceptions.append(e)
                    raise
            threads = []
            for i in range(times):
                threads.append(threading.Thread(target=call_test_func))
            for t in threads:
                t.start()
            for t in threads:
                t.join()
            if exceptions:
                raise Exception('test_concurrently intercepted %s exceptions: %s' % (len(exceptions), exceptions))
        return wrapper
    return test_concurrently_decorator

To use this in a test, create a small function that includes the thread-safe code inside your test. Apply the decorator, passing the number of times you want to run the code simultaneously, and then call the function:

class MyTestCase(TestCase):
    def testRegistrationThreaded(self):
        url = reverse('toggle_registration')
        @test_concurrently(15)
        def toggle_registration():
            # perform the code you want to test here; it must be thread-safe 
            # (e.g., each thread must have its own Django test client)
            c = Client()
            c.login(username='user@example.com', password='abc123')
            response = c.get(url)
        toggle_registration()

Explicit Table Locking with PostgreSQL and Django

May 26 2009 by Colin Copeland

By default, Django doesn't do explicit table locking. This is OK for most read-heavy scenarios, but sometimes you need guaranteed, exclusive access to the data. Caktus uses PostgreSQL in most of our production environments, so we can use the various lock modes it provides to control concurrent access to the data. Once we obtain a lock in PostgreSQL, it is held for the remainder of the current transaction. Django provides transaction management, so all we need to do is execute a SQL LOCK statement within a transaction, and Django and PostgreSQL will handle the rest.

Below is an example decorator we came up with to provide easy table-locking access in Django:

from django.db import transaction

LOCK_MODES = (
    'ACCESS SHARE',
    'ROW SHARE',
    'ROW EXCLUSIVE',
    'SHARE UPDATE EXCLUSIVE',
    'SHARE',
    'SHARE ROW EXCLUSIVE',
    'EXCLUSIVE',
    'ACCESS EXCLUSIVE',
)

def require_lock(model, lock):
    """
    Decorator for PostgreSQL's table-level lock functionality
    
    Example:
        @transaction.commit_on_success
        @require_lock(MyModel, 'ACCESS EXCLUSIVE')
        def myview(request)
            ...
    
    PostgreSQL's LOCK Documentation:
    http://www.postgresql.org/docs/8.3/interactive/sql-lock.html
    """
    def require_lock_decorator(view_func):
        def wrapper(*args, **kwargs):
            if lock not in LOCK_MODES:
                raise ValueError('%s is not a PostgreSQL supported lock mode.')
            from django.db import connection
            cursor = connection.cursor()
            cursor.execute(
                'LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock)
            )
            return view_func(*args, **kwargs)
        return wrapper
    return require_lock_decorator

This is, by no means, a perfect solution. Feel free to comment below.

Parsing Microseconds in a Django Form

May 26 2009 by Tobias McNulty

There's currently no way to accept microsecond-precision input through a Django form's DateTimeField. This is an acknowledged bug, but the official solution might not come very soon, because the real fix is non-trivial.

In the meantime, here's one approach that will work in most cases:

class DateTimeWithUsecsField(forms.DateTimeField):
    def clean(self, value):
        if value and '.' in value: 
            value, usecs = value.rsplit('.', 1) # rsplit in case '.' is used elsewhere
            usecs += '0'*(6-len(usecs)) # right pad with zeros if necessary
            try:
                usecs = int(usecs) 
            except ValueError: 
                raise ValidationError('Microseconds must be an integer') 
        else: 
            usecs = 0 
        cleaned_value = super(DateTimeWithUsecsField, self).clean(value)
        if cleaned_value:
            cleaned_value = cleaned_value.replace(microsecond=usecs)
        return cleaned_value

To use this in a model form, you can override the field like so:

class MyForm(forms.ModelForm):
    def __init__(self, *arg, **kwargs):
        super(MyForm, self).__init__(*arg, **kwargs)
        self.fields['date'] = DateTimeWithUsecsField()

Seamlessly switch off (and on) a Django (or other WSGI) site for upgrades

May 25 2009 by Tobias McNulty

In preparation for migrating the EveryWatt database from one machine to another, I wrote this little WSGI script to easily disable the site while I copy the data. Since it doesn't depend on Django or really anything else (other than a functioning WSGI server), you can use it for other upgrades, too.

This is useful for preventing updates to the database while you, for example, dump the database on one machine and load it on another. With everything else already in place on either side, the user should only see the "Upgrade in progress" message for a few minutes.

Since EveryWatt includes a number of data logger clients that upload utility meter readings to the site through its Open API, I wanted to make sure any POST attempts received a temporary failure message (the data logger will store the data and retry the POST every minute)--hence the 405 Method Not Allowed for all non-GET requests.

Here's the script:

import os
import sys

UPGRADING = False

#Calculate the project path based on the location of the WSGI script.
project_dir = os.path.dirname(__file__)
sys.path.append(project_dir)

def upgrade_in_progress(environ, start_response):
    upgrade_file = os.path.join(project_dir, 'media', 'html', 'upgrade.html')
    if os.path.exists(upgrade_file):
        response_headers = [('Content-type','text/html')]
        response = open(upgrade_file).read()
    else:
        response_headers = [('Content-type','text/plain')]
        response = 'Application upgrade in progress...please check back soon.'
    
    if environ['REQUEST_METHOD'] == 'GET':
        status = '503 Service Unavailable'
    else:
        status = '405 Method Not Allowed'
    start_response(status, response_headers)
    return [response]

if UPGRADING:
    application = upgrade_in_progress
else:
    os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
    import django.core.handlers.wsgi
    application = django.core.handlers.wsgi.WSGIHandler()

And in case you need it, here's one way to dump a PostgreSQL database on one machine while you load it on another, to be run on the new host, as the database superuser:

pg_dump -h  -U   | psql 

Good luck and please post your questions/comments.

Eclipse Ganymede and Subclipse on Ubuntu - JavaHL (JNI) not available

May 21 2009 by Tobias McNulty

I finally got around to updating my Eclipse, PyDev, and Subclipse environment today, which I use for Django development.

Formerly I was using the SvnKit (pure-Java) libraries. SvnKit "felt" slow to me, compared to my command line SVN client, so this time I tried to get the JavaHL (JNI) libraries working.

For the record I'm using Ubuntu (jaunty) with Eclipse 3.4 (Ganymede). This version of Ubuntu comes with Subversion 1.5, so I need to install Subclipse 1.4. See:

http://subclipse.tigris.org/servlets/ProjectProcess?pageID=p4wYuA

I installed everything through the Eclipse update manager (minus SvnKit), but JavaHL didn't show up under Preferences -> Team -> SVN. The error message was. JavaHL (JNI) not available.

I had installed Eclipse manually (not through apt-get), so the solution was to install the JavaHL libraries:

apt-get install libsvn-java

and add the following line to my eclipse.ini (usually in the top level eclipse directory):

-Djava.library.path=/usr/lib/jni

Restart Eclipse, and you should be good to go!

Downsizing an LVM/RAID root partition

April 23 2009 by Tobias McNulty

At Caktus we use LVM2 on a RAID1 device to ease disk management on a number of our servers. Recently I needed to downsize the root partition of one of the servers, so I rebooted onto an Ubuntu 8.10 LiveCD and attempted to load the RAID/LVM info.

The procedure was slightly more complicated than I would have imagined, so I'll document it here in case anyone else finds it useful.

  1. First, install mdadm and lvm2 on the LiveCD:
    apt-get update
    apt-get install lvm2 mdadm
    
  2. Next, scan for and load your RAID drives:
    mdadm -A -s
    cat /proc/mdstat
    
    You should see all your RAID drives listed.
  3. Then, load the device-mapper kernel module and volume group backup configuration:
    modprobe dm-mod
    vgcfgrestore vg00
    vgscan
    vgdisplay
    
    Hopefully at this point you will see your volume groups and logical volumes! If the volume group backup is not available, you may have to recover it using dd to read the first few blocks on the underlying device. Google is your friend.
  4. Now, I'll run fsck and downsize the partition:
    e2fsck -f /dev/vg00/root
    resize2fs /dev/vg00/root 10G
    

Migrating from django-photologue 1.x to 2.x

March 27 2009 by Tobias McNulty

We're in the process of updating a web app for a client that was built last year about this time using Django and Photologue. Needless to say, there have been a lot of changes to both over the past year!

We were somewhat dismayed to find no easy upgrade path for photologue, and there are a number of model changes that mean you can't just run svn up and be done with it. Using the JSON output from ./manage.py dumpdata, we created a little Python script that handles the database migrations for three of the photologue models (Gallery, Photo, and PhotoSize). Save this in a script called migrate-photologue.py:

#!/usr/bin/python

import sys
import simplejson

if len(sys.argv) != 2:
    print 'Usage: %s ' % sys.argv[0]
    sys.exit(1)

REMOVE_COLUMNS = {
    'photologue.photo': (
        'photographer',
        'info',
    ),
}

RENAME_COLUMNS = {
    'photologue.photo': {
        'pub_date': 'date_added',
        'slug': 'title_slug',
    },
    'photologue.gallery': {
        'pub_date': 'date_added',
        'slug': 'title_slug',
    },
}

ADD_COLUMNS = {
    'photologue.photo': {
        'view_count': 0,
    },
    'photologue.photosize': {
        'upscale': False,
        'increment_count': 0,
    },
}

data = simplejson.load(open(sys.argv[1]))

for obj in data:
    fields = obj['fields']
    model = obj['model']
    for col in REMOVE_COLUMNS.get(model, []):
        if col in fields:
            fields.pop(col)
    for old_name, new_name in RENAME_COLUMNS.get(model, {}).iteritems():
        if old_name in fields:
            fields[new_name] = fields[old_name]
            fields.pop(old_name)
    for col, default_value in ADD_COLUMNS.get(model, {}).iteritems():
        fields[col] = default_value

print simplejson.dumps(data, indent=4)

The script is fairly simple, but back up your database first, just in case. If you need support for additional models, just add the changes you need to the dicts at the top of the file.

During the upgrade, it might help to have two copies of the database running on the local machine, so you can switch back and forth between them at will. A typical migration might look like this:

./manage.py dumpdata photologue &gt; photologue.json
./migrate-photologue.py photologue.json &gt; photologue2.json
./manage.py sqlclear photologue | ./manage.py dbshell
svn up photologue # or however you do it
./manage.py syncdb
./manage.py loaddata photologue2.json
./manage.py sqlsequencereset photologue | ./manage.py dbshell # just in case

Of course, things will get more complicated if you have other models with foreign keys to any of the photologue models. You'll have to drop the constraints temporarily and then add them again after you finish the migration, or take the plunge and write the SQL to do the migration while keeping your database relationships intact.

Solving NFS issues on embedded machines

March 25 2009 by Tobias McNulty

As part of my work on EveryWatt, I setup an NFS-based development environment for one of the data loggers we use for energy monitoring in the Caktus office. The stock 2.4 Linux kernel in the machine seemed to have some trouble mounting the file system root I had exported from one of our servers. The symptoms included long delays for most if not all activities that used the file system and lots of messages like these in dmesg:

nsm_mon_unmon: rpc failed, status=-110
lockd: cannot monitor 192.168.x.x
lockd: failed to monitor 192.168.x.x

The issue turned out to be that statd was not running on the client embedded machine. The solution is simple: mount the file system with the -o nolock option, to get around the statd requirement, like this:

mount -o nolock 192.168.x.x:/path/to/nfsroot /mnt/root 

Now the system is zippy as ever, and I have an embedded root file system that includes Python 2.4 and sqlite, and fits in just under 3.5 MB (compressed)!

Overriding Django admin templates for fun and profit

January 20 2009 by Alex Lemann

Motivation & Goal

I sometimes find the admin interface's lists of instances of models overwhelming. The filters and searching helps, but it would be nice to get an overview of the data that is being shown. Particularly I wanted to generate a graph based on the filters selected by the user, so that only items displayed after a filter would be graphed.

For example, if you have a Post model in your Blog application and a filter by author, this code might graph the number of Posts per day of the week to get a sense of when to release your next Post. But, if you select an author from the filter you might want to just graph the number of Posts created only by that one author on each day of the week.

Django has amazing documentation. There is an example in the docs describing how to override the change_form.html template, but no example overriding the change_list.html file. The change_list.html template is the template that describes the list of a particular model's objects in the admin interface. And, there is especially no example that uses the selected filters to change the content of the list of objects in the admin interface.

Process

Following the documentation above, I overrode the templates/admin/my_app/my_model/change_list.html file. This means that my changes to the templates will only show up on that particular object's change_list page. We want to show the graphs above the list of objects only for this one model. Examining the original change_list.html file, it turns out that the header for this list of objects corresponds to the pretitle block. The fact that it is so simple to override a single block in one template for one model in the admin interface speaks to how modular Django is. What do we want to put in that block? The filter data is buried deep within the context passed in to this page from a view. We certainly don't want to mess with the admin's views. That leaves template tags. Now, template tags are usually a last resort for me because of the unwieldy argument passing syntax. Caktus has our own internal way of doing this which is extremely similar to a django snippet posted recently. So, the template should look like:

{% extends "admin/change_list.html" %}
{% load graphs_from_filters %}

{% block pretitle %}
{% graphs_from_filter change_list=cl %}
{% endblock %}

Notice, we are passing the ChangeList variable, cl, from the context into our template tag.

Digging into a ChangeList

So, we've been passed django.contrib.admin.main.views.ChangeList object. What the heck is that? Well change_list objects hold django.contrib.admin.filter_spec.FilterSpecs objects which give us the name of the model, the class, and the id of the particular object. We use that to create a dictionary of filter name to filtered by objects. It looks something like this:

from django import template
from caktus.django.templatetags import parse_args_kwargs, CaktNode
import project.graphs as graphs

class GraphFilterNode(CaktNode):
    def render_with_args(self, context, change_list):
        selected = {}
        for filter_spec in change_list.filter_specs:
          if filter_spec.lookup_val:
            selected[filter_spec.title()] = filter_spec.field.rel.to.objects.get(id=filter_spec.lookup_val)
    return "<img src="%s" />" % graphs.graph_url_from_filters(selected)

register = template.Library()
@register.tag
def graphs_from_filter(parser, token):
  """
    Usage {% graphs_from_filter change_list=<change_list_object> %}
    """
    tag_name, args, kwargs = parse_args_kwargs(parser, token)
    return GraphFilterNode(*args, **kwargs)</change_list_object>

Disclaimer

I know very little about Django internals. This was mostly worked out through ipython, introspection, and reading some code. There is a little bit of validation of this method in Django ticket #3096, but as usual the internal Django structures might change and break your code. This happend to me when this ticket got resolved. I think that now I have this, a better solution, but is it the best one?

Why Caktus Uses Django

January 13 2009 by Tobias McNulty

Here at Caktus, we use the popular Django web framework for a lot of our custom web application development. We don't use Django simply because it's popular, easy to learn, or happened to be the first thing we found. We've written web apps in PHP, Java, and Ruby on Rails--all before we discovered Django--but were never quite satisfied. Following are just a few of the reasons that we both enjoy working with Django and believe it gives you (the client) the best end-product.

Django is Business-Friendly

Django is open source, free, and published under a "do anything you like" license, so it can be used to create all kinds of products, including proprietary business web apps. In addition to a flexible license, Django has a truly thriving user community and is being constantly improved by web developers like ourselves across the globe.

Built-in Admin Interface

Web application development often starts with the "data model." A data model defines the ways in which all the different pieces of information--such as customer names and addresses or product descriptions--are organized and related to each other in the database. Finding the right data model takes time and it's important to get it right, because a lot of development decisions will be based on the way your information is organized and accessible. When you're building a web application from the ground up--something we do every day at Caktus--you want the flexibility to experiment with your data model and "see" what all the different options look like.

This is where Django's built-in admin interface comes in. From the beginning, Django has included an automatically generated interface that lets you see and edit what's in your database. It knows the structure of your data and puts together a set of search and listing pages and custom web forms for creating, modifying, and otherwise managing your data. It lets you evaluate your data model up front before making a big investment in other parts of your web app. For some sites, the admin interface even makes up a big part of the final product (e.g., for sites that primarily publish content, such as news organizations). And, we've found, the automatically generated admin interface is a powerful tool for showing potential clients what a web app can do.

I Trust Django With My Data

At Caktus we put a strong emphasis on "data integrity." What is data integrity? Kevin already wrote a great post about what it is and why you should care about data integrity. In a nutshell, the "integrity" of data refers to its "completeness" or validity as a whole. For example, you probably want to limit the products that people can order on your web site to those that you actually stock in the warehouse. Modern "relational database management systems" provide integrity "checks" for your data that verify its appropriateness--based on the conditions you supply--for a given table in the database. When you build a data model in Django, you specify the nature or "type" of each column in your database and can even specify "constraints" on the data that--if your database server supports it--will be enforced at the database level in addition to the application. While this is always a good thing, it's even more important if other programs or users will be connecting to your database in addition to your web app. While Django does this out of the box, another popular web framework requires some under the hood "hacking" to achieve the same peace of mind about your data. On a side note, in addition to preferring Django for web app development, Caktus also prefers PostgreSQL for data storage. Our friends over at Summersault have already written a good summary describing why PostgreSQL is often the best choice for web app development, so I won't repeat the reasons here. We trust the Django + PostgreSQL combination so much that we even wrote our own CRM and bookkeeping package to keep track of our clients, projects, and all the related financial transactions.

Django is Written in Python

Python is a great language with no shortage of facilities and a huge (and growing) user base. A lot of Google's infrastructure is written in Python, and it is the only language supported by the initial release of their App Engine service. According to python.org:

[Python] offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.

Based on Caktus' experience writing Django web apps over the past 1.5+ years, this couldn't be more true.

Separation of Application Components

Django uses a variation of the Model View Controller (MVC) architecture that ensures all the different pieces of your application end up in the right place and, for larger projects, let the people with different skills work on the things they do best, without getting in each other's way. Moreover, Django implements its own very simple "template language" for generating web pages. While some may view its simplicity as a curse, it is actually a blessing in disguise: by allowing only very simple constructs in the template, Django forces you to keep your business logic in the controller (what Django calls a "view") where it belongs. At Caktus, we're not just web developers. We're web engineers with a passion for web apps that not only work, feel, and look great, but also have the capacity to grow, improve, and continue to perform long into the future without breaking the bank. That said, we're truly thrilled about the Python/Django + PostgreSQL combination.

minibooks: Small Business Bookkeeping

January 07 2009 by Colin Copeland

Caktus released minibooks (open-sourced under the AGPL) as a bookkeeping package for small tech agencies. Boasting a double-entry accounting system, customer relationship management (CRM) and transaction reconciliation, minibooks provides a clean, multiuser web-based interface to manage simple accounting needs for small businesses.

minibooks was originally developed out of our frustration with single-user, desktop-oriented accounting packages like QuickBooks. We wanted a team-accessible system, so everyone on our VPN could access the CRM and manage bookkeeping tasks remotely. So Caktus developed a lightweight web app using Django and PostgreSQL to handle our basic needs. We use it everyday at Caktus, so we're continually improving it and adding features (most recently recurring transactions and flags to monitor delivered and undelivered exchanges) to make things easier.

I'd like to spend some time highlighting a few of the many great features found in minibooks:

ExchangeType Model

Invoices, Receipts, Orders, Purchases, etc., are represented by a single, generalized Exchange model in minibooks. At a very low level, they all do the same thing (record an exchange between two entities), but vary slightly in their characteristics. So, they all reside in the same database table and are distinguished by their type. Exchanges have a foreign key relationship to the ExchangeType model.

ExchangeTypes provide a powerful interface to streamline repetitive tasks and customize interface elements based sets of characteristics. For example, if most purchases are made through your credit card, you can create a Purchase ExchangeType to credit your Credit Card Account by default while still allowing you to debit each item on the receipt to a separate account (e.g., Food Expenses, Office Supplies, etc.). Invoices can also use this system with accounts receivable (accrual accounting). Simply set up an Invoice ExchangeType with Accounts Receivable as the common account and separate income credit accounts (e.g. Consulting, Hosting, etc) will be available for each item on the invoice.

Quick Search

minibooks' CRM stores all of the contacts, businesses and projects associated with Caktus. This way everything is consolidated in one spot ---from a client phone number to a project invoice. Being able to jump quickly is a necessity, so we created an AJAX auto-complete search bar that's displayed at the top of every page. Need to enter a receipt from the client meeting at the coffee shop or quickly find a phone number? Just type the first few letters and minibooks will search project, business and customer names and emails and return a list of matches instantly. Then just arrow down and hit return or click the one you're looking for. Better yet, the Quick Search field is accessible through the "f" access key, so hit Control+Alt+F on a Mac or Shift+Alt+F on Linux (and Windows?) and you'll never have to leave the keyboard!

Transaction Reconciliation

Balancing your checkbook can be a slow and tedious process, so we tried to ease the process with your business accounts. When a monthly credit card statement arrives, jump over to the Accounts tab and go to the credit card ledger. Here you'll find all credits (purchases) and debits (payments) ordered by date alongside a running total. Once you OK the amount on the credit card statement, check the checkbox next to that transaction and, using AJAX, minibooks will update the current reconciled balance! Now you can make sure your bookkeeping records always match your bank and credit card statements with ease.

Recurring Exchanges

Caktus provides web and email hosting services for our clients, so we wanted an easy way to automatically generate invoices for recurring services. The exchange creation form has the option to repeat items based on a set interval (days, weeks, months and years). Just setup cron to access /ledger/cron/ every night and all of your recurring exchanges will be automatically generated. An itemized email of generated invoices and other exchanges will be sent to you as well.

LaTeX Exchange and Report Generation

minibooks uses LaTeX to generate PDFs for exchanges (Invoices, Receipts, Member Distributions, etc.) and project reports. LaTeX files fit nicely into the Django template system, so you can use django variables, tags and filters right in your .tex document. minibooks includes the template we use here at Caktus by default for you to use, but the possibilities are endless, so feel free to create your own and style your invoices anyway you like! Further, the CRM automatically attaches the PDFs when sending exchanges to clients through minibooks.

Learn More

These are just some of the features found in minibooks. The source code and installation instructions can be found on the minibooks project page. An online demo (login:demo@minibooksdemo.com, pass:demo) is available for testing as well. minibooks is far from feature complete, so feel free to hack away at it and add features as you see fit!

Entering Contacts in CiviCRM

November 05 2008 by Tobias McNulty

One of our non-profit clients recently asked for help entering the numerous business cards they get at trade shows, etc., into their Customer Relationship Management database, a copy of CiviCRM that we setup and manage for them. The best path for entering contacts isn't necessarily obvious from the get-go, but the following procedure is the best we've found and has the lowest up-front investment (you might be able to do something more efficient with a Profile, but that doesn't seem as flexible with respect to matching existing contacts).

Before we get started, if you're using Drupal, make sure you've added the CiviCRM Shortcuts block to an accessible location on your site, such as the navigation bar on the left. To do this, go into Administer → Site building → Blocks and define a region for the "CiviCRM Shortcuts" block. I'm not familiar with Joomla, but it probably supports something comparable.

  1. First, login and click New Individual under CiviCRM Shortcuts. If you're using CiviCRM 2.1, you can also get to the New Individual screen by hitting Alt-Shift-I (on a Mac, use Control in place of Alt).
  2. Enter the individual's First Name, Last Name, and Current Employer. If you're using CiviCRM 2.0, make sure you type the employer name exactly as it appears on the card (especially if that organization might already exist in the database). CiviCRM 2.0 will match organizations in the Current Employer field, but they must be typed exactly as they show up in the database. CiviCRM 2.1 has an auto-complete field for the Current Employer. If you type in the first few letters of an organization and it already exists in the database, it'll fill it out for you and link up the contacts automatically.
  3. Click Check for Matching Contact(s). If the contact already exists and you need to make changes, click the contact's name to edit.
  4. If the contact doesn't already exist, enter any additional contact information you want to associate with the individual, such as his or her e-mail address and direct telephone number and/or cell phone number and click Save.
  5. Now click on the Relationships tab in the individual's contact record. You should be able to see from that page whether the organization already has an address and/or telephone number associated with it.
  6. If the organization doesn't have any information other than a name, chances are it was just created for you with the individual (using the contents of the Current Employer field). To add contact information for the organization, click on the organization's name and then the Edit button on the following screen.
  7. Enter any additional information from the business card, such as the organization's mailing address or main telephone number, and click Save.

Good luck, and by all means let us know if you find of a more efficient method of entering a large number of contacts!

Older | Newer